logo

Sufficient Statistics and Maximum Likelihood Estimators for the Binomial Distribution 📂Probability Distribution

Sufficient Statistics and Maximum Likelihood Estimators for the Binomial Distribution

Theorem

Let’s assume we have a random sample $\mathbf{X} := \left( X_{1} , \cdots , X_{n} \right) \sim U \left( 0 , \theta \right)$ following a uniform distribution.

The sufficient statistic $T$ and maximum likelihood estimator $\hat{\theta}$ for $\theta$ are as follows: $$ \begin{align*} T =& \max_{k=1 , \cdots , n} X_{k} \\ \hat{\theta} =& \max_{k=1 , \cdots , n} X_{k} \end{align*} $$

Proof

Strategy: The sufficient statistic and maximum likelihood estimator of the uniform distribution are must-know statistics due to their practicality aside, especially for homework, midterms, and final exams. They can be derived directly by definition, but at first, this is not always straightforward.

Sufficient statistic and maximum likelihood estimator for location families: Suppose we have a random sample $X_{1} , \cdots , X_{n} \sim X$ from a location family with a probability density function of $f_{X} \left( x ; \theta \right) = f_{X} \left( x - \theta \right)$. The sufficient statistic and maximum likelihood estimator are

  • For a support of $X$ that is bounded above, use $\max X_{k}$
  • For a support of $X$ that is bounded below, use $\min X_{k}$

depending on the condition.

$U \left( 0 , \theta \right)$ is a location family, and while the sufficient statistic and maximum likelihood estimator for the location family can easily be guessed by corollaries, let’s derive them directly for an intuitive understanding.

Sufficient Statistic

Product of indicator functions: $$ \prod_{i=1}^{n} I_{(-\infty, \theta]} \left( x_{i} \right) = I_{(-\infty, \theta]} \left( \max_{i \in [n]} x_{i} \right) $$

$$ \begin{align*} f \left( \mathbf{x} ; \theta \right) =& \prod_{k=1}^{n} f \left( x_{k} ; \theta \right) \\ =& \prod_{k=1}^{n} {{ 1 } \over { \theta }} I_{[0,\theta]} \left( x_{k} \right) \\ =& {{ 1 } \over { \theta^{n} }} I_{[0,\theta]} \left( \max x_{k} \right) \\ =& {{ 1 } \over { \theta^{n} }} I_{[0,\theta]} \left( \max x_{k} \right) \cdot 1 \end{align*} $$

Neyman Factorization Theorem: Consider a random sample $X_{1} , \cdots , X_{n}$ with the same probability mass/density function $f \left( x ; \theta \right)$ for parameter $\theta \in \Theta$. A statistic $Y = u_{1} \left( X_{1} , \cdots , X_{n} \right)$ is a sufficient statistic for $\theta$ if it satisfies the existence of two non-negative functions $k_{1} , k_{2} \ge 0$ such that: $$ f \left( x_{1} ; \theta \right) \cdots f \left( x_{n} ; \theta \right) = k_{1} \left[ u_{1} \left( x_{1} , \cdots , x_{n} \right) ; \theta \right] k_{2} \left( x_{1} , \cdots , x_{n} \right) $$ However, $k_{2}$ must not depend on $\theta$.

Based on the Neyman Factorization Theorem, $T := \max X_{k}$ is a sufficient statistic for $\theta$.

Maximum Likelihood Estimator

$$ L \left( \theta ; \mathbf{x} \right) = f \left( \mathbf{x} ; \theta \right) = I_{[0,\theta]} \left( \max x_{k} \right) $$ The likelihood function of the random sample can be determined as shown, eliminating the need to specifically use the partial derivative of the indicator function.

Definition of Maximum Likelihood Estimator: An estimator $\hat{\theta} := \hat{\theta} \left( \mathbf{X} \right)$ that satisfies the following is called Maximum Likelihood Estimator, or MLE for short: $$ \hat{\theta} = \argmax L \left( \theta ; \mathbf{X} \right) $$

Based on the Definition of Maximum Likelihood Estimator, instead of extensively worrying about the likelihood function, focusing solely on $\hat{\theta} \ge \max X_{k}$ suffices, because if $\hat{\theta} < \max X_{k}$ then it results in $L = 0$. As illustrated by this explanation and definition, the maximum likelihood estimator doesn’t have to be uniquely $\max X_{k}$, nor does it need to be; hence, there’s no point in unnecessarily considering $\max X_{k} + 700$ when $\max X_{k}$ is sufficient.