logo

Sufficient Statistics and Maximum Likelihood Estimators of a Normal Distribution 📂Probability Distribution

Sufficient Statistics and Maximum Likelihood Estimators of a Normal Distribution

Theorem

A given random sample $\mathbf{X} := \left( X_{1} , \cdots , X_{n} \right) \sim N \left( \mu , \sigma^{2} \right)$ follows a normal distribution.

The sufficient statistic $T$ and maximum likelihood estimator $\left( \hat{\mu}, \widehat{\sigma^{2}} \right)$ for $\left( \mu, \sigma^{2} \right)$ are as follows: $$ \begin{align*} T =& \left( \sum_{k} X_{k}, \sum_{k} X_{k}^{2} \right) \\ \left( \hat{\mu}, \widehat{\sigma^{2}} \right) =& \left( {{ 1 } \over { n }} \sum_{k} X_{k}, {{ 1 } \over { n }} \sum_{k} \left( X_{k} - \overline{X} \right)^{2} \right) \end{align*} $$

Proof

Sufficient Statistic

$$ \begin{align*} f \left( \mathbf{x} ; \lambda \right) =& \prod_{k=1}^{n} f \left( x_{k} ; \lambda \right) \\ =& \prod_{k=1}^{n} {{ 1 } \over { \sqrt{2 \pi} \sigma }} \exp \left[ - {{ 1 } \over { 2 }} \left( {{ x_{k} - \mu } \over { \sigma }} \right)^{2} \right] \\ =& {{ 1 } \over { \sqrt{2 \pi}^{n} \sigma^{n} }} \exp \left[ - {{ 1 } \over { 2 \sigma^{2} }} \sum_{k=1}^{n} x_{k}^{2} \right] \exp \left[ {{ 1 } \over { \sigma^{2} }} \sum_{k=1}^{n} \mu x_{k} \right] \exp \left[ - {{ 1 } \over { 2 \sigma^{2} }} n \mu^{2} \right] \\ \overset{\mu}{=}& \exp \left[ {{ \mu } \over { \sigma^{2} }} \sum_{k=1}^{n} x_{k} - {{ 1 } \over { 2 \sigma^{2} }} n \mu^{2} \right] \cdot {{ 1 } \over { \sqrt{2 \pi}^{n} \sigma^{n} }} \exp \left[ - {{ 1 } \over { 2 \sigma^{2} }} \sum_{k=1}^{n} x_{k}^{2} \right] \\ \overset{\sigma}{=}& {{ 1 } \over { \sqrt{2 \pi}^{n} \sigma^{n} }} \exp \left[ - {{ 1 } \over { 2 \sigma^{2} }} \sum_{k=1}^{n} x_{k}^{2} \right] \exp \left[ {{ 1 } \over { \sigma^{2} }} \sum_{k=1}^{n} \mu x_{k} \right] \exp \left[ - {{ 1 } \over { 2 \sigma^{2} }} n \mu^{2} \right] \cdot 1 \end{align*} $$

Neyman Factorization Theorem: Consider a random sample $X_{1} , \cdots , X_{n}$ that has the same probability mass/density function $f \left( x ; \theta \right)$ for a parameter $\theta \in \Theta$. A statistic $Y = u_{1} \left( X_{1} , \cdots , X_{n} \right)$ is a sufficient statistic $\theta$ if there exist two non-negative functions $k_{1} , k_{2} \ge 0$ that satisfy the following: $$ f \left( x_{1} ; \theta \right) \cdots f \left( x_{n} ; \theta \right) = k_{1} \left[ u_{1} \left( x_{1} , \cdots , x_{n} \right) ; \theta \right] k_{2} \left( x_{1} , \cdots , x_{n} \right) $$ Note that $k_{2}$ should not depend on $\theta$.

According to the Neyman Factorization Theorem, $T := \left( \sum_{k} X_{k}, \sum_{k} X_{k}^{2} \right)$ is a sufficient statistic for $\left( \mu, \sigma^{2} \right)$.

Maximum Likelihood Estimator

$$ \begin{align*} \log L \left( \mu, \sigma^{2} ; \mathbf{x} \right) =& \log f \left( \mathbf{x} ; \mu, \sigma^{2} \right) \\ =& - n \log \sigma \sqrt{2 \pi} - {{ 1 } \over { 2 \sigma^{2} }} \sum_{k=1}^{n} x_{k}^{2} + {{ 1 } \over { \sigma^{2} }} \sum_{k=1}^{n} \mu x_{k} - {{ 1 } \over { 2 \sigma^{2} }} n \mu^{2} \end{align*} $$

The log-likelihood function of the random sample is as shown above, and for the likelihood function to reach its maximum value, the partial derivative with respect to $\mu, \sigma$ has to be $0$. Initially, to have the partial derivative with respect to $\mu$ be $0$, $$ \begin{align*} & 0 = {{ 1 } \over { \sigma^{2} }} \sum_{k=1}^{n} x_{k} - {{ 1 } \over { \sigma^{2} }} n \mu \\ \implies & \mu = {{ 1 } \over { n }} \sum_{k=1}^{n} x_{k} \end{align*} $$

thus, regardless of $\sigma$, $\hat{\mu} = \sum_{k=1}^{n} X_{k} / n$ holds, and to have the partial derivative with respect to $\sigma$ be $0$, $$ \begin{align*} & 0 = - {{ n } \over { \sigma }} + {{ 1 } \over { \sigma^{3} }} \sum_{k=1}^{n} x_{k}^{2} - {{ 2 } \over { \sigma^{3} }} \sum_{k=1}^{n} \mu x_{k} + {{ 1 } \over { \sigma^{3} }} n \mu^{2} \\ \implies & n \sigma^{2} = \sum_{k=1}^{n} x_{k}^{2} - 2 \sum_{k=1}^{n} \mu x_{k} + n \mu^{2} \\ \implies & \sigma^{2} = {{ 1 } \over { n }} \sum_{k=1}^{n} \left( x_{k} - \mu \right)^{2} \end{align*} $$

thus, for $\hat{\mu} = \hat{\mu} = \sum_{k=1}^{n} X_{k} / n$, $\widehat{\sigma^{2}} = \sum_{k=1}^{n} \left( X_{k} - \mu \right)^{2} / n$ holds.