Neumann Factorization Theorem Proof 📂Mathematical Statistics

Neumann Factorization Theorem Proof

Theorem

Let’s say a random sample $X_{1} , \cdots , X_{n}$ has the same probability mass/density function $f \left( x ; \theta \right)$ for a parameter $\theta \in \Theta$ . Statistic $Y = u_{1} \left( X_{1} , \cdots , X_{n} \right)$ is a sufficient statistic for $\theta$ if there exist two non-negative functions $k_{1} , k_{2} \ge 0$ that satisfy the following. $f \left( x_{1} ; \theta \right) \cdots f \left( x_{n} ; \theta \right) = k_{1} \left[ u_{1} \left( x_{1} , \cdots , x_{n} \right) ; \theta \right] k_{2} \left( x_{1} , \cdots , x_{n} \right)$ Here, $k_{2}$ must not depend on $\theta$ .

Proof

Definition of Sufficient Statistic: For a function $H \left( x_{1} , \cdots , x_{n} \right)$ that does not depend on $\theta \in \Theta$ , ${{ f \left( x_{1} ; \theta \right) \cdots f \left( x_{n} ; \theta \right) } \over { f_{Y_{1}} \left( u_{1} \left( x_{1} , \cdots, x_{n} \right) ; \theta \right) }} = H \left( x_{1} , \cdots , x_{n} \right)$ if that is true, then $Y_{1}$ is called a Sufficient Statistic for $\theta$ .

We prove this only for continuous probability distributions. Refer to Casella for proofs on discrete probability distributions.

$(\Rightarrow)$

As per the definition of sufficient statistic, it is obvious since $f_{Y_{1}}$ corresponds to $k_{1}$ , and $H$ to $f_{2}$ .

$(\Leftarrow)$

$\begin{align*} y_{1} &:= u_{1} \left( x_{1} , \cdots , x_{n} \right) \\ y_{2} &:= u_{2} \left( x_{1} , \cdots , x_{n} \right) \\ &\vdots \\ y_{n} &:= u_{n} \left( x_{1} , \cdots , x_{n} \right) \end{align*}$

Let’s denote the inverse functions of the above functions for convenience and represent the Jacobian as $J$ .

$\begin{align*} x_{1} &:= w_{1} \left( y_{1} , \cdots , y_{n} \right) \\ x_{2} &:= w_{2} \left( y_{1} , \cdots , y_{n} \right) \\ &\vdots \\ x_{n} &:= w_{n} \left( y_{1} , \cdots , y_{n} \right) \end{align*}$

Then, the joint probability density function $g$ of $Y_{1} , \cdots , Y_{n}$ for $w_{i} = w_{i} \left( y_{1} , \cdots , y_{n} \right)$ is $g \left( y_{1} , \cdots , y_{n} ; \theta \right) = k_{1} \left( y_{1} ; \theta \right) k_{2} \left( w_{1} , \cdots , w_{n} \right) \left| J \right|$ and, the marginal probability density function $f_{Y_{1}}$ of $Y_{1}$ is $\begin{align*} f_{Y_{1}} \left( y_{1} ; \theta \right) =& \int_{-\infty}^{\infty} \cdots \int_{-\infty}^{\infty} g \left( y_{1} , \dots , y_{n} ; \theta \right) d y_{2} \cdots d y_{n} \\ =& k_{1} \left( y_{1} ; \theta \right) \int_{-\infty}^{\infty} \cdots \int_{-\infty}^{\infty} \left| J \right| k_{2} \left( w_{1} , \dots , w_{n} \right) d y_{2} \cdots d y_{n} \end{align*}$ $k_{2}$ , being a function that does not depend on $\theta$ and since $J$ also does not involve $\theta$ , the right-hand integral can be expressed as a function solely of $y_{1}$ , which we’ll temporarily denote as $m \left( y_{1} \right)$ . $f_{Y_{1}} \left( y_{1} ; \theta \right) = k_{1} \left( y_{1} ; \theta \right) m \left( y_{1} \right)$ Here, if $m \left( y_{1} \right) = 0$ , it is trivially $f_{Y_{1}} \left( y_{1} ; \theta \right) = 0$ . Now, assuming $m \left( y_{1} \right) > 0$ , it can be written as follows. $k_{1} \left[ u_{1} \left( x_{1} , \cdots , x_{n} \right) ; \theta \right] = {{ f_{Y_{1}} \left[ u_{1} \left( x_{1} , \cdots , x_{n} \right) ; \theta \right] } \over { m \left[ u_{1} \left( x_{1} , \cdots , x_{n} \right) \right] }}$ Substituting the given expression yields $\begin{align*} f \left( x_{1} ; \theta \right) \cdots f \left( x_{n} ; \theta \right) =& k_{1} \left[ u_{1} \left( x_{1} , \cdots , x_{n} \right) ; \theta \right] k_{2} \left( x_{1} , \cdots , x_{n} \right) \\ =& {{ f_{Y_{1}} \left[ u_{1} \left( x_{1} , \cdots , x_{n} \right) ; \theta \right] } \over { m \left[ u_{1} \left( x_{1} , \cdots , x_{n} \right) \right] }} k_{2} \left( x_{1} , \cdots , x_{n} \right) \\ =& f_{Y_{1}} \left[ u_{1} \left( x_{1} , \cdots , x_{n} \right) ; \theta \right] {{ k_{2} \left( x_{1} , \cdots , x_{n} \right) } \over { m \left[ u_{1} \left( x_{1} , \cdots , x_{n} \right) \right] }} \end{align*}$ Since both $k_{2}$ and $m$ do not depend on $\theta$ , by definition, $Y_{1}$ is a sufficient statistic for $\theta$ .

■