Neumann Factorization Theorem Proof
Theorem
Let’s say a random sample $X_{1} , \cdots , X_{n}$ has the same probability mass/density function $f \left( x ; \theta \right)$ for a parameter $\theta \in \Theta$. Statistic $Y = u_{1} \left( X_{1} , \cdots , X_{n} \right)$ is a sufficient statistic for $\theta$ if there exist two non-negative functions $k_{1} , k_{2} \ge 0$ that satisfy the following. $$ f \left( x_{1} ; \theta \right) \cdots f \left( x_{n} ; \theta \right) = k_{1} \left[ u_{1} \left( x_{1} , \cdots , x_{n} \right) ; \theta \right] k_{2} \left( x_{1} , \cdots , x_{n} \right) $$ Here, $k_{2}$ must not depend on $\theta$.
Proof
Definition of Sufficient Statistic: For a function $H \left( x_{1} , \cdots , x_{n} \right)$ that does not depend on $\theta \in \Theta$, $$ {{ f \left( x_{1} ; \theta \right) \cdots f \left( x_{n} ; \theta \right) } \over { f_{Y_{1}} \left( u_{1} \left( x_{1} , \cdots, x_{n} \right) ; \theta \right) }} = H \left( x_{1} , \cdots , x_{n} \right) $$ if that is true, then $Y_{1}$ is called a Sufficient Statistic for $\theta$.
We prove this only for continuous probability distributions. Refer to Casella for proofs on discrete probability distributions.
$(\Rightarrow)$
As per the definition of sufficient statistic, it is obvious since $f_{Y_{1}}$ corresponds to $k_{1}$, and $H$ to $f_{2}$.
$(\Leftarrow)$
$$ \begin{align*} y_{1} &:= u_{1} \left( x_{1} , \cdots , x_{n} \right) \\ y_{2} &:= u_{2} \left( x_{1} , \cdots , x_{n} \right) \\ &\vdots \\ y_{n} &:= u_{n} \left( x_{1} , \cdots , x_{n} \right) \end{align*} $$
Let’s denote the inverse functions of the above functions for convenience and represent the Jacobian as $J$.
$$ \begin{align*} x_{1} &:= w_{1} \left( y_{1} , \cdots , y_{n} \right) \\ x_{2} &:= w_{2} \left( y_{1} , \cdots , y_{n} \right) \\ &\vdots \\ x_{n} &:= w_{n} \left( y_{1} , \cdots , y_{n} \right) \end{align*} $$
Then, the joint probability density function $g$ of $Y_{1} , \cdots , Y_{n}$ for $w_{i} = w_{i} \left( y_{1} , \cdots , y_{n} \right)$ is $$ g \left( y_{1} , \cdots , y_{n} ; \theta \right) = k_{1} \left( y_{1} ; \theta \right) k_{2} \left( w_{1} , \cdots , w_{n} \right) \left| J \right| $$ and, the marginal probability density function $f_{Y_{1}}$ of $Y_{1}$ is $$ \begin{align*} f_{Y_{1}} \left( y_{1} ; \theta \right) =& \int_{-\infty}^{\infty} \cdots \int_{-\infty}^{\infty} g \left( y_{1} , \dots , y_{n} ; \theta \right) d y_{2} \cdots d y_{n} \\ =& k_{1} \left( y_{1} ; \theta \right) \int_{-\infty}^{\infty} \cdots \int_{-\infty}^{\infty} \left| J \right| k_{2} \left( w_{1} , \dots , w_{n} \right) d y_{2} \cdots d y_{n} \end{align*} $$ $k_{2}$, being a function that does not depend on $\theta$ and since $J$ also does not involve $\theta$, the right-hand integral can be expressed as a function solely of $y_{1}$, which we’ll temporarily denote as $m \left( y_{1} \right)$. $$ f_{Y_{1}} \left( y_{1} ; \theta \right) = k_{1} \left( y_{1} ; \theta \right) m \left( y_{1} \right) $$ Here, if $m \left( y_{1} \right) = 0$, it is trivially $f_{Y_{1}} \left( y_{1} ; \theta \right) = 0$. Now, assuming $m \left( y_{1} \right) > 0$, it can be written as follows. $$ k_{1} \left[ u_{1} \left( x_{1} , \cdots , x_{n} \right) ; \theta \right] = {{ f_{Y_{1}} \left[ u_{1} \left( x_{1} , \cdots , x_{n} \right) ; \theta \right] } \over { m \left[ u_{1} \left( x_{1} , \cdots , x_{n} \right) \right] }} $$ Substituting the given expression yields $$ \begin{align*} f \left( x_{1} ; \theta \right) \cdots f \left( x_{n} ; \theta \right) =& k_{1} \left[ u_{1} \left( x_{1} , \cdots , x_{n} \right) ; \theta \right] k_{2} \left( x_{1} , \cdots , x_{n} \right) \\ =& {{ f_{Y_{1}} \left[ u_{1} \left( x_{1} , \cdots , x_{n} \right) ; \theta \right] } \over { m \left[ u_{1} \left( x_{1} , \cdots , x_{n} \right) \right] }} k_{2} \left( x_{1} , \cdots , x_{n} \right) \\ =& f_{Y_{1}} \left[ u_{1} \left( x_{1} , \cdots , x_{n} \right) ; \theta \right] {{ k_{2} \left( x_{1} , \cdots , x_{n} \right) } \over { m \left[ u_{1} \left( x_{1} , \cdots , x_{n} \right) \right] }} \end{align*} $$ Since both $k_{2}$ and $m$ do not depend on $\theta$, by definition, $Y_{1}$ is a sufficient statistic for $\theta$.
■