Minimum Sufficient Statistic 📂Mathematical Statistics

Minimum Sufficient Statistic

Definition ¹

Let $T \left( \mathbf{X} \right)$ be a sufficient statistic. If for every other sufficient statistic $T ' \left( \mathbf{X} \right)$ , $T \left( \mathbf{x} \right)$ can be expressed as a function of $T ' \left( \mathbf{x} \right)$ , then $T \left( \mathbf{X} \right)$ is called a Minimal Sufficient Statistic.

Theorem

Let $f \left( \mathbf{x} ; \theta \right)$ be the probability density function or probability mass function of a sample $\mathbf{X}$ . If for every realization $\mathbf{x} , \mathbf{y}$ , there exists a function $\mathbf{x}$ of $T$ that satisfies ${{ f \left( \mathbf{x} ; \theta \right) } \over { f \left( \mathbf{y} ; \theta \right) }} = c (\theta) \text{ is constant as function a of }\theta \iff T (\mathbf{x}) = T (\mathbf{y})$ , then $T \left( \mathbf{X} \right)$ is a minimal sufficient statistic for $\theta$ .

Explanation

The statement in the definition that $T$ is expressed as something of all other $T '$ is a very mathematical statement. It means that $T '$ , likewise a sufficient statistic, not being expressible by some $T$ indicates that $T '$ is somewhat lacking compared to $T$ , and the fact that it must hold for ‘all’ $T '$ justifies the use of ‘minimal’.

Example

(Continuing from the tangible example of a sufficient statistic)

$X_{1} , \cdots , X_{n} \sim U (0,\theta) \text{ with } f \left( x ; \theta \right) = \begin{cases} 1 & , \text{if } x \in (0,\theta) \\ 0 & , \text{otherwise} \end{cases} = {{ 1 } \over { \theta }} I_{(0,\theta)} (x)$

Consider a random sample obtained from a uniform distribution with maximum parameter $\theta$ . The maximum value of the sample, $\max_{k} X_{k} = X_{(n)}$ , could be a sufficient statistic for $\theta$ . Looking at it intuitively, there doesn’t seem to be a better sufficient statistic, but let’s check according to the theorem above.

Product of Indicator Functions: $\prod_{i=1}^{n} I_{(-\infty, \theta]} \left( x_{i} \right) = I_{(-\infty, \theta]} \left( \max_{i \in [n]} x_{i} \right)$

$\begin{align*} \mathbf{x} :=& \left( x_{1} , \cdots , x_{n} \right) \\ \mathbf{y} :=& \left( y_{1} , \cdots , y_{n} \right) \end{align*}$ For two data points $\mathbf{x} , \mathbf{y}$ , the ratio of the joint probability density functions is according to the product of indicator functions $\begin{align*} {{ f \left( \mathbf{x} ; \theta \right) } \over { f \left( \mathbf{y} ; \theta \right) }} =& {{ \prod_{k = 1}^{n} \theta^{-1} I_{(0,\theta)} \left( x_{k} \right) } \over { \prod_{k = 1}^{n} \theta^{-1} I_{(0,\theta)} \left( y_{k} \right) }} \\ =& {{ \theta^{-n} I_{(0,\theta)} \left( \max_{k} x_{k} \right) } \over { \theta^{-n} I_{(0,\theta)} \left( \max_{k} y_{k} \right) }} \\ =& {{ I_{(0,\theta)} \left( \max_{k} x_{k} \right) } \over { I_{(0,\theta)} \left( \max_{k} y_{k} \right) }} \end{align*}$ and this is always the same value regardless of $\theta$ if $\max_{k} x_{k} = \max_{k} y_{k}$ , and only in that case. Therefore, $\max_{k} x_{k}$ can not only be a sufficient statistic for $\theta$ but is guaranteed to be minimal.

Proof

For convenience, assume that we do not need to consider cases where $f \left( \mathbf{x} ; \theta \right) = 0$ . From now on, let the range of $\mathbf{X}$ be $\mathcal{X}$ , and for all $\mathbf{x} \in \mathcal{X}$ and $\theta$ , $f \left( \mathbf{x} ; \theta \right) \gneq 0$ holds.

Part 1. Sufficiency

Represent the image of $\mathcal{X}$ for $T$ as $\mathcal{T} := T \left( \mathcal{X} \right)$ , and for every $t > 0$ , define a subset $A_{t} := \left\{ \mathbf{x} \in \mathcal{X} : T (\mathbf{x}) = t \right\} \subset \mathcal{T}$ of the set $\mathcal{T}$ , thereby thinking of a partition of set $\mathcal{T}$ . For each $t$ , pick one element $\mathbf{x}_{t} \in A_{t}$ , and if $\mathbf{x}_{T(\mathbf{x})}$ is represented as $\mathbf{x}_{t}$ corresponding to each $\forall \mathbf{x} \in \mathcal{X}$ , then $\mathbf{x} \in A_{t} \iff \mathbf{x}_{t} \in A_{t}$ , so $T \left( \mathbf{x} \right) = T \left( \mathbf{x}_{T(\mathbf{x})} \right)$ . From the assumption, ${{ f \left( \mathbf{x} ; \theta \right) } \over { f \left( \mathbf{y} ; \theta \right) }} = c (\theta) \text{ is constant as function a of }\theta \impliedby T (\mathbf{x}) = T (\mathbf{y})$ thus, $h \left( \mathbf{x} \right) := {{ f \left( \mathbf{x} ; \theta \right) } \over { f \left( \mathbf{x}_{T(\mathbf{x})} ; \theta \right) }}$ the function $h : \mathcal{X} \to \mathbb{R}$ defined as above is a constant function regarding $\theta$ . Therefore, defining the function $g : \mathcal{T} \to \mathbb{R}$ as $g(t ; \theta) := f \left( \mathbf{x}_{t} ; \theta \right)$ , the following holds: $\begin{align*} & f \left( \mathbf{x} ; \theta \right) \\ =& f \left( \mathbf{x}_{T \left( \mathbf{x} \right)} ; \theta \right) {{ f \left( \mathbf{x} ; \theta \right) } \over { f \left( \mathbf{x}_{T \left( \mathbf{x} \right)} ; \theta \right) }} \\ =& g \left( T \left( \mathbf{x} \right) ; \theta \right) \cdot h \left( \mathbf{x} \right) \end{align*}$

Neyman Factorization Theorem: Let a random sample $X_{1} , \cdots , X_{n}$ have the same probability mass/density function $f \left( x ; \theta \right)$ for parameter $\theta \in \Theta$ . A statistic $Y = u_{1} \left( X_{1} , \cdots , X_{n} \right)$ being a sufficient statistic for $\theta$ means that there exist two non-negative functions $k_{1} , k_{2} \ge 0$ that satisfy: $f \left( x_{1} ; \theta \right) \cdots f \left( x_{n} ; \theta \right) = k_{1} \left[ u_{1} \left( x_{1} , \cdots , x_{n} \right) ; \theta \right] k_{2} \left( x_{1} , \cdots , x_{n} \right)$ Note, $k_{2}$ must not be dependent on $\theta$ .

According to the factorization theorem, $T$ is a sufficient statistic for $\theta$ .

Part 2. Minimality

Considering a sufficient statistic $T'$ other than $T$ , according to the factorization theorem, there exist two functions $g'$ and $h '$ that satisfy: $f \left( \mathbf{x} ; \theta \right) = g' \left( T ' \left( \mathbf{x} \right) ; \theta \right) \cdot h’ \left( \mathbf{x} \right)$ For any two realizations $\mathbf{x}$ and $\mathbf{y}$ satifying $T ' \left( \mathbf{x} \right) = T ' \left( \mathbf{y} \right)$ , ${{ f \left( \mathbf{x} ; \theta \right) } \over { f \left( \mathbf{y} ; \theta \right) }} = {{ g' \left( T’ (\mathbf{x}) ; \theta \right) h’ \left( \mathbf{x} \right) } \over { g' \left( T’ (\mathbf{y}) ; \theta \right) h’ \left( \mathbf{x} \right) }} = {{ h’ (\mathbf{x}) } \over { h’ (\mathbf{y}) }}$ is not dependent on $\theta$ , hence by assumption ${{ f \left( \mathbf{x} ; \theta \right) } \over { f \left( \mathbf{y} ; \theta \right) }} = c (\theta) \text{ is constant as function a of }\theta \implies T (\mathbf{x}) = T (\mathbf{y})$ , $T (\mathbf{x}) = T (\mathbf{y})$ holds. That is, $T’ (\mathbf{x}) = T’ (\mathbf{y}) \implies T (\mathbf{x}) = T (\mathbf{y})$ and for some function $\lambda$ , $T (\mathbf{x}) = \lambda \left( T’ (\mathbf{x}) \right)$ Therefore, $T$ is a function of $T'$ , and $T$ is a minimal sufficient statistic.

■

Casella. (2001). Statistical Inference(2nd Edition): p280. ↩︎