Minimum Sufficient Statistic
Definition 1
Let $T \left( \mathbf{X} \right)$ be a sufficient statistic. If for every other sufficient statistic $T ' \left( \mathbf{X} \right)$, $T \left( \mathbf{x} \right)$ can be expressed as a function of $T ' \left( \mathbf{x} \right)$, then $T \left( \mathbf{X} \right)$ is called a Minimal Sufficient Statistic.
Theorem
Let $f \left( \mathbf{x} ; \theta \right)$ be the probability density function or probability mass function of a sample $\mathbf{X}$. If for every realization $\mathbf{x} , \mathbf{y}$, there exists a function $\mathbf{x}$ of $T$ that satisfies $$ {{ f \left( \mathbf{x} ; \theta \right) } \over { f \left( \mathbf{y} ; \theta \right) }} = c (\theta) \text{ is constant as function a of }\theta \iff T (\mathbf{x}) = T (\mathbf{y}) $$, then $T \left( \mathbf{X} \right)$ is a minimal sufficient statistic for $\theta$.
Explanation
The statement in the definition that $T$ is expressed as something of all other $T '$ is a very mathematical statement. It means that $T '$, likewise a sufficient statistic, not being expressible by some $T$ indicates that $T '$ is somewhat lacking compared to $T$, and the fact that it must hold for ‘all’ $T '$ justifies the use of ‘minimal’.
Example
(Continuing from the tangible example of a sufficient statistic)
$$ X_{1} , \cdots , X_{n} \sim U (0,\theta) \text{ with } f \left( x ; \theta \right) = \begin{cases} 1 & , \text{if } x \in (0,\theta) \\ 0 & , \text{otherwise} \end{cases} = {{ 1 } \over { \theta }} I_{(0,\theta)} (x) $$
Consider a random sample obtained from a uniform distribution with maximum parameter $\theta$. The maximum value of the sample, $\max_{k} X_{k} = X_{(n)}$, could be a sufficient statistic for $\theta$. Looking at it intuitively, there doesn’t seem to be a better sufficient statistic, but let’s check according to the theorem above.
Product of Indicator Functions: $$ \prod_{i=1}^{n} I_{(-\infty, \theta]} \left( x_{i} \right) = I_{(-\infty, \theta]} \left( \max_{i \in [n]} x_{i} \right) $$
$$ \begin{align*} \mathbf{x} :=& \left( x_{1} , \cdots , x_{n} \right) \\ \mathbf{y} :=& \left( y_{1} , \cdots , y_{n} \right) \end{align*} $$ For two data points $\mathbf{x} , \mathbf{y}$, the ratio of the joint probability density functions is according to the product of indicator functions $$ \begin{align*} {{ f \left( \mathbf{x} ; \theta \right) } \over { f \left( \mathbf{y} ; \theta \right) }} =& {{ \prod_{k = 1}^{n} \theta^{-1} I_{(0,\theta)} \left( x_{k} \right) } \over { \prod_{k = 1}^{n} \theta^{-1} I_{(0,\theta)} \left( y_{k} \right) }} \\ =& {{ \theta^{-n} I_{(0,\theta)} \left( \max_{k} x_{k} \right) } \over { \theta^{-n} I_{(0,\theta)} \left( \max_{k} y_{k} \right) }} \\ =& {{ I_{(0,\theta)} \left( \max_{k} x_{k} \right) } \over { I_{(0,\theta)} \left( \max_{k} y_{k} \right) }} \end{align*} $$ and this is always the same value regardless of $\theta$ if $\max_{k} x_{k} = \max_{k} y_{k}$, and only in that case. Therefore, $\max_{k} x_{k}$ can not only be a sufficient statistic for $\theta$ but is guaranteed to be minimal.
Proof
For convenience, assume that we do not need to consider cases where $f \left( \mathbf{x} ; \theta \right) = 0$. From now on, let the range of $\mathbf{X}$ be $\mathcal{X}$, and for all $\mathbf{x} \in \mathcal{X}$ and $\theta$, $f \left( \mathbf{x} ; \theta \right) \gneq 0$ holds.
Part 1. Sufficiency
Represent the image of $\mathcal{X}$ for $T$ as $\mathcal{T} := T \left( \mathcal{X} \right)$, and for every $t > 0$, define a subset $$ A_{t} := \left\{ \mathbf{x} \in \mathcal{X} : T (\mathbf{x}) = t \right\} \subset \mathcal{T} $$ of the set $\mathcal{T}$, thereby thinking of a partition of set $\mathcal{T}$. For each $t$, pick one element $\mathbf{x}_{t} \in A_{t}$, and if $\mathbf{x}_{T(\mathbf{x})}$ is represented as $\mathbf{x}_{t}$ corresponding to each $\forall \mathbf{x} \in \mathcal{X}$, then $\mathbf{x} \in A_{t} \iff \mathbf{x}_{t} \in A_{t}$, so $T \left( \mathbf{x} \right) = T \left( \mathbf{x}_{T(\mathbf{x})} \right)$. From the assumption, $$ {{ f \left( \mathbf{x} ; \theta \right) } \over { f \left( \mathbf{y} ; \theta \right) }} = c (\theta) \text{ is constant as function a of }\theta \impliedby T (\mathbf{x}) = T (\mathbf{y}) $$ thus, $$ h \left( \mathbf{x} \right) := {{ f \left( \mathbf{x} ; \theta \right) } \over { f \left( \mathbf{x}_{T(\mathbf{x})} ; \theta \right) }} $$ the function $h : \mathcal{X} \to \mathbb{R}$ defined as above is a constant function regarding $\theta$. Therefore, defining the function $g : \mathcal{T} \to \mathbb{R}$ as $g(t ; \theta) := f \left( \mathbf{x}_{t} ; \theta \right)$, the following holds: $$ \begin{align*} & f \left( \mathbf{x} ; \theta \right) \\ =& f \left( \mathbf{x}_{T \left( \mathbf{x} \right)} ; \theta \right) {{ f \left( \mathbf{x} ; \theta \right) } \over { f \left( \mathbf{x}_{T \left( \mathbf{x} \right)} ; \theta \right) }} \\ =& g \left( T \left( \mathbf{x} \right) ; \theta \right) \cdot h \left( \mathbf{x} \right) \end{align*} $$
Neyman Factorization Theorem: Let a random sample $X_{1} , \cdots , X_{n}$ have the same probability mass/density function $f \left( x ; \theta \right)$ for parameter $\theta \in \Theta$. A statistic $Y = u_{1} \left( X_{1} , \cdots , X_{n} \right)$ being a sufficient statistic for $\theta$ means that there exist two non-negative functions $k_{1} , k_{2} \ge 0$ that satisfy: $$ f \left( x_{1} ; \theta \right) \cdots f \left( x_{n} ; \theta \right) = k_{1} \left[ u_{1} \left( x_{1} , \cdots , x_{n} \right) ; \theta \right] k_{2} \left( x_{1} , \cdots , x_{n} \right) $$ Note, $k_{2}$ must not be dependent on $\theta$.
According to the factorization theorem, $T$ is a sufficient statistic for $\theta$.
Part 2. Minimality
Considering a sufficient statistic $T'$ other than $T$, according to the factorization theorem, there exist two functions $g'$ and $h '$ that satisfy: $$ f \left( \mathbf{x} ; \theta \right) = g' \left( T ' \left( \mathbf{x} \right) ; \theta \right) \cdot h’ \left( \mathbf{x} \right) $$ For any two realizations $\mathbf{x}$ and $\mathbf{y}$ satifying $T ' \left( \mathbf{x} \right) = T ' \left( \mathbf{y} \right)$, $$ {{ f \left( \mathbf{x} ; \theta \right) } \over { f \left( \mathbf{y} ; \theta \right) }} = {{ g' \left( T’ (\mathbf{x}) ; \theta \right) h’ \left( \mathbf{x} \right) } \over { g' \left( T’ (\mathbf{y}) ; \theta \right) h’ \left( \mathbf{x} \right) }} = {{ h’ (\mathbf{x}) } \over { h’ (\mathbf{y}) }} $$ is not dependent on $\theta$, hence by assumption $$ {{ f \left( \mathbf{x} ; \theta \right) } \over { f \left( \mathbf{y} ; \theta \right) }} = c (\theta) \text{ is constant as function a of }\theta \implies T (\mathbf{x}) = T (\mathbf{y}) $$, $T (\mathbf{x}) = T (\mathbf{y})$ holds. That is, $$ T’ (\mathbf{x}) = T’ (\mathbf{y}) \implies T (\mathbf{x}) = T (\mathbf{y}) $$ and for some function $\lambda$, $$ T (\mathbf{x}) = \lambda \left( T’ (\mathbf{x}) \right) $$ Therefore, $T$ is a function of $T'$, and $T$ is a minimal sufficient statistic.
■
Casella. (2001). Statistical Inference(2nd Edition): p280. ↩︎