logo

Minimum Sufficient Statistic 📂Mathematical Statistics

Minimum Sufficient Statistic

Definition 1

Let T(X)T \left( \mathbf{X} \right) be a sufficient statistic. If for every other sufficient statistic T(X)T ' \left( \mathbf{X} \right), T(x)T \left( \mathbf{x} \right) can be expressed as a function of T(x)T ' \left( \mathbf{x} \right), then T(X)T \left( \mathbf{X} \right) is called a Minimal Sufficient Statistic.

Theorem

Let f(x;θ)f \left( \mathbf{x} ; \theta \right) be the probability density function or probability mass function of a sample X\mathbf{X}. If for every realization x,y\mathbf{x} , \mathbf{y}, there exists a function x\mathbf{x} of TT that satisfies f(x;θ)f(y;θ)=c(θ) is constant as function a of θ    T(x)=T(y) {{ f \left( \mathbf{x} ; \theta \right) } \over { f \left( \mathbf{y} ; \theta \right) }} = c (\theta) \text{ is constant as function a of }\theta \iff T (\mathbf{x}) = T (\mathbf{y}) , then T(X)T \left( \mathbf{X} \right) is a minimal sufficient statistic for θ\theta.

Explanation

The statement in the definition that TT is expressed as something of all other TT ' is a very mathematical statement. It means that TT ', likewise a sufficient statistic, not being expressible by some TT indicates that TT ' is somewhat lacking compared to TT, and the fact that it must hold for ‘all’ TT ' justifies the use of ‘minimal’.

Example

(Continuing from the tangible example of a sufficient statistic)

X1,,XnU(0,θ) with f(x;θ)={1,if x(0,θ)0,otherwise=1θI(0,θ)(x) X_{1} , \cdots , X_{n} \sim U (0,\theta) \text{ with } f \left( x ; \theta \right) = \begin{cases} 1 & , \text{if } x \in (0,\theta) \\ 0 & , \text{otherwise} \end{cases} = {{ 1 } \over { \theta }} I_{(0,\theta)} (x)

Consider a random sample obtained from a uniform distribution with maximum parameter θ\theta. The maximum value of the sample, maxkXk=X(n)\max_{k} X_{k} = X_{(n)}, could be a sufficient statistic for θ\theta. Looking at it intuitively, there doesn’t seem to be a better sufficient statistic, but let’s check according to the theorem above.

Product of Indicator Functions: i=1nI(,θ](xi)=I(,θ](maxi[n]xi) \prod_{i=1}^{n} I_{(-\infty, \theta]} \left( x_{i} \right) = I_{(-\infty, \theta]} \left( \max_{i \in [n]} x_{i} \right)

x:=(x1,,xn)y:=(y1,,yn) \begin{align*} \mathbf{x} :=& \left( x_{1} , \cdots , x_{n} \right) \\ \mathbf{y} :=& \left( y_{1} , \cdots , y_{n} \right) \end{align*} For two data points x,y\mathbf{x} , \mathbf{y}, the ratio of the joint probability density functions is according to the product of indicator functions f(x;θ)f(y;θ)=k=1nθ1I(0,θ)(xk)k=1nθ1I(0,θ)(yk)=θnI(0,θ)(maxkxk)θnI(0,θ)(maxkyk)=I(0,θ)(maxkxk)I(0,θ)(maxkyk) \begin{align*} {{ f \left( \mathbf{x} ; \theta \right) } \over { f \left( \mathbf{y} ; \theta \right) }} =& {{ \prod_{k = 1}^{n} \theta^{-1} I_{(0,\theta)} \left( x_{k} \right) } \over { \prod_{k = 1}^{n} \theta^{-1} I_{(0,\theta)} \left( y_{k} \right) }} \\ =& {{ \theta^{-n} I_{(0,\theta)} \left( \max_{k} x_{k} \right) } \over { \theta^{-n} I_{(0,\theta)} \left( \max_{k} y_{k} \right) }} \\ =& {{ I_{(0,\theta)} \left( \max_{k} x_{k} \right) } \over { I_{(0,\theta)} \left( \max_{k} y_{k} \right) }} \end{align*} and this is always the same value regardless of θ\theta if maxkxk=maxkyk\max_{k} x_{k} = \max_{k} y_{k}, and only in that case. Therefore, maxkxk\max_{k} x_{k} can not only be a sufficient statistic for θ\theta but is guaranteed to be minimal.

Proof

For convenience, assume that we do not need to consider cases where f(x;θ)=0f \left( \mathbf{x} ; \theta \right) = 0. From now on, let the range of X\mathbf{X} be X\mathcal{X}, and for all xX\mathbf{x} \in \mathcal{X} and θ\theta, f(x;θ)0f \left( \mathbf{x} ; \theta \right) \gneq 0 holds.

Part 1. Sufficiency

Represent the image of X\mathcal{X} for TT as T:=T(X)\mathcal{T} := T \left( \mathcal{X} \right), and for every t>0t > 0, define a subset At:={xX:T(x)=t}T A_{t} := \left\{ \mathbf{x} \in \mathcal{X} : T (\mathbf{x}) = t \right\} \subset \mathcal{T} of the set T\mathcal{T}, thereby thinking of a partition of set T\mathcal{T}. For each tt, pick one element xtAt\mathbf{x}_{t} \in A_{t}, and if xT(x)\mathbf{x}_{T(\mathbf{x})} is represented as xt\mathbf{x}_{t} corresponding to each xX\forall \mathbf{x} \in \mathcal{X}, then xAt    xtAt\mathbf{x} \in A_{t} \iff \mathbf{x}_{t} \in A_{t}, so T(x)=T(xT(x))T \left( \mathbf{x} \right) = T \left( \mathbf{x}_{T(\mathbf{x})} \right). From the assumption, f(x;θ)f(y;θ)=c(θ) is constant as function a of θ    T(x)=T(y) {{ f \left( \mathbf{x} ; \theta \right) } \over { f \left( \mathbf{y} ; \theta \right) }} = c (\theta) \text{ is constant as function a of }\theta \impliedby T (\mathbf{x}) = T (\mathbf{y}) thus, h(x):=f(x;θ)f(xT(x);θ) h \left( \mathbf{x} \right) := {{ f \left( \mathbf{x} ; \theta \right) } \over { f \left( \mathbf{x}_{T(\mathbf{x})} ; \theta \right) }} the function h:XRh : \mathcal{X} \to \mathbb{R} defined as above is a constant function regarding θ\theta. Therefore, defining the function g:TRg : \mathcal{T} \to \mathbb{R} as g(t;θ):=f(xt;θ)g(t ; \theta) := f \left( \mathbf{x}_{t} ; \theta \right), the following holds: f(x;θ)=f(xT(x);θ)f(x;θ)f(xT(x);θ)=g(T(x);θ)h(x) \begin{align*} & f \left( \mathbf{x} ; \theta \right) \\ =& f \left( \mathbf{x}_{T \left( \mathbf{x} \right)} ; \theta \right) {{ f \left( \mathbf{x} ; \theta \right) } \over { f \left( \mathbf{x}_{T \left( \mathbf{x} \right)} ; \theta \right) }} \\ =& g \left( T \left( \mathbf{x} \right) ; \theta \right) \cdot h \left( \mathbf{x} \right) \end{align*}

Neyman Factorization Theorem: Let a random sample X1,,XnX_{1} , \cdots , X_{n} have the same probability mass/density function f(x;θ)f \left( x ; \theta \right) for parameter θΘ\theta \in \Theta. A statistic Y=u1(X1,,Xn)Y = u_{1} \left( X_{1} , \cdots , X_{n} \right) being a sufficient statistic for θ\theta means that there exist two non-negative functions k1,k20k_{1} , k_{2} \ge 0 that satisfy: f(x1;θ)f(xn;θ)=k1[u1(x1,,xn);θ]k2(x1,,xn) f \left( x_{1} ; \theta \right) \cdots f \left( x_{n} ; \theta \right) = k_{1} \left[ u_{1} \left( x_{1} , \cdots , x_{n} \right) ; \theta \right] k_{2} \left( x_{1} , \cdots , x_{n} \right) Note, k2k_{2} must not be dependent on θ\theta.

According to the factorization theorem, TT is a sufficient statistic for θ\theta.


Part 2. Minimality

Considering a sufficient statistic TT' other than TT, according to the factorization theorem, there exist two functions gg' and hh ' that satisfy: f(x;θ)=g(T(x);θ)h(x) f \left( \mathbf{x} ; \theta \right) = g' \left( T ' \left( \mathbf{x} \right) ; \theta \right) \cdot h’ \left( \mathbf{x} \right) For any two realizations x\mathbf{x} and y\mathbf{y} satifying T(x)=T(y)T ' \left( \mathbf{x} \right) = T ' \left( \mathbf{y} \right), f(x;θ)f(y;θ)=g(T(x);θ)h(x)g(T(y);θ)h(x)=h(x)h(y) {{ f \left( \mathbf{x} ; \theta \right) } \over { f \left( \mathbf{y} ; \theta \right) }} = {{ g' \left( T’ (\mathbf{x}) ; \theta \right) h’ \left( \mathbf{x} \right) } \over { g' \left( T’ (\mathbf{y}) ; \theta \right) h’ \left( \mathbf{x} \right) }} = {{ h’ (\mathbf{x}) } \over { h’ (\mathbf{y}) }} is not dependent on θ\theta, hence by assumption f(x;θ)f(y;θ)=c(θ) is constant as function a of θ    T(x)=T(y) {{ f \left( \mathbf{x} ; \theta \right) } \over { f \left( \mathbf{y} ; \theta \right) }} = c (\theta) \text{ is constant as function a of }\theta \implies T (\mathbf{x}) = T (\mathbf{y}) , T(x)=T(y)T (\mathbf{x}) = T (\mathbf{y}) holds. That is, T(x)=T(y)    T(x)=T(y) T’ (\mathbf{x}) = T’ (\mathbf{y}) \implies T (\mathbf{x}) = T (\mathbf{y}) and for some function λ\lambda, T(x)=λ(T(x)) T (\mathbf{x}) = \lambda \left( T’ (\mathbf{x}) \right) Therefore, TT is a function of TT', and TT is a minimal sufficient statistic.


  1. Casella. (2001). Statistical Inference(2nd Edition): p280. ↩︎