Minimum Sufficient Statistic
📂Mathematical StatisticsMinimum Sufficient Statistic
Definition
Let T(X) be a sufficient statistic. If for every other sufficient statistic T′(X), T(x) can be expressed as a function of T′(x), then T(X) is called a Minimal Sufficient Statistic.
Theorem
Let f(x;θ) be the probability density function or probability mass function of a sample X. If for every realization x,y, there exists a function x of T that satisfies
f(y;θ)f(x;θ)=c(θ) is constant as function a of θ⟺T(x)=T(y),
then T(X) is a minimal sufficient statistic for θ.
Explanation
The statement in the definition that T is expressed as something of all other T′ is a very mathematical statement. It means that T′, likewise a sufficient statistic, not being expressible by some T indicates that T′ is somewhat lacking compared to T, and the fact that it must hold for ‘all’ T′ justifies the use of ‘minimal’.
Example
(Continuing from the tangible example of a sufficient statistic)
X1,⋯,Xn∼U(0,θ) with f(x;θ)={10,if x∈(0,θ),otherwise=θ1I(0,θ)(x)
Consider a random sample obtained from a uniform distribution with maximum parameter θ. The maximum value of the sample, maxkXk=X(n), could be a sufficient statistic for θ. Looking at it intuitively, there doesn’t seem to be a better sufficient statistic, but let’s check according to the theorem above.
Product of Indicator Functions: i=1∏nI(−∞,θ](xi)=I(−∞,θ](i∈[n]maxxi)
x:=y:=(x1,⋯,xn)(y1,⋯,yn)
For two data points x,y, the ratio of the joint probability density functions is according to the product of indicator functions
f(y;θ)f(x;θ)===∏k=1nθ−1I(0,θ)(yk)∏k=1nθ−1I(0,θ)(xk)θ−nI(0,θ)(maxkyk)θ−nI(0,θ)(maxkxk)I(0,θ)(maxkyk)I(0,θ)(maxkxk)
and this is always the same value regardless of θ if maxkxk=maxkyk, and only in that case. Therefore, maxkxk can not only be a sufficient statistic for θ but is guaranteed to be minimal.
Proof
For convenience, assume that we do not need to consider cases where f(x;θ)=0. From now on, let the range of X be X, and for all x∈X and θ, f(x;θ)⪈0 holds.
Part 1. Sufficiency
Represent the image of X for T as T:=T(X), and for every t>0, define a subset
At:={x∈X:T(x)=t}⊂T
of the set T, thereby thinking of a partition of set T. For each t, pick one element xt∈At, and if xT(x) is represented as xt corresponding to each ∀x∈X, then x∈At⟺xt∈At, so T(x)=T(xT(x)). From the assumption,
f(y;θ)f(x;θ)=c(θ) is constant as function a of θ⟸T(x)=T(y)
thus,
h(x):=f(xT(x);θ)f(x;θ)
the function h:X→R defined as above is a constant function regarding θ. Therefore, defining the function g:T→R as g(t;θ):=f(xt;θ), the following holds:
==f(x;θ)f(xT(x);θ)f(xT(x);θ)f(x;θ)g(T(x);θ)⋅h(x)
Neyman Factorization Theorem: Let a random sample X1,⋯,Xn have the same probability mass/density function f(x;θ) for parameter θ∈Θ. A statistic Y=u1(X1,⋯,Xn) being a sufficient statistic for θ means that there exist two non-negative functions k1,k2≥0 that satisfy:
f(x1;θ)⋯f(xn;θ)=k1[u1(x1,⋯,xn);θ]k2(x1,⋯,xn)
Note, k2 must not be dependent on θ.
According to the factorization theorem, T is a sufficient statistic for θ.
Part 2. Minimality
Considering a sufficient statistic T′ other than T, according to the factorization theorem, there exist two functions g′ and h′ that satisfy:
f(x;θ)=g′(T′(x);θ)⋅h’(x)
For any two realizations x and y satifying T′(x)=T′(y),
f(y;θ)f(x;θ)=g′(T’(y);θ)h’(x)g′(T’(x);θ)h’(x)=h’(y)h’(x)
is not dependent on θ, hence by assumption
f(y;θ)f(x;θ)=c(θ) is constant as function a of θ⟹T(x)=T(y),
T(x)=T(y) holds. That is,
T’(x)=T’(y)⟹T(x)=T(y)
and for some function λ,
T(x)=λ(T’(x))
Therefore, T is a function of T′, and T is a minimal sufficient statistic.
■