logo

Sufficient Statistic 📂Mathematical Statistics

Sufficient Statistic

Definition 1

Let us define the set of parameters as Ω\Omega. The family {f(t;θ):θΘ}\left\{ f \left( t ; \theta \right) : \theta \in \Theta \right\} that collects all probability density functions or probability mass functions f(t;θ)f \left( t ; \theta \right) of the statistic T:=T(X)T := T \left( \mathbf{X} \right) from the sample X\mathbf{X}, θ,Eθg(T)=0    θ,Pθ(g(T)=0)=1 \forall \theta, E_{\theta} g (T) = 0 \implies \forall \theta, P_{\theta} \left( g(T) = 0 \right) = 1 is said to be complete if it satisfies, and T(X)T \left( \mathbf{X} \right) is called a complete statistic.

Explanation

When pronounced verbally, just as nobody uses the Korean term for ancillary statistics but rather says [ancillary statistic], the term complete statistic is commonly referred to by its English pronunciation [complete statistic].

Note

What needs attention is that being complete refers not to a specific probability distribution but to a family of distributions, that is, thinking about {f(t;θ):θΘ}\left\{ f \left( t ; \theta \right) : \theta \in \Theta \right\} for all θ\theta and then calling TT a complete statistic. To summarize more clearly:

  • What is complete is the family of distributions {f(t;θ):θΘ}\left\{ f \left( t ; \theta \right) : \theta \in \Theta \right\}.
  • The complete statistic itself is not that family of distributions but the statistic TT.

However, if you look at the summary, gg does not appear. The reason why the definition of a complete statistic feels like grasping at clouds is because it’s impossible to know what exactly gg and TT are. The phrase ‘for all θ\theta’ puts a period at the end.

Intuitively Easy Statistics

Let’s approach with a simple intuition. Putting aside for a moment the explanation that attempts to make the expected value g(T)g(T), let’s just consider the situation where the two expected values Eθg1(T)E_{\theta} g_{1} (T) and Eθg2(T)E_{\theta} g_{2} (T) are the same. If these are expressed in integral form, Rg1(t)f(t;θ)dt=Eθg1(T)=Eθg2(T)=Rg2(t)f(t;θ)dt \int_{\mathbb{R}} g_{1} \left( t \right) f \left( t ; \theta \right) d t = E_{\theta} g_{1} (T) = E_{\theta} g_{2} (T) = \int_{\mathbb{R}} g_{2} \left( t \right) f \left( t ; \theta \right) d t [ NOTE: In actual calculations, TT has meaning as the integration variable dtdt, and the essence of that statistic is incorporated into f(t;θ)f \left( t ; \theta \right). ] Since f(t;θ)f \left( t ; \theta \right) is a probability density function, it should be f(t;θ)0f \left( t ; \theta \right) \ne 0, and yet the above equation holds, which probably means g1g_{1} and g2g_{2} are the same. Of course, one could satisfy the equation by selecting a new g1g2g_{1} \ne g_{2} every time the form of the probability density function changes according to θ\theta, but if this is always true for all θ\theta, there wouldn’t be many options for g1g_{1} and g2g_{2}. Probably, in almost all tt, the following must be true: g1(t)=g2(t) g_{1} \left( t \right) = g_{2} \left( t \right) In other words, g1g_{1} and g2g_{2} are almost surely the same function, meaning the probability that they are the same missing only a few points is 100%100\% . To put it simply in terms of equations, Pθ(g1(T)=g2(T))=100% P_{\theta} \left( g_{1} \left( T \right) = g_{2} \left( T \right) \right) = 100 \% Of course, this is just a story when Eθg1(T)=Eθg2(T)E_{\theta} g_{1} (T) = E_{\theta} g_{2} (T), θΘ,Eθg1(T)=Eθg2(T)    θΘ,Pθ(g1(T)=g2(T))=100% \forall \theta \in \Theta, E_{\theta} g_{1} (T) = E_{\theta} g_{2} (T) \implies \forall \theta \in \Theta , P_{\theta} \left( g_{1} \left( T \right) = g_{2} \left( T \right) \right) = 100 \% Returning to gg and setting it as g:=g2g1g := g_{2} - g_{1}, we obtain the somewhat awkward definition of a complete statistic. The reason why this explanation starts abruptly with gg in the definition is that g1g_{1} and g2g_{2} are not necessarily required to express this concept. [ NOTE: Textbooks might also explain this in relation to unbiased estimators, which again, is not essential to the definition of a complete statistic. ] θ,Eθ[g2(T)g1(T)]=0    θ,Pθ([g2(T)g1(T)]=0)=100% \forall \theta, E_{\theta} \left[ g_{2} (T) - g_{1} (T) \right] = 0 \implies \forall \theta, P_{\theta} \left( \left[ g_{2} (T) - g_{1} (T) \right] = 0 \right) = 100 \%

If TT is not a complete statistic, one could speculate there is a significant problem with f(t;θ)f \left( t ; \theta \right) as easily mentioned above. A statistic TT that has a probability density function f(t;θ)f \left( t ; \theta \right) changing so incredibly and unpredictably for some θ\theta would be considered non-standard.

Throughout this discussion, it’s apparent that complete statistics have not shown any particularly good properties. Rather, they seem to be too obvious and commonsensical. However, if there is even one exception among the propositions for θ\theta, TT could be considered a statistic that is completely unprepared in any logical development of what seems to be obvious and commonsensical.

Theorem

To summarize the stuff about g,T,θg, T, \theta that might have been confusing in one line each:

  • gg: Honestly, since it’s going to be 00, there’s no need to worry anything beyond 00, just set it to g(t)g(t).
  • TT: It may seem to disappear from the formula at first glance, but in fact, it’s f(t;θ)f \left( t ; \theta \right) itself.
  • θ\forall \theta: It achieves completeness in the sense that not even a single exceptional θ\theta is allowed.

On the Universality of Completeness

It’s a bit of a complicated story. If you’re solely interested in statistics, you can skip this part. Even if you read and understand it, remember that this is not an academic consensus but the author’s personal opinion, so don’t take it too seriously.

In ordinary mathematics, unlike in mathematical statistics, completeness usually refers to the completeness defined in a metric space. Whether sequences converging to points are a significant matter in mathematics, and in places like measure spaces, completeness can be discussed without explicitly defining Cauchy sequences. p:x?X    q:xX p: x \overset{?}{\triangle} X \implies q: x \in X However, when talking about completeness, regardless of the granular definition or what relationship ?\overset{?}{\triangle} has, conditional statements’s qq involve something similar to xXx \in X. No matter what’s different or how abstracted it is, the fact that something of our interest belongs to the original set is the universal form of completeness.

For a long time, the author has asked many scholars about this and searched various documents2 3, including Wikipedia, but has concluded that there is no precise relation between the completeness of statistics and this universal form of completeness. If anything, in terms of not making exceptions for any θ\theta, we could only talk about the completeness statistic TT by crafting a statement like θΘ\theta \in \Theta encapsulating all θ\theta of our interest in the set of parameters.

The only thing that might be somewhat similar in terms of formula is the completeness of the sigma field discussed in measure theory. 4 This might be covered later if there’s a chance and if I remember.


  1. Casella. (2001). Statistical Inference(2nd Edition): p285. ↩︎

  2. https://stats.stackexchange.com/questions/53107/meaning-of-completeness-of-a-Statistic ↩︎

  3. https://mathworld.wolfram.com/CompleteOrthogonalSystem.html ↩︎

  4. Capinski. (1999). Measure, Integral and Probability: p40. ↩︎