Sufficient Statistic 📂Mathematical Statistics

Sufficient Statistic

Definition ¹

Let us define the set of parameters as $\Omega$. The family $\left\{ f \left( t ; \theta \right) : \theta \in \Theta \right\}$ that collects all probability density functions or probability mass functions $f \left( t ; \theta \right)$ of the statistic $T := T \left( \mathbf{X} \right)$ from the sample $\mathbf{X}$, $$ \forall \theta, E_{\theta} g (T) = 0 \implies \forall \theta, P_{\theta} \left( g(T) = 0 \right) = 1 $$ is said to be complete if it satisfies, and $T \left( \mathbf{X} \right)$ is called a complete statistic.

Explanation

When pronounced verbally, just as nobody uses the Korean term for ancillary statistics but rather says [ancillary statistic], the term complete statistic is commonly referred to by its English pronunciation [complete statistic].

Note

What needs attention is that being complete refers not to a specific probability distribution but to a family of distributions, that is, thinking about $\left\{ f \left( t ; \theta \right) : \theta \in \Theta \right\}$ for all $\theta$ and then calling $T$ a complete statistic. To summarize more clearly:

What is complete is the family of distributions $\left\{ f \left( t ; \theta \right) : \theta \in \Theta \right\}$.
The complete statistic itself is not that family of distributions but the statistic $T$.

However, if you look at the summary, $g$ does not appear. The reason why the definition of a complete statistic feels like grasping at clouds is because it’s impossible to know what exactly $g$ and $T$ are. The phrase ‘for all $\theta$’ puts a period at the end.

Intuitively Easy Statistics

Let’s approach with a simple intuition. Putting aside for a moment the explanation that attempts to make the expected value $g(T)$, let’s just consider the situation where the two expected values $E_{\theta} g_{1} (T)$ and $E_{\theta} g_{2} (T)$ are the same. If these are expressed in integral form, $$ \int_{\mathbb{R}} g_{1} \left( t \right) f \left( t ; \theta \right) d t = E_{\theta} g_{1} (T) = E_{\theta} g_{2} (T) = \int_{\mathbb{R}} g_{2} \left( t \right) f \left( t ; \theta \right) d t $$ [ NOTE: In actual calculations, $T$ has meaning as the integration variable $dt$, and the essence of that statistic is incorporated into $f \left( t ; \theta \right)$. ] Since $f \left( t ; \theta \right)$ is a probability density function, it should be $f \left( t ; \theta \right) \ne 0$, and yet the above equation holds, which probably means $g_{1}$ and $g_{2}$ are the same. Of course, one could satisfy the equation by selecting a new $g_{1} \ne g_{2}$ every time the form of the probability density function changes according to $\theta$, but if this is always true for all $\theta$, there wouldn’t be many options for $g_{1}$ and $g_{2}$. Probably, in almost all $t$, the following must be true: $$ g_{1} \left( t \right) = g_{2} \left( t \right) $$ In other words, $g_{1}$ and $g_{2}$ are almost surely the same function, meaning the probability that they are the same missing only a few points is $100\% $. To put it simply in terms of equations, $$ P_{\theta} \left( g_{1} \left( T \right) = g_{2} \left( T \right) \right) = 100 \% $$ Of course, this is just a story when $E_{\theta} g_{1} (T) = E_{\theta} g_{2} (T)$, $$ \forall \theta \in \Theta, E_{\theta} g_{1} (T) = E_{\theta} g_{2} (T) \implies \forall \theta \in \Theta , P_{\theta} \left( g_{1} \left( T \right) = g_{2} \left( T \right) \right) = 100 \% $$ Returning to $g$ and setting it as $g := g_{2} - g_{1}$, we obtain the somewhat awkward definition of a complete statistic. The reason why this explanation starts abruptly with $g$ in the definition is that $g_{1}$ and $g_{2}$ are not necessarily required to express this concept. [ NOTE: Textbooks might also explain this in relation to unbiased estimators, which again, is not essential to the definition of a complete statistic. ] $$ \forall \theta, E_{\theta} \left[ g_{2} (T) - g_{1} (T) \right] = 0 \implies \forall \theta, P_{\theta} \left( \left[ g_{2} (T) - g_{1} (T) \right] = 0 \right) = 100 \% $$

If $T$ is not a complete statistic, one could speculate there is a significant problem with $f \left( t ; \theta \right)$ as easily mentioned above. A statistic $T$ that has a probability density function $f \left( t ; \theta \right)$ changing so incredibly and unpredictably for some $\theta$ would be considered non-standard.

Throughout this discussion, it’s apparent that complete statistics have not shown any particularly good properties. Rather, they seem to be too obvious and commonsensical. However, if there is even one exception among the propositions for $\theta$, $T$ could be considered a statistic that is completely unprepared in any logical development of what seems to be obvious and commonsensical.

Theorem

To summarize the stuff about $g, T, \theta$ that might have been confusing in one line each:

$g$: Honestly, since it’s going to be $0$, there’s no need to worry anything beyond $0$, just set it to $g(t)$.
$T$: It may seem to disappear from the formula at first glance, but in fact, it’s $f \left( t ; \theta \right)$ itself.
$\forall \theta$: It achieves completeness in the sense that not even a single exceptional $\theta$ is allowed.

On the Universality of Completeness

It’s a bit of a complicated story. If you’re solely interested in statistics, you can skip this part. Even if you read and understand it, remember that this is not an academic consensus but the author’s personal opinion, so don’t take it too seriously.

In ordinary mathematics, unlike in mathematical statistics, completeness usually refers to the completeness defined in a metric space. Whether sequences converging to points are a significant matter in mathematics, and in places like measure spaces, completeness can be discussed without explicitly defining Cauchy sequences. $$ p: x \overset{?}{\triangle} X \implies q: x \in X $$ However, when talking about completeness, regardless of the granular definition or what relationship $\overset{?}{\triangle}$ has, conditional statements’s $q$ involve something similar to $x \in X$. No matter what’s different or how abstracted it is, the fact that something of our interest belongs to the original set is the universal form of completeness.

For a long time, the author has asked many scholars about this and searched various documents² ³, including Wikipedia, but has concluded that there is no precise relation between the completeness of statistics and this universal form of completeness. If anything, in terms of not making exceptions for any $\theta$, we could only talk about the completeness statistic $T$ by crafting a statement like $\theta \in \Theta$ encapsulating all $\theta$ of our interest in the set of parameters.

The only thing that might be somewhat similar in terms of formula is the completeness of the sigma field discussed in measure theory. ⁴ This might be covered later if there’s a chance and if I remember.

Casella. (2001). Statistical Inference(2nd Edition): p285. ↩︎
https://stats.stackexchange.com/questions/53107/meaning-of-completeness-of-a-Statistic ↩︎
https://mathworld.wolfram.com/CompleteOrthogonalSystem.html ↩︎
Capinski. (1999). Measure, Integral and Probability: p40. ↩︎