logo

Proof of Bézout's Theorem 📂Mathematical Statistics

Proof of Bézout's Theorem

Theorem

If $T \left( \mathbf{X} \right)$ is a complete statistic as well as a minimal sufficient statistic, then $T \left( \mathbf{X} \right)$ is independent of all ancillary statistics.

Description

Basu’s theorem is one of the most important results, among those related to sufficient statistics, allowing for a very strong conclusion that certain statistics are independent. Intuitively, a sufficient statistic contains all the information about a parameter $\theta$, and since ancillary statistics are not dependent on $\theta$, they naturally seem independent. However, completeness, a ‘common sense’ property along with minimalism, is required for a statistic to ensure independence.

A classic result is that the sample mean $\overline{X}$ and the sample variance $S^{2}$ from a normal distribution are independent, a fact that can also be shown without Basu’s theorem, for example, through Student’s theorem, but Basu’s theorem offers a slightly more general method of proof.

Proof 1

Strategy: Let’s prove for the discrete probability distribution case. Assuming $S \left( \mathbf{X} \right)$ is an ancillary statistic for parameter $\theta$, then $P \left( S \left( \mathbf{X} \right) = s \right)$ is not dependent on $\theta$, and according to the definition of sufficient statistics, the conditional probability $$ P \left( S \left( \mathbf{X} \right) = s \mid T \left( \mathbf{X} \right) = t \right) = P \left( \mathbf{X} \in \left\{ \mathbf{x} : S \left( \mathbf{x} \right) = s \right\} \mid T \left( \mathbf{X} \right) = t \right) $$ is also not dependent on $\theta$. Thus, it suffices to show the following: $$ P \left( S \left( \mathbf{X} \right) = s \mid T \left( \mathbf{X} \right) = t \right) = P \left( S \left( \mathbf{X} \right) = s \right) \qquad , \forall t $$


According to the law of addition of probabilities: $$ P \left( S \left( \mathbf{X} \right) = s \right) = \sum_{t} P \left( S \left( \mathbf{X} \right) = s \mid T \left( \mathbf{X} \right) = t \right) P_{\theta} \left( T \left( \mathbf{X} \right) = t \right) $$ Meanwhile, multiplying both sides of $1 = \sum_{t} P_{\theta} \left( T \left( \mathbf{X} \right) = t \right)$ by $P \left( S \left( \mathbf{X} \right) = s \right)$ gives: $$ \begin{align*} P \left( S \left( \mathbf{X} \right) = s \right) \cdot 1 =& P \left( S \left( \mathbf{X} \right) = s \right) \cdot \sum_{t} P_{\theta} \left( T \left( \mathbf{X} \right) = t \right) \\ =& \sum_{t} P \left( S \left( \mathbf{X} \right) = s \right) P_{\theta} \left( T \left( \mathbf{X} \right) = t \right) \end{align*} $$ Defining the following statistic: $$ g(t) := P \left( S \left( \mathbf{X} \right) = s \mid T \left( \mathbf{X} \right) = t \right) - P \left( S \left( \mathbf{X} \right) = s \right) $$ for all $\theta$, yields: $$ \begin{align*} E_{\theta} g (T) =& \sum_{t} g(t) P_{\theta} \left( T \left( \mathbf{X} \right) = t \right) \\ =& P \left( S \left( \mathbf{X} \right) = s \right) - P \left( S \left( \mathbf{X} \right) = s \right) \\ =& 0 \end{align*} $$

Definition of a complete statistic: $$ \forall \theta, E_{\theta} g (T) = 0 \implies \forall \theta, P_{\theta} \left( g(T) = 0 \right) = 1 $$

Since $T \left( \mathbf{X} \right)$ is assumed to be a complete statistic, for all possible $t$ and all $\theta$, it follows: $$ P_{\theta} \left( g(T) = 0 \right) = 1 $$ In other words, the following holds: $$ P \left( S \left( \mathbf{X} \right) = s \mid T \left( \mathbf{X} \right) = t \right) = P \left( S \left( \mathbf{X} \right) = s \right) \qquad , \forall t $$


  1. Casella. (2001). Statistical Inference(2nd Edition): p287. ↩︎