logo

Proof of Bézout's Theorem 📂Mathematical Statistics

Proof of Bézout's Theorem

Theorem

If T(X)T \left( \mathbf{X} \right) is a complete statistic as well as a minimal sufficient statistic, then T(X)T \left( \mathbf{X} \right) is independent of all ancillary statistics.

Description

Basu’s theorem is one of the most important results, among those related to sufficient statistics, allowing for a very strong conclusion that certain statistics are independent. Intuitively, a sufficient statistic contains all the information about a parameter θ\theta, and since ancillary statistics are not dependent on θ\theta, they naturally seem independent. However, completeness, a ‘common sense’ property along with minimalism, is required for a statistic to ensure independence.

A classic result is that the sample mean X\overline{X} and the sample variance S2S^{2} from a normal distribution are independent, a fact that can also be shown without Basu’s theorem, for example, through Student’s theorem, but Basu’s theorem offers a slightly more general method of proof.

Proof 1

Strategy: Let’s prove for the discrete probability distribution case. Assuming S(X)S \left( \mathbf{X} \right) is an ancillary statistic for parameter θ\theta, then P(S(X)=s)P \left( S \left( \mathbf{X} \right) = s \right) is not dependent on θ\theta, and according to the definition of sufficient statistics, the conditional probability P(S(X)=sT(X)=t)=P(X{x:S(x)=s}T(X)=t) P \left( S \left( \mathbf{X} \right) = s \mid T \left( \mathbf{X} \right) = t \right) = P \left( \mathbf{X} \in \left\{ \mathbf{x} : S \left( \mathbf{x} \right) = s \right\} \mid T \left( \mathbf{X} \right) = t \right) is also not dependent on θ\theta. Thus, it suffices to show the following: P(S(X)=sT(X)=t)=P(S(X)=s),t P \left( S \left( \mathbf{X} \right) = s \mid T \left( \mathbf{X} \right) = t \right) = P \left( S \left( \mathbf{X} \right) = s \right) \qquad , \forall t


According to the law of addition of probabilities: P(S(X)=s)=tP(S(X)=sT(X)=t)Pθ(T(X)=t) P \left( S \left( \mathbf{X} \right) = s \right) = \sum_{t} P \left( S \left( \mathbf{X} \right) = s \mid T \left( \mathbf{X} \right) = t \right) P_{\theta} \left( T \left( \mathbf{X} \right) = t \right) Meanwhile, multiplying both sides of 1=tPθ(T(X)=t)1 = \sum_{t} P_{\theta} \left( T \left( \mathbf{X} \right) = t \right) by P(S(X)=s)P \left( S \left( \mathbf{X} \right) = s \right) gives: P(S(X)=s)1=P(S(X)=s)tPθ(T(X)=t)=tP(S(X)=s)Pθ(T(X)=t) \begin{align*} P \left( S \left( \mathbf{X} \right) = s \right) \cdot 1 =& P \left( S \left( \mathbf{X} \right) = s \right) \cdot \sum_{t} P_{\theta} \left( T \left( \mathbf{X} \right) = t \right) \\ =& \sum_{t} P \left( S \left( \mathbf{X} \right) = s \right) P_{\theta} \left( T \left( \mathbf{X} \right) = t \right) \end{align*} Defining the following statistic: g(t):=P(S(X)=sT(X)=t)P(S(X)=s) g(t) := P \left( S \left( \mathbf{X} \right) = s \mid T \left( \mathbf{X} \right) = t \right) - P \left( S \left( \mathbf{X} \right) = s \right) for all θ\theta, yields: Eθg(T)=tg(t)Pθ(T(X)=t)=P(S(X)=s)P(S(X)=s)=0 \begin{align*} E_{\theta} g (T) =& \sum_{t} g(t) P_{\theta} \left( T \left( \mathbf{X} \right) = t \right) \\ =& P \left( S \left( \mathbf{X} \right) = s \right) - P \left( S \left( \mathbf{X} \right) = s \right) \\ =& 0 \end{align*}

Definition of a complete statistic: θ,Eθg(T)=0    θ,Pθ(g(T)=0)=1 \forall \theta, E_{\theta} g (T) = 0 \implies \forall \theta, P_{\theta} \left( g(T) = 0 \right) = 1

Since T(X)T \left( \mathbf{X} \right) is assumed to be a complete statistic, for all possible tt and all θ\theta, it follows: Pθ(g(T)=0)=1 P_{\theta} \left( g(T) = 0 \right) = 1 In other words, the following holds: P(S(X)=sT(X)=t)=P(S(X)=s),t P \left( S \left( \mathbf{X} \right) = s \mid T \left( \mathbf{X} \right) = t \right) = P \left( S \left( \mathbf{X} \right) = s \right) \qquad , \forall t


  1. Casella. (2001). Statistical Inference(2nd Edition): p287. ↩︎