Proof of Cochran's Theorem 📂Mathematical Statistics

Proof of Cochran's Theorem

Theorem

Let Sample $\mathbf{X} = \left( X_{1} , \cdots , X_{n} \right)$ be iid and follow a Normal distribution like $X_{1} , \cdots , X_{n} \overset{\text{iid}}{\sim} N \left( 0, \sigma^{2} \right)$. For a symmetric matrix $A_{1} , \cdots , A_{k} \in \mathbb{R}^{n \times n}$ with rank $r_{j}$, suppose the random variable $Q_{1} , \cdots , Q_{k}$ is expressed as a random vector quadratic form $Q_{i} := \mathbf{X}^{T} A_{i} \mathbf{X}$, and the sum of squares of the sample is given as $\sum_{i=1}^{n} X_{i}^{2} = \sum_{j=1}^{k} Q_{j}$. Then the following holds. $$ \forall j , {\frac{ Q_{j} }{ \sigma^{2} }} \sim \chi^{2} \left( r_{j} \right) \land \forall j_{1} \ne j_{2} , Q_{j_{1}} \perp Q_{j_{2}} \iff \sum_{j=1}^{k} r_{j} = n $$ In other words, $Q_{j}$ are mutually independent and their equivalence condition with the Chi-square distribution $\chi^{2} \left( r_{j} \right)$ implies that the sum of ranks $r_{j}$ equals the size of the sample $n$.

Explanation

This theorem provides the theoretical framework supporting the analysis of variance where F-test is used.

Proof

Assume that $(\implies)$ and $Q_{j}$ are mutually independent and that $Q_{j} / \sigma^{2} \sim \chi^{2} \left( r_{j} \right)$ holds.

Addition of random variables: If $X_i \sim \chi^2 ( r_{i} )$ then $$ \sum_{i=1}^{n} X_{i} \sim \chi ^2 \left( \sum_{i=1}^{n} r_{i} \right) $$

Since $Q_{j} / \sigma^{2}$ follows a Chi-square distribution with degree of freedom $r_{j}$, the sum of these variables also follows a Chi-square distribution as below. $$ \sum_{j=1}^{k} {\frac{ Q_{j} }{ \sigma^{2} }} \sim \chi^{2} \left( \sum_{j=1}^{k} r_{j} \right) $$

Derivation of the Chi-square distribution from the standard normal distribution: If $X \sim N(\mu,\sigma ^2)$ then $$ V=\left( { X - \mu \over \sigma} \right) ^2 \sim \chi ^2 (1) $$

Since $X_{1} , \cdots , X_{n}$ follows a Normal distribution, $X_{i}^{2} / \sigma^{2} \sim \chi^{2} \left( 1 \right)$ holds, and their sum follows a Chi-square distribution as shown below. $$ \sum_{i=1}^{n} {\frac{ X_{i}^{2} }{ \sigma^{2} }} \sim \chi^{2} \left( n \right) $$

Given that $\sum_{i=1}^{n} X_{i}^{2} = \sum_{j=1}^{k} Q_{j}$ was stated in the major premise, $n = \sum_{j=1}^{k} r_{j}$ must hold.

Assume that $(\impliedby)$ and $\sum_{j=1}^{k} r_{j} = n$ hold.

$$ \begin{align*} \sum_{j=1}^{k} Q_{j} =& \mathbf{X}^{T} \left( A_{1} + \cdots + A_{k} \right) \mathbf{X} \\ =& \mathbf{X}^{T} \mathbf{X} \\ =& \sum_{i=1}^{n} X_{i}^{2} \end{align*} $$ Since $\sum_{i=1}^{n} X_{i}^{2} = \sum_{j=1}^{k} Q_{j}$ holds as discussed above, it can be noticed that $I_{n} = \sum_{j=1}^{k} A_{j}$ is true. By defining the matrix $B_{j} = I_{n} - A_{j}$, $B_{j}$ is equal to the sum excluding $A_{j}$, leaving the remainder $A_{1} , \cdots , A_{k}$.

Subadditivity of matrix rank: The rank of a matrix exhibits subadditivity. That is, for two matrices $A, B$, the following holds. $$ \rank \left( A + B \right) \le \rank A + \rank B $$

By defining the rank of $R_{j_{0}}$ as $B_{j_{0}}$, the rank of the matrix sum is less than or equal to the sum of the individual ranks, yielding the following inequality. $$ R_{j_{0}} = \rank B_{j_{0}} \le \rank \left( I_{n} - A_{j_{0}} \right) = \sum_{j=1}^{k} r_{j} - r_{j_{0}} = n - r_{j_{0}} $$ However, considering that $I_{n} = A_{j_{0}} + B_{j_{0}}$, it follows that $n \le r_{j_{0}} + R_{j_{0}} \implies n - r_{j_{0}} \le R_{j_{0}}$, and exactly $R_{j_{0}} = n - r_{j_{0}}$ holds.

This implies that $B_{j_{0}}$ has exactly $0$ non-zero eigenvalues. The eigenvalues $\lambda$ of $B_{j_{0}}$ must satisfy $\det \left( B_{j_{0}} - \lambda I \right) = 0$, allowing us to rewrite it in the following manner given $B_{j_{0}} = I_{n} - A_{j_{0}}$. $$ \det \left( I_{n} - A_{j_{0}} - \lambda I_{n} \right) = \det \left( A_{j_{0}} - \left( 1 - \lambda \right) I_{n} \right) = 0 $$ Thus, the eigenvalues of $A_{j_{0}}$ differ by $1$ from the eigenvalues of $B_{j_{0}}$, which exactly counted $r_{j_{0}}$ zero eigenvalues; hence $A_{j_{0}}$ has exactly $r_{j_{0}}$ eigenvalues of $1$, and the rest are all $0$.

Symmetric real matrix with only eigenvalues $0$ and $1$: If a symmetric matrix $A \in \mathbb{R}^{n \times n}$ has all eigenvalues as $0$ or $1$, then $A$ is an idempotent matrix.

Conditions for the chi-square distribution of quadratic forms of normal random vectors: Let sample $\mathbf{X} = \left( X_{1} , \cdots , X_{n} \right)$ be iid following a normal distribution as $X_{1} , \cdots , X_{n} \overset{\text{iid}}{\sim} N \left( 0, \sigma^{2} \right)$. For a symmetric matrix $A \in \mathbb{R}^{n \times n}$ with rank $r \le n$, if we define the random vector quadratic form as $Q = \sigma^{-2} \mathbf{X}^{T} A \mathbf{X}$, the following holds. $$ Q \sim \chi^{2} (r) \iff A^{2} = A $$

All symmetric real matrices $A_{1} , \cdots , A_{k}$ are idempotent matrices since their eigenvalues are only $0$ and $1$, and since their rank is $r_{j}$, $Q_{j} / \sigma^{2}$ follows a Chi-square distribution $\chi^{2} \left( r_{j} \right)$.

Hogg-Craig theorem: Let sample $\mathbf{X} = \left( X_{1} , \cdots , X_{n} \right)$ be iid following a normal distribution as $X_{1} , \cdots , X_{n} \overset{\text{iid}}{\sim} N \left( 0, \sigma^{2} \right)$. For a symmetric matrix $A_{1} , \cdots , A_{k} \in \mathbb{R}^{n \times n}$, assume that the random variable $Q_{1} , \cdots , Q_{k}$ is expressed as a random vector quadratic form $Q_{i} := \mathbf{X}^{T} A_{i} \mathbf{X}$, define the symmetric matrix $A$ and random variable $Q$ as follows. $$ \begin{align*} A =& A_{1} + \cdots + A_{k} \\ Q =& Q_{1} + \cdots + Q_{k} \end{align*} $$ If $Q / \sigma^{2}$ follows a Chi-square distribution $\chi^{2} \left( r \right)$, satisfies $i = 1 , \cdots , k-1$ for $Q_{i} / \sigma^{2} \sim \chi^{2} \left( r_{i} \right)$, and if $Q_{k} \ge 0$ holds, then $Q_{1} , \cdots , Q_{k}$ is independent and $Q_{k} / \sigma^{2}$ follows a Chi-square distribution $\chi^{2} \left( r_{k} \right)$ with degrees of freedom $r_{k} = r - r_{1} - \cdots - r_{k-1}$.

By the Hogg-Craig theorem, $Q_{1} , \cdots , Q_{k}$ are mutually independent.

■