Proof of Craig's Theorem 📂Mathematical Statistics

Proof of Craig's Theorem

Theorem

Let the sample $\mathbf{X} = \left( X_{1} , \cdots , X_{n} \right)$ be iid and follow a normal distribution like $X_{1} , \cdots , X_{n} \overset{\text{iid}}{\sim} N \left( 0, \sigma^{2} \right)$. For a symmetric matrix $A, B \in \mathbb{R}^{n \times n}$, with respect to the random variables $Q_{1}$ and $Q_{2}$, which are defined as quadratic forms in random vectors like $Q_{1} := \sigma^{-2} \mathbf{X}^{T} A \mathbf{X}$ and $Q_{2} := \sigma^{-2} \mathbf{X}^{T} B \mathbf{X}$, the following holds.

$$ Q_{1} \perp Q_{2} \iff A B = O_{n} $$

In other words, $Q_{1}$ and $Q_{2}$ being independent is equivalent to $A B = O_{n}$.

$O_{n}$ is a zero matrix of size $n \times n$.

Theorem

This theorem is used in the proof of the Hoegh-Craig theorem.

Proof ¹

Let $\mathbf{0}_{n}$ be a $n$-dimensional zero vector and $I_{n}$ be a $n \times n$ identity matrix. Suppose $\rank A = r \le n$ and $\rank B = s \le n$ hold, and let $\lambda_{1} , \cdots , \lambda_{r}$ be a non-zero eigenvalue of $A$.

Spectral Theory: If $A$ is a Hermitian matrix, it is unitarily diagonalizable: $$ A = A^{\ast} \implies A = Q^{\ast} \Lambda Q $$

Since the real matrix $A$ is symmetric, it is a Hermitian matrix and there exists its spectral decomposition $\Gamma_{1}^{T} \Lambda_{1} \Gamma_{1}$. Without loss of generality, suppose the first $r$ diagonal elements of $\Lambda_{1}$ are the eigenvalue $\lambda_{1} , \cdots , \lambda_{r}$, and the remaining are $0$. When the submatrix of $\Gamma_{1}^{T} \in \mathbb{R}^{n \times n}$ is $\Gamma_{11}^{T} \in \mathbb{R}^{n \times r}$, the submatrix of $\Lambda_{1}$ becomes $\Lambda_{11} = \diag \left( \lambda_{1} , \cdots , \lambda_{r} \right)$. In this notation, $A$ can be specified in two ways as follows: $$ A = \Gamma_{1}^{T} \Lambda_{1} \Gamma_{1} = \Gamma_{11}^{T} \Lambda_{11} \Gamma_{11} $$ Define the random vector $\mathbf{W}_{1}$ as $\mathbf{W}_{1} = \sigma^{-1} \Gamma_{11} \mathbf{X}$, then $Q_{1}$ can be expressed as follows: $$ \begin{align*} Q_{1} =& \sigma^{-2} \mathbf{X}^{T} A \mathbf{X} \\ =& \sigma^{-2} \mathbf{X}^{T} \Gamma_{11}^{T} \Lambda_{11} \Gamma_{11} \mathbf{X} \\ =& \left( \sigma^{-1} \Gamma_{11} \mathbf{X} \right)^{T} \Lambda_{11} \left( \sigma^{-1} \Gamma_{11} \mathbf{X} \right) \\ =& \mathbf{W}_{1}^{T} \Lambda_{11} \mathbf{W}_{1} \end{align*} $$ Through this same process, express $Q_{2}$ with respect to $B$, which has rank $s$ as follows: $$ \begin{align*} B =& \Gamma_{2}^{T} \Lambda_{2} \Gamma_{2} & \\ =& \Gamma_{21}^{T} \Lambda_{22} \Gamma_{21} & , \Lambda_{22} \in \mathbb{R}^{s \times s} \\ \implies Q_{2} =& \mathbf{W}_{2}^{T} \Lambda_{22} \mathbf{W}_{2} & , \mathbf{W}_{2} := \sigma^{-1} \Gamma_{21} \mathbf{X} \end{align*} $$

If we let $\mathbf{W} := \begin{bmatrix} W_{1} \\ W_{2} \end{bmatrix}$, it can also be expressed as follows: $$ \mathbf{W} = \sigma^{-1} \begin{bmatrix} \Gamma_{11} \\ \Gamma_{21} \end{bmatrix} \mathbf{X} $$

Normality of the Linear Transformation of Multivariate Normal Distribution: For a matrix $A \in \mathbb{R}^{m \times n}$ and vector $\mathbf{b} \in \mathbb{R}^{m}$, the linear transformation $\mathbf{Y} = A \mathbf{X} + \mathbf{b}$ of a random vector that follows a multivariate normal distribution still follows the multivariate normal distribution $N_{m} \left( A \mu + \mathbf{b} , A \Sigma A^{T} \right)$.

Since the sample $\mathbf{X}$ follows a $n$-variate normal distribution $N_{n} \left( \mathbf{0}_{n} , I_{n} \right)$, its linear transformation $\mathbf{W}$ also follows a multivariate normal distribution with a population mean of $\mathbf{0}_{r+s}$ and a covariance matrix as follows: $$ \Var \mathbf{W} = \begin{bmatrix} I_{r} & \Gamma_{11} \Gamma_{21}^{T} \\ \Gamma_{21} \Gamma_{11}^{T} & I_{s} \end{bmatrix} $$

Now, the product $AB$ of $A$ and $B$ can be represented as follows for $U := \Gamma_{11}^{T} \Lambda_{11}$ and $V := \Lambda_{22} \Gamma_{21}$. $$ \begin{align*} AB =& \Gamma_{11}^{T} \Lambda_{11} \Gamma_{11} \cdot \Gamma_{21}^{T} \Lambda_{22} \Gamma_{21} \\ =& \left[ \Gamma_{11}^{T} \Lambda_{11} \right] \Gamma_{11} \Gamma_{21}^{T} \left[ \Lambda_{22} \Gamma_{21} \right] \\ =& U \Gamma_{11} \Gamma_{21}^{T} V \\ =& U \left[ \Gamma_{11} \Gamma_{21}^{T} V \right] \end{align*} $$

Assume $(\impliedby)$ is $A B = O_{n}$.

Since $U = \Gamma_{11}^{T} \Lambda_{11}$ has full rank, the kernel of $U$ should be the trivial vector space $\left\{ \mathbf{0}_{r} \right\}$ containing only zero vectors, and for the following to hold, $\Gamma_{11} \Gamma_{21}^{T} V$ must be a zero matrix. $$ O_{n} = AB = U \left[ \Gamma_{11} \Gamma_{21}^{T} V \right] $$ Meanwhile, since $V$ also has full rank, $\Gamma_{21} \Gamma_{11}^{T}$ must be a zero matrix to satisfy the following: $$ V^{T} \Gamma_{21} \Gamma_{11}^{T} = O_{s} $$

Using similar logic for $B^{T} A^{T} = O_{n}$, one can show that $\Gamma_{11} \Gamma_{21}^{T}$ is a zero matrix, thus the covariance matrix of $\mathbf{W}$ is as follows: $$ \Var \mathbf{W} = \begin{bmatrix} I_{r} & \Gamma_{11} \Gamma_{21}^{T} \\ \Gamma_{21} \Gamma_{11}^{T} & I_{s} \end{bmatrix} = \begin{bmatrix} I_{r} & O \\ O & I_{s} \end{bmatrix} $$

In Multivariate Normal Distribution, Independence and Zero Correlation Are Equivalent: $$ \begin{align*} \mathbf{X} =& \begin{bmatrix} \mathbf{X}_{1} \\ \mathbf{X}_{2} \end{bmatrix} & : \Omega \to \mathbb{R}^{n} \\ \mu =& \begin{bmatrix} \mu_{1} \\ \mu_{2} \end{bmatrix} & \in \mathbb{R}^{n} \\ \Sigma =& \begin{bmatrix} \Sigma_{11} & \Sigma_{12} \\ \Sigma_{21} & \Sigma_{22} \end{bmatrix} & \in \mathbb{R}^{n \times n} \end{align*} $$ For $\mathbf{X}$, $\mu$, and $\Sigma$ written in Jordan block form, given a random vector $\mathbf{X} \sim N_{n} \left( \mu , \Sigma \right)$ that follows a multivariate normal distribution, the following holds: $$ \mathbf{X}_{1} \perp \mathbf{X}_{2} \iff \Sigma_{12} = \Sigma_{21} = O $$

Since the sample $\mathbf{X}$ has normality in the theorem, one can infer that $\mathbf{W}_{1} \perp \mathbf{W}_{2}$, and thus $Q_{1} = \mathbf{W}_{1}^{T} \Lambda_{11} \mathbf{W}_{1}$ and $Q_{2} = \mathbf{W}_{2}^{T} \Lambda_{22} \mathbf{W}_{2}$ expressed as functions of these, are also independent of each other.

Assume $(\implies)$ is $Q_{1} \perp Q_{2}$.

If $Q_{1}$ and $Q_{2}$ are independent, the following holds for $\left( t_{1} , t_{2} \right)$ in the neighborhood of $(0, 0)$. $$ E \left[ \exp \left( t_{1} Q_{1} + t_{2} Q_{2} \right) \right] = E \left( \exp t_{1} Q_{1} \right) \cdot E \left( \exp t_{2} Q_{2} \right) $$

Here, $t_{1} Q_{1} + t_{2} Q_{2}$ can be expressed as follows: $$ \begin{align*} & t_{1} Q_{1} + t_{2} Q_{2} \\ =& t_{1} \mathbf{X}^{T} A \mathbf{X} + t_{2} \mathbf{X}^{T} B \mathbf{X} \\ =& \mathbf{X}^{T} \left( t_{1} A + t_{2} B \right) \mathbf{X} \end{align*} $$ Since $t_{1} A + t_{2} B$ is still a symmetric matrix, $t_{1} Q_{1} + t_{2} Q_{2}$ is also a quadratic form in random vectors.

Moment Generating Function of Quadratic Form in Normal Random Vector: Suppose the sample $\mathbf{X} = \left( X_{1} , \cdots , X_{n} \right)$ is iid and follows a normal distribution like $X_{1} , \cdots , X_{n} \overset{\text{iid}}{\sim} N \left( 0, \sigma^{2} \right)$. The moment generating function of quadratic form in random vectors $Q = \sigma^{-2} \mathbf{X}^{T} A \mathbf{X}$ with rank $r \le n$ and symmetric matrix $A \in \mathbb{R}^{n \times n}$ is given as follows: $$ M_{Q} (t) = \prod_{i=1}^{r} \left( 1 - 2 t \lambda_{i} \right)^{-1/2} = \det \left( I_{n} - 2 t A \right)^{-1/2} \qquad , | t | < 1 / 2 \lambda_{1} $$ Here, $I_{n} \in \mathbb{R}^{n \times n}$ is the identity matrix, and $\det$ is the determinant. $\lambda_{1} \ge \cdots \ge \lambda_{r}$ lists the non-zero eigenvalues of $A$ in descending order without loss of generality.

Properties of Orthogonal Matrices: The determinant of an orthogonal matrix is either $1$ or $-1$.

The determinant of the orthogonal matrix $\Gamma_{1}$ is $\pm 1$, so whether it is $1$ or $-1$, the determinants of $\Gamma_{1}$ and $\Gamma_{1}^{T}$ preserve sign correlation. Thus, the moment generating function of $t_{1} Q_{1} + t_{2} Q_{2}$ can be expressed as follows: $$ \begin{align*} & \left( E \left[ \exp \left( t_{1} Q_{1} + t_{2} Q_{2} \right) \right] \right)^{-2} \\ =& \det \left( I_{n} - 2 t_{1} A - 2 t_{2} B \right) \\ =& \det \left( \Gamma_{1}^{T} \Gamma_{1} - 2 t_{1} \Gamma_{1}^{T} \Lambda_{1} \Gamma_{1} - 2 t_{2} \Gamma_{1}^{T} \Gamma_{1} B \Gamma_{1}^{T} \Gamma_{1} \right) \\ =& \det \Gamma_{1}^{T} \det \left( I_{n} - 2 t_{1} \Lambda_{1} - 2 t_{2} \Gamma_{1} B \Gamma_{1}^{T} \right) \det \Gamma_{1}^{T} \\ =& \det \left( I_{n} - 2 t_{1} \Lambda_{1} - 2 t_{2} D \right) \end{align*} $$ Here, one intends to express $D = \Gamma_{1} B \Gamma_{1}^{T} \in \mathbb{R}^{r \times r}$ as a block matrix in the following way: $$ D = \begin{bmatrix} D_{11} & D_{12} \\ D_{21} & D_{22} \end{bmatrix} $$

Leaving $D$ as is and writing down the product of the moment generating functions of $Q_{1}$ and $Q_{2}$ gives: $$ \begin{align*} & \left[ E \left( \exp t_{1} Q_{1} \right) \cdot E \left( \exp t_{2} Q_{2} \right) \right]^{-2} \\ =& \prod_{i=1}^{r} \left( 1 - 2 t_{1} \lambda_{i} \right) \det \left( I_{n} - 2 t_{2} D \right) \end{align*} $$

A simple summary of the development so far yields: $$ \det \left( I_{n} - 2 t_{1} \Lambda_{1} - 2 t_{2} D \right) = \prod_{i=1}^{r} \left( 1 - 2 t_{1} \lambda_{i} \right) \det \left( I_{n} - 2 t_{2} D \right) $$ Here, it is unnecessary to actually calculate the determinant and examine every term, instead investigating the power of $t_{1}$ suffices. While it is easy to observe that the coefficient of $\left( - 2 t_{1} \right)^{r}$ on the RHS is $\prod_{i=1}^{r} \lambda_{i} \det \left( I_{n} - 2 t_{2} D \right)$, the LHS is considerably more challenging. To proceed, consider breaking down the matrix within the determinant based on the block of $D$. Since $\Lambda_{1}$ can be written as $\Lambda_{1} = \begin{bmatrix} \Lambda_{11} & O \\ O & O \end{bmatrix}$, let $\det \left( I_{r} - 2 t_{1} \Lambda_{11} - 2 t_{2} D_{11} \right)$ denote the part corresponding to $D_{11}$, and $\det \left( I_{n-r} - 2 t_{2} D_{22} \right)$ denote the rest. From this, one can ascertain the coefficient of $\left( - 2 t_{1} \right)^{r}$ is $\prod_{i=1}^{r} \lambda_{i} \det \left( I_{n-2} - 2 t_{2} D_{22} \right)$, and comparing solely the coefficients of $t_{1}$ on both sides yields: $$ \det \left( I_{n-r} - 2 t_{2} D_{22} \right) = \det \left( I_{n} - 2 t_{2} D \right) $$

The following two claims will now be made:

The non-$0$ eigenvalues of $D_{22}$ and $D$ are the same.
The sum of squares of the elements of a symmetric real matrix is equal to the sum of squares of its eigenvalues.

Claim 1: The non-$0$ eigenvalues of $D_{22}$ and $D$ are the same.

Properties of Determinants: Let $A,B$ be a $n\times n$ matrix, and $k$ a constant. The determinant has the following properties:
(a) $\det(kA) = k^{n}\det(A)$

If $\lambda \ne 0$ is defined as $\lambda := \left( 2 t_{2} \right)^{-1}$, the following equivalence is obtained: $$ \begin{align*} & \det \left( I_{n-r} - {\frac{ 1 }{ \lambda }} D_{22} \right) = \det \left( I_{n} - {\frac{ 1 }{ \lambda }} D \right) \\ \implies & \left( - \lambda \right)^{r} \cdot \left( - \lambda \right)^{n-r} \det \left( I_{n-r} - {\frac{ 1 }{ \lambda }} D_{22} \right) = \left( - \lambda \right)^{n} \det \left( I_{n} - {\frac{ 1 }{ \lambda }} D \right) \\ \implies & \left( - \lambda \right)^{r} \det \left( D_{22} - \lambda I_{n-r} \right) = \det \left( D - \lambda I_{n} \right) \end{align*} $$

Equivalence Condition for Eigenvalues: For $\lambda$ to be an eigenvalue of $A$, the following equivalence condition holds: $$ \det (A -\lambda I) = 0 $$

Ultimately, since $\lambda \ne 0$ satisfies both $\det \left( D_{22} - \lambda I_{n-r} \right) = 0$ and $\det \left( D - \lambda I_{n} \right) = 0$ concurrently, the eigenvalues of $D_{22}$ and $D$ are the same.

Claim 2: The sum of squares of the elements of a symmetric real matrix is equal to the sum of squares of its eigenvalues.

Trace of the Power of a Diagonalizable Matrix: Suppose $A \in \mathbb{C}^{n \times n}$ is a diagonalizable matrix and $k \in \mathbb{N}$ a natural number. For eigenvalues $\lambda_{1} , \cdots , \lambda_{n}$ of $A$, the following holds: $$ \operatorname{tr} A^{k} = \sum_{i=1}^{n} \lambda_{i}^{k} $$ Here, $\operatorname{tr}$ is the trace.

Since the sum of squares of the elements of a symmetric matrix $M = \left( m_{ij} \right)$ can be represented as the trace of the matrix squared $\operatorname{tr} M^{T} M$, the following holds: $$ \sum_{i,j} m_{ij}^{2} = \operatorname{tr} M^{T} M = \operatorname{tr} M^{2} = \sum_{l=1}^{n} \lambda_{l}^{2} $$

Both $D$ and $D_{22}$ are symmetric and real by definition, and hence diagonalizable. Consequently, the sum of squares of their elements equals the sum of squares of their respective eigenvalues, and hence, the sum of squares of the elements of $D$ equals that of $D_{22}$. However, since both matrices are real, $D_{12}$, $D_{21}$, and $D_{11}$ must all be zero matrices. For the product of $\Lambda_{1}$ and $D$, the following is calculated: $$ \Lambda_{1} D = \begin{bmatrix} \Lambda_{11} & O \\ O & O_{n-r} \end{bmatrix} \begin{bmatrix} O_{r} & O \\ O & D_{22} \end{bmatrix} = O_{n} $$ Concurrently, since $A = \Gamma_{1}^{T} \Lambda_{1} \Gamma_{1}$ and $D = \Gamma_{1} B \Gamma_{1}^{T}$ hold, one obtains: $$ \begin{align*} & A B \\ =& \Gamma_{1}^{T} \Lambda_{1} \Gamma_{1} \Gamma_{1}^{T} D \Gamma_{1} \\ =& \Gamma_{1}^{T} \Lambda_{1} D \Gamma_{1} \\ =& O_{n} \end{align*} $$

■

Hogg et al. (2018). Introduction to Mathematical Statistcs(8th Edition): p562. ↩︎

Proof of Craig's Theorem

Theorem

Theorem

Proof 1

Proof ¹