logo

Proof of Craig's Theorem 📂Mathematical Statistics

Proof of Craig's Theorem

Theorem

Let the sample X=(X1,,Xn)\mathbf{X} = \left( X_{1} , \cdots , X_{n} \right) be iid and follow a normal distribution like X1,,XniidN(0,σ2)X_{1} , \cdots , X_{n} \overset{\text{iid}}{\sim} N \left( 0, \sigma^{2} \right). For a symmetric matrix A,BRn×nA, B \in \mathbb{R}^{n \times n}, with respect to the random variables Q1Q_{1} and Q2Q_{2}, which are defined as quadratic forms in random vectors like Q1:=σ2XTAXQ_{1} := \sigma^{-2} \mathbf{X}^{T} A \mathbf{X} and Q2:=σ2XTBXQ_{2} := \sigma^{-2} \mathbf{X}^{T} B \mathbf{X}, the following holds.

Q1Q2    AB=On Q_{1} \perp Q_{2} \iff A B = O_{n}

In other words, Q1Q_{1} and Q2Q_{2} being independent is equivalent to AB=OnA B = O_{n}.


  • OnO_{n} is a zero matrix of size n×nn \times n.

Theorem

This theorem is used in the proof of the Hoegh-Craig theorem.

Proof 1

Let 0n\mathbf{0}_{n} be a nn-dimensional zero vector and InI_{n} be a n×nn \times n identity matrix. Suppose rankA=rn\rank A = r \le n and rankB=sn\rank B = s \le n hold, and let λ1,,λr\lambda_{1} , \cdots , \lambda_{r} be a non-zero eigenvalue of AA.

Spectral Theory: If AA is a Hermitian matrix, it is unitarily diagonalizable: A=A    A=QΛQ A = A^{\ast} \implies A = Q^{\ast} \Lambda Q

Since the real matrix AA is symmetric, it is a Hermitian matrix and there exists its spectral decomposition Γ1TΛ1Γ1\Gamma_{1}^{T} \Lambda_{1} \Gamma_{1}. Without loss of generality, suppose the first rr diagonal elements of Λ1\Lambda_{1} are the eigenvalue λ1,,λr\lambda_{1} , \cdots , \lambda_{r}, and the remaining are 00. When the submatrix of Γ1TRn×n\Gamma_{1}^{T} \in \mathbb{R}^{n \times n} is Γ11TRn×r\Gamma_{11}^{T} \in \mathbb{R}^{n \times r}, the submatrix of Λ1\Lambda_{1} becomes Λ11=diag(λ1,,λr)\Lambda_{11} = \diag \left( \lambda_{1} , \cdots , \lambda_{r} \right). In this notation, AA can be specified in two ways as follows: A=Γ1TΛ1Γ1=Γ11TΛ11Γ11 A = \Gamma_{1}^{T} \Lambda_{1} \Gamma_{1} = \Gamma_{11}^{T} \Lambda_{11} \Gamma_{11} Define the random vector W1\mathbf{W}_{1} as W1=σ1Γ11X\mathbf{W}_{1} = \sigma^{-1} \Gamma_{11} \mathbf{X}, then Q1Q_{1} can be expressed as follows: Q1=σ2XTAX=σ2XTΓ11TΛ11Γ11X=(σ1Γ11X)TΛ11(σ1Γ11X)=W1TΛ11W1 \begin{align*} Q_{1} =& \sigma^{-2} \mathbf{X}^{T} A \mathbf{X} \\ =& \sigma^{-2} \mathbf{X}^{T} \Gamma_{11}^{T} \Lambda_{11} \Gamma_{11} \mathbf{X} \\ =& \left( \sigma^{-1} \Gamma_{11} \mathbf{X} \right)^{T} \Lambda_{11} \left( \sigma^{-1} \Gamma_{11} \mathbf{X} \right) \\ =& \mathbf{W}_{1}^{T} \Lambda_{11} \mathbf{W}_{1} \end{align*} Through this same process, express Q2Q_{2} with respect to BB, which has rank ss as follows: B=Γ2TΛ2Γ2=Γ21TΛ22Γ21,Λ22Rs×s    Q2=W2TΛ22W2,W2:=σ1Γ21X \begin{align*} B =& \Gamma_{2}^{T} \Lambda_{2} \Gamma_{2} & \\ =& \Gamma_{21}^{T} \Lambda_{22} \Gamma_{21} & , \Lambda_{22} \in \mathbb{R}^{s \times s} \\ \implies Q_{2} =& \mathbf{W}_{2}^{T} \Lambda_{22} \mathbf{W}_{2} & , \mathbf{W}_{2} := \sigma^{-1} \Gamma_{21} \mathbf{X} \end{align*}

If we let W:=[W1W2]\mathbf{W} := \begin{bmatrix} W_{1} \\ W_{2} \end{bmatrix}, it can also be expressed as follows: W=σ1[Γ11Γ21]X \mathbf{W} = \sigma^{-1} \begin{bmatrix} \Gamma_{11} \\ \Gamma_{21} \end{bmatrix} \mathbf{X}

Normality of the Linear Transformation of Multivariate Normal Distribution: For a matrix ARm×nA \in \mathbb{R}^{m \times n} and vector bRm\mathbf{b} \in \mathbb{R}^{m}, the linear transformation Y=AX+b\mathbf{Y} = A \mathbf{X} + \mathbf{b} of a random vector that follows a multivariate normal distribution still follows the multivariate normal distribution Nm(Aμ+b,AΣAT)N_{m} \left( A \mu + \mathbf{b} , A \Sigma A^{T} \right).

Since the sample X\mathbf{X} follows a nn-variate normal distribution Nn(0n,In)N_{n} \left( \mathbf{0}_{n} , I_{n} \right), its linear transformation W\mathbf{W} also follows a multivariate normal distribution with a population mean of 0r+s\mathbf{0}_{r+s} and a covariance matrix as follows: VarW=[IrΓ11Γ21TΓ21Γ11TIs] \Var \mathbf{W} = \begin{bmatrix} I_{r} & \Gamma_{11} \Gamma_{21}^{T} \\ \Gamma_{21} \Gamma_{11}^{T} & I_{s} \end{bmatrix}

Now, the product ABAB of AA and BB can be represented as follows for U:=Γ11TΛ11U := \Gamma_{11}^{T} \Lambda_{11} and V:=Λ22Γ21V := \Lambda_{22} \Gamma_{21}. AB=Γ11TΛ11Γ11Γ21TΛ22Γ21=[Γ11TΛ11]Γ11Γ21T[Λ22Γ21]=UΓ11Γ21TV=U[Γ11Γ21TV] \begin{align*} AB =& \Gamma_{11}^{T} \Lambda_{11} \Gamma_{11} \cdot \Gamma_{21}^{T} \Lambda_{22} \Gamma_{21} \\ =& \left[ \Gamma_{11}^{T} \Lambda_{11} \right] \Gamma_{11} \Gamma_{21}^{T} \left[ \Lambda_{22} \Gamma_{21} \right] \\ =& U \Gamma_{11} \Gamma_{21}^{T} V \\ =& U \left[ \Gamma_{11} \Gamma_{21}^{T} V \right] \end{align*}


Assume (    )(\impliedby) is AB=OnA B = O_{n}.

Since U=Γ11TΛ11U = \Gamma_{11}^{T} \Lambda_{11} has full rank, the kernel of UU should be the trivial vector space {0r}\left\{ \mathbf{0}_{r} \right\} containing only zero vectors, and for the following to hold, Γ11Γ21TV\Gamma_{11} \Gamma_{21}^{T} V must be a zero matrix. On=AB=U[Γ11Γ21TV] O_{n} = AB = U \left[ \Gamma_{11} \Gamma_{21}^{T} V \right] Meanwhile, since VV also has full rank, Γ21Γ11T\Gamma_{21} \Gamma_{11}^{T} must be a zero matrix to satisfy the following: VTΓ21Γ11T=Os V^{T} \Gamma_{21} \Gamma_{11}^{T} = O_{s}

Using similar logic for BTAT=OnB^{T} A^{T} = O_{n}, one can show that Γ11Γ21T\Gamma_{11} \Gamma_{21}^{T} is a zero matrix, thus the covariance matrix of W\mathbf{W} is as follows: VarW=[IrΓ11Γ21TΓ21Γ11TIs]=[IrOOIs] \Var \mathbf{W} = \begin{bmatrix} I_{r} & \Gamma_{11} \Gamma_{21}^{T} \\ \Gamma_{21} \Gamma_{11}^{T} & I_{s} \end{bmatrix} = \begin{bmatrix} I_{r} & O \\ O & I_{s} \end{bmatrix}

In Multivariate Normal Distribution, Independence and Zero Correlation Are Equivalent: X=[X1X2]:ΩRnμ=[μ1μ2]RnΣ=[Σ11Σ12Σ21Σ22]Rn×n \begin{align*} \mathbf{X} =& \begin{bmatrix} \mathbf{X}_{1} \\ \mathbf{X}_{2} \end{bmatrix} & : \Omega \to \mathbb{R}^{n} \\ \mu =& \begin{bmatrix} \mu_{1} \\ \mu_{2} \end{bmatrix} & \in \mathbb{R}^{n} \\ \Sigma =& \begin{bmatrix} \Sigma_{11} & \Sigma_{12} \\ \Sigma_{21} & \Sigma_{22} \end{bmatrix} & \in \mathbb{R}^{n \times n} \end{align*} For X\mathbf{X}, μ\mu, and Σ\Sigma written in Jordan block form, given a random vector XNn(μ,Σ)\mathbf{X} \sim N_{n} \left( \mu , \Sigma \right) that follows a multivariate normal distribution, the following holds: X1X2    Σ12=Σ21=O \mathbf{X}_{1} \perp \mathbf{X}_{2} \iff \Sigma_{12} = \Sigma_{21} = O

Since the sample X\mathbf{X} has normality in the theorem, one can infer that W1W2\mathbf{W}_{1} \perp \mathbf{W}_{2}, and thus Q1=W1TΛ11W1Q_{1} = \mathbf{W}_{1}^{T} \Lambda_{11} \mathbf{W}_{1} and Q2=W2TΛ22W2Q_{2} = \mathbf{W}_{2}^{T} \Lambda_{22} \mathbf{W}_{2} expressed as functions of these, are also independent of each other.


Assume (    )(\implies) is Q1Q2Q_{1} \perp Q_{2}.

If Q1Q_{1} and Q2Q_{2} are independent, the following holds for (t1,t2)\left( t_{1} , t_{2} \right) in the neighborhood of (0,0)(0, 0). E[exp(t1Q1+t2Q2)]=E(expt1Q1)E(expt2Q2) E \left[ \exp \left( t_{1} Q_{1} + t_{2} Q_{2} \right) \right] = E \left( \exp t_{1} Q_{1} \right) \cdot E \left( \exp t_{2} Q_{2} \right)

Here, t1Q1+t2Q2t_{1} Q_{1} + t_{2} Q_{2} can be expressed as follows: t1Q1+t2Q2=t1XTAX+t2XTBX=XT(t1A+t2B)X \begin{align*} & t_{1} Q_{1} + t_{2} Q_{2} \\ =& t_{1} \mathbf{X}^{T} A \mathbf{X} + t_{2} \mathbf{X}^{T} B \mathbf{X} \\ =& \mathbf{X}^{T} \left( t_{1} A + t_{2} B \right) \mathbf{X} \end{align*} Since t1A+t2Bt_{1} A + t_{2} B is still a symmetric matrix, t1Q1+t2Q2t_{1} Q_{1} + t_{2} Q_{2} is also a quadratic form in random vectors.

Moment Generating Function of Quadratic Form in Normal Random Vector: Suppose the sample X=(X1,,Xn)\mathbf{X} = \left( X_{1} , \cdots , X_{n} \right) is iid and follows a normal distribution like X1,,XniidN(0,σ2)X_{1} , \cdots , X_{n} \overset{\text{iid}}{\sim} N \left( 0, \sigma^{2} \right). The moment generating function of quadratic form in random vectors Q=σ2XTAXQ = \sigma^{-2} \mathbf{X}^{T} A \mathbf{X} with rank rnr \le n and symmetric matrix ARn×nA \in \mathbb{R}^{n \times n} is given as follows: MQ(t)=i=1r(12tλi)1/2=det(In2tA)1/2,t<1/2λ1 M_{Q} (t) = \prod_{i=1}^{r} \left( 1 - 2 t \lambda_{i} \right)^{-1/2} = \det \left( I_{n} - 2 t A \right)^{-1/2} \qquad , | t | < 1 / 2 \lambda_{1} Here, InRn×nI_{n} \in \mathbb{R}^{n \times n} is the identity matrix, and det\det is the determinant. λ1λr\lambda_{1} \ge \cdots \ge \lambda_{r} lists the non-zero eigenvalues of AA in descending order without loss of generality.

Properties of Orthogonal Matrices: The determinant of an orthogonal matrix is either 11 or 1-1.

The determinant of the orthogonal matrix Γ1\Gamma_{1} is ±1\pm 1, so whether it is 11 or 1-1, the determinants of Γ1\Gamma_{1} and Γ1T\Gamma_{1}^{T} preserve sign correlation. Thus, the moment generating function of t1Q1+t2Q2t_{1} Q_{1} + t_{2} Q_{2} can be expressed as follows: (E[exp(t1Q1+t2Q2)])2=det(In2t1A2t2B)=det(Γ1TΓ12t1Γ1TΛ1Γ12t2Γ1TΓ1BΓ1TΓ1)=detΓ1Tdet(In2t1Λ12t2Γ1BΓ1T)detΓ1T=det(In2t1Λ12t2D) \begin{align*} & \left( E \left[ \exp \left( t_{1} Q_{1} + t_{2} Q_{2} \right) \right] \right)^{-2} \\ =& \det \left( I_{n} - 2 t_{1} A - 2 t_{2} B \right) \\ =& \det \left( \Gamma_{1}^{T} \Gamma_{1} - 2 t_{1} \Gamma_{1}^{T} \Lambda_{1} \Gamma_{1} - 2 t_{2} \Gamma_{1}^{T} \Gamma_{1} B \Gamma_{1}^{T} \Gamma_{1} \right) \\ =& \det \Gamma_{1}^{T} \det \left( I_{n} - 2 t_{1} \Lambda_{1} - 2 t_{2} \Gamma_{1} B \Gamma_{1}^{T} \right) \det \Gamma_{1}^{T} \\ =& \det \left( I_{n} - 2 t_{1} \Lambda_{1} - 2 t_{2} D \right) \end{align*} Here, one intends to express D=Γ1BΓ1TRr×rD = \Gamma_{1} B \Gamma_{1}^{T} \in \mathbb{R}^{r \times r} as a block matrix in the following way: D=[D11D12D21D22] D = \begin{bmatrix} D_{11} & D_{12} \\ D_{21} & D_{22} \end{bmatrix}

Leaving DD as is and writing down the product of the moment generating functions of Q1Q_{1} and Q2Q_{2} gives: [E(expt1Q1)E(expt2Q2)]2=i=1r(12t1λi)det(In2t2D) \begin{align*} & \left[ E \left( \exp t_{1} Q_{1} \right) \cdot E \left( \exp t_{2} Q_{2} \right) \right]^{-2} \\ =& \prod_{i=1}^{r} \left( 1 - 2 t_{1} \lambda_{i} \right) \det \left( I_{n} - 2 t_{2} D \right) \end{align*}

A simple summary of the development so far yields: det(In2t1Λ12t2D)=i=1r(12t1λi)det(In2t2D) \det \left( I_{n} - 2 t_{1} \Lambda_{1} - 2 t_{2} D \right) = \prod_{i=1}^{r} \left( 1 - 2 t_{1} \lambda_{i} \right) \det \left( I_{n} - 2 t_{2} D \right) Here, it is unnecessary to actually calculate the determinant and examine every term, instead investigating the power of t1t_{1} suffices. While it is easy to observe that the coefficient of (2t1)r\left( - 2 t_{1} \right)^{r} on the RHS is i=1rλidet(In2t2D)\prod_{i=1}^{r} \lambda_{i} \det \left( I_{n} - 2 t_{2} D \right), the LHS is considerably more challenging. To proceed, consider breaking down the matrix within the determinant based on the block of DD. Since Λ1\Lambda_{1} can be written as Λ1=[Λ11OOO]\Lambda_{1} = \begin{bmatrix} \Lambda_{11} & O \\ O & O \end{bmatrix}, let det(Ir2t1Λ112t2D11)\det \left( I_{r} - 2 t_{1} \Lambda_{11} - 2 t_{2} D_{11} \right) denote the part corresponding to D11D_{11}, and det(Inr2t2D22)\det \left( I_{n-r} - 2 t_{2} D_{22} \right) denote the rest. From this, one can ascertain the coefficient of (2t1)r\left( - 2 t_{1} \right)^{r} is i=1rλidet(In22t2D22)\prod_{i=1}^{r} \lambda_{i} \det \left( I_{n-2} - 2 t_{2} D_{22} \right), and comparing solely the coefficients of t1t_{1} on both sides yields: det(Inr2t2D22)=det(In2t2D) \det \left( I_{n-r} - 2 t_{2} D_{22} \right) = \det \left( I_{n} - 2 t_{2} D \right)

The following two claims will now be made:

  • The non-00 eigenvalues of D22D_{22} and DD are the same.
  • The sum of squares of the elements of a symmetric real matrix is equal to the sum of squares of its eigenvalues.

Claim 1: The non-00 eigenvalues of D22D_{22} and DD are the same.

Properties of Determinants: Let A,BA,B be a n×nn\times n matrix, and kk a constant. The determinant has the following properties:

  • (a) det(kA)=kndet(A)\det(kA) = k^{n}\det(A)

If λ0\lambda \ne 0 is defined as λ:=(2t2)1\lambda := \left( 2 t_{2} \right)^{-1}, the following equivalence is obtained: det(Inr1λD22)=det(In1λD)    (λ)r(λ)nrdet(Inr1λD22)=(λ)ndet(In1λD)    (λ)rdet(D22λInr)=det(DλIn) \begin{align*} & \det \left( I_{n-r} - {\frac{ 1 }{ \lambda }} D_{22} \right) = \det \left( I_{n} - {\frac{ 1 }{ \lambda }} D \right) \\ \implies & \left( - \lambda \right)^{r} \cdot \left( - \lambda \right)^{n-r} \det \left( I_{n-r} - {\frac{ 1 }{ \lambda }} D_{22} \right) = \left( - \lambda \right)^{n} \det \left( I_{n} - {\frac{ 1 }{ \lambda }} D \right) \\ \implies & \left( - \lambda \right)^{r} \det \left( D_{22} - \lambda I_{n-r} \right) = \det \left( D - \lambda I_{n} \right) \end{align*}

Equivalence Condition for Eigenvalues: For λ\lambda to be an eigenvalue of AA, the following equivalence condition holds: det(AλI)=0 \det (A -\lambda I) = 0

Ultimately, since λ0\lambda \ne 0 satisfies both det(D22λInr)=0\det \left( D_{22} - \lambda I_{n-r} \right) = 0 and det(DλIn)=0\det \left( D - \lambda I_{n} \right) = 0 concurrently, the eigenvalues of D22D_{22} and DD are the same.

Claim 2: The sum of squares of the elements of a symmetric real matrix is equal to the sum of squares of its eigenvalues.

Trace of the Power of a Diagonalizable Matrix: Suppose ACn×nA \in \mathbb{C}^{n \times n} is a diagonalizable matrix and kNk \in \mathbb{N} a natural number. For eigenvalues λ1,,λn\lambda_{1} , \cdots , \lambda_{n} of AA, the following holds: trAk=i=1nλik \operatorname{tr} A^{k} = \sum_{i=1}^{n} \lambda_{i}^{k} Here, tr\operatorname{tr} is the trace.

Since the sum of squares of the elements of a symmetric matrix M=(mij)M = \left( m_{ij} \right) can be represented as the trace of the matrix squared trMTM\operatorname{tr} M^{T} M, the following holds: i,jmij2=trMTM=trM2=l=1nλl2 \sum_{i,j} m_{ij}^{2} = \operatorname{tr} M^{T} M = \operatorname{tr} M^{2} = \sum_{l=1}^{n} \lambda_{l}^{2}


Both DD and D22D_{22} are symmetric and real by definition, and hence diagonalizable. Consequently, the sum of squares of their elements equals the sum of squares of their respective eigenvalues, and hence, the sum of squares of the elements of DD equals that of D22D_{22}. However, since both matrices are real, D12D_{12}, D21D_{21}, and D11D_{11} must all be zero matrices. For the product of Λ1\Lambda_{1} and DD, the following is calculated: Λ1D=[Λ11OOOnr][OrOOD22]=On \Lambda_{1} D = \begin{bmatrix} \Lambda_{11} & O \\ O & O_{n-r} \end{bmatrix} \begin{bmatrix} O_{r} & O \\ O & D_{22} \end{bmatrix} = O_{n} Concurrently, since A=Γ1TΛ1Γ1A = \Gamma_{1}^{T} \Lambda_{1} \Gamma_{1} and D=Γ1BΓ1TD = \Gamma_{1} B \Gamma_{1}^{T} hold, one obtains: AB=Γ1TΛ1Γ1Γ1TDΓ1=Γ1TΛ1DΓ1=On \begin{align*} & A B \\ =& \Gamma_{1}^{T} \Lambda_{1} \Gamma_{1} \Gamma_{1}^{T} D \Gamma_{1} \\ =& \Gamma_{1}^{T} \Lambda_{1} D \Gamma_{1} \\ =& O_{n} \end{align*}


  1. Hogg et al. (2018). Introduction to Mathematical Statistcs(8th Edition): p562. ↩︎