Proof of Craig's Theorem
📂Mathematical StatisticsProof of Craig's Theorem
Theorem
Let the sample X=(X1,⋯,Xn) be iid and follow a normal distribution like X1,⋯,Xn∼iidN(0,σ2). For a symmetric matrix A,B∈Rn×n, with respect to the random variables Q1 and Q2, which are defined as quadratic forms in random vectors like Q1:=σ−2XTAX and Q2:=σ−2XTBX, the following holds.
Q1⊥Q2⟺AB=On
In other words, Q1 and Q2 being independent is equivalent to AB=On.
Theorem
This theorem is used in the proof of the Hoegh-Craig theorem.
Proof
Let 0n be a n-dimensional zero vector and In be a n×n identity matrix. Suppose rankA=r≤n and rankB=s≤n hold, and let λ1,⋯,λr be a non-zero eigenvalue of A.
Spectral Theory: If A is a Hermitian matrix, it is unitarily diagonalizable:
A=A∗⟹A=Q∗ΛQ
Since the real matrix A is symmetric, it is a Hermitian matrix and there exists its spectral decomposition Γ1TΛ1Γ1. Without loss of generality, suppose the first r diagonal elements of Λ1 are the eigenvalue λ1,⋯,λr, and the remaining are 0. When the submatrix of Γ1T∈Rn×n is Γ11T∈Rn×r, the submatrix of Λ1 becomes Λ11=diag(λ1,⋯,λr). In this notation, A can be specified in two ways as follows:
A=Γ1TΛ1Γ1=Γ11TΛ11Γ11
Define the random vector W1 as W1=σ−1Γ11X, then Q1 can be expressed as follows:
Q1====σ−2XTAXσ−2XTΓ11TΛ11Γ11X(σ−1Γ11X)TΛ11(σ−1Γ11X)W1TΛ11W1
Through this same process, express Q2 with respect to B, which has rank s as follows:
B==⟹Q2=Γ2TΛ2Γ2Γ21TΛ22Γ21W2TΛ22W2,Λ22∈Rs×s,W2:=σ−1Γ21X
If we let W:=[W1W2], it can also be expressed as follows:
W=σ−1[Γ11Γ21]X
Normality of the Linear Transformation of Multivariate Normal Distribution: For a matrix A∈Rm×n and vector b∈Rm, the linear transformation Y=AX+b of a random vector that follows a multivariate normal distribution still follows the multivariate normal distribution Nm(Aμ+b,AΣAT).
Since the sample X follows a n-variate normal distribution Nn(0n,In), its linear transformation W also follows a multivariate normal distribution with a population mean of 0r+s and a covariance matrix as follows:
VarW=[IrΓ21Γ11TΓ11Γ21TIs]
Now, the product AB of A and B can be represented as follows for U:=Γ11TΛ11 and V:=Λ22Γ21.
AB====Γ11TΛ11Γ11⋅Γ21TΛ22Γ21[Γ11TΛ11]Γ11Γ21T[Λ22Γ21]UΓ11Γ21TVU[Γ11Γ21TV]
Assume (⟸) is AB=On.
Since U=Γ11TΛ11 has full rank, the kernel of U should be the trivial vector space {0r} containing only zero vectors, and for the following to hold, Γ11Γ21TV must be a zero matrix.
On=AB=U[Γ11Γ21TV]
Meanwhile, since V also has full rank, Γ21Γ11T must be a zero matrix to satisfy the following:
VTΓ21Γ11T=Os
Using similar logic for BTAT=On, one can show that Γ11Γ21T is a zero matrix, thus the covariance matrix of W is as follows:
VarW=[IrΓ21Γ11TΓ11Γ21TIs]=[IrOOIs]
In Multivariate Normal Distribution, Independence and Zero Correlation Are Equivalent:
X=μ=Σ=[X1X2][μ1μ2][Σ11Σ21Σ12Σ22]:Ω→Rn∈Rn∈Rn×n
For X, μ, and Σ written in Jordan block form, given a random vector X∼Nn(μ,Σ) that follows a multivariate normal distribution, the following holds:
X1⊥X2⟺Σ12=Σ21=O
Since the sample X has normality in the theorem, one can infer that W1⊥W2, and thus Q1=W1TΛ11W1 and Q2=W2TΛ22W2 expressed as functions of these, are also independent of each other.
Assume (⟹) is Q1⊥Q2.
If Q1 and Q2 are independent, the following holds for (t1,t2) in the neighborhood of (0,0).
E[exp(t1Q1+t2Q2)]=E(expt1Q1)⋅E(expt2Q2)
Here, t1Q1+t2Q2 can be expressed as follows:
==t1Q1+t2Q2t1XTAX+t2XTBXXT(t1A+t2B)X
Since t1A+t2B is still a symmetric matrix, t1Q1+t2Q2 is also a quadratic form in random vectors.
Moment Generating Function of Quadratic Form in Normal Random Vector: Suppose the sample X=(X1,⋯,Xn) is iid and follows a normal distribution like X1,⋯,Xn∼iidN(0,σ2). The moment generating function of quadratic form in random vectors Q=σ−2XTAX with rank r≤n and symmetric matrix A∈Rn×n is given as follows:
MQ(t)=i=1∏r(1−2tλi)−1/2=det(In−2tA)−1/2,∣t∣<1/2λ1
Here, In∈Rn×n is the identity matrix, and det is the determinant. λ1≥⋯≥λr lists the non-zero eigenvalues of A in descending order without loss of generality.
Properties of Orthogonal Matrices: The determinant of an orthogonal matrix is either 1 or −1.
The determinant of the orthogonal matrix Γ1 is ±1, so whether it is 1 or −1, the determinants of Γ1 and Γ1T preserve sign correlation. Thus, the moment generating function of t1Q1+t2Q2 can be expressed as follows:
====(E[exp(t1Q1+t2Q2)])−2det(In−2t1A−2t2B)det(Γ1TΓ1−2t1Γ1TΛ1Γ1−2t2Γ1TΓ1BΓ1TΓ1)detΓ1Tdet(In−2t1Λ1−2t2Γ1BΓ1T)detΓ1Tdet(In−2t1Λ1−2t2D)
Here, one intends to express D=Γ1BΓ1T∈Rr×r as a block matrix in the following way:
D=[D11D21D12D22]
Leaving D as is and writing down the product of the moment generating functions of Q1 and Q2 gives:
=[E(expt1Q1)⋅E(expt2Q2)]−2i=1∏r(1−2t1λi)det(In−2t2D)
A simple summary of the development so far yields:
det(In−2t1Λ1−2t2D)=i=1∏r(1−2t1λi)det(In−2t2D)
Here, it is unnecessary to actually calculate the determinant and examine every term, instead investigating the power of t1 suffices. While it is easy to observe that the coefficient of (−2t1)r on the RHS is ∏i=1rλidet(In−2t2D), the LHS is considerably more challenging. To proceed, consider breaking down the matrix within the determinant based on the block of D. Since Λ1 can be written as Λ1=[Λ11OOO], let det(Ir−2t1Λ11−2t2D11) denote the part corresponding to D11, and det(In−r−2t2D22) denote the rest. From this, one can ascertain the coefficient of (−2t1)r is ∏i=1rλidet(In−2−2t2D22), and comparing solely the coefficients of t1 on both sides yields:
det(In−r−2t2D22)=det(In−2t2D)
The following two claims will now be made:
- The non-0 eigenvalues of D22 and D are the same.
- The sum of squares of the elements of a symmetric real matrix is equal to the sum of squares of its eigenvalues.
Claim 1: The non-0 eigenvalues of D22 and D are the same.
Properties of Determinants: Let A,B be a n×n matrix, and k a constant. The determinant has the following properties:
- (a) det(kA)=kndet(A)
If λ=0 is defined as λ:=(2t2)−1, the following equivalence is obtained:
⟹⟹det(In−r−λ1D22)=det(In−λ1D)(−λ)r⋅(−λ)n−rdet(In−r−λ1D22)=(−λ)ndet(In−λ1D)(−λ)rdet(D22−λIn−r)=det(D−λIn)
Equivalence Condition for Eigenvalues: For λ to be an eigenvalue of A, the following equivalence condition holds:
det(A−λI)=0
Ultimately, since λ=0 satisfies both det(D22−λIn−r)=0 and det(D−λIn)=0 concurrently, the eigenvalues of D22 and D are the same.
Claim 2: The sum of squares of the elements of a symmetric real matrix is equal to the sum of squares of its eigenvalues.
Trace of the Power of a Diagonalizable Matrix: Suppose A∈Cn×n is a diagonalizable matrix and k∈N a natural number. For eigenvalues λ1,⋯,λn of A, the following holds:
trAk=i=1∑nλik
Here, tr is the trace.
Since the sum of squares of the elements of a symmetric matrix M=(mij) can be represented as the trace of the matrix squared trMTM, the following holds:
i,j∑mij2=trMTM=trM2=l=1∑nλl2
Both D and D22 are symmetric and real by definition, and hence diagonalizable. Consequently, the sum of squares of their elements equals the sum of squares of their respective eigenvalues, and hence, the sum of squares of the elements of D equals that of D22. However, since both matrices are real, D12, D21, and D11 must all be zero matrices. For the product of Λ1 and D, the following is calculated:
Λ1D=[Λ11OOOn−r][OrOOD22]=On
Concurrently, since A=Γ1TΛ1Γ1 and D=Γ1BΓ1T hold, one obtains:
===ABΓ1TΛ1Γ1Γ1TDΓ1Γ1TΛ1DΓ1On
■