The Moment Generating Function of a Quadratic Form of a Normally Distributed Random Vector 📂Mathematical Statistics

The Moment Generating Function of a Quadratic Form of a Normally Distributed Random Vector

Theorem

Let Sample $\mathbf{X} = \left( X_{1} , \cdots , X_{n} \right)$ be iid following a Normal Distribution such as $X_{1} , \cdots , X_{n} \overset{\text{iid}}{\sim} N \left( 0, \sigma^{2} \right)$. Consider a Symmetric Matrix $A \in \mathbb{R}^{n \times n}$ with Rank $r \le n$. The Moment Generating Function of the Quadratic Form of a Random Vector $Q = \sigma^{-2} \mathbf{X}^{T} A \mathbf{X}$ is expressed as follows: $$ M_{Q} (t) = \prod_{i=1}^{r} \left( 1 - 2 t \lambda_{i} \right)^{-1/2} = \det \left( I_{n} - 2 t A \right)^{-1/2} \qquad , | t | < 1 / 2 \lambda_{1} $$ where $I_{n} \in \mathbb{R}^{n \times n}$ is the Identity Matrix and $\det$ is the Determinant. The values $\lambda_{1} \ge \cdots \ge \lambda_{r}$ are the Eigenvalues of $A$, arranged in descending order without loss of generality, excluding $0$.

Description

This theorem is utilized in the Proof of the Hogg-Craig Theorem.

Proof ¹

Let $n$-dimensional Zero Vector be denoted as $\mathbf{0}_{n}$.

Spectral Decomposition: In Spectral Theory, $A = Q \Lambda Q^{\ast}$ is expressed in terms of a summation of Eigenpairs $\left\{ \left( \lambda_{k} , e_{k} \right) \right\}_{k=1}^{n}$ as follows: $$ A = \sum_{k=1}^{n} \lambda_{k} e_{k} e_{k}^{\ast} $$

Since $A$ is a Symmetric Matrix, $Q$ can be represented via spectral decomposition as follows: $$ \begin{align*} Q =& \sigma^{-2} \mathbf{X}^{T} A \mathbf{X} \\ =& \sigma^{-2} \mathbf{X}^{T} \sum_{i=1}^{n} \lambda_{i} e_{i} e_{i}^{T} \mathbf{X} \\ =& \sum_{i=1}^{r} \lambda_{i} \left( \mathbf{X}^{T} e_{i} \sigma^{-1} \right) \left( \sigma^{-1} e_{i}^{T} \mathbf{X} \right) \\ =& \sum_{i=1}^{r} \lambda_{i} \left( \sigma^{-1} e_{i}^{T} \mathbf{X} \right)^{T} \left( \sigma^{-1} e_{i}^{T} \mathbf{X} \right) \\ =& \sum_{i=1}^{r} \lambda_{i} \left( \sigma^{-1} e_{i}^{T} \mathbf{X} \right)^{2} \end{align*} $$ Let $\Gamma_{1} := \left( e_{1}^{T} , \cdots , e_{r}^{T} \right) \in \mathbb{R}^{r \times n}$, and if we consider the Random Vector $\mathbf{W}$ to be $\mathbf{W} = \sigma^{-1} \Gamma_{1} \mathbf{X}$, then $\mathbf{W} = \left( W_{1} , \cdots , W_{r} \right)$ becomes a $r$-dimensional random vector. $$ \begin{bmatrix} W_{1} \\ \vdots \\ W_{r} \end{bmatrix} = \mathbf{W} = \sigma^{-1} \Gamma_{1} \mathbf{X} = \begin{bmatrix} \sigma^{-1} e_{1}^{T} \mathbf{X} \\ \vdots \\ \sigma^{-1} e_{n}^{T} \mathbf{X} \end{bmatrix} $$ Therefore, $Q$ can be expressed as follows: $$ Q = \sum_{i=1}^{r} \lambda_{i} \left( \sigma^{-1} e_{i}^{T} \mathbf{X} \right)^{2} = \sum_{i=1}^{r} \lambda_{i} W_{i}^{2} $$

Since each component of the random vector $\mathbf{X}$ follows the normal distribution $N \left( 0 , \sigma^{2} \right)$, $\mathbf{X}$ follows a Multivariate Normal Distribution $N_{n} \left( \mathbf{0}_{n} , \sigma^{2} I_{n} \right)$ and it inherently follows from the definition of $\Gamma_{1}$ that it is $\Gamma_{1} \Gamma_{1}^{T} = I_{r}$.

Normality of Linear Transformations of Multivariate Normal Distributions: Given a Matrix $A \in \mathbb{R}^{m \times n}$ and a Vector $\mathbf{b} \in \mathbb{R}^{m}$, the Linear Transformation $\mathbf{Y} = A \mathbf{X} + \mathbf{b}$ of a Random Vector following a Multivariate Normal Distribution still follows a multivariate normal distribution $N_{m} \left( A \mu + \mathbf{b} , A \Sigma A^{T} \right)$.

By the Normality of Linear Transformations of Multivariate Normal Distributions, $\mathbf{W}$ is found to follow the $r$-dimensional multivariate normal distribution $N_{r} \left( \mathbf{0}_{r} , I_{r} \right)$ as follows: $$ \begin{align*} \mathbf{W} =& \sigma^{-1} \Gamma_{1} \mathbf{X} + \mathbf{0}_{r} \\ \implies \mathbf{W} \sim & N_{r} \left( \sigma^{-1} \Gamma_{1} \mathbf{0}_{n} + \mathbf{0}_{r} , \left( \sigma^{-1} \Gamma_{1} \right) \left( \sigma^{2} I_{n} \right) \left( \sigma^{-1} \Gamma_{1} \right)^{T} \right) \\ \implies \mathbf{W} \sim & N_{r} \left( \mathbf{0}_{r} + \mathbf{0}_{r} , \Gamma_{1} I_{n} \Gamma_{1}^{T} \right) \\ \implies \mathbf{W} \sim & N_{r} \left( \mathbf{0}_{r} , I_{r} \right) \end{align*} $$

Derivation of the Chi-Squared Distribution in Standard Normal Distribution: If $X \sim N(\mu,\sigma ^2)$, $$ V=\left( { X - \mu \over \sigma} \right) ^2 \sim \chi ^2 (1) $$

This indicates that the components $W_{1} , \cdots , W_{r}$ of $\mathbf{W}$ are all iid following a Standard Normal Distribution, and $W_{i}^{2}$ follows a Chi-Squared Distribution $\chi^{2} (1)$.

Moment Generating Function of the Chi-Squared Distribution: The Moment Generating Function of a random variable following a chi-squared distribution with Degrees of Freedom $r$ is expressed as follows: $$m(t) = (1-2t)^{-r/2} \qquad , t < {{ 1 } \over { 2 }}$$

Therefore, since $Q$ is a Linear Combination of Random Variables following a chi-squared distribution, its Moment Generating Function is as follows: $$ \begin{align*} & M_{Q} (t) \\ =& E \left[ \exp \left( t Q \right) \right] \\ =& E \left[ t \exp \left( \sum_{i=1}^{r} \lambda_{i} W_{i}^{2} \right) \right] \\ =& \prod_{i=1}^{r} E \left[ \exp \left( t \lambda_{i} W_{i}^{2} \right) \right] \\ =& \prod_{i=1}^{r} \left( 1 - 2 t \lambda_{i} \right)^{-1/2} \qquad , | t | < 1 / 2 \lambda_{1} \end{align*} $$

Properties of Orthogonal Matrices: The determinant of an orthogonal matrix is either $1$ or $-1$.

Finally, the determinant of the orthogonal matrix $\Gamma_{1}$ is $\pm 1$, $$ 1 = \det I_{n} = \det \Gamma_{1}^{T} \det \Gamma_{1} $$ thus, whether $1$ or $-1$, the determinants of $\Gamma_{1}$ and $\Gamma_{1}^{T}$ are congruent in sign. By obtaining the determinant of $I_{n} - 2 t A$, an alternative form of $M_{Q} (t)$ can be derived. $$ \begin{align*} & \det \left( I_{n} - 2 t A \right) \\ =& \det \left( \Gamma_{1}^{T} \Gamma - 2 t \Gamma_{1}^{T} \Lambda \Gamma_{1} \right) \\ =& \det \left( \Gamma_{1}^{T} \left( I_{n} - 2 t \Lambda \right) \Gamma_{1} \right) \\ =& \det \Gamma_{1}^{T} \det \left( I_{n} - 2 t \Lambda \right) \det \Gamma_{1} \\ =& \left( \pm 1 \right) \cdot \det \left( I_{n} - 2 t \Lambda \right) \cdot \left( \pm 1 \right) \\ =& \det \left( I_{n} - 2 t \Lambda \right) \\ =& \det \begin{bmatrix} 1 - 2 t \lambda_{1} & 0 & \cdots & 0 \\ 0 & 1 - 2 t \lambda_{2} & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & 1 \end{bmatrix} \\ =& \prod_{i=1}^{r} \left( 1 - 2 t \lambda_{i} \right) \\ =& \left[ \prod_{i=1}^{r} \left( 1 - 2 t \lambda_{i} \right)^{-1/2} \right]^{-2} \end{align*} $$ The proof is concluded by raising both sides to the power $-2$. $$ \det \left( I_{n} - 2 t A \right)^{-1/2} = \prod_{i=1}^{r} \left( 1 - 2 t \lambda_{i} \right)^{-1/2} $$

■

Hogg et al. (2018). Introduction to Mathematical Statistcs(8th Edition): p557~558. ↩︎