logo

Conditional Mean and Variance of the Multivariate Normal Distribution 📂Probability Distribution

Conditional Mean and Variance of the Multivariate Normal Distribution

Formulas

Bivariate Normal Distribution

$$ \left( X, Y \right) \sim N_{2} \left( \begin{bmatrix} \mu_{1} \\ \mu_{n} \end{bmatrix} , \begin{bmatrix} \sigma_{X}^{2} & \rho \sigma_{X} \sigma_{Y} \\ \rho \sigma_{X} \sigma_{Y} & \sigma_{Y}^{2} \end{bmatrix} \right) $$

When a random vector $\left( X,Y \right)$ follows a bivariate normal distribution as above, $X | Y$ follows a univariate normal distribution and the conditional mean and variance are as follows. $$ \begin{align*} E \left( X | Y \right) =& \mu_{X} + \rho {{ \sigma_{X} } \over { \sigma_{Y} }} \left( Y - \mu_{Y} \right) \\ \operatorname{Var} \left( X | Y \right) =& \left( 1 - \rho^{2} \right) \sigma_{X}^{2} \end{align*} $$

Multivariate Normal Distribution 1

$$ \begin{align*} \mathbf{X} =& \begin{bmatrix} \mathbf{X}_{1} \\ \mathbf{X}_{2} \end{bmatrix} & : \Omega \to \mathbb{R}^{n} \\ \mu =& \begin{bmatrix} \mu_{1} \\ \mu_{2} \end{bmatrix} & \in \mathbb{R}^{n} \\ \Sigma =& \begin{bmatrix} \Sigma_{11} & \Sigma_{12} \\ \Sigma_{21} & \Sigma_{22} \end{bmatrix} & \in \mathbb{R}^{n \times n} \end{align*} $$ Given a random vector $\mathbf{X} \sim N_{n} \left( \mu , \Sigma \right)$ that follows a multivariate normal distribution represented in Jordan block form as $\mathbf{X}$, $\mu$, $\Sigma$, the conditional probability vector $\mathbf{X}_{1} | \mathbf{X}_{2} : \Omega \to \mathbb{R}^{m}$ still follows a multivariate normal distribution and specifically has the following mean vector and covariance matrix. $$ \mathbf{X}_{1} | \mathbf{X}_{2} \sim N_{m} \left( \mu_{1} + \Sigma_{12} \Sigma_{22}^{-1} \left( \mathbf{X}_{2} - \mu_{2} \right) , \Sigma_{11} - \Sigma_{12} \Sigma_{22}^{-1} \Sigma_{21} \right) $$

Derivations

Simple, Complicated, and Special Derivations

Children not familiar with matrix algebra might find even the bivariate normal distribution challenging. The following derivation process is simple enough to be followed by high school graduates, yet the development itself is complicated, stopping at the mean and variance of the bivariate normal distribution.

$$ f(x,y) = {{ 1 } \over { 2 \pi \sigma_{X} \sigma_{Y} \sqrt{1 - \rho^{2}} }} \exp \left[ - {{ 1 } \over { 2 \left( 1 - \rho^{2} \right) }} \left[ \left( {{ x - \mu_{X} } \over { \sigma_{X} }} \right)^{2} + \left( {{ y - \mu_{Y} } \over { \sigma_{Y} }} \right)^{2}- 2 \rho {{ \left( x - \mu_{X} \right) \left( y - \mu_{Y} \right) } \over { \sigma_{X} \sigma_{Y} }} \right] \right] $$ The probability density function of the bivariate normal distribution is as follows.

$$ \begin{align*} & f \left( x | y \right) \\ =& {{ f \left( x , y \right) } \over { f_{Y}(y) }} \\ =& {{ {{ 1 } \over { 2 \pi \sigma_{X} \sigma_{Y} \sqrt{1 - \rho^{2}} }} e^{\left[ - {{ 1 } \over { 2 \left( 1 - \rho^{2} \right) }} \left[ \left( {{ x - \mu_{X} } \over { \sigma_{X} }} \right)^{2} + \left( {{ y - \mu_{Y} } \over { \sigma_{Y} }} \right)^{2}- 2 \rho {{ \left( x - \mu_{X} \right) \left( y - \mu_{Y} \right) } \over { \sigma_{X} \sigma_{Y} }} \right] \right]} } \over { {{ 1 } \over { \sqrt{2 \pi} \sigma_{Y} }} e^{ \left[ - {{ 1 } \over { 2 }} \left( {{ y - \mu_{Y} } \over { \sigma_{Y} }} \right)^{2} \right] } }} \\ =& {{ 1 } \over { \sqrt{2 \pi} \sigma_{X} \sqrt{1 - \rho^{2}} }} {{ e^{\left[ - {{ 1 } \over { 2 \left( 1 - \rho^{2} \right) }} \left[ \left( {{ x - \mu_{X} } \over { \sigma_{X} }} \right)^{2} + \left( {{ y - \mu_{Y} } \over { \sigma_{Y} }} \right)^{2}- 2 \rho {{ \left( x - \mu_{X} \right) \left( y - \mu_{Y} \right) } \over { \sigma_{X} \sigma_{Y} }} \right] \right]} } \over { e^{\left[ - {{ 1 - \rho^{2} } \over { 2 \left( 1 - \rho^{2} \right) }} \left( {{ y - \mu_{Y} } \over { \sigma_{Y} }} \right)^{2} \right]} }} \\ =& {{ 1 } \over { \sqrt{2 \pi} \sigma_{X} \sqrt{1 - \rho^{2}} }} \exp \left[ - {{ 1 } \over { 2 \left( 1 - \rho^{2} \right) }} \left[ \left( {{ x - \mu_{X} } \over { \sigma_{X} }} \right)^{2} + \rho^{2} \left( {{ y - \mu_{Y} } \over { \sigma_{Y} }} \right)^{2}- 2 \rho {{ \left( x - \mu_{X} \right) \left( y - \mu_{Y} \right) } \over { \sigma_{X} \sigma_{Y} }} \right] \right] \\ =& {{ 1 } \over { \sqrt{2 \pi} \sigma_{X} \sqrt{1 - \rho^{2}} }} \exp \left[ - {{ 1 } \over { 2 \left( 1 - \rho^{2} \right) }} \left[ \left( {{ x - \mu_{X} } \over { \sigma_{X} }} \right) - \rho \left( {{ y - \mu_{Y} } \over { \sigma_{Y} }} \right) \right]^{2} \right] \\ =& {{ 1 } \over { \sqrt{2 \pi} \sigma_{X} \sqrt{1 - \rho^{2}} }} \exp \left[ - {{ 1 } \over { 2 \sigma_{X}^{2} \left( 1 - \rho^{2} \right) }} \left[ x - \mu_{X} - \rho \sigma_{X} \left( {{ y - \mu_{Y} } \over { \sigma_{Y} }} \right) \right]^{2} \right] \\ =& {{ 1 } \over { \sqrt{2 \pi} \sigma_{X} \sqrt{1 - \rho^{2}} }} \exp \left[ - {{ 1 } \over { 2 \sigma_{X}^{2} \left( 1 - \rho^{2} \right) }} \left[ x - \mu_{X} - \rho {{ \sigma_{X} } \over { \sigma_{Y} }} \left( y - \mu_{Y} \right) \right]^{2} \right] \end{align*} $$

This is the same as the probability density function of the univariate normal distribution with mean $\mu_{X} + \rho {{ \sigma_{X} } \over { \sigma_{Y} }} \left( Y - \mu_{Y} \right)$ and variance $\left( 1 - \rho^{2} \right) \sigma_{X}^{2}$.

Difficult, Simple, and General Derivations

Let $\mathbf{W} := \mathbf{X}_{1} - \Sigma_{12} \Sigma_{22}^{-1} \mathbf{X}_{2}$ and $p := (m - n) \in \mathbb{N}$. Then, for the identity matrix $I_{k} \in \mathbb{R}^{k \times k}$ and zero matrix $O \in \mathbb{R}^{p \times m}$, it can be represented as follows. $$ \begin{bmatrix} \mathbf{W} \\ \mathbf{X}_{2} \end{bmatrix} = \begin{bmatrix} I_{m} & - \Sigma_{12} \Sigma_{22}^{-1} \\ O & I_{p} \end{bmatrix} \begin{bmatrix} \mathbf{X}_{1} \\ \mathbf{X}_{2} \end{bmatrix} $$

Normality of Linear Transformations: For a matrix $A \in \mathbb{R}^{m \times n}$ and a vector $\mathbf{b} \in \mathbb{R}^{m}$, the linear transformation $\mathbf{Y} = A \mathbf{X} + \mathbf{b}$ of a random vector $\mathbf{X} \sim N_{n} \left( \mu , \Sigma \right)$ following a multivariate normal distribution still follows a multivariate normal distribution $N_{m} \left( A \mu + \mathbf{b} , A \Sigma A^{T} \right)$.

$\mathbf{W}$, being a linear transformation of a multivariate normal distribution, has the following covariance matrix with mean vector $$ E \mathbf{W} = \mu_{1} - \Sigma_{12} \Sigma_{22}^{-1} \mu_{2} $$ and $A = \Sigma_{12} \Sigma_{22}^{-1}$. $$ \begin{align*} \operatorname{Cov} \mathbf{X} =& A \Sigma A^{T} \\ =& \begin{bmatrix} I_{m} & - \Sigma_{12} \Sigma_{22}^{-1} \\ O & I_{p} \end{bmatrix} \begin{bmatrix} \Sigma_{11} & \Sigma_{12} \\ \Sigma_{21} & \Sigma_{22} \end{bmatrix} \begin{bmatrix} I_{m} & O^{T} \\ - \Sigma_{22}^{-1} \Sigma_{21} & I_{p} \end{bmatrix} \\ =& \begin{bmatrix} \Sigma_{11} - \Sigma_{12} \Sigma_{22}^{-1} \Sigma_{21} & O^{T} \\ O & \Sigma_{22} \end{bmatrix} \\ \implies \operatorname{Cov} \mathbf{W} =& \Sigma_{11} - \Sigma_{12} \Sigma_{22}^{-1} \Sigma_{21} \end{align*} $$ Here, $A^{T}$ is the transpose of $A$.

Equivalence of Independence and Zero Covariance: Let’s assume that $\left( \mathbf{X}_{1} , \mathbf{X}_{2} \right) \sim N_{n} \left( \mu , \Sigma \right)$ follows a multivariate normal distribution. $$ \mathbf{X}_{1} \perp \mathbf{X}_{2} \iff \Sigma_{12} = \Sigma_{21} = O $$

Since $\mathbf{X}$ follows a multivariate normal distribution, zero covariance $\operatorname{Cov} \left( \mathbf{W} , \mathbf{X}_{2} \right) = O$ guarantees the independence of $\mathbf{W}$ and $\mathbf{X}_{2}$. Therefore, $\mathbf{W} | \mathbf{X}_{2}$ is simply $\mathbf{W}$ itself without any condition, $$ \mathbf{W} | \mathbf{X}_{2} \sim N_{m} \left( \mu_{1} - \Sigma_{12} \Sigma_{22}^{-1} \mu_{2} , \Sigma_{11} - \Sigma_{12} \Sigma_{22}^{-1} \Sigma_{21} \right) $$ and it can be represented as follows. Initially, $\mathbf{W}$ was defined to satisfy $\mathbf{X}_{1} = \mathbf{W} + \Sigma_{12} \Sigma_{22}^{-1} \mathbf{X}_{2}$, thus reversing the transformation yields the following. $$ \mathbf{X}_{1} | \mathbf{X}_{2} \sim N_{m} \left( \mu_{1} + \Sigma_{12} \Sigma_{22}^{-1} \left( \mathbf{X}_{2} - \mu_{2} \right) , \Sigma_{11} - \Sigma_{12} \Sigma_{22}^{-1} \Sigma_{21} \right) $$


  1. Hogg et al. (2013). Introduction to Mathematical Statistcs(7th Edition): p185. ↩︎