logo

Conditional Mean and Variance of the Multivariate Normal Distribution 📂Probability Distribution

Conditional Mean and Variance of the Multivariate Normal Distribution

Formulas

Bivariate Normal Distribution

(X,Y)N2([μ1μn],[σX2ρσXσYρσXσYσY2]) \left( X, Y \right) \sim N_{2} \left( \begin{bmatrix} \mu_{1} \\ \mu_{n} \end{bmatrix} , \begin{bmatrix} \sigma_{X}^{2} & \rho \sigma_{X} \sigma_{Y} \\ \rho \sigma_{X} \sigma_{Y} & \sigma_{Y}^{2} \end{bmatrix} \right)

When a random vector (X,Y)\left( X,Y \right) follows a bivariate normal distribution as above, XYX | Y follows a univariate normal distribution and the conditional mean and variance are as follows. E(XY)=μX+ρσXσY(YμY)Var(XY)=(1ρ2)σX2 \begin{align*} E \left( X | Y \right) =& \mu_{X} + \rho {{ \sigma_{X} } \over { \sigma_{Y} }} \left( Y - \mu_{Y} \right) \\ \Var \left( X | Y \right) =& \left( 1 - \rho^{2} \right) \sigma_{X}^{2} \end{align*}

Multivariate Normal Distribution 1

X=[X1X2]:ΩRnμ=[μ1μ2]RnΣ=[Σ11Σ12Σ21Σ22]Rn×n \begin{align*} \mathbf{X} =& \begin{bmatrix} \mathbf{X}_{1} \\ \mathbf{X}_{2} \end{bmatrix} & : \Omega \to \mathbb{R}^{n} \\ \mu =& \begin{bmatrix} \mu_{1} \\ \mu_{2} \end{bmatrix} & \in \mathbb{R}^{n} \\ \Sigma =& \begin{bmatrix} \Sigma_{11} & \Sigma_{12} \\ \Sigma_{21} & \Sigma_{22} \end{bmatrix} & \in \mathbb{R}^{n \times n} \end{align*} Given a random vector XNn(μ,Σ)\mathbf{X} \sim N_{n} \left( \mu , \Sigma \right) that follows a multivariate normal distribution represented in Jordan block form as X\mathbf{X}, μ\mu, Σ\Sigma, the conditional probability vector X1X2:ΩRm\mathbf{X}_{1} | \mathbf{X}_{2} : \Omega \to \mathbb{R}^{m} still follows a multivariate normal distribution and specifically has the following mean vector and covariance matrix. X1X2Nm(μ1+Σ12Σ221(X2μ2),Σ11Σ12Σ221Σ21) \mathbf{X}_{1} | \mathbf{X}_{2} \sim N_{m} \left( \mu_{1} + \Sigma_{12} \Sigma_{22}^{-1} \left( \mathbf{X}_{2} - \mu_{2} \right) , \Sigma_{11} - \Sigma_{12} \Sigma_{22}^{-1} \Sigma_{21} \right)

Derivations

Simple, Complicated, and Special Derivations

Children not familiar with matrix algebra might find even the bivariate normal distribution challenging. The following derivation process is simple enough to be followed by high school graduates, yet the development itself is complicated, stopping at the mean and variance of the bivariate normal distribution.

f(x,y)=12πσXσY1ρ2exp[12(1ρ2)[(xμXσX)2+(yμYσY)22ρ(xμX)(yμY)σXσY]] f(x,y) = {{ 1 } \over { 2 \pi \sigma_{X} \sigma_{Y} \sqrt{1 - \rho^{2}} }} \exp \left[ - {{ 1 } \over { 2 \left( 1 - \rho^{2} \right) }} \left[ \left( {{ x - \mu_{X} } \over { \sigma_{X} }} \right)^{2} + \left( {{ y - \mu_{Y} } \over { \sigma_{Y} }} \right)^{2}- 2 \rho {{ \left( x - \mu_{X} \right) \left( y - \mu_{Y} \right) } \over { \sigma_{X} \sigma_{Y} }} \right] \right] The probability density function of the bivariate normal distribution is as follows.

f(xy)=f(x,y)fY(y)=12πσXσY1ρ2e[12(1ρ2)[(xμXσX)2+(yμYσY)22ρ(xμX)(yμY)σXσY]]12πσYe[12(yμYσY)2]=12πσX1ρ2e[12(1ρ2)[(xμXσX)2+(yμYσY)22ρ(xμX)(yμY)σXσY]]e[1ρ22(1ρ2)(yμYσY)2]=12πσX1ρ2exp[12(1ρ2)[(xμXσX)2+ρ2(yμYσY)22ρ(xμX)(yμY)σXσY]]=12πσX1ρ2exp[12(1ρ2)[(xμXσX)ρ(yμYσY)]2]=12πσX1ρ2exp[12σX2(1ρ2)[xμXρσX(yμYσY)]2]=12πσX1ρ2exp[12σX2(1ρ2)[xμXρσXσY(yμY)]2] \begin{align*} & f \left( x | y \right) \\ =& {{ f \left( x , y \right) } \over { f_{Y}(y) }} \\ =& {{ {{ 1 } \over { 2 \pi \sigma_{X} \sigma_{Y} \sqrt{1 - \rho^{2}} }} e^{\left[ - {{ 1 } \over { 2 \left( 1 - \rho^{2} \right) }} \left[ \left( {{ x - \mu_{X} } \over { \sigma_{X} }} \right)^{2} + \left( {{ y - \mu_{Y} } \over { \sigma_{Y} }} \right)^{2}- 2 \rho {{ \left( x - \mu_{X} \right) \left( y - \mu_{Y} \right) } \over { \sigma_{X} \sigma_{Y} }} \right] \right]} } \over { {{ 1 } \over { \sqrt{2 \pi} \sigma_{Y} }} e^{ \left[ - {{ 1 } \over { 2 }} \left( {{ y - \mu_{Y} } \over { \sigma_{Y} }} \right)^{2} \right] } }} \\ =& {{ 1 } \over { \sqrt{2 \pi} \sigma_{X} \sqrt{1 - \rho^{2}} }} {{ e^{\left[ - {{ 1 } \over { 2 \left( 1 - \rho^{2} \right) }} \left[ \left( {{ x - \mu_{X} } \over { \sigma_{X} }} \right)^{2} + \left( {{ y - \mu_{Y} } \over { \sigma_{Y} }} \right)^{2}- 2 \rho {{ \left( x - \mu_{X} \right) \left( y - \mu_{Y} \right) } \over { \sigma_{X} \sigma_{Y} }} \right] \right]} } \over { e^{\left[ - {{ 1 - \rho^{2} } \over { 2 \left( 1 - \rho^{2} \right) }} \left( {{ y - \mu_{Y} } \over { \sigma_{Y} }} \right)^{2} \right]} }} \\ =& {{ 1 } \over { \sqrt{2 \pi} \sigma_{X} \sqrt{1 - \rho^{2}} }} \exp \left[ - {{ 1 } \over { 2 \left( 1 - \rho^{2} \right) }} \left[ \left( {{ x - \mu_{X} } \over { \sigma_{X} }} \right)^{2} + \rho^{2} \left( {{ y - \mu_{Y} } \over { \sigma_{Y} }} \right)^{2}- 2 \rho {{ \left( x - \mu_{X} \right) \left( y - \mu_{Y} \right) } \over { \sigma_{X} \sigma_{Y} }} \right] \right] \\ =& {{ 1 } \over { \sqrt{2 \pi} \sigma_{X} \sqrt{1 - \rho^{2}} }} \exp \left[ - {{ 1 } \over { 2 \left( 1 - \rho^{2} \right) }} \left[ \left( {{ x - \mu_{X} } \over { \sigma_{X} }} \right) - \rho \left( {{ y - \mu_{Y} } \over { \sigma_{Y} }} \right) \right]^{2} \right] \\ =& {{ 1 } \over { \sqrt{2 \pi} \sigma_{X} \sqrt{1 - \rho^{2}} }} \exp \left[ - {{ 1 } \over { 2 \sigma_{X}^{2} \left( 1 - \rho^{2} \right) }} \left[ x - \mu_{X} - \rho \sigma_{X} \left( {{ y - \mu_{Y} } \over { \sigma_{Y} }} \right) \right]^{2} \right] \\ =& {{ 1 } \over { \sqrt{2 \pi} \sigma_{X} \sqrt{1 - \rho^{2}} }} \exp \left[ - {{ 1 } \over { 2 \sigma_{X}^{2} \left( 1 - \rho^{2} \right) }} \left[ x - \mu_{X} - \rho {{ \sigma_{X} } \over { \sigma_{Y} }} \left( y - \mu_{Y} \right) \right]^{2} \right] \end{align*}

This is the same as the probability density function of the univariate normal distribution with mean μX+ρσXσY(YμY)\mu_{X} + \rho {{ \sigma_{X} } \over { \sigma_{Y} }} \left( Y - \mu_{Y} \right) and variance (1ρ2)σX2\left( 1 - \rho^{2} \right) \sigma_{X}^{2}.

Difficult, Simple, and General Derivations

Let W:=X1Σ12Σ221X2\mathbf{W} := \mathbf{X}_{1} - \Sigma_{12} \Sigma_{22}^{-1} \mathbf{X}_{2} and p:=(mn)Np := (m - n) \in \mathbb{N}. Then, for the identity matrix IkRk×kI_{k} \in \mathbb{R}^{k \times k} and zero matrix ORp×mO \in \mathbb{R}^{p \times m}, it can be represented as follows. [WX2]=[ImΣ12Σ221OIp][X1X2] \begin{bmatrix} \mathbf{W} \\ \mathbf{X}_{2} \end{bmatrix} = \begin{bmatrix} I_{m} & - \Sigma_{12} \Sigma_{22}^{-1} \\ O & I_{p} \end{bmatrix} \begin{bmatrix} \mathbf{X}_{1} \\ \mathbf{X}_{2} \end{bmatrix}

Normality of Linear Transformations: For a matrix ARm×nA \in \mathbb{R}^{m \times n} and a vector bRm\mathbf{b} \in \mathbb{R}^{m}, the linear transformation Y=AX+b\mathbf{Y} = A \mathbf{X} + \mathbf{b} of a random vector XNn(μ,Σ)\mathbf{X} \sim N_{n} \left( \mu , \Sigma \right) following a multivariate normal distribution still follows a multivariate normal distribution Nm(Aμ+b,AΣAT)N_{m} \left( A \mu + \mathbf{b} , A \Sigma A^{T} \right).

W\mathbf{W}, being a linear transformation of a multivariate normal distribution, has the following covariance matrix with mean vector EW=μ1Σ12Σ221μ2 E \mathbf{W} = \mu_{1} - \Sigma_{12} \Sigma_{22}^{-1} \mu_{2} and A=Σ12Σ221A = \Sigma_{12} \Sigma_{22}^{-1}. CovX=AΣAT=[ImΣ12Σ221OIp][Σ11Σ12Σ21Σ22][ImOTΣ221Σ21Ip]=[Σ11Σ12Σ221Σ21OTOΣ22]    CovW=Σ11Σ12Σ221Σ21 \begin{align*} \operatorname{Cov} \mathbf{X} =& A \Sigma A^{T} \\ =& \begin{bmatrix} I_{m} & - \Sigma_{12} \Sigma_{22}^{-1} \\ O & I_{p} \end{bmatrix} \begin{bmatrix} \Sigma_{11} & \Sigma_{12} \\ \Sigma_{21} & \Sigma_{22} \end{bmatrix} \begin{bmatrix} I_{m} & O^{T} \\ - \Sigma_{22}^{-1} \Sigma_{21} & I_{p} \end{bmatrix} \\ =& \begin{bmatrix} \Sigma_{11} - \Sigma_{12} \Sigma_{22}^{-1} \Sigma_{21} & O^{T} \\ O & \Sigma_{22} \end{bmatrix} \\ \implies \operatorname{Cov} \mathbf{W} =& \Sigma_{11} - \Sigma_{12} \Sigma_{22}^{-1} \Sigma_{21} \end{align*} Here, ATA^{T} is the transpose of AA.

Equivalence of Independence and Zero Covariance: Let’s assume that (X1,X2)Nn(μ,Σ)\left( \mathbf{X}_{1} , \mathbf{X}_{2} \right) \sim N_{n} \left( \mu , \Sigma \right) follows a multivariate normal distribution. X1X2    Σ12=Σ21=O \mathbf{X}_{1} \perp \mathbf{X}_{2} \iff \Sigma_{12} = \Sigma_{21} = O

Since X\mathbf{X} follows a multivariate normal distribution, zero covariance Cov(W,X2)=O\operatorname{Cov} \left( \mathbf{W} , \mathbf{X}_{2} \right) = O guarantees the independence of W\mathbf{W} and X2\mathbf{X}_{2}. Therefore, WX2\mathbf{W} | \mathbf{X}_{2} is simply W\mathbf{W} itself without any condition, WX2Nm(μ1Σ12Σ221μ2,Σ11Σ12Σ221Σ21) \mathbf{W} | \mathbf{X}_{2} \sim N_{m} \left( \mu_{1} - \Sigma_{12} \Sigma_{22}^{-1} \mu_{2} , \Sigma_{11} - \Sigma_{12} \Sigma_{22}^{-1} \Sigma_{21} \right) and it can be represented as follows. Initially, W\mathbf{W} was defined to satisfy X1=W+Σ12Σ221X2\mathbf{X}_{1} = \mathbf{W} + \Sigma_{12} \Sigma_{22}^{-1} \mathbf{X}_{2}, thus reversing the transformation yields the following. X1X2Nm(μ1+Σ12Σ221(X2μ2),Σ11Σ12Σ221Σ21) \mathbf{X}_{1} | \mathbf{X}_{2} \sim N_{m} \left( \mu_{1} + \Sigma_{12} \Sigma_{22}^{-1} \left( \mathbf{X}_{2} - \mu_{2} \right) , \Sigma_{11} - \Sigma_{12} \Sigma_{22}^{-1} \Sigma_{21} \right)


  1. Hogg et al. (2013). Introduction to Mathematical Statistcs(7th Edition): p185. ↩︎