logo

Proof of Normality of Regression Coefficients 📂Probability Distribution

Proof of Normality of Regression Coefficients

Theorem

[y1y2yn]=[1x11xp11x12xp21x1nxpn][β0β1βp]+[ε1ε2εn] \begin{bmatrix} y_{1} \\ y_{2} \\ \vdots \\ y_{n} \end{bmatrix} = \begin{bmatrix} 1 & x_{11} & \cdots & x_{p1} \\ 1 & x_{12} & \cdots & x_{p2} \\ \vdots & \vdots & \ddots & \vdots \\ 1 & x_{1n} & \cdots & x_{pn} \end{bmatrix} \begin{bmatrix} \beta_{0} \\ \beta_{1} \\ \vdots \\ \beta_{p} \end{bmatrix} + \begin{bmatrix} \varepsilon_{1} \\ \varepsilon_{2} \\ \vdots \\ \varepsilon_{n} \end{bmatrix} Given pp independent variables and nn data points, the linear multiple regression model can be represented by a design matrix as shown above, which we simply denote as Y=Xβ+εY = X \beta + \varepsilon. It is assumed that the residuals exhibit homoscedasticity, independence, and normality, meaning ε1,,εniidN(0,σ2)    εNn(0,σ2In) \varepsilon_{1} , \cdots , \varepsilon_{n} \overset{\text{iid}}{\sim} N \left( 0 , \sigma^{2} \right) \iff \varepsilon \sim N_{n} \left( \mathbf{0} , \sigma^{2} I_{n} \right) under this assumption, the estimated regression coefficients β^=(β^0,β^1,,β^p)=(XTX)1XTY \hat{\beta} = \left( \hat{\beta}_{0}, \hat{\beta}_{1}, \cdots, \hat{\beta}_{p} \right) = \left( X^{T} X \right)^{-1} X^{T} Y follow a multivariate normal distribution. β^N1+p(β,σ2(XTX)1) \hat{\beta} \sim N_{1+p} \left( \beta , \sigma^{2} \left( X^{T} X \right)^{-1} \right) Moreover, β^\hat{\beta} is the best linear unbiased estimator of β\beta, hence it is also called the Best Linear Unbiased Estimator (BLUE).


Description

The fact that the vector of regression coefficients follows a multivariate normal distribution is especially important for hypothesis testing related to regression coefficients, and it requires diagnosing the homoscedasticity, independence, and normality of the residuals.

Proof

Strategy: There is not really a strategy per se as everything is easily derived assuming normality of the residuals. Knowing that XX and YY are not random variables but data fixed in a matrix form, i.e., constants, it’s all just matrix calculations.

Normality 1

β^=(XTX)1XTY=(XTX)1XT(Xβ+ε)=I1+pβ+(XTX)1XTε \begin{align*} \hat{\beta} =& \left( X^{T} X \right)^{-1} X^{T} Y \\ =& \left( X^{T} X \right)^{-1} X^{T} \left( X \beta + \varepsilon \right) \\ =& I_{1+p} \beta + \left( X^{T} X \right)^{-1} X^{T} \varepsilon \end{align*} Thus, since β^\hat{\beta} is a linear transformation of ε\varepsilon and assuming that ε\varepsilon follows a multivariate normal distribution, β^\hat{\beta} also follows a multivariate normal distribution.

Mean

Eβ^=E[I1+pβ+(XTX)1XTε]=EI1+pβ+E[(XTX)1XTε]=EI1+pβ+(XTX)1XTEε=E[β0β1βp]+0=[β0β1βp] \begin{align*} E \hat{\beta} =& E \left[ I_{1+p} \beta + \left( X^{T} X \right)^{-1} X^{T} \varepsilon \right] \\ =& E I_{1+p} \beta + E \left[ \left( X^{T} X \right)^{-1} X^{T} \varepsilon \right] \\ =& E I_{1+p} \beta + \left( X^{T} X \right)^{-1} X^{T} E \varepsilon \\ =& E \begin{bmatrix} \beta_{0} \\ \beta_{1} \\ \vdots \\ \beta_{p} \end{bmatrix} + \mathbf{0} \\ =& \begin{bmatrix} \beta_{0} \\ \beta_{1} \\ \vdots \\ \beta_{p} \end{bmatrix} \end{align*} Consequently, β^\hat{\beta} is also an unbiased estimator of β\beta.

Variance 2 3

Varβ^=Var[I1+pβ+(XTX)1XTε]=Var[(XTX)1XTε]=(XTX)1XT(Varε)((XTX)1XT)T=(XTX)1XTσ2I1+pX(XTX)1=σ2(XTX)1XTX(XTX)1=σ2(XTX)1 \begin{align*} \operatorname{Var} \hat{\beta} =& \operatorname{Var} \left[ I_{1+p} \beta + \left( X^{T} X \right)^{-1} X^{T} \varepsilon \right] \\ =& \operatorname{Var} \left[ \left( X^{T} X \right)^{-1} X^{T} \varepsilon \right] \\ =& \left( X^{T} X \right)^{-1} X^{T} \left( \operatorname{Var} \varepsilon \right) \left( \left( X^{T} X \right)^{-1} X^{T} \right)^{T} \\ =& \left( X^{T} X \right)^{-1} X^{T} \sigma^{2} I_{1+p} X \left( X^{T} X \right)^{-1} \\ =& \sigma^{2} \left( X^{T} X \right)^{-1} X^{T} X \left( X^{T} X \right)^{-1} \\ =& \sigma^{2} \left( X^{T} X \right)^{-1} \end{align*} Meanwhile, since β^\hat{\beta} is derived through the least squares method, there does not exist an unbiased estimator of β\beta with a smaller variance, thus it is the best unbiased estimator.