Proof of Normality of Regression Coefficients 📂Probability Distribution

Proof of Normality of Regression Coefficients

Theorem

$\begin{bmatrix} y_{1} \\ y_{2} \\ \vdots \\ y_{n} \end{bmatrix} = \begin{bmatrix} 1 & x_{11} & \cdots & x_{p1} \\ 1 & x_{12} & \cdots & x_{p2} \\ \vdots & \vdots & \ddots & \vdots \\ 1 & x_{1n} & \cdots & x_{pn} \end{bmatrix} \begin{bmatrix} \beta_{0} \\ \beta_{1} \\ \vdots \\ \beta_{p} \end{bmatrix} + \begin{bmatrix} \varepsilon_{1} \\ \varepsilon_{2} \\ \vdots \\ \varepsilon_{n} \end{bmatrix}$ Given $p$ independent variables and $n$ data points, the linear multiple regression model can be represented by a design matrix as shown above, which we simply denote as $Y = X \beta + \varepsilon$ . It is assumed that the residuals exhibit homoscedasticity, independence, and normality, meaning $\varepsilon_{1} , \cdots , \varepsilon_{n} \overset{\text{iid}}{\sim} N \left( 0 , \sigma^{2} \right) \iff \varepsilon \sim N_{n} \left( \mathbf{0} , \sigma^{2} I_{n} \right)$ under this assumption, the estimated regression coefficients $\hat{\beta} = \left( \hat{\beta}_{0}, \hat{\beta}_{1}, \cdots, \hat{\beta}_{p} \right) = \left( X^{T} X \right)^{-1} X^{T} Y$ follow a multivariate normal distribution. $\hat{\beta} \sim N_{1+p} \left( \beta , \sigma^{2} \left( X^{T} X \right)^{-1} \right)$ Moreover, $\hat{\beta}$ is the best linear unbiased estimator of $\beta$ , hence it is also called the Best Linear Unbiased Estimator (BLUE).

$N$ denotes the normal distribution.
$N_{k}$ denotes the multivariate normal distribution.
$I_{k}$ is an $k$ th order identity matrix.

Description

The fact that the vector of regression coefficients follows a multivariate normal distribution is especially important for hypothesis testing related to regression coefficients, and it requires diagnosing the homoscedasticity, independence, and normality of the residuals.

Proof

Strategy: There is not really a strategy per se as everything is easily derived assuming normality of the residuals. Knowing that $X$ and $Y$ are not random variables but data fixed in a matrix form, i.e., constants, it’s all just matrix calculations.

Normality ¹

$\begin{align*} \hat{\beta} =& \left( X^{T} X \right)^{-1} X^{T} Y \\ =& \left( X^{T} X \right)^{-1} X^{T} \left( X \beta + \varepsilon \right) \\ =& I_{1+p} \beta + \left( X^{T} X \right)^{-1} X^{T} \varepsilon \end{align*}$ Thus, since $\hat{\beta}$ is a linear transformation of $\varepsilon$ and assuming that $\varepsilon$ follows a multivariate normal distribution, $\hat{\beta}$ also follows a multivariate normal distribution.

■

Mean

$\begin{align*} E \hat{\beta} =& E \left[ I_{1+p} \beta + \left( X^{T} X \right)^{-1} X^{T} \varepsilon \right] \\ =& E I_{1+p} \beta + E \left[ \left( X^{T} X \right)^{-1} X^{T} \varepsilon \right] \\ =& E I_{1+p} \beta + \left( X^{T} X \right)^{-1} X^{T} E \varepsilon \\ =& E \begin{bmatrix} \beta_{0} \\ \beta_{1} \\ \vdots \\ \beta_{p} \end{bmatrix} + \mathbf{0} \\ =& \begin{bmatrix} \beta_{0} \\ \beta_{1} \\ \vdots \\ \beta_{p} \end{bmatrix} \end{align*}$ Consequently, $\hat{\beta}$ is also an unbiased estimator of $\beta$ .

■

Variance ² ³

$\begin{align*} \Var \hat{\beta} =& \Var \left[ I_{1+p} \beta + \left( X^{T} X \right)^{-1} X^{T} \varepsilon \right] \\ =& \Var \left[ \left( X^{T} X \right)^{-1} X^{T} \varepsilon \right] \\ =& \left( X^{T} X \right)^{-1} X^{T} \left( \Var \varepsilon \right) \left( \left( X^{T} X \right)^{-1} X^{T} \right)^{T} \\ =& \left( X^{T} X \right)^{-1} X^{T} \sigma^{2} I_{1+p} X \left( X^{T} X \right)^{-1} \\ =& \sigma^{2} \left( X^{T} X \right)^{-1} X^{T} X \left( X^{T} X \right)^{-1} \\ =& \sigma^{2} \left( X^{T} X \right)^{-1} \end{align*}$ Meanwhile, since $\hat{\beta}$ is derived through the least squares method, there does not exist an unbiased estimator of $\beta$ with a smaller variance, thus it is the best unbiased estimator.

■

https://qr.ae/pvkWof ↩︎
Hadi. (2006). Regression Analysis by Example(4th Edition): p82~84. ↩︎
https://stats.stackexchange.com/questions/44838/how-are-the-standard-errors-of-coefficients-calculated-in-a-regression/44841#44841 ↩︎