logo

The Definition of Regression Coefficients and Derivation of Estimator Formulas 📂Statistical Analysis

The Definition of Regression Coefficients and Derivation of Estimator Formulas

Definition 1

Y=β0+β1X1++βpXp+ε Y = \beta_{0} + \beta_{1} X_{1} + \cdots + \beta_{p} X_{p} + \varepsilon In multiple regression analysis, when establishing the above linear model for given pp independent variables X1,,XpX_{1} , \cdots , X_{p}, β0,β1,,βp\beta_{0} , \beta_{1} , \cdots , \beta_{p} is called the regression coefficient. YY represents the dependent variable, and ε\varepsilon indicates randomly distributed errors.

Formula

[y1y2yn]=[1x11xp11x12xp21x1nxpn][β0β1βp]+[ε1ε2εn] \begin{bmatrix} y_{1} \\ y_{2} \\ \vdots \\ y_{n} \end{bmatrix} = \begin{bmatrix} 1 & x_{11} & \cdots & x_{p1} \\ 1 & x_{12} & \cdots & x_{p2} \\ \vdots & \vdots & \ddots & \vdots \\ 1 & x_{1n} & \cdots & x_{pn} \end{bmatrix} \begin{bmatrix} \beta_{0} \\ \beta_{1} \\ \vdots \\ \beta_{p} \end{bmatrix} + \begin{bmatrix} \varepsilon_{1} \\ \varepsilon_{2} \\ \vdots \\ \varepsilon_{n} \end{bmatrix} Given nn data points and denoting it as p<np < n, the linear multiple regression model can be represented in a design matrix as above, and we simply denote it as Y=Xβ+εY = X \beta + \varepsilon. The least squares estimator vector β^\hat{\beta} for β\beta is as follows. β^=[β^0β^1β^p]=(XTX)1XTY \hat{\beta} = \begin{bmatrix} \hat{\beta}_{0} \\ \hat{\beta}_{1} \\ \vdots \\ \hat{\beta}_{p} \end{bmatrix} = \left( X^{T} X \right)^{-1} X^{T} Y Moreover, since β^\hat{\beta} is the best unbiased estimator of β\beta, it is also known as the Best Linear Unbiased Estimator.

Derivation 2 3

Our goal is ε22=k=0nεk=[ε0ε1εn][ε0ε1εn]=εTε \left\| \varepsilon \right\|_{2}^{2} = \sum_{k=0}^{n} \varepsilon_{k} = \begin{bmatrix} \varepsilon_{0} & \varepsilon_{1} & \cdots & \varepsilon_{n} \end{bmatrix} \begin{bmatrix} \varepsilon_{0} \\ \varepsilon_{1} \\ \vdots \\ \varepsilon_{n} \end{bmatrix} = \varepsilon^{T} \varepsilon to minimize. Therefore, finding a β\beta that minimizes εTε=(YXβ)T(YXβ)\varepsilon^{T} \varepsilon = \left( Y - X \beta \right)^{T} \left( Y - X \beta \right) follows from ε=YXβ\varepsilon = Y - X \beta.

Gradient of residual sum of squares: f(s):=(yXs)TR(yXs) f \left( \mathbf{s} \right) := \left( \mathbf{y} - X \mathbf{s} \right)^{T} R \left( \mathbf{y} - X \mathbf{s} \right) Assume that RR is an identity matrix, then the following is obtained. f(s)s=2XT(yXs) {{ \partial f \left( \mathbf{s} \right) } \over { \partial \mathbf{s} }} = - 2 X^{T} \left( \mathbf{y} - X \mathbf{s} \right)

By differentiating both sides concerning β\beta βεTε=2XT(YXβ)=2XT(YXβ)=2XTY+2XTXβ \begin{align*} {{ \partial } \over { \partial \beta }} \varepsilon^{T} \varepsilon =& - 2 X^{T} \left( Y - X \beta \right) \\ = & - 2 X^{T} \left( Y - X \beta \right) \\ = & - 2 X^{T} Y + 2 X^{T} X \beta \end{align*} results in a zero vector 0\mathbf{0} for β^\hat{\beta} like this. β^=arg minβεTε=(XTX)1XTY \hat{\beta} = \argmin_{\beta} \varepsilon^{T} \varepsilon = \left( X^{T} X \right)^{-1} X^{T} Y Meanwhile, it can be easily shown that β^\hat{\beta} is an unbiased estimator for β\beta, and since it is derived via the least squares method, there is no unbiased estimator with smaller variance for β\beta, making it the best unbiased estimator.


If differentiating concerning β\beta in the derivation is not particularly appealing, an alternative approach using matrix algebra is available. In least squares method in matrix algebra, XY=XXβ^ X^{\ast} Y = X^{\ast} X \hat{\beta} the fact that β^\hat{\beta} satisfies being the least squares solution shows that since XRn×pX \in \mathbb{R}^{n \times p} holds, X=XTX^{\ast} = X^{T} follows, and consequently, β^=(XTX)1XTY\hat{\beta} = \left( X^{T} X \right)^{-1} X^{T} Y is derived.

Corollary

If β^\hat{\beta} is the best linear unbiased estimator, then the sum of yky_{k} and the sum of fits y^k=1=β^0+j=1pβ^jxj\hat{y}_{k=1} = \hat{\beta}_{0} + \sum_{j=1}^{p} \hat{\beta}_{j} x_{j} are equal: k=1nyk=k=1ny^k \sum_{k=1}^{n} y_{k} = \sum_{k=1}^{n} \hat{y}_{k}

Proof

In proving this formula, that β^\hat{\beta} is the best linear unbiased estimator means 0=2XTY+2XTXβ^    0=XT(YXβ^)    0=XT[y1y^1yny^n] \begin{align*} & \mathbf{0} = - 2 X^{T} Y + 2 X^{T} X \hat{\beta} \\ \implies & \mathbf{0} = X^{T} \left( Y - X \hat{\beta} \right) \\ \implies & \mathbf{0} = X^{T} \begin{bmatrix} y_{1} - \hat{y}_{1} \\ \vdots \\ y_{n} - \hat{y}_{n} \end{bmatrix} \end{align*} that this holds. Since XX is the design matrix, the first row of XTX^{T} can be regarded as all elements being 11 in a row matrix of ones. Considering only the product of the first row of XTX^{T} with YXβ^Y - X \hat{\beta} yields the following. 0=[11][y1y^1yny^n]    0=(y1y^1)++(yny^n)    0=k=1nykk=1ny^k \begin{align*} & 0 = \begin{bmatrix} 1 & \cdots & 1 \end{bmatrix} \begin{bmatrix} y_{1} - \hat{y}_{1} \\ \vdots \\ y_{n} - \hat{y}_{n} \end{bmatrix} \\ \implies & 0 = \left( y_{1} - \hat{y}_{1} \right) + \cdots + \left( y_{n} - \hat{y}_{n} \right) \\ \implies & 0 = \sum_{k=1}^{n} y_{k} - \sum_{k=1}^{n} \hat{y}_{k} \end{align*} Consequently, the following is obtained. k=1nyk=k=1ny^k \sum_{k=1}^{n} y_{k} = \sum_{k=1}^{n} \hat{y}_{k}

See Also


  1. Hadi. (2006). Regression Analysis by Example(4th Edition): p53. ↩︎

  2. Hadi. (2006). Regression Analysis by Example(4th Edition): p82~84. ↩︎

  3. https://www.stat.purdue.edu/~boli/stat512/lectures/topic3.pdf ↩︎