logo

The Definition of Regression Coefficients and Derivation of Estimator Formulas 📂Statistical Analysis

The Definition of Regression Coefficients and Derivation of Estimator Formulas

Definition 1

$$ Y = \beta_{0} + \beta_{1} X_{1} + \cdots + \beta_{p} X_{p} + \varepsilon $$ In multiple regression analysis, for the given $p$ independent variables $X_{1} , \cdots , X_{p}$, when setting up a linear model as above, $\beta_{0} , \beta_{1} , \cdots , \beta_{p}$ is called the regression coefficient. $Y$ represents the dependent variable, and $\varepsilon$ represents the randomly distributed error.

Formula

$$ \begin{bmatrix} y_{1} \\ y_{2} \\ \vdots \\ y_{n} \end{bmatrix} = \begin{bmatrix} 1 & x_{11} & \cdots & x_{p1} \\ 1 & x_{12} & \cdots & x_{p2} \\ \vdots & \vdots & \ddots & \vdots \\ 1 & x_{1n} & \cdots & x_{pn} \end{bmatrix} \begin{bmatrix} \beta_{0} \\ \beta_{1} \\ \vdots \\ \beta_{p} \end{bmatrix} + \begin{bmatrix} \varepsilon_{1} \\ \varepsilon_{2} \\ \vdots \\ \varepsilon_{n} \end{bmatrix} $$ Given $n$ pieces of data and defining them as $p < n$, representing the linear multiple regression model by the design matrix is as above, and let’s briefly express it as $Y = X \beta + \varepsilon$. The least squares estimate vector $\hat{\beta}$ for $\beta$ is as follows. $$ \hat{\beta} = \begin{bmatrix} \hat{\beta}_{0} \\ \hat{\beta}_{1} \\ \vdots \\ \hat{\beta}_{p} \end{bmatrix} = \left( X^{T} X \right)^{-1} X^{T} Y $$ Furthermore, $\hat{\beta}$, being the best unbiased estimator for $\beta$, is also called the Best Linear Unbiased Estimator, BLUE.

Derivation 2 3

Our goal is $$ \left\| \varepsilon \right\|_{2}^{2} = \sum_{k=0}^{n} \varepsilon_{k} = \begin{bmatrix} \varepsilon_{0} & \varepsilon_{1} & \cdots & \varepsilon_{n} \end{bmatrix} \begin{bmatrix} \varepsilon_{0} \\ \varepsilon_{1} \\ \vdots \\ \varepsilon_{n} \end{bmatrix} = \varepsilon^{T} \varepsilon $$ to minimize this. Since $\varepsilon = Y - X \beta$, finding $\beta$ that minimizes $\varepsilon^{T} \varepsilon = \left( Y - X \beta \right)^{T} \left( Y - X \beta \right)$ suffices. Differentiating both sides by $\beta$ $$ \begin{align*} {{ d } \over { d \beta }} \varepsilon^{T} \varepsilon =& - 2 X^{T} \left( Y - X \beta \right) \\ = & - 2 X^{T} \left( Y - X \beta \right) \\ = & - 2 X^{T} Y + 2 X^{T} X \beta \end{align*} $$ yields $\hat{\beta}$ in the following form. $$ \hat{\beta} = \argmin_{\beta} \varepsilon^{T} \varepsilon = \left( X^{T} X \right)^{-1} X^{T} Y $$ Meanwhile, it is easy to see that $\hat{\beta}$ is an unbiased estimator for $\beta$, and since it is derived using the least squares method, there exists no unbiased estimator of $\beta$ with smaller variance, making it the best unbiased estimator.

If one is not fond of the differentiation by $\beta$ in the derivation, an alternative approach is through matrix algebra. In least squares in matrix algebra, $$ X^{\ast} Y = X^{\ast} X \hat{\beta} $$ $\hat{\beta}$ satisfying this becomes the least squares solution, since $X \in \mathbb{R}^{n \times p}$, thus $X^{\ast} = X^{T}$, and consequently, we obtain $\hat{\beta} = \left( X^{T} X \right)^{-1} X^{T} Y$.

See Also


  1. Hadi. (2006). Regression Analysis by Example(4th Edition): p53. ↩︎

  2. Hadi. (2006). Regression Analysis by Example(4th Edition): p82~84. ↩︎

  3. https://www.stat.purdue.edu/~boli/stat512/lectures/topic3.pdf ↩︎