logo

Regression Coefficient's t-test 📂Statistical Test

Regression Coefficient's t-test

Hypothesis Testing

$$ \begin{bmatrix} y_{1} \\ y_{2} \\ \vdots \\ y_{n} \end{bmatrix} = \begin{bmatrix} 1 & x_{11} & \cdots & x_{p1} \\ 1 & x_{12} & \cdots & x_{p2} \\ \vdots & \vdots & \ddots & \vdots \\ 1 & x_{1n} & \cdots & x_{pn} \end{bmatrix} \begin{bmatrix} \beta_{0} \\ \beta_{1} \\ \vdots \\ \beta_{p} \end{bmatrix} + \begin{bmatrix} \varepsilon_{1} \\ \varepsilon_{2} \\ \vdots \\ \varepsilon_{n} \end{bmatrix} $$ When independent variables of $p$ and $n$ data are given, the linear multiple regression model can be represented using the design matrix as shown above, and let’s denote it simply as $Y = X \beta + \varepsilon$. Assuming that the residuals satisfy linearity, homoscedasticity, independence, and normality in the model diagnostics, the hypothesis testing for each regression coefficient in multiple regression analysis is as follows:

  • $H_{0}$: $\beta_{j} = 0$, that is, the $j$th independent variable has no correlation with the dependent variable.
  • $H_{1}$: $\beta_{j} \ne 0$, that is, the regression coefficient for the $j$th independent variable is significant.

Derivation 1

Let’s define $t_{j}$ for the estimate of regression coefficient $\hat{ \beta_{j} }$ and standard error $\text{se} \left( \hat{ \beta_{j} } \right)$ as follows: $$ t_{j} := {{\hat{ \beta_{j} }} \over {\text{se} \left( \hat{ \beta_{j} } \right)}} $$

The residual sum of squares $\sum_{i=1}^{n} \left( y_{i} - \hat{y}_{i} \right)^{2} / \sigma^{2}$ uses the sample mean for $n$ data points, $p$ independent variables, and $1$ constant terms—thus, since only $(n-p-1)$ independent random variables are used, it follows a chi-squared distribution with degrees of freedom $(n-p-1)$ under the null hypothesis—assuming the null hypothesis is true, then $\beta_{j} = 0$ results in $\hat{\beta}_{j} \sim N \left( 0 , \sigma^{2} \left( X^{T} X \right)^{-1}_{jj} \right)$, thus obtaining: $$ \begin{align*} t_{j} =& {{\hat{ \beta_{j} }} \over {\text{se} \left( \hat{ \beta_{j} } \right)}} \\ =& {{\hat{ \beta_{j}} - 0 } \over { \hat{\sigma} \sqrt{ \left[ \left( X^{T} X \right)^{-1} \right]_{kk} } }} \\ =& {{\hat{ \beta_{j}} - 0 } \over { \sqrt{ {{ \sum_{i=1}^{n} \left( y_{i} - \hat{y}_{i} \right)^{2} } \over { n-p-1 }} \left[ \left( X^{T} X \right)^{-1} \right]_{kk} } }} \\ =& {{ {{ \hat{ \beta_{j}} - 0 } \over { \sqrt{ \left[ \left( X^{T} X \right)^{-1} \right]_{kk} } }} } \over { \sqrt{ {{ \sum_{i=1}^{n} \left( y_{i} - \hat{y}_{i} \right)^{2} } \over { n-p-1 }} } }} \\ =& {{ {{ \hat{ \beta_{j} } - 0 } \over { \sigma \sqrt{ \left[ \left( X^{T} X \right)^{-1} \right]_{kk} } }} } \over { \sqrt{ {{ \sum_{i=1}^{n} \left( y_{i} - \hat{y}_{i} \right)^{2} } \over { \sigma^{2} }} / (n-p-1) } }} \\ \sim & {{ N (0,1) } \over { \sqrt{\chi^{2} (n-p-1) / n-p-1} }} \end{align*} $$

In conclusion, $t_{j}$ follows a t-distribution with degrees of freedom $(n-p-1)$. Representing it in a formula again gives: $$ t_{j} = {{\hat{ \beta_{j} }} \over {\text{se} \left( \hat{ \beta_{j} } \right)}} \sim t(n-p-1) $$ which is used for hypothesis testing. If $$ \left| t_j \right| \ge t_{(n-p-1 , {{\alpha} \over {2}})} $$ then the null hypothesis is rejected. That $\left| t_j \right|$ is of such magnitude means $\hat{ \beta_{j} }$ is too large to believe that the null hypothesis is true.

Explanation

$j$th variable and significance level $\alpha$ can also calculate the confidence interval $\hat{ \beta_{j} } \pm t_{(n-p-1 , {{\alpha} \over {2}})} \text{se} ( \hat{ \beta_{j}} )$.

While the derivation process is written calmly, it could be quite difficult for undergraduates learning regression analysis to understand it. Following $t_{j}$ to a t-distribution is not very challenging, but it requires understanding the distribution of regression coefficients as a corollary, and having a firm grasp on the concepts of model diagnostics.

See Also


  1. Hadi. (2006). Regression Analysis by Example(4th Edition): p0. ↩︎