F-test for Regression Coefficients 📂Statistical Test

F-test for Regression Coefficients

Hypothesis Testing

Assuming in the model diagnostics of the linear multiple regression model, the residuals satisfy linearity, homoscedasticity, independence, and normality. The hypothesis testing for the multiple regression analysis with $n$ observations and $p$ independent variables is as follows:

$H_{0}$ : $\beta_{1} = \beta_{2} = \cdots = \beta_{p} = 0$ i.e., all independent variables do not have a correlation with the dependent variable.
$H_{1}$ : At least one among $\beta_{1} , \beta_{2} , \cdots , \beta_{p}$ is not $0$ . Meaning, at least one independent variable has a significant correlation with the dependent variable.

Derivation

SST, SSR, SSE: 6. TSS(Total Sum of Squares) or SST(Sum of Squares Total): $\text{TSS} =\text{SST} := \sum_{i=1}^{n} ( y_{i} - \overline{y} )^2$ 7. ESS(Explained Sum of Squares) or SSR(Sum of Squares due to Regression): $\text{ESS} = \text{SSR} := \sum_{i=1}^{n} ( \hat{y}_{i} - \overline{y} )^2$ 8. RSS(Residual Sum of Squares) or SSE(Sum of squared Error): $\text{RSS} = \text{SSE} := \sum_{i=1}^{n} ( y_{i} - \hat{y}_{i} )^2$

SST, since it uses a single mean, has a degrees of freedom of $(n-1)$ while SSE, computed based on $p$ independent variables, uses $(p+1)$ regression coefficients including the constant term thus having a degrees of freedom of $\left( n-(p-1) \right)$ . These follow a Chi-square distribution with their respective degrees of freedom based on the homoscedasticity and independence, and normality of residuals as divided by $\sigma^{2}$ . Meanwhile, $SST = SSR + SSE \iff SSR = SST - SSE$ thus, it has a degrees of freedom of $SSR$ and follows a Chi-square distribution $\chi^{2} (p)$ .

Derivation of the F-distribution: If two random variables $U,V$ are independent and $U \sim \chi^{2} ( r_{1})$ , $V \sim \chi^{2} ( r_{2})$ , then ${{ U / r_{1} } \over { V / r_{2} }} \sim F \left( r_{1} , r_{2} \right)$

The test statistic $F$ is $F := {{ \text{SSR} / p } \over { \text{SSE} / (n-p-1 ) }}$ defined as above follows an F-distribution with $(p , n-p-1)$ degrees of freedom.

Representing in equations again, it is $\displaystyle F = {{ \text{SSR} / p } \over { \text{SSE} / (n-p-1 ) }} \sim F(p, n-p-1)$ , which is used for hypothesis testing.

■

Although this F-test is meaningless if we can see each t-test of regression coefficient, the real essence lies in comparing models. It is about shedding the ‘subjectivity’ or ‘ambiguousness’ that inevitably follows statistics to produce statistically meaningful results. (Of course, in actual analyses, more convenient and easier statistics are used to compare models.)

Hypothesis Testing for Reduced Model

For the multiple regression analysis with $n$ observations and $p$ independent variables, let’s say it is $i=0,1,\cdots,p$ . This regression model is called the Full Model, and the model that removes $k$ independent variables from FM is called the Reduced Model.

$H_{0}$ : RM is sufficient. Meaning, there is no need to use the FM with many variables.
$H_{1}$ : RM is insufficient. Meaning, it’s better to use the FM by increasing the variables.

$F = {{ [ \text{SSR} (RM) - \text{SSR} (FM) ] / (p +1 - k) } \over { \text{SSE} ( FM ) / (n-p-1 ) }}$ follows an F-distribution with $(p + 1 - k , n-p-1)$ degrees of freedom. If at the significance level $\alpha$ , $F \le F_{ ( p+1-k , n-p-1 ; \alpha ) }$ is achieved, $H_{0}$ is adopted, allowing for the use of the reduced RM.