logo

F-test for Regression Coefficients 📂Statistical Test

F-test for Regression Coefficients

Hypothesis Testing

Assuming in the model diagnostics of the linear multiple regression model, the residuals satisfy linearity, homoscedasticity, independence, and normality. The hypothesis testing for the multiple regression analysis with nn observations and pp independent variables is as follows:

  • H0H_{0}: β1=β2==βp=0\beta_{1} = \beta_{2} = \cdots = \beta_{p} = 0 i.e., all independent variables do not have a correlation with the dependent variable.
  • H1H_{1}: At least one among β1,β2,,βp\beta_{1} , \beta_{2} , \cdots , \beta_{p} is not 0 0. Meaning, at least one independent variable has a significant correlation with the dependent variable.

Derivation

SST, SSR, SSE: 6. TSS(Total Sum of Squares) or SST(Sum of Squares Total): TSS=SST:=i=1n(yiy)2 \text{TSS} =\text{SST} := \sum_{i=1}^{n} ( y_{i} - \overline{y} )^2 7. ESS(Explained Sum of Squares) or SSR(Sum of Squares due to Regression): ESS=SSR:=i=1n(y^iy)2 \text{ESS} = \text{SSR} := \sum_{i=1}^{n} ( \hat{y}_{i} - \overline{y} )^2 8. RSS(Residual Sum of Squares) or SSE(Sum of squared Error): RSS=SSE:=i=1n(yiy^i)2 \text{RSS} = \text{SSE} := \sum_{i=1}^{n} ( y_{i} - \hat{y}_{i} )^2

SST, since it uses a single mean, has a degrees of freedom of (n1)(n-1) while SSE, computed based on pp independent variables, uses (p+1)(p+1) regression coefficients including the constant term thus having a degrees of freedom of (n(p1))\left( n-(p-1) \right). These follow a Chi-square distribution with their respective degrees of freedom based on the homoscedasticity and independence, and normality of residuals as divided by σ2\sigma^{2}. Meanwhile, SST=SSR+SSE    SSR=SSTSSE SST = SSR + SSE \iff SSR = SST - SSE thus, it has a degrees of freedom of SSRSSR and follows a Chi-square distribution χ2(p)\chi^{2} (p).

Derivation of the F-distribution: If two random variables U,VU,V are independent and Uχ2(r1)U \sim \chi^{2} ( r_{1}), Vχ2(r2)V \sim \chi^{2} ( r_{2}), then U/r1V/r2F(r1,r2) {{ U / r_{1} } \over { V / r_{2} }} \sim F \left( r_{1} , r_{2} \right)

The test statistic FF is F:=SSR/pSSE/(np1) F := {{ \text{SSR} / p } \over { \text{SSE} / (n-p-1 ) }} defined as above follows an F-distribution with (p,np1)(p , n-p-1) degrees of freedom.

Representing in equations again, it is F=SSR/pSSE/(np1)F(p,np1)\displaystyle F = {{ \text{SSR} / p } \over { \text{SSE} / (n-p-1 ) }} \sim F(p, n-p-1), which is used for hypothesis testing.

Although this F-test is meaningless if we can see each t-test of regression coefficient, the real essence lies in comparing models. It is about shedding the ‘subjectivity’ or ‘ambiguousness’ that inevitably follows statistics to produce statistically meaningful results. (Of course, in actual analyses, more convenient and easier statistics are used to compare models.)

Hypothesis Testing for Reduced Model

For the multiple regression analysis with nn observations and pp independent variables, let’s say it is i=0,1,,pi=0,1,\cdots,p. This regression model is called the Full Model, and the model that removes kk independent variables from FM is called the Reduced Model.

  • H0H_{0}: RM is sufficient. Meaning, there is no need to use the FM with many variables.
  • H1H_{1}: RM is insufficient. Meaning, it’s better to use the FM by increasing the variables.

F=[SSR(RM)SSR(FM)]/(p+1k)SSE(FM)/(np1) F = {{ [ \text{SSR} (RM) - \text{SSR} (FM) ] / (p +1 - k) } \over { \text{SSE} ( FM ) / (n-p-1 ) }} follows an F-distribution with (p+1k,np1)(p + 1 - k , n-p-1) degrees of freedom. If at the significance level α\alpha, FF(p+1k,np1;α)F \le F_{ ( p+1-k , n-p-1 ; \alpha ) } is achieved, H0H_{0} is adopted, allowing for the use of the reduced RM.

See Also