Estimation of the Variance of Residuals and Standard Errors of Regression Coefficients in Multiple Regression Analysis📂Statistical Analysis
Estimation of the Variance of Residuals and Standard Errors of Regression Coefficients in Multiple Regression Analysis
Theorem
y1y2⋮yn=11⋮1x11x12⋮x1n⋯⋯⋱⋯xp1xp2⋮xpnβ0β1⋮βp+ε1ε2⋮εn
When there are p independent variables and n pieces of data, the linear multiple regression model can be represented by a design matrix as shown above, and let’s simply express it as Y=Xβ+ε. Since the estimate of the regression coefficient is β^=(XTX)−1XTY, the vector Y^ of fitted valuesY^=Xβ^=X(XTX)−1XTY
is known. For convenience, let’s set it as P:=X(XTX)−1XT. Meanwhile, it is assumed that residuals have linearity, i.e., the population mean of ε1,⋯,εn is 0.
Expected Value of the Sum of Squared Residuals
[1]: If the residuals also have homoscedasticity, i.e., if for some constant σ>0, ε1,⋯,εn∼(0,σ2) holds, then the expected value of SSE is as follows.
E(SSE)=E[i=1∑n(yi−y^i)2]=nσ2−i,j∑E(yiyj)Pij
Unbiased Estimator for the Variance of the Sum of Squared Residuals
[2]: If the residuals also have independence, i.e., if ε1,⋯,εn∼iid(0,σ2) holds, then the unbiased estimator for the variance of SSE, σ2, is as follows.
Eσ2=E[n−p−11i=1∑n(yi−y^i)2]=σ2
Standard Error of Regression Coefficients
[3] If the residuals also have normality, i.e., if ε1,⋯,εn∼iidN(0,σ2) holds, then the standard error of the regression coefficients is as follows.
s.e.(β^k)=σ^[(XTX)−1]kk
Explanation
Most statistics majors, when they first learn regression analysis in school, are overwhelmed by projects and other subjects, so they often gloss over such mathematical statistics theoretical developments. Regardless of their willingness or motivation, the content is too difficult to understand at an undergraduate sophomore or junior level, and forcing oneself to study too hard can also be inefficient. If the proofs don’t make sense from the get-go, it’s okay to turn away without getting frustrated.
However, if one continues their studies at the master’s level or higher and reviews undergraduate courses, it is strongly recommended to check out the well-organized content here. The most important aspect of model diagnosis in multiple regression analysis is linearity, followed by homoscedasticity, independence (../679), and then normality. To derive the t-test and F-test in regression analysis, these assumptions must be added in exactly that order. It might not be intuitive or understandable through experience, but it’s fortunate that even those without such genius can understand through theoretical study.
Strategy: It might not be the easiest. Setting aside mathematical statistics for a moment, one must at least be well-versed in matrix algebra. The statement of the theorem briefly expressed as P:=X(XTX)−1XT for P being idempotent or a projection operator, i.e.,
P2=====X(XTX)−1XT⋅X(XTX)−1XTX(XTX)−1(XTX)(XTX)−1XTX(XTX)−1XTPPT
hence P2=P=PT and its orthogonal projector(I−P) is also a projection operator, hence (I−P)2=(I−P) holds. This fact will be used as a lemma. If this seems too difficult, it is recommended to study more and return to this proof in a few years.
For the Kronecker deltaδij={10,if i=j,if i=j, the following holds:
E[i=1∑n(yi−y^i)2]==========E[(Y−PY)T(Y−PY)]E[[(I1+p−P)Y]T[(I1+p−P)Y]]E[YT(I1+p−P)T(I1+p−P)Y]E[YT(I1+p−P)(I1+p−P)Y]E[YT(I1+p−P)2Y]E[YT(I1+p−P)Y]E[i,j∑yiyj(δij−Pij)]i,j∑E[yiyjδij]−i,j∑E[yiyjPij]i∑E[yi2]−i,j∑E[yiyj]Pijnσ2−i,j∑E[yiyj]Pij
■
[2]
That residuals are independent means if i=j, then yi and yj also have no correlation, and if i=j, E[yiyj]=0 and when i=j, according to the linearity and homoscedasticity of residuals, E[yiyj]=σ2 thus we obtain:
E[i=1∑n(yi−y^i)2]==nσ2−i,j∑E[yiyj]Pijnσ2−i∑σ2Pii
∑iPii is the trace of P, trP, thus
E[i=1∑n(yi−y^i)2]======nσ2−σ2i∑Piiσ2(n−trP)σ2(n−trX(XTX)−1XT)σ2(n−trXTX(XTX)−1)σ2(n−trI1+p)σ2(n−(1+p))
is obtained. Dividing both sides by (n−p−1),
n−p−11E[i=1∑n(yi−y^i)2]=σ2
hence, the unbiased estimator for σ2, σ2=∑(yi−y^i)2/(n−p−1), is obtained.
If residuals follow a normal distribution iid, then the marginal probability distribution of the kth component, β^k, of β^=(β^0,⋯,β^p) also follows the following univariate normal distribution:
β^k∼N(βk,σ2[(XTX)−1]kk)
Varβ^k=σ2[(XTX)−1]kk hence we obtain the following.
s.e.(β^k)====Var(β^k)σ2[(XTX)−1]kkn−p−11i=1∑n(yi−y^i)2[(XTX)−1]kkσ^[(XTX)−1]kk
■
Hadi. (2006). Regression Analysis by Example(4th Edition): p82~84. ↩︎