logo

Proof of SST = SSR + SSE in Linear Regression Analysis 📂Statistical Analysis

Proof of SST = SSR + SSE in Linear Regression Analysis

Summary

[y1y2yn]=[1x11xp11x12xp21x1nxpn][β0β1βp]+[ε1ε2εn] \begin{bmatrix} y_{1} \\ y_{2} \\ \vdots \\ y_{n} \end{bmatrix} = \begin{bmatrix} 1 & x_{11} & \cdots & x_{p1} \\ 1 & x_{12} & \cdots & x_{p2} \\ \vdots & \vdots & \ddots & \vdots \\ 1 & x_{1n} & \cdots & x_{pn} \end{bmatrix} \begin{bmatrix} \beta_{0} \\ \beta_{1} \\ \vdots \\ \beta_{p} \end{bmatrix} + \begin{bmatrix} \varepsilon_{1} \\ \varepsilon_{2} \\ \vdots \\ \varepsilon_{n} \end{bmatrix} Given nn data and p<np < n, the multiple linear regression model is represented as a design matrix in such a way, and simply denoted as Y=Xβ+εY = X \beta + \varepsilon. The least squares estimator vector for β\beta β=(β0,β1,,βp)=(XTX)1XTY \beta = \left( \beta_{0} , \beta_{1} , \cdots , \beta_{p} \right) = \left( X^{T} X \right)^{-1} X^{T} Y is called the best linear unbiased estimator. Now, we define mean y\overline{y}, fitted value y^k\hat{y}_{k}, SST\text{SST}, SSR\text{SSR}, SSE\text{SSE} as follows. y=1nk=1nyky^k=β0+β1xk1++βpxkpSST=k=1n(yky)2SSR=k=1n(y^ky)2SSE=k=1n(yky^k)2 \begin{align*} \overline{y} =& {{1} \over {n}} \sum_{k=1}^{n} y_{k} \\ \hat{y}_{k} =& \beta_{0} + \beta_{1} x_{k1} + \cdots + \beta_{p} x_{kp} \\ \text{SST} =& \sum_{k=1}^{n} \left( y_{k} - \overline{y} \right)^{2} \\ \text{SSR} =& \sum_{k=1}^{n} \left( \hat{y}_{k} - \overline{y} \right)^{2} \\ \text{SSE} =& \sum_{k=1}^{n} \left( y_{k} - \hat{y}_{k} \right)^{2} \end{align*} The relationships between SST\text{SST}, SSR\text{SSR}, and SSE\text{SSE} are as follows. SST=SSR+SSE \text{SST} = \text{SSR} + \text{SSE}

Proof

In the case of simple regression analysis, it might not be difficult, but in multiple regression analysis, proofs involving matrix functions and partial derivatives appear in supporting theorems, making it more challenging than it seems. In fact, even when searching for proofs online, many times the part marked as k=1nyk=k=1ny^k\sum_{k=1}^{n} y_{k} = \sum_{k=1}^{n} \hat{y}_{k} is often skipped, simply showing the intuition of the equation1. If you are just starting to learn regression analysis, it is recommended to acknowledge its existence and come back later for deeper understanding.


Part 1. SST=SSR+SSE+\text{SST} = \text{SSR} + \text{SSE} + \cdots

SST=k=1n(yky)2=k=1n(yky^k+y^ky)2=k=1n(y^ky)2+k=1n(yky^k)2+2k=1n(yky^k)(y^ky)=SSR+SSE+2k=1n(yky^k)(y^ky) \begin{align*} & \text{SST} \\ =& \sum_{k=1}^{n} \left( y_{k} - \overline{y} \right)^2 \\ =& \sum_{k=1}^{n} \left( y_{k} - \hat{y}_{k} + \hat{y}_{k} - \overline{y} \right)^2 \\ =& \sum_{k=1}^{n} \left( \hat{y}_{k} - \overline{y} \right)^{2} + \sum_{k=1}^{n} \left( y_{k} - \hat{y}_{k} \right)^{2} + 2 \sum_{k=1}^{n} \left( y_{k} - \hat{y}_{k} \right) \left( \hat{y}_{k} - \overline{y} \right) \\ =& \text{SSR} + \text{SSE} + 2 \sum_{k=1}^{n} \left( y_{k} - \hat{y}_{k} \right) \left( \hat{y}_{k} - \overline{y} \right) \end{align*} Thus, the last term k=1n(yky^k)(y^ky)=k=1n(yky^k)y^kyk=1n(yky^k) \begin{align*} & \sum_{k=1}^{n} \left( y_{k} - \hat{y}_{k} \right) \left( \hat{y}_{k} - \overline{y} \right) \\ =& \sum_{k=1}^{n} \left( y_{k} - \hat{y}_{k} \right) \hat{y}_{k} - \overline{y} \sum_{k=1}^{n} \left( y_{k} - \hat{y}_{k} \right) \end{align*} proving 00 completes the proof.


Part 2. yk=1n(yky^k)=0\overline{y} \sum_{k=1}^{n} \left( y_{k} - \hat{y}_{k} \right) = 0

Sum of fitted values in multiple regression analysis: If β^\hat{\beta} is the best linear unbiased estimator, then the sum of yky_{k} and the sum of fitted values y^k=1=β^0+j=1pβ^jxj\hat{y}_{k=1} = \hat{\beta}_{0} + \sum_{j=1}^{p} \hat{\beta}_{j} x_{j} are equal: k=1nyk=k=1ny^k \sum_{k=1}^{n} y_{k} = \sum_{k=1}^{n} \hat{y}_{k}

According to the auxiliary theorem, because of k=1nyk=k=1ny^k\sum_{k=1}^{n} y_{k} = \sum_{k=1}^{n} \hat{y}_{k}, it should follow that k=1n(yky^k)=0\sum_{k=1}^{n} \left( y_{k} - \hat{y}_{k} \right) = 0. While in this post it seems to be glossed over using the auxiliary theorem, this is actually a rather critical part. Make sure to thoroughly understand the proof of the auxiliary theorem.


Part 3. k=1n(yky^k)y^k=0\sum_{k=1}^{n} \left( y_{k} - \hat{y}_{k} \right) \hat{y}_{k} = 0

[y^1y^n]=Xβ \begin{bmatrix} \hat{y}_{1} \\ \vdots \\ \hat{y}_{n} \end{bmatrix} = X \beta The fitted value vector y^1,,y^n\hat{y}_{1} , \cdots , \hat{y}_{n} can be expressed as a product of XX and β\beta as shown above. Let us expand the expression involving the identity matrix EE and the zero matrix OO as follows. k=1n(yky^k)y^k=[y1y^1yny^n][y^1y^n]=(YT(Xβ)T)[y^1y^n]=(YXβ)TXβ=(YX(XTX)1XTY)TXβ=([EX(XTX)1XT]Y)TXβ=YT(EX(XTX)1XT)TXβ=YT(XT[EX(XTX)1XT])Tβ=YT(XTXTX(XTX)1XT)Tβ=YT(XTXT)Tβ=YTOTβ=0 \begin{align*} & \sum_{k=1}^{n} \left( y_{k} - \hat{y}_{k} \right) \hat{y}_{k} \\ =& \begin{bmatrix} y_{1} - \hat{y}_{1} & \cdots & y_{n} - \hat{y}_{n} \end{bmatrix} \begin{bmatrix} \hat{y}_{1} \\ \vdots \\ \hat{y}_{n} \end{bmatrix} \\ =& \left( Y^{T} - \left( X \beta \right)^{T} \right) \begin{bmatrix} \hat{y}_{1} \\ \vdots \\ \hat{y}_{n} \end{bmatrix} \\ =& \left( Y - X \beta \right)^{T} X \beta \\ =& \left( Y - X \left( X^{T} X \right)^{-1} X^{T} Y \right)^{T} X \beta \\ =& \left( \left[ E - X \left( X^{T} X \right)^{-1} X^{T} \right] Y \right)^{T} X \beta \\ =& Y^{T} \left( E - X \left( X^{T} X \right)^{-1} X^{T} \right)^{T} X \beta \\ =& Y^{T} \left( X^{T} \left[ E - X \left( X^{T} X \right)^{-1} X^{T} \right] \right)^{T} \beta \\ =& Y^{T} \left( X^{T} - X^{T} X \left( X^{T} X \right)^{-1} X^{T} \right)^{T} \beta \\ =& Y^{T} \left( X^{T} - X^{T} \right)^{T} \beta \\ =& Y^{T} O^{T} \beta \\ =& 0 \end{align*} Consequently, we obtain the following equation. SST=SSR+SSE \text{SST} = \text{SSR} + \text{SSE}