Proof of SST = SSR + SSE in Linear Regression Analysis
Summary
$$ \begin{bmatrix} y_{1} \\ y_{2} \\ \vdots \\ y_{n} \end{bmatrix} = \begin{bmatrix} 1 & x_{11} & \cdots & x_{p1} \\ 1 & x_{12} & \cdots & x_{p2} \\ \vdots & \vdots & \ddots & \vdots \\ 1 & x_{1n} & \cdots & x_{pn} \end{bmatrix} \begin{bmatrix} \beta_{0} \\ \beta_{1} \\ \vdots \\ \beta_{p} \end{bmatrix} + \begin{bmatrix} \varepsilon_{1} \\ \varepsilon_{2} \\ \vdots \\ \varepsilon_{n} \end{bmatrix} $$ Given $n$ data and $p < n$, the multiple linear regression model is represented as a design matrix in such a way, and simply denoted as $Y = X \beta + \varepsilon$. The least squares estimator vector for $\beta$ $$ \beta = \left( \beta_{0} , \beta_{1} , \cdots , \beta_{p} \right) = \left( X^{T} X \right)^{-1} X^{T} Y $$ is called the best linear unbiased estimator. Now, we define mean $\overline{y}$, fitted value $\hat{y}_{k}$, $\text{SST}$, $\text{SSR}$, $\text{SSE}$ as follows. $$ \begin{align*} \overline{y} =& {{1} \over {n}} \sum_{k=1}^{n} y_{k} \\ \hat{y}_{k} =& \beta_{0} + \beta_{1} x_{k1} + \cdots + \beta_{p} x_{kp} \\ \text{SST} =& \sum_{k=1}^{n} \left( y_{k} - \overline{y} \right)^{2} \\ \text{SSR} =& \sum_{k=1}^{n} \left( \hat{y}_{k} - \overline{y} \right)^{2} \\ \text{SSE} =& \sum_{k=1}^{n} \left( y_{k} - \hat{y}_{k} \right)^{2} \end{align*} $$ The relationships between $\text{SST}$, $\text{SSR}$, and $\text{SSE}$ are as follows. $$ \text{SST} = \text{SSR} + \text{SSE} $$
Proof
In the case of simple regression analysis, it might not be difficult, but in multiple regression analysis, proofs involving matrix functions and partial derivatives appear in supporting theorems, making it more challenging than it seems. In fact, even when searching for proofs online, many times the part marked as $\sum_{k=1}^{n} y_{k} = \sum_{k=1}^{n} \hat{y}_{k}$ is often skipped, simply showing the intuition of the equation1. If you are just starting to learn regression analysis, it is recommended to acknowledge its existence and come back later for deeper understanding.
Part 1. $\text{SST} = \text{SSR} + \text{SSE} + \cdots$
$$ \begin{align*} & \text{SST} \\ =& \sum_{k=1}^{n} \left( y_{k} - \overline{y} \right)^2 \\ =& \sum_{k=1}^{n} \left( y_{k} - \hat{y}_{k} + \hat{y}_{k} - \overline{y} \right)^2 \\ =& \sum_{k=1}^{n} \left( \hat{y}_{k} - \overline{y} \right)^{2} + \sum_{k=1}^{n} \left( y_{k} - \hat{y}_{k} \right)^{2} + 2 \sum_{k=1}^{n} \left( y_{k} - \hat{y}_{k} \right) \left( \hat{y}_{k} - \overline{y} \right) \\ =& \text{SSR} + \text{SSE} + 2 \sum_{k=1}^{n} \left( y_{k} - \hat{y}_{k} \right) \left( \hat{y}_{k} - \overline{y} \right) \end{align*} $$ Thus, the last term $$ \begin{align*} & \sum_{k=1}^{n} \left( y_{k} - \hat{y}_{k} \right) \left( \hat{y}_{k} - \overline{y} \right) \\ =& \sum_{k=1}^{n} \left( y_{k} - \hat{y}_{k} \right) \hat{y}_{k} - \overline{y} \sum_{k=1}^{n} \left( y_{k} - \hat{y}_{k} \right) \end{align*} $$ proving $0$ completes the proof.
Part 2. $\overline{y} \sum_{k=1}^{n} \left( y_{k} - \hat{y}_{k} \right) = 0$
Sum of fitted values in multiple regression analysis: If $\hat{\beta}$ is the best linear unbiased estimator, then the sum of $y_{k}$ and the sum of fitted values $\hat{y}_{k=1} = \hat{\beta}_{0} + \sum_{j=1}^{p} \hat{\beta}_{j} x_{j}$ are equal: $$ \sum_{k=1}^{n} y_{k} = \sum_{k=1}^{n} \hat{y}_{k} $$
According to the auxiliary theorem, because of $\sum_{k=1}^{n} y_{k} = \sum_{k=1}^{n} \hat{y}_{k}$, it should follow that $\sum_{k=1}^{n} \left( y_{k} - \hat{y}_{k} \right) = 0$. While in this post it seems to be glossed over using the auxiliary theorem, this is actually a rather critical part. Make sure to thoroughly understand the proof of the auxiliary theorem.
Part 3. $\sum_{k=1}^{n} \left( y_{k} - \hat{y}_{k} \right) \hat{y}_{k} = 0$
$$ \begin{bmatrix} \hat{y}_{1} \\ \vdots \\ \hat{y}_{n} \end{bmatrix} = X \beta $$ The fitted value vector $\hat{y}_{1} , \cdots , \hat{y}_{n}$ can be expressed as a product of $X$ and $\beta$ as shown above. Let us expand the expression involving the identity matrix $E$ and the zero matrix $O$ as follows. $$ \begin{align*} & \sum_{k=1}^{n} \left( y_{k} - \hat{y}_{k} \right) \hat{y}_{k} \\ =& \begin{bmatrix} y_{1} - \hat{y}_{1} & \cdots & y_{n} - \hat{y}_{n} \end{bmatrix} \begin{bmatrix} \hat{y}_{1} \\ \vdots \\ \hat{y}_{n} \end{bmatrix} \\ =& \left( Y^{T} - \left( X \beta \right)^{T} \right) \begin{bmatrix} \hat{y}_{1} \\ \vdots \\ \hat{y}_{n} \end{bmatrix} \\ =& \left( Y - X \beta \right)^{T} X \beta \\ =& \left( Y - X \left( X^{T} X \right)^{-1} X^{T} Y \right)^{T} X \beta \\ =& \left( \left[ E - X \left( X^{T} X \right)^{-1} X^{T} \right] Y \right)^{T} X \beta \\ =& Y^{T} \left( E - X \left( X^{T} X \right)^{-1} X^{T} \right)^{T} X \beta \\ =& Y^{T} \left( X^{T} \left[ E - X \left( X^{T} X \right)^{-1} X^{T} \right] \right)^{T} \beta \\ =& Y^{T} \left( X^{T} - X^{T} X \left( X^{T} X \right)^{-1} X^{T} \right)^{T} \beta \\ =& Y^{T} \left( X^{T} - X^{T} \right)^{T} \beta \\ =& Y^{T} O^{T} \beta \\ =& 0 \end{align*} $$ Consequently, we obtain the following equation. $$ \text{SST} = \text{SSR} + \text{SSE} $$
■