Fitted Values, Predicted Values, Residuals, Errors
Definition 1
Let the regression equation obtained by regression analysis $Y \gets X_{1} + X_{2} + \cdots + X_{n}$ be denoted as $y = \beta_{0} + \beta_{1} x_{1} + \beta_{2} x_{2} + \cdots + \beta_{n} x_{n}$, and represent the $i$-th data as $(y_{i} , x_{i1} , x_{i2} , \cdots , x_{in})$.
- Mean: $$ \displaystyle \overline{y} := {{1} \over {n}} \sum_{i=1}^{n} y_{i} $$
- Fitted Value: For the $i$-th data $y_{i}$ $$ \hat{y}_{i} := \beta_{0} + \beta_{1} x_{i1} + \beta_{2} x_{i2} + \cdots + \beta_{n} x_{in} $$
- Predicted Value: For the new data $y_{0}$ $$ \hat{y}_{0} := \beta_{0} + \beta_{1} x_{01} + \beta_{2} x_{02} + \cdots + \beta_{n} x_{0n} $$
- Deviation Due to Fit: $$ \hat{y}_{i} - \overline{y} $$
- Residual: $$ y_{i} - \hat{y}_{i} $$
- TSS (Total Sum of Squares) or SST (Sum of Squares Total): $$ \text{TSS} =\text{SST} := \sum_{i=1}^{n} ( y_{i} - \overline{y} )^2 $$
- ESS (Explained Sum of Squares) or SSR (Sum of Squares due to Regression): $$ \text{ESS} = \text{SSR} := \sum_{i=1}^{n} ( \hat{y}_{i} - \overline{y} )^2 $$
- RSS (Residual Sum of Squares) or SSE (Sum of squared Error): $$ \text{RSS} = \text{SSE} := \sum_{i=1}^{n} ( y_{i} - \hat{y}_{i} )^2 $$
- R-squared or Coefficient of Determination: $$ R^2 := {{ \text{ SSR } } \over { \text{ SST} }} $$
Explanation
- Fitted values and predicted values are mathematically identical, but differ based on whether the data input into the regression equation is actual data. Here, to calculate $\hat{y_{i}}$ is to compute a value reflecting the information provided to us. Then, the difference between the fitted value and the true value $y_{i}$, namely the residual in 5, represents an error that is naturally inevitable and unquestionably expected. Regression analysis minimizes the sum of squares, and after obtaining the regression line, the examination of whether the assumptions of regression analysis are satisfied by investigating residuals is called model diagnosis.
- The ’explained sum of squares ESS’ is merely a contrast to the ‘unexplainable sum of squares RSS’. The tricky part is that E and R are confusingly used as both Explained and Regression, and Error and Residual, respectively.
$$ \text{TSS} = \text{SST} \\ \text{ESS} = \text{SSR} \\ \text{RSS} = \text{SSE} $$
It is not recommended to memorize based on whether they are appended to the front or back as in $E$ and $R$. It is sufficient to choose one notation that makes sense to you, remember it as a formula, and just keep in mind the possibility that abbreviations might be used in reverse to what you know if the context requires it. - R-squared, also called the coefficient of determination, is a measure of how well the analysis explains the data. In linear regression, it is possible to mathematically prove that $\text{SST} = \text{SSR} + \text{SSE}$, which implies that as $\text{ESS}$ increases, $\text{RSS}$ decreases, and $0 \le R^{2} \le 1$. Intuitively, this makes it
$$ R^2 = {{ \text{ SSR } } \over { \text{ SST} }} = {{ \text{ ESS } } \over { \text{ TSS } }} = {{\text{설명할 수 있는 에러}} \over {\text{전체 에러}}} $$
understandable as the proportion of the data that is explained by the analysis.
Hadi. (2006). Regression Analysis by Example(4th Edition): p40~42. ↩︎