logo

Fitted Values, Predicted Values, Residuals, Errors 📂Statistical Analysis

Fitted Values, Predicted Values, Residuals, Errors

Definition 1

The regression equation obtained through regression analysis $Y \gets X_{1} + X_{2} + \cdots + X_{n}$ is denoted as $y = \beta_{0} + \beta_{1} x_{1} + \beta_{2} x_{2} + \cdots + \beta_{n} x_{n}$, and let’s indicate the n-th data as $(y_{i} , x_{i1} , x_{i2} , \cdots , x_{in})$.

  1. Mean: $$ \displaystyle \overline{y} := {{1} \over {n}} \sum_{i=1}^{n} y_{i} $$
  2. Fitted Value: For the n-th data $y_{i}$ $$ \hat{y}_{i} := \beta_{0} + \beta_{1} x_{i1} + \beta_{2} x_{i2} + \cdots + \beta_{n} x_{in} $$
  3. Predicted Value: For new data $y_{0}$ $$ \hat{y}_{0} := \beta_{0} + \beta_{1} x_{01} + \beta_{2} x_{02} + \cdots + \beta_{n} x_{0n} $$
  4. Deviation due to Fit: $$ \hat{y}_{i} - \overline{y} $$
  5. Residual: $$ y_{i} - \hat{y}_{i} $$
  6. TSS(Total Sum of Squares) or SST(Sum of Squares Total): $$ \text{TSS} =\text{SST} := \sum_{i=1}^{n} ( y_{i} - \overline{y} )^2 $$
  7. ESS(Explained Sum of Squares) or SSR(Sum of Squares due to Regression): $$ \text{ESS} = \text{SSR} := \sum_{i=1}^{n} ( \hat{y}_{i} - \overline{y} )^2 $$
  8. RSS(Residual Sum of Squares) or SSE(Sum of squared Error): $$ \text{RSS} = \text{SSE} := \sum_{i=1}^{n} ( y_{i} - \hat{y}_{i} )^2 $$
  9. R-squared or Coefficient of Determination: $$ R^2 := {{ \text{ SSR } } \over { \text{ SST} }} $$

Description

  • Fitted values and predicted values are mathematically the same; however, the difference lies in whether the data inserted into the regression equation is actual data or not. Thus, obtaining $\hat{y_{i}}$ means calculating a value reflecting the given information. In this context, the 5th item, residual, is an error that naturally exists out of our control—errors that should naturally be present. Regression analysis minimizes the sum of squared errors, constructs a regression line, and then examines the residuals to verify if the regression analysis’s assumptions are met, in a process called model diagnostics.
  • The term “ESS” is merely a contrastive expression to “RSS,” the unexplainable sum of squares. The annoying part is that E and R stand confusingly close to Explained and Regression, Error and Residual respectively. $$ \text{TSS} = \text{SST} \\ \text{ESS} = \text{SSR} \\ \text{RSS} = \text{SSE} $$ It is not recommended to memorize which terms are switched by whether they come first or last. Just choose one notation that is comfortable for yourself to remember mathematically, and memorize as a fact that if it is written opposite to what you know, the abbreviations could be reversed as well.
  • The R-squared value, also called the coefficient of determination, indicates how well the analysis explains the data. On the other hand, it can be easily shown that $\text{SST} = \text{SSR} + \text{SSE}$; according to this, as $\text{ESS}$ increases, $\text{RSS}$ decreases, and $0 \le R^{2} \le 1$ is true. Therefore, when viewed intuitively, $$ R^2 = {{ \text{ SSR } } \over { \text{ SST} }} = {{ \text{ ESS } } \over { \text{ TSS } }} = {{\text{설명할 수 있는 에러}} \over {\text{전체 에러}}} $$ hence, it can be understood as the ratio of the explanation of data in the analysis.

  1. Hadi. (2006). Regression Analysis by Example(4th Edition): p40~42. ↩︎