logo

Fitted Values, Predicted Values, Residuals, Errors 📂Statistical Analysis

Fitted Values, Predicted Values, Residuals, Errors

Definition 1

Let the regression equation obtained by regression analysis YX1+X2++XnY \gets X_{1} + X_{2} + \cdots + X_{n} be denoted as y=β0+β1x1+β2x2++βnxny = \beta_{0} + \beta_{1} x_{1} + \beta_{2} x_{2} + \cdots + \beta_{n} x_{n}, and represent the ii-th data as (yi,xi1,xi2,,xin)(y_{i} , x_{i1} , x_{i2} , \cdots , x_{in}).

  1. Mean: y:=1ni=1nyi \displaystyle \overline{y} := {{1} \over {n}} \sum_{i=1}^{n} y_{i}
  2. Fitted Value: For the ii-th data yiy_{i} y^i:=β0+β1xi1+β2xi2++βnxin \hat{y}_{i} := \beta_{0} + \beta_{1} x_{i1} + \beta_{2} x_{i2} + \cdots + \beta_{n} x_{in}
  3. Predicted Value: For the new data y0y_{0} y^0:=β0+β1x01+β2x02++βnx0n \hat{y}_{0} := \beta_{0} + \beta_{1} x_{01} + \beta_{2} x_{02} + \cdots + \beta_{n} x_{0n}
  4. Deviation Due to Fit: y^iy \hat{y}_{i} - \overline{y}
  5. Residual: yiy^i y_{i} - \hat{y}_{i}
  6. TSS (Total Sum of Squares) or SST (Sum of Squares Total): TSS=SST:=i=1n(yiy)2 \text{TSS} =\text{SST} := \sum_{i=1}^{n} ( y_{i} - \overline{y} )^2
  7. ESS (Explained Sum of Squares) or SSR (Sum of Squares due to Regression): ESS=SSR:=i=1n(y^iy)2 \text{ESS} = \text{SSR} := \sum_{i=1}^{n} ( \hat{y}_{i} - \overline{y} )^2
  8. RSS (Residual Sum of Squares) or SSE (Sum of squared Error): RSS=SSE:=i=1n(yiy^i)2 \text{RSS} = \text{SSE} := \sum_{i=1}^{n} ( y_{i} - \hat{y}_{i} )^2
  9. R-squared or Coefficient of Determination: R2:= SSR  SST R^2 := {{ \text{ SSR } } \over { \text{ SST} }}

Explanation

  • Fitted values and predicted values are mathematically identical, but differ based on whether the data input into the regression equation is actual data. Here, to calculate yi^\hat{y_{i}} is to compute a value reflecting the information provided to us. Then, the difference between the fitted value and the true value yiy_{i}, namely the residual in 5, represents an error that is naturally inevitable and unquestionably expected. Regression analysis minimizes the sum of squares, and after obtaining the regression line, the examination of whether the assumptions of regression analysis are satisfied by investigating residuals is called model diagnosis.
  • The ’explained sum of squares ESS’ is merely a contrast to the ‘unexplainable sum of squares RSS’. The tricky part is that E and R are confusingly used as both Explained and Regression, and Error and Residual, respectively.
    TSS=SSTESS=SSRRSS=SSE \text{TSS} = \text{SST} \\ \text{ESS} = \text{SSR} \\ \text{RSS} = \text{SSE}
    It is not recommended to memorize based on whether they are appended to the front or back as in EE and RR. It is sufficient to choose one notation that makes sense to you, remember it as a formula, and just keep in mind the possibility that abbreviations might be used in reverse to what you know if the context requires it.
  • R-squared, also called the coefficient of determination, is a measure of how well the analysis explains the data. In linear regression, it is possible to mathematically prove that SST=SSR+SSE\text{SST} = \text{SSR} + \text{SSE}, which implies that as ESS\text{ESS} increases, RSS\text{RSS} decreases, and 0R210 \le R^{2} \le 1. Intuitively, this makes it
    R2= SSR  SST= ESS  TSS =설명할 수 있는 에러전체 에러 R^2 = {{ \text{ SSR } } \over { \text{ SST} }} = {{ \text{ ESS } } \over { \text{ TSS } }} = {{\text{설명할 수 있는 에러}} \over {\text{전체 에러}}}
    understandable as the proportion of the data that is explained by the analysis.

  1. Hadi. (2006). Regression Analysis by Example(4th Edition): p40~42. ↩︎