logo

Simple Regression Analysis 📂Statistical Analysis

Simple Regression Analysis

Overview

Regression Analysis is a method for identifying relationships between variables, especially useful for elucidating linear relationships. Simple Linear Regression is the simplest among them, referring to regression analysis on one dependent (response) variable and one independent (explanatory) variable.

Model 1

99757D375B4B0F0A14.png

The statement that independent variable $x_{i}$ and dependent variable $y_{i}$ have a linear relationship means that for some $a,b$, it can be expressed as $y_{i} = ax_{i} + b$. Of course, there will inevitably be errors concerning the actual data, so more precisely, it includes an error term, becoming $y_{i} = ax_{i} + b + \varepsilon_{i}$. Converting this expression into a form more commonly used in regression analysis $$ y_{i} = \beta_{0} + \beta_{1} x_{i} + \varepsilon_{i} $$ When represented as a design matrix, $$ \begin{bmatrix} y_{1} \\ y_{2} \\ \vdots \\ y_{n} \end{bmatrix} = \begin{bmatrix} 1 & x_{1} \\ 1 & x_{2} \\ \vdots & \vdots \\ 1 & x_{n} \end{bmatrix} \begin{bmatrix} \beta_{0} \\ \beta_{1} \end{bmatrix} + \begin{bmatrix} \varepsilon_{1} \\ \varepsilon_{2} \\ \vdots \\ \varepsilon_{n} \end{bmatrix} $$ and by arranging, we get $Y = X \beta + \varepsilon$.

Optimization

This boils down to finding $\beta = \begin{bmatrix} \beta_{0} \\ \beta_{1} \end{bmatrix}$ that minimizes $\| \varepsilon \|_{2} = \| Y - X \beta \|_{2}$ through the method of least squares. Since $\beta$ represents the intercept and slope of the line, finding $\beta$ is about finding the line that explains the data in linear terms with the least error. Of course, since we can’t know exactly what relationship the two variables have, we need to find the estimates of regression coefficients, $\hat{ \beta_{0}}$ and $\hat{ \beta_{1} }$. In simple terms, draw the line that looks the most like the data.

20190905\_104344.png These problems are typically solved using tools from numerical linear algebra, but simple regression analysis can also be solved with simple calculus. Rearranging the matrix and representing it again, $$ \begin{align} \varepsilon^2 = \sum_{i=1}^{n} ( y_{i} - \beta_{0} - \beta_{1} x_{i} )^2 \end{align} $$ is about finding $\beta_{0} = \hat{ \beta_{0} }$ and $\beta_{1} = \hat {\beta_{1}}$ that minimize it. Taking the partial derivative of equation $(1)$ with respect to $\beta_{0}$, $$ {{ \partial \varepsilon^2 } \over { \partial \beta_{0}}} = -2 \sum_{i=1}^{n} (y_{i} - \beta_{0} - \beta_{1} x_{i} ) $$ For $\varepsilon^2$ to be minimized, $$ n \beta_{0} = \sum_{i=1}^{n} y_{i} - \beta_{1} \sum_{i=1}^{n} x_{i} $$ Thus, $\varepsilon^2$ is minimized when it is $\beta_{0} = \overline{y} - \beta_{1} \overline{x}$. Taking the partial derivative of equation $(1)$ with respect to $\beta_{1}$, $$ {{ \partial \varepsilon^2 } \over { \partial \beta_{1}}} = -2 \sum_{i=1}^{n} x_{i} (y_{i} - \beta_{0} - \beta_{1} x_{i} ) $$ Since $\varepsilon^2$ is minimized when it is $\beta_{0} = \overline{y} - \beta_{1} \overline{x}$, $$ \sum_{i=1}^{n} x_{i} (y_{i} - \overline{y} + \beta_{1} \overline{x} - \beta_{1} x_{i} ) = 0 $$ In other words, $$ \beta_{1} \sum_{i=1}^{n} ( x_{i}^2 - \overline{x} x_{i} ) = \sum_{i=1}^{n} x_{i} y_{i} - \sum_{i=1}^{n} x_{i} \overline{y} $$ Summarizing, $$ \begin{align*} \beta_{1} =& {{\sum_{i=1}^{n} x_{i} y_{i} - \sum_{i=1}^{n} x_{i} \overline{y} } \over {\sum_{i=1}^{n} ( x_{i}^2 - \overline{x} x_{i} ) }} \\ =& {{ \sum_{i=1}^n ( x_{i} - \overline{x} ) ( y_{i} - \overline{y} ) } \over { \sum_{i=1}^{n} (x_{i}^2 - \overline{x}^2 )}} \\ =& {{ \text{Cov} (X,Y) } \over { \text{Var} ( X ) }} \\ =& \text{Cor} (X,Y) {{s_{y}} \over {s_{x}}} \end{align*} $$ For the actual calculation, one should find $\hat{\beta_{0}}$ before $\hat{\beta_{1}}$.

See Also


  1. Hadi. (2006). Regression Analysis by Example(4th Edition): p24. ↩︎