logo

Simple Regression Analysis 📂Statistical Analysis

Simple Regression Analysis

Overview

Regression Analysis is a method for identifying relationships between variables, especially useful for elucidating linear relationships. Simple Linear Regression is the simplest among them, referring to regression analysis on one dependent (response) variable and one independent (explanatory) variable.

Model 1

99757D375B4B0F0A14.png

The statement that independent variable xix_{i} and dependent variable yiy_{i} have a linear relationship means that for some a,ba,b, it can be expressed as yi=axi+by_{i} = ax_{i} + b. Of course, there will inevitably be errors concerning the actual data, so more precisely, it includes an error term, becoming yi=axi+b+εiy_{i} = ax_{i} + b + \varepsilon_{i}. Converting this expression into a form more commonly used in regression analysis yi=β0+β1xi+εi y_{i} = \beta_{0} + \beta_{1} x_{i} + \varepsilon_{i} When represented as a design matrix, [y1y2yn]=[1x11x21xn][β0β1]+[ε1ε2εn] \begin{bmatrix} y_{1} \\ y_{2} \\ \vdots \\ y_{n} \end{bmatrix} = \begin{bmatrix} 1 & x_{1} \\ 1 & x_{2} \\ \vdots & \vdots \\ 1 & x_{n} \end{bmatrix} \begin{bmatrix} \beta_{0} \\ \beta_{1} \end{bmatrix} + \begin{bmatrix} \varepsilon_{1} \\ \varepsilon_{2} \\ \vdots \\ \varepsilon_{n} \end{bmatrix} and by arranging, we get Y=Xβ+εY = X \beta + \varepsilon.

Optimization

This boils down to finding β=[β0β1]\beta = \begin{bmatrix} \beta_{0} \\ \beta_{1} \end{bmatrix} that minimizes ε2=YXβ2\| \varepsilon \|_{2} = \| Y - X \beta \|_{2} through the method of least squares. Since β\beta represents the intercept and slope of the line, finding β\beta is about finding the line that explains the data in linear terms with the least error. Of course, since we can’t know exactly what relationship the two variables have, we need to find the estimates of regression coefficients, β0^\hat{ \beta_{0}} and β1^\hat{ \beta_{1} }. In simple terms, draw the line that looks the most like the data.

20190905\_104344.png These problems are typically solved using tools from numerical linear algebra, but simple regression analysis can also be solved with simple calculus. Rearranging the matrix and representing it again, ε2=i=1n(yiβ0β1xi)2 \begin{align} \varepsilon^2 = \sum_{i=1}^{n} ( y_{i} - \beta_{0} - \beta_{1} x_{i} )^2 \end{align} is about finding β0=β0^\beta_{0} = \hat{ \beta_{0} } and β1=β1^\beta_{1} = \hat {\beta_{1}} that minimize it. Taking the partial derivative of equation (1)(1) with respect to β0\beta_{0}, ε2β0=2i=1n(yiβ0β1xi) {{ \partial \varepsilon^2 } \over { \partial \beta_{0}}} = -2 \sum_{i=1}^{n} (y_{i} - \beta_{0} - \beta_{1} x_{i} ) For ε2\varepsilon^2 to be minimized, nβ0=i=1nyiβ1i=1nxi n \beta_{0} = \sum_{i=1}^{n} y_{i} - \beta_{1} \sum_{i=1}^{n} x_{i} Thus, ε2\varepsilon^2 is minimized when it is β0=yβ1x\beta_{0} = \overline{y} - \beta_{1} \overline{x}. Taking the partial derivative of equation (1)(1) with respect to β1\beta_{1}, ε2β1=2i=1nxi(yiβ0β1xi) {{ \partial \varepsilon^2 } \over { \partial \beta_{1}}} = -2 \sum_{i=1}^{n} x_{i} (y_{i} - \beta_{0} - \beta_{1} x_{i} ) Since ε2\varepsilon^2 is minimized when it is β0=yβ1x\beta_{0} = \overline{y} - \beta_{1} \overline{x}, i=1nxi(yiy+β1xβ1xi)=0 \sum_{i=1}^{n} x_{i} (y_{i} - \overline{y} + \beta_{1} \overline{x} - \beta_{1} x_{i} ) = 0 In other words, β1i=1n(xi2xxi)=i=1nxiyii=1nxiy \beta_{1} \sum_{i=1}^{n} ( x_{i}^2 - \overline{x} x_{i} ) = \sum_{i=1}^{n} x_{i} y_{i} - \sum_{i=1}^{n} x_{i} \overline{y} Summarizing, β1=i=1nxiyii=1nxiyi=1n(xi2xxi)=i=1n(xix)(yiy)i=1n(xi2x2)=Cov(X,Y)Var(X)=Cor(X,Y)sysx \begin{align*} \beta_{1} =& {{\sum_{i=1}^{n} x_{i} y_{i} - \sum_{i=1}^{n} x_{i} \overline{y} } \over {\sum_{i=1}^{n} ( x_{i}^2 - \overline{x} x_{i} ) }} \\ =& {{ \sum_{i=1}^n ( x_{i} - \overline{x} ) ( y_{i} - \overline{y} ) } \over { \sum_{i=1}^{n} (x_{i}^2 - \overline{x}^2 )}} \\ =& {{ \operatorname{Cov} (X,Y) } \over { \operatorname{Var} ( X ) }} \\ =& \text{Cor} (X,Y) {{s_{y}} \over {s_{x}}} \end{align*} For the actual calculation, one should find β0^\hat{\beta_{0}} before β1^\hat{\beta_{1}}.

See Also


  1. Hadi. (2006). Regression Analysis by Example(4th Edition): p24. ↩︎