logo

Multiple Regression Analysis 📂Statistical Analysis

Multiple Regression Analysis

Overview

Regression analysis is a method used to discover the relationships between variables, particularly useful for identifying linear relationships. Multiple Linear Regression refers to the regression analysis that determines the effects of multiple independent variables (explanatory variables) on a single dependent variable (response variable).

Model 1

$$Y = \beta_{0} + \beta_{1} X_{1} + \cdots + \beta_{p} X_{p} + \varepsilon $$

We are interested in whether variables have a linear relationship as shown above. It is assumed that each variable is independent of the others, and similarly, the regression coefficient represents the rate of unit change of a variable when other variables are held constant. Represented by a design matrix, it looks like $$ \begin{bmatrix} y_{1} \\ y_{2} \\ \vdots \\ y_{n} \end{bmatrix} = \begin{bmatrix} 1 & x_{11} & \cdots & x_{p1} \\ 1 & x_{12} & \cdots & x_{p2} \\ \vdots & \vdots & \ddots & \vdots \\ 1 & x_{1n} & \cdots & x_{pn} \end{bmatrix} \begin{bmatrix} \beta_{0} \\ \beta_{1} \\ \vdots \\ \beta_{p} \end{bmatrix} + \begin{bmatrix} \varepsilon_{1} \\ \varepsilon_{2} \\ \vdots \\ \varepsilon_{n} \end{bmatrix} $$ and summarized, it is $Y = X \beta + \varepsilon$.

The computation itself uses the least squares method, just like simple regression analysis, which fortunately does not pay much attention to the dimensions $p$. However, unlike simple regression analysis, as $p$ is generalized for dimensions, it is also difficult to verify with a graph in $p \ge 3$.

Just looking at it is not enough to know if the analysis was done properly, therefore the analyst must justify the results through various diagnostics. Even if it passes these diagnostics, there are still issues like interaction and multicollinearity left, and choosing which variable to use is also an important issue.

See also


  1. Hadi. (2006). Regression Analysis by Example(4th Edition): p53. ↩︎