Linear Models for Regression in Machine Learning
Definition1
Simple Model
Let’s define the target function $f : X \to Y$ between the data set $X = \left\{ \mathbf{x}_{i} \right\}$ and the label set $Y = \left\{ y_{i} \right\}$ as follows.
$$ y_{i} = f(\mathbf{x}_{i}) $$
In machine learning, linear regression refers to finding a linear function $\hat{f}$ that satisfies the following equation for $\mathbf{w}$.
$$ y_{i} \approx \hat{y}_{i} = \hat{f}(\mathbf{x}_{i}, \mathbf{w}) = w_{0} + w_{1}x_{1} + \cdots + w_{n}x_{n} = w_{0} + \sum\limits_{j} w_{j}x_{j} $$
In this case, $\mathbf{x} \in \mathbb{R}^{n}, \mathbf{w} \in \mathbb{R}^{n+1}$.
Extended Model
Let’s say a non-linear function $\phi_{j}$ for input data is given.
$$ y_{i} \approx \hat{y}_{i} = \hat{f}(\mathbf{x}_{i}, \mathbf{w}) = w_{0} + \sum\limits_{j} w_{j} \phi_{j}(\mathbf{x}_{i}) $$
In this case, $\phi_{j}$ is called basis functions.
Explanation
Realistically, we can’t even be sure of the existence of the target function $f$. Therefore, the goal is to find $\hat{f}$ as similar as possible to $f$, assuming the simplest yet most explanatory linear function.
$$ f \approx \tilde{f} $$
Even when introducing non-linear basis functions $\phi$, it is called a linear model because $\hat{f}$ is linear with respect to the weights $\mathbf{w}$.
Notation
If we denote by $x_{0} = 1$, $\phi_{j} = 1$, we can represent the two models more simply.
$$ \hat{y} = \hat{f}(\mathbf{x}, \mathbf{w}) = \sum\limits_{j=0}^{n} w_{j}x_{j} = \mathbf{w}^{T}\mathbf{x} $$
In this case, $\mathbf{x} = \begin{bmatrix} x_{0} & \dots & x_{n} \end{bmatrix}^{T}$, $\mathbf{w} = \begin{bmatrix} w_{0} & \dots & w_{n} \end{bmatrix}^{T}$.
$$ \hat{y} = \hat{f}(\mathbf{x}, \mathbf{w}) = \sum\limits_{j=0}^{n} w_{j}\phi_{j}(\mathbf{x}) = \mathbf{w}^{T}\boldsymbol{\phi}(\mathbf{x}) $$
In this case, $\boldsymbol{\phi} = \begin{bmatrix} \phi_{0} & \dots & \phi_{n} \end{bmatrix}^{T}$.
Linear Regression in Statistics
In statistics, instead of $w_{i}$, it is denoted as $\beta_{i}$, and these are called regression coefficients. In machine learning, $w_{i}$ is called weights.
Statistics | Machine Learning | |
---|---|---|
$x$ | Independent Variable | Data |
$y$ | Dependent Variable | Label |
$w, \beta$ | Regression Coefficient | Weight |
Especially in machine learning, $b = w_{0}$ is called bias.
Learning Methods
The methods to train a linear model include:
- [Gradient Descent]
- [Least Squares Method]
- Bayesian
- [Maximum Likelihood (ML)]
- [Maximum A Posteriori Probability (MAP)]
Christoper M. Bishop, Pattern Recognition annd Machine Learning (2006), p138-139 ↩︎