Linear Models for Regression in Machine Learning 📂Machine Learning

Linear Models for Regression in Machine Learning

Definition¹

Simple Model

Let’s define the target function $f : X \to Y$ between the data set $X = \left\{ \mathbf{x}_{i} \right\}$ and the label set $Y = \left\{ y_{i} \right\}$ as follows.

$$ y_{i} = f(\mathbf{x}_{i}) $$

In machine learning, linear regression refers to finding a linear function $\hat{f}$ that satisfies the following equation for $\mathbf{w}$.

$$ y_{i} \approx \hat{y}_{i} = \hat{f}(\mathbf{x}_{i}, \mathbf{w}) = w_{0} + w_{1}x_{1} + \cdots + w_{n}x_{n} = w_{0} + \sum\limits_{j} w_{j}x_{j} $$

In this case, $\mathbf{x} \in \mathbb{R}^{n}, \mathbf{w} \in \mathbb{R}^{n+1}$.

Extended Model

Let’s say a non-linear function $\phi_{j}$ for input data is given.

$$ y_{i} \approx \hat{y}_{i} = \hat{f}(\mathbf{x}_{i}, \mathbf{w}) = w_{0} + \sum\limits_{j} w_{j} \phi_{j}(\mathbf{x}_{i}) $$

In this case, $\phi_{j}$ is called basis functions.

Explanation

Realistically, we can’t even be sure of the existence of the target function $f$. Therefore, the goal is to find $\hat{f}$ as similar as possible to $f$, assuming the simplest yet most explanatory linear function.

$$ f \approx \tilde{f} $$

Even when introducing non-linear basis functions $\phi$, it is called a linear model because $\hat{f}$ is linear with respect to the weights $\mathbf{w}$.

Notation

If we denote by $x_{0} = 1$, $\phi_{j} = 1$, we can represent the two models more simply.

$$ \hat{y} = \hat{f}(\mathbf{x}, \mathbf{w}) = \sum\limits_{j=0}^{n} w_{j}x_{j} = \mathbf{w}^{T}\mathbf{x} $$

In this case, $\mathbf{x} = \begin{bmatrix} x_{0} & \dots & x_{n} \end{bmatrix}^{T}$, $\mathbf{w} = \begin{bmatrix} w_{0} & \dots & w_{n} \end{bmatrix}^{T}$.

$$ \hat{y} = \hat{f}(\mathbf{x}, \mathbf{w}) = \sum\limits_{j=0}^{n} w_{j}\phi_{j}(\mathbf{x}) = \mathbf{w}^{T}\boldsymbol{\phi}(\mathbf{x}) $$

In this case, $\boldsymbol{\phi} = \begin{bmatrix} \phi_{0} & \dots & \phi_{n} \end{bmatrix}^{T}$.

Linear Regression in Statistics

In statistics, instead of $w_{i}$, it is denoted as $\beta_{i}$, and these are called regression coefficients. In machine learning, $w_{i}$ is called weights.

	Statistics	Machine Learning
$x$	Independent Variable	Data
$y$	Dependent Variable	Label
$w, \beta$	Regression Coefficient	Weight

Especially in machine learning, $b = w_{0}$ is called bias.

Learning Methods

The methods to train a linear model include:

[Gradient Descent]
[Least Squares Method]
Bayesian
- [Maximum Likelihood (ML)]
- [Maximum A Posteriori Probability (MAP)]

Christoper M. Bishop, Pattern Recognition annd Machine Learning (2006), p138-139 ↩︎