logo

Linear Models for Regression in Machine Learning 📂Machine Learning

Linear Models for Regression in Machine Learning

Definition1

Simple Model

Let’s define the target function f:XYf : X \to Y between the data set X={xi}X = \left\{ \mathbf{x}_{i} \right\} and the label set Y={yi}Y = \left\{ y_{i} \right\} as follows.

yi=f(xi) y_{i} = f(\mathbf{x}_{i})

In machine learning, linear regression refers to finding a linear function f^\hat{f} that satisfies the following equation for w\mathbf{w}.

yiy^i=f^(xi,w)=w0+w1x1++wnxn=w0+jwjxj y_{i} \approx \hat{y}_{i} = \hat{f}(\mathbf{x}_{i}, \mathbf{w}) = w_{0} + w_{1}x_{1} + \cdots + w_{n}x_{n} = w_{0} + \sum\limits_{j} w_{j}x_{j}

In this case, xRn,wRn+1\mathbf{x} \in \mathbb{R}^{n}, \mathbf{w} \in \mathbb{R}^{n+1}.

Extended Model

Let’s say a non-linear function ϕj\phi_{j} for input data is given.

yiy^i=f^(xi,w)=w0+jwjϕj(xi) y_{i} \approx \hat{y}_{i} = \hat{f}(\mathbf{x}_{i}, \mathbf{w}) = w_{0} + \sum\limits_{j} w_{j} \phi_{j}(\mathbf{x}_{i})

In this case, ϕj\phi_{j} is called basis functions.

Explanation

Realistically, we can’t even be sure of the existence of the target function ff. Therefore, the goal is to find f^\hat{f} as similar as possible to ff, assuming the simplest yet most explanatory linear function.

ff~ f \approx \tilde{f}

Even when introducing non-linear basis functions ϕ\phi, it is called a linear model because f^\hat{f} is linear with respect to the weights w\mathbf{w}.

Notation

If we denote by x0=1x_{0} = 1, ϕj=1\phi_{j} = 1, we can represent the two models more simply.

y^=f^(x,w)=j=0nwjxj=wTx \hat{y} = \hat{f}(\mathbf{x}, \mathbf{w}) = \sum\limits_{j=0}^{n} w_{j}x_{j} = \mathbf{w}^{T}\mathbf{x}

In this case, x=[x0xn]T\mathbf{x} = \begin{bmatrix} x_{0} & \dots & x_{n} \end{bmatrix}^{T}, w=[w0wn]T\mathbf{w} = \begin{bmatrix} w_{0} & \dots & w_{n} \end{bmatrix}^{T}.

y^=f^(x,w)=j=0nwjϕj(x)=wTϕ(x) \hat{y} = \hat{f}(\mathbf{x}, \mathbf{w}) = \sum\limits_{j=0}^{n} w_{j}\phi_{j}(\mathbf{x}) = \mathbf{w}^{T}\boldsymbol{\phi}(\mathbf{x})

In this case, ϕ=[ϕ0ϕn]T\boldsymbol{\phi} = \begin{bmatrix} \phi_{0} & \dots & \phi_{n} \end{bmatrix}^{T}.

Linear Regression in Statistics

In statistics, instead of wiw_{i}, it is denoted as βi\beta_{i}, and these are called regression coefficients. In machine learning, wiw_{i} is called weights.

StatisticsMachine Learning
xxIndependent VariableData
yyDependent VariableLabel
w,βw, \betaRegression CoefficientWeight

Especially in machine learning, b=w0b = w_{0} is called bias.

Learning Methods

The methods to train a linear model include:

  • [Gradient Descent]
  • [Least Squares Method]
  • Bayesian
    • [Maximum Likelihood (ML)]
    • [Maximum A Posteriori Probability (MAP)]

  1. Christoper M. Bishop, Pattern Recognition annd Machine Learning (2006), p138-139 ↩︎