Maximum Likelihood Estimation for Linear Regression Model in Machine Learning
📂Machine LearningMaximum Likelihood Estimation for Linear Regression Model in Machine Learning
Summary
Assume the relationship between data xi∈Rn and its labels yi∈R is described by the following linear model.
yi=wTxi+ϵi,i=1,…,K(1)
When K>n, the parameter wML that maximizes the likelihood is as follows.
wML=(XTX)−1XTy
Here, y=[y1⋯yK]T and X=[x1⋯xK]T∈RK×n.
Explanation
In (1), w∈Rn are parameters and ϵi∼N(0,σ2) is [Gaussian noise]. It is assumed that ϵi follows N(0,σ2), hence yi=wTxi+ϵi follows N(wTxi,σ2).
yi∼N(wTxi,σ2)
Maximum likelihood estimation is finding the wML that satisfies the following.
wML=wargmaxp(y∣w,X)
The likelihood function for yi and y in terms of w is as follows.
p(yi∣w,xi)=2πσ21exp[−2σ2(yi−wTxi)2]
p(y∣w,X)=i=1∏Kp(yi∣w,xi)=i=1∏K2πσ21exp[−2σ2(yi−wTxi)2]=(2πσ2)K/21exp[−2σ21i=1∑K(yi−wTxi)2]=(2πσ2)K/21exp[−2σ21∥y−Xw∥22]
Since the likelihood is expressed as an exponential function, considering the log likelihood is convenient for computation.
wML=wargmaxlogp(y∣w,X)=wargmax(2πσ2)K/21(−2σ21∥y−Xw∥22)=wargmax(−∥y−Xw∥22)=wargmin∥y−Xw∥22
According to the least squares method, wML is as follows.
wML=(XTX)−1XTy
See Also