Y=β0+β1X1+⋯+βpXp+ε
In multiple regression analysis, when establishing the above linear model for given p independent variables X1,⋯,Xp, β0,β1,⋯,βp is called the regression coefficient. Y represents the dependent variable, and ε indicates randomly distributed errors.
Formula
y1y2⋮yn=11⋮1x11x12⋮x1n⋯⋯⋱⋯xp1xp2⋮xpnβ0β1⋮βp+ε1ε2⋮εn
Given ndata points and denoting it as p<n, the linear multiple regression model can be represented in a design matrix as above, and we simply denote it as Y=Xβ+ε. The least squaresestimator vectorβ^ for β is as follows.
β^=β^0β^1⋮β^p=(XTX)−1XTY
Moreover, since β^ is the best unbiased estimator of β, it is also known as the Best Linear Unbiased Estimator.
Our goal is
∥ε∥22=k=0∑nεk=[ε0ε1⋯εn]ε0ε1⋮εn=εTε
to minimize. Therefore, finding a β that minimizes εTε=(Y−Xβ)T(Y−Xβ) follows from ε=Y−Xβ.
By differentiating both sides concerning β∂β∂εTε===−2XT(Y−Xβ)−2XT(Y−Xβ)−2XTY+2XTXβ
results in a zero vector0 for β^ like this.
β^=βargminεTε=(XTX)−1XTY
Meanwhile, it can be easily shown that β^ is an unbiased estimator for β, and since it is derived via the least squares method, there is no unbiased estimator with smaller variance for β, making it the best unbiased estimator.
■
If differentiating concerning β in the derivation is not particularly appealing, an alternative approach using matrix algebra is available. In least squares method in matrix algebra,
X∗Y=X∗Xβ^
the fact that β^ satisfies being the least squares solution shows that since X∈Rn×p holds, X∗=XT follows, and consequently, β^=(XTX)−1XTY is derived.
Corollary
If β^ is the best linear unbiased estimator, then the sum of yk and the sum of fits y^k=1=β^0+∑j=1pβ^jxj are equal:
k=1∑nyk=k=1∑ny^k
Proof
In proving this formula, that β^ is the best linear unbiased estimator means
⟹⟹0=−2XTY+2XTXβ^0=XT(Y−Xβ^)0=XTy1−y^1⋮yn−y^n
that this holds. Since X is the design matrix, the first row of XT can be regarded as all elements being 1 in a row matrix of ones. Considering only the product of the first row of XT with Y−Xβ^ yields the following.
⟹⟹0=[1⋯1]y1−y^1⋮yn−y^n0=(y1−y^1)+⋯+(yn−y^n)0=k=1∑nyk−k=1∑ny^k
Consequently, the following is obtained.
k=1∑nyk=k=1∑ny^k