logo

Least Squares Method 📂Matrix Algebra

Least Squares Method

Definition1

Let’s say that a linear system $A\mathbf{x} = \mathbf{b}$ with a matrix $A \in \mathbb{C}^{m \times n}$ and a vector $\mathbf{b} \in \mathbb{C}^{m}$ is either overdetermined or underdetermined. In this case, the system either does not have a solution or has infinitely many. Here, consider the problem of minimizing the value of

$$ \left\| A \mathbf{x} - \mathbf{b} \right\|_{2} $$

This is called the Least Squares Problem (LSP). The solution $\mathbf{x}_{\ast}$ to this problem is referred to as the least square solution.

$$ \mathbf{x}_{\ast} = \argmin \left\| A \mathbf{x} - \mathbf{b} \right\|_{2} $$

$A \mathbf{x} - \mathbf{b}$ is called the least square error vector, and $\left\| A \mathbf{x} - \mathbf{b} \right\|$ is called the least square error.

Explanation

It’s unfortunate when a system of equations has no solution, but that doesn’t mean we can abandon the search for a solution altogether. In fact, equations that are tough to solve, waiting for solutions from mathematicians at the forefront of academic research, are typically of this nature. Researching methods to solve these problems, even if only approximately, is undeniably valuable and practical. Among these methods, the Least Squares Method is one of the most prominent. It is actively used across various applied sciences, especially in statistics where it is fundamental to the theory supporting regression analysis. The minimization of $\left\| \mathbf{b} - A \mathbf{x} \right\|_{2}$ implies that the distance, or the error, between $A \mathbf{x}$ and $\mathbf{b}$ is minimized. Regarding the orthogonal projection $P : \mathbb{C}^{m} \to \mathcal{C} (A)$,

$$ \mathbf{b} = P \mathbf{b} + (I -P) \mathbf{b} $$

$$ P \mathbf{b} \in \mathcal{C} (A) $$

thus, for any vector $\mathbf{x}_{\ast}$, $A \mathbf{x}_{\ast} = P \mathbf{b}$ holds. Representing this as

$$ \left\| A \mathbf{x} - \mathbf{b} \right\|_{2} = \left\| A \mathbf{x} - P \mathbf{b} + P \mathbf{b} - \mathbf{b} \right\|_{2} $$

we can understand that $( A \mathbf{x} - P \mathbf{b} ) \in \mathcal{C} (A)$ and $(I -P )\mathbf{b} \in \mathcal{N}(A)$ are orthogonal to each other. According to Pythagoras’ theorem,

$$ \left\| \mathbf{b} - A \mathbf{x} \right\|_{2}^{2} = \left\| A \mathbf{x} - P \mathbf{b} \right\|_{2}^{2} + \left\| (I -P )\mathbf{b} \right\|_{2}^{2} $$

and the smallest possible value of $\left\| \mathbf{b} - A \mathbf{x} \right\|_{2}$ occurs when $\mathbf{x} = \mathbf{x}_{\ast}$ is true.

Furthermore, due to the properties of projections, $A \in \mathcal{C} (A)$ and thus $(I - P) \mathbf{b} \in \mathcal{C} (A)^{\perp}$,

$$ A^{\ast} (I - P) \mathbf{b} = A^{\ast} ( \mathbf{b} - A \mathbf{x}_{\ast} ) = 0 $$

In summary, $A^{\ast} A \mathbf{x}_{\ast} = A^{\ast} \mathbf{b}$, so the Least Squares Method essentially consists of finding the solution $\mathbf{x}_{\ast}$ that satisfies the normal equation $A^{\ast} A \mathbf{x}_{\ast} = A^{\ast} \mathbf{b}$.

For an intuitive understanding not through formulas but through figures, the following example might be helpful.

20180715\_180721.png

Consider the problem of finding a line that goes through all the points laid out on a plane as shown above. Naturally, there is no line (solution) that exactly does this, but we can look for an approximate solution that does it as closely as possible.

20190905\_104344.png

Comparing the green and red lines, one can easily see that the one on the left is more accurate than the one on the right. The lengths of the blue lines represent the distances the points fall from the line when projected onto it. In this problem, the least square solution is a certain line for which the sum of the squares of these distances is minimized.


  1. Howard Anton, Elementary Linear Algebra: Aplications Version (12th Edition, 2019), p417-418 ↩︎