What is Compressive Sensing?
Definitions
$$ \mathbf{y} = \Theta \mathbf{s} $$ When we have $\Theta \in \mathbb{R}^{n \times p}$ and $n \ll p$, meaning that there are infinitely many solutions $\mathbf{s}$ that satisfy the underdetermined system of matrix equations, then the optimization problem that finds the sparsest solution under the constraint of satisfying this matrix equation is referred to as Compressed Sensing. $$ \begin{matrix} \text{Minimize} & \left\| \mathbf{s} \right\|_{0} \\ \text{subject to} & \mathbf{y} = \Theta \mathbf{s} \end{matrix} $$ Here, $\left\| \cdot \right\|_{0}$ is defined as the $0$-norm, which refers to the number of non-zero elements in a vector.
Description
Compression in Compressed Sensing is also referred to as Compressive Sensing.
Motivation
The motivation behind Compressed Sensing is, of course, sparse regression itself, in which, unlike in ordinary regression analysis, the design matrix $\Phi$ is referred to as the Measurement Matrix, and the vector of regression coefficients $\mathbf{s}$ is referred to as the Signal1.
The above figure visualizes the solving of ordinary matrix equations versus Compressed Sensing, where the color of pixels represents the magnitude of that component. The lighter the color, the closer it is to $0$, and white means it is exactly $0$. Although figuring out $\mathbf{s}$ might be feasible, unlike the left figure, we want to find a $\mathbf{s}$ that contains as many $0$ as possible, as shown in the right figure.
Considering the multiplication of matrices, any component of $\mathbf{s}$ being $0$ means that it’s possible to describe $\mathbf{y}$ without the corresponding column of $\Theta$. An increase in $0$ elements of $\mathbf{s}$ implies that $\Theta$ is “compressed” horizontally, and recognizing a necessary column indicates “sensing,” hence the term Compressed Sensing is deemed appropriate.
Difficulty 2
The problem, as mentioned before, is not just satisfying $\mathbf{y} = \Theta \mathbf{s}$ but finding the optimal solution $\hat{\mathbf{s}}$ where $\left\| \mathbf{s} \right\|_{0}$ is minimized. The most brute-force approach would be to try all possible combinations, meaning computing for $O \left( 2^{p} \right)$, which is an NP-hard problem.
Obviously, it’s smarter to solve the problem in a wiser way than trying all possibilities, and one idea is to consider using a different norm than the $l_{0}$-norm $\left\| \cdot \right\|_{0}$. For starters, consider the simpler $l_{2}$-norm and $l_{1}$-norm.
Thinking geometrically about the Compressed Sensing problem like the above figure, we need to find the equation of a [line] (../2042) $\mathbf{y} = \Theta \mathbf{s}$ that results in a solution with as many elements as possible being $0$. This is similar to geometrically explaining how [Lasso regression differs from Ridge regression] (../2571). At least between these two, the $l_{1}$-norm seems more suitable for Compressed Sensing than the $l_{2}$-norm, but it doesn’t mean we can replace the $l_{0}$-norm with the $l_{1}$-norm without a valid justification.
Looking at the visualization of various $q \ge 0$ for $l_{q}$-norm, it feels instantly that around $q \le 1$, $\left\| \mathbf{s} \right\|_{q}$ will be similar to $\left\| \mathbf{s} \right\|_{0}$. We don’t necessarily need to consider all $0 \le q \le 1$; narrowing down the difference between $l_{0}$-norm and $l_{1}$-norm seems to make solving the Compressed Sensing problem much easier.
Actually, it has been proven that the Compressed Sensing problem can be solved more easily when $\Theta$ follows the Restricted Isometry Property, establishing its unique position and being widely applied in various fields as of 2023.
Needell, D., Vershynin, R. Uniform Uncertainty Principle and Signal Recovery via Regularized Orthogonal Matching Pursuit. Found Comput Math 9, 317–334 (2009). https://doi.org/10.1007/s10208-008-9031-3 ↩︎
Brunton. (2022). Data-Driven Science and Engineering : Machine Learning, Dynamical Systems, and Control(2nd Edition): p110. ↩︎