Uniform Uncertainty Principle: Restricted Isometry Condition 📂Statistical Analysis

Uniform Uncertainty Principle: Restricted Isometry Condition

Definition

$\begin{align*} \left\| \mathbf{v} \right\|_{0} :=& \left| \operatorname{supp} \mathbf{v} \right| \\ =& \operatorname{card} \left\{ k : \left( \mathbf{v} \right) _{k} \ne 0 \right\} \end{align*}$ For a vector $\mathbf{v} \in V$ in the vector space $V$ , let’s define the $l_{0}$ -norm $\left\| \mathbf{v} \right\|_{0} : V \to \mathbb{Z}$ as the cardinality of the support, that is, the number of non-zero components $0$ . $\left\| \cdot \right\|_{1}$ is the $l_{1}$ -norm, $\left\| \cdot \right\|_{2}$ is the $l_{2}$ -norm.

Uncertainty Principle in DFT ¹

Let’s denote the Discrete Fourier Transform of a finite vector $x \in \mathbb{C}^{N}$ of length $N \in \mathbb{N}$ as $\hat{x}$ . The following statement is called the Uncertainty Principle for Discrete Fourier Transform: $\left\| x \right\|_{0} \left\| \hat{x} \right\|_{0} \ge N$

Restricted Isometry Property ²

Let’s denote the components of the vector $\mathbf{s} \in \mathbb{R}^{p}$ and the columns of the matrix $\Theta \in \mathbb{R}^{n \times p}$ corresponding to the $T \subset \left\{ 1 , \cdots , p \right\}$ th ones as $\mathbf{s}_{T} \in \mathbb{R}^{\left| T \right|}$ and $\Theta_{T} \in \mathbb{R}^{n \times \left| T \right|}$ , respectively. Given a natural number $S \in \mathbb{N}$ , for all $T$ and $\mathbf{s} \in \mathbb{R}^{p}$ satisfying $\left| T \right| \le S$ , there exists a $S$ -restricted Isometry Constant $\delta_{S}$ such that $\left( 1 - \delta_{S} \right) \left\| \mathbf{s}_{T} \right\|_{2}^{2} \le \left\| \Theta_{T} \mathbf{s}_{T} \right\|_{2}^{2} \le \left( 1 + \delta_{S} \right) \left\| \mathbf{s}_{T} \right\|_{2}^{2}$ holds, then $\Theta$ is said to follow the Restricted Isometry Hypothesis or Uniform Uncertainty Principle.

Explanation

Compressed Sensing

Compressed sensing, well-known in optimization problems, is often described as follows: $\begin{matrix} \text{Minimize} & \left\| \mathbf{s} \right\|_{0} \\ \text{subject to} & \mathbf{y} = \Theta \mathbf{s} \end{matrix}$ The Heisenberg inequality in Fourier analysis, especially known in quantum mechanics as Heisenberg’s Uncertainty Principle, was mentioned by Donoho, a pioneer of compressed sensing, in the 5th theorem of a paper published in 1989 under the keyword Signal Recovery, which presents the condition for the uniqueness of the solution, and in the 9th theorem when $\mathbf{y} = \Theta \mathbf{s}$ , $\argmin \left\| \mathbf{s} \right\|_{0} = \argmin \left\| \mathbf{s} \right\|_{1}$ which was relaxed as follows: $\begin{matrix} \text{Minimize} & \left\| \mathbf{s} \right\|_{1} \\ \text{subject to} & \mathbf{y} = \Theta \mathbf{s} \end{matrix}$ Candès and Tao went further under the condition that $\Theta$ follows the $S$ -restricted Isometry Hypothesis, when the solution $\mathbf{s}^{\ast}$ is sufficiently sparse, $\begin{matrix} \text{Minimize} & \left\| \mathbf{s} \right\|_{1} \\ \text{subject to} & \left\| \mathbf{y} - \Theta \mathbf{s} \right\|_{2} \le \varepsilon \end{matrix}$ they showed that the optimal solution $\hat{\mathbf{s}}$ satisfies the following inequality for some constant $C > 0$ : $\left\| \hat{\mathbf{s}} - \mathbf{s}^{\ast} \right\|_{2} \le C \varepsilon$ This essentially presents a condition under which the NP-hard problem of compressed sensing is easily solvable, and currently, this condition is widely used across various fields based on these findings.

The problem is that it’s challenging to simultaneously excel in Fourier analysis and data science, making it difficult to understand the relationship between the uncertainty principle and the restricted isometry property. Even Donoho’s paper continuously pushes the term uncertainty principle, making it quite troublesome to read, but at least Candès, Tao, and others explain it in terms of ‘matrix and restricted isometry condition’ in several papers in a relatively easy-to-understand manner mathematically, and similar explanations are made in other papers citing them. ³⁴

Restricted Isometry Property and Column-wise Normalization

Stepping back a bit from the practical aspect, let’s intuitively understand why $\Theta$ is normalized column-wise in sparse regression when a matrix equation like $\mathbf{y} = \Theta \mathbf{s}$ is given.

Looking back at the inequality presented in the restricted isometry condition, $\left( 1 - \delta_{S} \right) \left\| \mathbf{s}_{T} \right\|_{2}^{2} \le \left\| \Theta_{T} \mathbf{s}_{T} \right\|_{2}^{2} \le \left( 1 + \delta_{S} \right) \left\| \mathbf{s}_{T} \right\|_{2}^{2}$ it’s similar to the formula that comes from equivalence of norms learned in linear algebra.

Equivalence of Norms: For two norms $\left\| \cdot \right\|_{\alpha}, \left\| \cdot \right\|_{\beta}$ defined on the vector space $V$ and any vector $\mathbf{v} \in V$ , $c \left\| \mathbf{v} \right\|_{\alpha} \le \left\| \mathbf{v} \right\|_{\beta} \le C \left\| \mathbf{v} \right\|_{\alpha}$ If there exist constants $c , C >0$ that satisfy the above, the two norms are equivalent.

However, the results of Donoho, Candès, and Tao essentially say that the solution of $\argmin \left\| \cdot \right\|_{1}$ is also a solution of $\argmin \left\| \cdot \right\|_{0}$ , and in this sense, the appearance of formulas like the restricted isometry condition seems quite related at an intuitive level. Of course, one could check if $\left\| \cdot \right\|_{?} = \left\| \Theta_{T} \cdot \right\|_{2}$ is really a norm, but for now, let’s move on to the topic of tight frames in Hilbert space.

Frames in Hilbert Space: For a sequence $\left\{ \mathbf{v}_{k} \right\}_{k \in \mathbb{N}}$ in the Hilbert space $H$ , if there exists $A,B > 0$ that satisfies the following, $\left\{ \mathbf{v}_{k} \right\}_{k \in \mathbb{N}}$ is called a frame, and especially when $A = B$ , this frame is said to be tight. $A \left\| \mathbf{v} \right\|^{2} \le \sum_{k \in \mathbb{N}} \left| \left\langle \mathbf{v} , \mathbf{v}_{k} \right\rangle \right|^{2} \le B \left\| \mathbf{v} \right\|^{2} \qquad , \forall \mathbf{v} \in H$ Especially if $\left\{ \mathbf{v}_{k} \right\}_{k \in \mathbb{N}}$ is an orthonormal basis of $H$ , it’s equivalent to being a tight frame of $A=B=1$ .
Equivalence Conditions for Orthonormal Basis: Assuming $H$ is a Hilbert space, for an orthonormal system $\left\{ \mathbf{e}_{k} \right\}_{k \in \mathbb{N}} \subset H$ in $H$ , the following are all equivalent:
(i): $\left\{ \mathbf{e}_{k} \right\}_{k \in \mathbb{N}} \subset H$ is an orthonormal basis of $H$ .
(iv): For all $\mathbf{x}\in H$ , $\sum_{k \in \mathbb{N}} \left| \langle \mathbf{x}, \mathbf{e}_{k} \rangle \right|^{2} = \left\| \mathbf{x}\right\|^{2}$

The fact that a frame becomes $A = B = 1$ means that for $\left\| \Theta_{T} \mathbf{s}_{T} \right\|_{2} \approx \left\| \mathbf{s}_{T} \right\|$ , or a little loosely for some small $\delta > 0$ , $1 - \delta = A \approx 1 \approx B = 1 + \delta$ it implies that $\Theta_{T}$ forms a ’not perfect but’ tight frame, and its orthonormality, though ’not perfect,’ can be inferred as a necessary and sufficient condition. Therefore, in practice, when performing calculations, sometimes the columns of the matrix are divided so that the sum of squares of each column becomes $1$ , and Candès and Tao’s papers are cited in such instances. ⁵

Relationship with Lasso

$\begin{matrix} \text{Minimize} & \left\| \mathbf{s} \right\|_{1} \\ \text{subject to} & \left\| \mathbf{y} - \Theta \mathbf{s} \right\|_{2} \le \varepsilon \end{matrix}$

Compressed sensing, as described above, can be reformulated as an optimization problem to find the following optimal solution according to the Lagrange multiplier method: $\argmin \left\| \mathbf{s} \right\|_{1} + \lambda \left\| \mathbf{y} - \Theta \mathbf{s} \right\|_{2}$

Meanwhile, Lasso regression is described as the following optimization problem:

Lasso Regression: The following optimization problem is called BPDN, and solving BPDN is referred to as Lasso or Lasso Regression. $\argmin_{\beta} \left( {{ 1 } \over { 2 }} \left\| Y - X \beta \right\|_{2}^{2} + \lambda \left\| \beta \right\|_{1} \right)$

This means that the two problems are identical mathematically, and this difference can be explained as whether it’s a term used by statisticians or signal analysts. ⁶ In practical terms, it’s sufficient to know that they are essentially the same, but as discussed, it’s crucial to be aware that there are indeed conditions for them to be equivalent. To summarize once again:

Under the restricted isometry condition, compressed sensing can be performed through Lasso Regression.
- However, Lasso regression is not the only method for performing compressed sensing.
Strictly speaking, they are different.
- However, there’s rarely a need to distinguish them strictly.

Y. Lyubarskii and R. Vershynin, “Uncertainty Principles and Vector Quantization,” in IEEE Transactions on Information Theory, vol. 56, no. 7, pp. 3491-3501, July 2010, doi: https://doi.org/10.1109/TIT.2010.2048458. ↩︎
Candès, E.J., Romberg, J.K. and Tao, T. (2006), Stable signal recovery from incomplete and inaccurate measurements. Comm. Pure Appl. Math., 59: 1207-1223. https://doi.org/10.1002/cpa.20124 ↩︎
E. J. Candes and T. Tao, “Near-Optimal Signal Recovery From Random Projections: Universal Encoding Strategies?,” in IEEE Transactions on Information Theory, vol. 52, no. 12, pp. 5406-5425, Dec. 2006, doi: https://doi.org/10.1109/TIT.2006.885507. ↩︎
Needell, D., Vershynin, R. Uniform Uncertainty Principle and Signal Recovery via Regularized Orthogonal Matching Pursuit. Found Comput Math 9, 317–334 (2009). https://doi.org/10.1007/s10208-008-9031-3 ↩︎
Brunton. (2016). Discovering governing equations from data by sparse identification of nonlinear dynamical systems: https://doi.org/10.1073/pnas.1517384113 ↩︎
https://stats.stackexchange.com/a/328731/172321 ↩︎