Positive Semidefinite Matrices and the Proof of the Extended Cauchy-Schwarz Inequality 📂Matrix Algebra

Positive Semidefinite Matrices and the Proof of the Extended Cauchy-Schwarz Inequality

Theorem ¹

For any two vectors $\mathbf{b}, \mathbf{d} \in \mathbf{R}^{p}$ and a positive definite matrix $A \in \mathbf{R}^{p \times p}$ , the following inequality holds. $\left( \mathbf{b}^{T} \mathbf{d} \right)^{2} \le \left( \mathbf{b}^{T} A \mathbf{b} \right) \left( \mathbf{d}^{T} A^{-1} \mathbf{d} \right)$ The equivalence conditions for this to be an equality are represented as either $\mathbf{b} = c A^{-1} \mathbf{d}$ or $\mathbf{d} = c A \mathbf{b}$ for some constant $c \in \mathbb{R}$ .

$X^{T}$ is the transpose matrix of the matrix $X$ .

Explanation

This inequality is a generalization of the Cauchy-Schwarz inequality, which equates to the original Cauchy-Schwarz inequality when $A$ is the identity matrix $I$ . The right side of the inequality introduces a quadratic form, naturally making it highly applicable in mathematical statistics.

Proof

Part 1. The Inequality

Inverse and Square Root of a Positive Definite Matrix: Assuming the eigenpairs $\left\{ \left( \lambda_{k} , e_{k} \right) \right\}_{k=1}^{n}$ of a positive definite matrix $A$ are sorted as $\lambda_{1} > \cdots > \lambda_{n} > 0$ , its inverse $A^{-1}$ and square root $\sqrt{A}$ matrices relative to the orthogonal matrix $P = \begin{bmatrix} e_{1} & \cdots & e_{n} \end{bmatrix} \in \mathbb{R}^{n \times n}$ and the diagonal matrix $\Lambda = \diag \left( \lambda_{1} , \cdots , \lambda_{n} \right)$ are as follows. $\begin{align*} A^{-1} =& P \Lambda^{-1} P^{T} = \sum_{k=1}^{n} {{ 1 } \over { \lambda_{k} }} e_{k} e_{k}^{T} \\ \sqrt{A} =& P \sqrt{\Lambda} P^{T} = \sum_{k=1}^{n} \sqrt{\lambda_{k}} e_{k} e_{k}^{T} \end{align*}$

If $A$ is a positive definite matrix, then its square root matrix is $\sqrt{A} = P \sqrt{\Lambda} P^{T} = \sum_{k=1}^{n} \sqrt{\lambda_{k}} e_{k} e_{k}^{T}$ thus being a transpose matrix, $A^{1/2} = \left( A^{1/2} \right)^{T}$ holds, and for the same reason, $A^{-1}$ is also a transpose matrix.

When we set $\mathbf{x} := A^{1/2} \mathbf{b}$ and $\mathbf{y} := A^{-1/2} \mathbf{d}$ , according to the original Cauchy-Schwarz inequality $\left( \mathbf{x}^{T} \mathbf{y} \right) \le \left( \mathbf{x}^{T} \mathbf{x} \right) \left( \mathbf{y}^{T} \mathbf{y} \right)$ , $\begin{align*} & \left( \mathbf{b}^{T} \mathbf{d} \right)^{2} \\ =& \left( \mathbf{b}^{T} A^{1/2} A^{-1/2} \mathbf{d} \right)^{2} \\ =& \left( \mathbf{b}^{T} \left( A^{1/2} \right)^{T} A^{-1/2} \mathbf{d} \right)^{2} \\ =& \left( \left( A^{1/2} \mathbf{b} \right)^{T} A^{-1/2} \mathbf{d} \right)^{2} \\ =& \left( \left( A^{1/2} \mathbf{b} \right)^{T} A^{-1/2} \mathbf{d} \right)^{2} \\ =& \left( \mathbf{x}^{T} \mathbf{y} \right)^{2} \\ \le & \left( \mathbf{x}^{T} \mathbf{x} \right) \left( \mathbf{y}^{T} \mathbf{y} \right) \\ \le & \left( \left( A^{1/2} \mathbf{b} \right)^{T} \left( A^{1/2} \mathbf{b} \right) \right) \left( \left( A^{-1/2} \mathbf{d} \right)^{T} \left( A^{-1/2} \mathbf{d} \right) \right) \\ =& \left( \mathbf{b}^{T} A \mathbf{b} \right) \left( \mathbf{d}^{T} A^{-1} \mathbf{d} \right) \end{align*}$ it can be generalized as such.

Part 2. The Equality

If a certain constant is $c = 0$ so that $\mathbf{b} = \mathbf{0}$ or $\mathbf{d} = \mathbf{0}$ , then the equality obviously holds. If neither are the zero vector and without loss of generality we assume $\mathbf{d} = c A \mathbf{b}$ , then $\begin{align*} \mathbf{b}^{T} \mathbf{d} =& \mathbf{b}^{T} c A \mathbf{b} \\ =& c \mathbf{b}^{T} A \mathbf{b} \end{align*}$ holds as well as $\begin{align*} \mathbf{b}^{T} \mathbf{d} =& \left( {{ 1 } \over { c }} A^{-1} \mathbf{d} \right)^{T} \mathbf{d} \\ =& {{ 1 } \over { c }} \mathbf{d}^{T} A^{-1} \mathbf{d} \end{align*}$ By multiplying the extreme sides of these two equations we obtain the following equality. $\left( \mathbf{b}^{T} \mathbf{d} \right)^{2} = \left( \mathbf{b}^{T} A \mathbf{b} \right) \left( \mathbf{d}^{T} A^{-1} \mathbf{d} \right)$

■

Johnson. (2013). Applied Multivariate Statistical Analysis(6th Edition): p117. ↩︎