Positive Semidefinite Matrices and the Proof of the Extended Cauchy-Schwarz Inequality
Theorem 1
For any two vectors $\mathbf{b}, \mathbf{d} \in \mathbf{R}^{p}$ and a positive definite matrix $A \in \mathbf{R}^{p \times p}$, the following inequality holds. $$ \left( \mathbf{b}^{T} \mathbf{d} \right)^{2} \le \left( \mathbf{b}^{T} A \mathbf{b} \right) \left( \mathbf{d}^{T} A^{-1} \mathbf{d} \right) $$ The equivalence conditions for this to be an equality are represented as either $\mathbf{b} = c A^{-1} \mathbf{d}$ or $\mathbf{d} = c A \mathbf{b}$ for some constant $c \in \mathbb{R}$.
- $X^{T}$ is the transpose matrix of the matrix $X$.
Explanation
This inequality is a generalization of the Cauchy-Schwarz inequality, which equates to the original Cauchy-Schwarz inequality when $A$ is the identity matrix $I$. The right side of the inequality introduces a quadratic form, naturally making it highly applicable in mathematical statistics.
Proof
Part 1. The Inequality
Inverse and Square Root of a Positive Definite Matrix: Assuming the eigenpairs $\left\{ \left( \lambda_{k} , e_{k} \right) \right\}_{k=1}^{n}$ of a positive definite matrix $A$ are sorted as $\lambda_{1} > \cdots > \lambda_{n} > 0$, its inverse $A^{-1}$ and square root $\sqrt{A}$ matrices relative to the orthogonal matrix $P = \begin{bmatrix} e_{1} & \cdots & e_{n} \end{bmatrix} \in \mathbb{R}^{n \times n}$ and the diagonal matrix $\Lambda = \diag \left( \lambda_{1} , \cdots , \lambda_{n} \right)$ are as follows. $$ \begin{align*} A^{-1} =& P \Lambda^{-1} P^{T} = \sum_{k=1}^{n} {{ 1 } \over { \lambda_{k} }} e_{k} e_{k}^{T} \\ \sqrt{A} =& P \sqrt{\Lambda} P^{T} = \sum_{k=1}^{n} \sqrt{\lambda_{k}} e_{k} e_{k}^{T} \end{align*} $$
If $A$ is a positive definite matrix, then its square root matrix is $$ \sqrt{A} = P \sqrt{\Lambda} P^{T} = \sum_{k=1}^{n} \sqrt{\lambda_{k}} e_{k} e_{k}^{T} $$ thus being a transpose matrix, $A^{1/2} = \left( A^{1/2} \right)^{T}$ holds, and for the same reason, $A^{-1}$ is also a transpose matrix.
When we set $\mathbf{x} := A^{1/2} \mathbf{b}$ and $\mathbf{y} := A^{-1/2} \mathbf{d}$, according to the original Cauchy-Schwarz inequality $\left( \mathbf{x}^{T} \mathbf{y} \right) \le \left( \mathbf{x}^{T} \mathbf{x} \right) \left( \mathbf{y}^{T} \mathbf{y} \right)$, $$ \begin{align*} & \left( \mathbf{b}^{T} \mathbf{d} \right)^{2} \\ =& \left( \mathbf{b}^{T} A^{1/2} A^{-1/2} \mathbf{d} \right)^{2} \\ =& \left( \mathbf{b}^{T} \left( A^{1/2} \right)^{T} A^{-1/2} \mathbf{d} \right)^{2} \\ =& \left( \left( A^{1/2} \mathbf{b} \right)^{T} A^{-1/2} \mathbf{d} \right)^{2} \\ =& \left( \left( A^{1/2} \mathbf{b} \right)^{T} A^{-1/2} \mathbf{d} \right)^{2} \\ =& \left( \mathbf{x}^{T} \mathbf{y} \right)^{2} \\ \le & \left( \mathbf{x}^{T} \mathbf{x} \right) \left( \mathbf{y}^{T} \mathbf{y} \right) \\ \le & \left( \left( A^{1/2} \mathbf{b} \right)^{T} \left( A^{1/2} \mathbf{b} \right) \right) \left( \left( A^{-1/2} \mathbf{d} \right)^{T} \left( A^{-1/2} \mathbf{d} \right) \right) \\ =& \left( \mathbf{b}^{T} A \mathbf{b} \right) \left( \mathbf{d}^{T} A^{-1} \mathbf{d} \right) \end{align*} $$ it can be generalized as such.
Part 2. The Equality
If a certain constant is $c = 0$ so that $\mathbf{b} = \mathbf{0}$ or $\mathbf{d} = \mathbf{0}$, then the equality obviously holds. If neither are the zero vector and without loss of generality we assume $\mathbf{d} = c A \mathbf{b}$, then $$ \begin{align*} \mathbf{b}^{T} \mathbf{d} =& \mathbf{b}^{T} c A \mathbf{b} \\ =& c \mathbf{b}^{T} A \mathbf{b} \end{align*} $$ holds as well as $$ \begin{align*} \mathbf{b}^{T} \mathbf{d} =& \left( {{ 1 } \over { c }} A^{-1} \mathbf{d} \right)^{T} \mathbf{d} \\ =& {{ 1 } \over { c }} \mathbf{d}^{T} A^{-1} \mathbf{d} \end{align*} $$ By multiplying the extreme sides of these two equations we obtain the following equality. $$ \left( \mathbf{b}^{T} \mathbf{d} \right)^{2} = \left( \mathbf{b}^{T} A \mathbf{b} \right) \left( \mathbf{d}^{T} A^{-1} \mathbf{d} \right) $$
■
Johnson. (2013). Applied Multivariate Statistical Analysis(6th Edition): p117. ↩︎