logo

Positive Semidefinite Matrices and the Proof of the Extended Cauchy-Schwarz Inequality 📂Matrix Algebra

Positive Semidefinite Matrices and the Proof of the Extended Cauchy-Schwarz Inequality

Theorem 1

For any two vectors b,dRp\mathbf{b}, \mathbf{d} \in \mathbf{R}^{p} and a positive definite matrix ARp×pA \in \mathbf{R}^{p \times p}, the following inequality holds. (bTd)2(bTAb)(dTA1d) \left( \mathbf{b}^{T} \mathbf{d} \right)^{2} \le \left( \mathbf{b}^{T} A \mathbf{b} \right) \left( \mathbf{d}^{T} A^{-1} \mathbf{d} \right) The equivalence conditions for this to be an equality are represented as either b=cA1d\mathbf{b} = c A^{-1} \mathbf{d} or d=cAb\mathbf{d} = c A \mathbf{b} for some constant cRc \in \mathbb{R}.


  • XTX^{T} is the transpose matrix of the matrix XX.

Explanation

This inequality is a generalization of the Cauchy-Schwarz inequality, which equates to the original Cauchy-Schwarz inequality when AA is the identity matrix II. The right side of the inequality introduces a quadratic form, naturally making it highly applicable in mathematical statistics.

Proof

Part 1. The Inequality

Inverse and Square Root of a Positive Definite Matrix: Assuming the eigenpairs {(λk,ek)}k=1n\left\{ \left( \lambda_{k} , e_{k} \right) \right\}_{k=1}^{n} of a positive definite matrix AA are sorted as λ1>>λn>0\lambda_{1} > \cdots > \lambda_{n} > 0, its inverse A1A^{-1} and square root A\sqrt{A} matrices relative to the orthogonal matrix P=[e1en]Rn×nP = \begin{bmatrix} e_{1} & \cdots & e_{n} \end{bmatrix} \in \mathbb{R}^{n \times n} and the diagonal matrix Λ=diag(λ1,,λn)\Lambda = \diag \left( \lambda_{1} , \cdots , \lambda_{n} \right) are as follows. A1=PΛ1PT=k=1n1λkekekTA=PΛPT=k=1nλkekekT \begin{align*} A^{-1} =& P \Lambda^{-1} P^{T} = \sum_{k=1}^{n} {{ 1 } \over { \lambda_{k} }} e_{k} e_{k}^{T} \\ \sqrt{A} =& P \sqrt{\Lambda} P^{T} = \sum_{k=1}^{n} \sqrt{\lambda_{k}} e_{k} e_{k}^{T} \end{align*}

If AA is a positive definite matrix, then its square root matrix is A=PΛPT=k=1nλkekekT \sqrt{A} = P \sqrt{\Lambda} P^{T} = \sum_{k=1}^{n} \sqrt{\lambda_{k}} e_{k} e_{k}^{T} thus being a transpose matrix, A1/2=(A1/2)TA^{1/2} = \left( A^{1/2} \right)^{T} holds, and for the same reason, A1A^{-1} is also a transpose matrix.

When we set x:=A1/2b\mathbf{x} := A^{1/2} \mathbf{b} and y:=A1/2d\mathbf{y} := A^{-1/2} \mathbf{d}, according to the original Cauchy-Schwarz inequality (xTy)(xTx)(yTy)\left( \mathbf{x}^{T} \mathbf{y} \right) \le \left( \mathbf{x}^{T} \mathbf{x} \right) \left( \mathbf{y}^{T} \mathbf{y} \right), (bTd)2=(bTA1/2A1/2d)2=(bT(A1/2)TA1/2d)2=((A1/2b)TA1/2d)2=((A1/2b)TA1/2d)2=(xTy)2(xTx)(yTy)((A1/2b)T(A1/2b))((A1/2d)T(A1/2d))=(bTAb)(dTA1d) \begin{align*} & \left( \mathbf{b}^{T} \mathbf{d} \right)^{2} \\ =& \left( \mathbf{b}^{T} A^{1/2} A^{-1/2} \mathbf{d} \right)^{2} \\ =& \left( \mathbf{b}^{T} \left( A^{1/2} \right)^{T} A^{-1/2} \mathbf{d} \right)^{2} \\ =& \left( \left( A^{1/2} \mathbf{b} \right)^{T} A^{-1/2} \mathbf{d} \right)^{2} \\ =& \left( \left( A^{1/2} \mathbf{b} \right)^{T} A^{-1/2} \mathbf{d} \right)^{2} \\ =& \left( \mathbf{x}^{T} \mathbf{y} \right)^{2} \\ \le & \left( \mathbf{x}^{T} \mathbf{x} \right) \left( \mathbf{y}^{T} \mathbf{y} \right) \\ \le & \left( \left( A^{1/2} \mathbf{b} \right)^{T} \left( A^{1/2} \mathbf{b} \right) \right) \left( \left( A^{-1/2} \mathbf{d} \right)^{T} \left( A^{-1/2} \mathbf{d} \right) \right) \\ =& \left( \mathbf{b}^{T} A \mathbf{b} \right) \left( \mathbf{d}^{T} A^{-1} \mathbf{d} \right) \end{align*} it can be generalized as such.


Part 2. The Equality

If a certain constant is c=0c = 0 so that b=0\mathbf{b} = \mathbf{0} or d=0\mathbf{d} = \mathbf{0}, then the equality obviously holds. If neither are the zero vector and without loss of generality we assume d=cAb\mathbf{d} = c A \mathbf{b}, then bTd=bTcAb=cbTAb \begin{align*} \mathbf{b}^{T} \mathbf{d} =& \mathbf{b}^{T} c A \mathbf{b} \\ =& c \mathbf{b}^{T} A \mathbf{b} \end{align*} holds as well as bTd=(1cA1d)Td=1cdTA1d \begin{align*} \mathbf{b}^{T} \mathbf{d} =& \left( {{ 1 } \over { c }} A^{-1} \mathbf{d} \right)^{T} \mathbf{d} \\ =& {{ 1 } \over { c }} \mathbf{d}^{T} A^{-1} \mathbf{d} \end{align*} By multiplying the extreme sides of these two equations we obtain the following equality. (bTd)2=(bTAb)(dTA1d) \left( \mathbf{b}^{T} \mathbf{d} \right)^{2} = \left( \mathbf{b}^{T} A \mathbf{b} \right) \left( \mathbf{d}^{T} A^{-1} \mathbf{d} \right)


  1. Johnson. (2013). Applied Multivariate Statistical Analysis(6th Edition): p117. ↩︎