logo

Proof of the Cauchy-Schwarz Inequality in Mathematical Statistics 📂Lemmas

Proof of the Cauchy-Schwarz Inequality in Mathematical Statistics

Theorem

For a random variable $X, Y$, the following holds. $$ \operatorname{Cov} (X,Y) \le \operatorname{Var} X \operatorname{Var} Y $$ The necessary and sufficient condition for the equality to hold is as follows1. $$ \exist a \ne 0 , b \in \mathbb{R} : a X + b = Y $$

Proof

Let’s denote the population means of $X,Y$ as $\mu_{X}$ and $\mu_{Y}$, respectively. $$ \begin{align*} h(t) :=& E \left( \left[ \left( X - \mu_{X} \right) t + \left( Y - \mu_{Y} \right) \right]^{2} \right) \\ =& t^{2} E \left[ \left( X - \mu_{X} \right)^{2} \right] + 2t E \left[ \left( X - \mu_{X} \right) \left( Y - \mu_{Y} \right) \right] + \left[ \left( Y - \mu_{Y} \right)^{2} \right] \\ =& \operatorname{Var} X t^{2} + 2 \operatorname{Cov} (X,Y) t + \operatorname{Var} Y \end{align*} $$ According to the root determination method of the quadratic formula, for the root of $h$ to exist at most once, the following must be true. $$ \left( 2 \operatorname{Cov}(X,y) \right)^{2} - 4 \operatorname{Var} X \cdot \operatorname{Var} Y \le 0 $$ Rearranging this leads to the following Cauchy-Schwarz inequality. $$ \operatorname{Cov} (X,Y) \le \operatorname{Var} X \operatorname{Var} Y $$ For the equality to hold, it must be that $h (t) = 0$, and when $a := -t$ and $b := \mu_{X}t + \mu_{Y}$ are set, it is equivalent to the following. $$ \begin{align*} & P \left( \left[ \left( X - \mu_{X} \right) t + \left( Y - \mu_{Y} \right) \right]^{2} = 0 \right) = 1 \\ \iff & P \left( \left( X - \mu_{X} \right) t + \left( Y - \mu_{Y} \right) = 0 \right) = 1 \\ \iff & P \left( Y = aX + b \right) = 1 \end{align*} $$

Explanation

At first glance, because the variance $\operatorname{Var}$ and covariance $\operatorname{Cov}$ appear, it may seem different from the commonly known Cauchy-Schwarz inequality, but upon closer inspection, there is no reason not to call it the Cauchy-Schwarz inequality. When considering the application in mathematical statistics, the inequality itself, as well as the necessary and sufficient condition for the equality to hold $$ \exist a \ne 0 , b \in \mathbb{R} : a X + b = Y $$ are very useful.


  1. Casella. (2001). Statistical Inference(2nd Edition): p172. ↩︎