logo

Proof of the Cauchy-Schwarz Inequality in Mathematical Statistics 📂Lemmas

Proof of the Cauchy-Schwarz Inequality in Mathematical Statistics

Theorem

For a random variable X,YX, Y, the following holds. Cov(X,Y)VarXVarY \operatorname{Cov} (X,Y) \le \operatorname{Var} X \operatorname{Var} Y The necessary and sufficient condition for the equality to hold is as follows1. a0,bR:aX+b=Y \exist a \ne 0 , b \in \mathbb{R} : a X + b = Y

Proof

Let’s denote the population means of X,YX,Y as μX\mu_{X} and μY\mu_{Y}, respectively. h(t):=E([(XμX)t+(YμY)]2)=t2E[(XμX)2]+2tE[(XμX)(YμY)]+[(YμY)2]=VarXt2+2Cov(X,Y)t+VarY \begin{align*} h(t) :=& E \left( \left[ \left( X - \mu_{X} \right) t + \left( Y - \mu_{Y} \right) \right]^{2} \right) \\ =& t^{2} E \left[ \left( X - \mu_{X} \right)^{2} \right] + 2t E \left[ \left( X - \mu_{X} \right) \left( Y - \mu_{Y} \right) \right] + \left[ \left( Y - \mu_{Y} \right)^{2} \right] \\ =& \operatorname{Var} X t^{2} + 2 \operatorname{Cov} (X,Y) t + \operatorname{Var} Y \end{align*} According to the root determination method of the quadratic formula, for the root of hh to exist at most once, the following must be true. (2Cov(X,y))24VarXVarY0 \left( 2 \operatorname{Cov}(X,y) \right)^{2} - 4 \operatorname{Var} X \cdot \operatorname{Var} Y \le 0 Rearranging this leads to the following Cauchy-Schwarz inequality. Cov(X,Y)VarXVarY \operatorname{Cov} (X,Y) \le \operatorname{Var} X \operatorname{Var} Y For the equality to hold, it must be that h(t)=0h (t) = 0, and when a:=ta := -t and b:=μXt+μYb := \mu_{X}t + \mu_{Y} are set, it is equivalent to the following. P([(XμX)t+(YμY)]2=0)=1    P((XμX)t+(YμY)=0)=1    P(Y=aX+b)=1 \begin{align*} & P \left( \left[ \left( X - \mu_{X} \right) t + \left( Y - \mu_{Y} \right) \right]^{2} = 0 \right) = 1 \\ \iff & P \left( \left( X - \mu_{X} \right) t + \left( Y - \mu_{Y} \right) = 0 \right) = 1 \\ \iff & P \left( Y = aX + b \right) = 1 \end{align*}

Explanation

At first glance, because the variance Var\operatorname{Var} and covariance Cov\operatorname{Cov} appear, it may seem different from the commonly known Cauchy-Schwarz inequality, but upon closer inspection, there is no reason not to call it the Cauchy-Schwarz inequality. When considering the application in mathematical statistics, the inequality itself, as well as the necessary and sufficient condition for the equality to hold a0,bR:aX+b=Y \exist a \ne 0 , b \in \mathbb{R} : a X + b = Y are very useful.


  1. Casella. (2001). Statistical Inference(2nd Edition): p172. ↩︎