logo

Proof of the Finite Form of Jensen's Inequality 📂Lemmas

Proof of the Finite Form of Jensen's Inequality

Theorem

In IRI \subset \mathbb{R}, for the convex functions f:IRf : I \to \mathbb{R} and k=1nλk=1,λk>0\displaystyle \sum_{k=1}^{n} \lambda_{k} = 1, \lambda_{k}>0

f(λ1x1+λ2x2++λnxn)λ1f(x1)+λ2f(x2)++λnf(xn)f(k=1nλkxk)k=1nλkf(xk) \begin{align*} f( \lambda_{1} x_{1} + \lambda_{2} x_{2} + \cdots + \lambda_{n} x_{n} ) & \le \lambda_{1} f( x_{1}) + \lambda_{2} f( x_{2}) + \cdots + \lambda_{n} f( x_{n} ) \\ f\left( \sum\limits_{k=1}^{n}\lambda_{k}x_{k} \right) &\le \sum\limits_{k=1}^{n} \lambda_{k} f(x_{k}) \end{align*}

If ff is a concave function, then the opposite inequality holds.

f(λ1x1+λ2x2++λnxn)λ1f(x1)+λ2f(x2)++λnf(xn)f(k=1nλkxk)k=1nλkf(xk) \begin{align*} f( \lambda_{1} x_{1} + \lambda_{2} x_{2} + \cdots + \lambda_{n} x_{n} ) & \ge \lambda_{1} f( x_{1}) + \lambda_{2} f( x_{2}) + \cdots + \lambda_{n} f( x_{n} ) \\ f\left( \sum\limits_{k=1}^{n}\lambda_{k}x_{k} \right) &\ge \sum\limits_{k=1}^{n} \lambda_{k} f(x_{k}) \end{align*}

Description

Jensen’s Inequality is widely used across various fields as a representative application of convex functions. Its finite form is a generalization concerning the number of points and weights from the original definition of convexity.

Proof

The proof method is the same, so we will only prove for convex functions.

Strategy: Apply mathematical induction by inserting terms inside and outside the function according to the definition of convex functions.

Definition of convex functions: For any two elements x1,x2x_{1} , x_{2} in the interval IRI \subset \mathbb{R} and functions f:IRf : I \to \mathbb{R} and 0t10 \le t \le 1, if f(tx1+(1t)x2)tf(x1)+(1t)f(x2)f( t x_{1} + (1-t) x_{2}) \le t f(x_{1}) + (1-t) f(x_{2}), then ff is called a convex function at II.


When n=1n=1, ff is convex, therefore

f(λx)λf(x) f( \lambda x ) \le \lambda f(x)

When n=2n=2, ff is convex, therefore

f(λ1x1+λ2x2)λ1f(x1)+λ2f(x2) f( \lambda_{1} x_{1} + \lambda_{2} x_{2} ) \le \lambda_{1} f( x_{1} ) + \lambda_{2} f( x_{2})

Assuming this holds for n=N3n = N \ge 3,

f(λ1x1+λ2x2++λNxN)λ1f(x1)+λ2f(x2)++λNf(xN) f( \lambda_{1} x_{1} + \lambda_{2} x_{2} + \cdots + \lambda_{N} x_{N} ) \le \lambda_{1} f( x_{1}) + \lambda_{2} f( x_{2}) + \cdots + \lambda_{N} f( x_{N} )

then proving it also holds for n=N+1n = N + 1 completes the proof.

Defining μi:=λi1λN+1\displaystyle \mu_{i} := {{\lambda_{i}} \over {1 - \lambda_{N+1}}} in k=1N+1λk=1\displaystyle \sum_{k=1}^{N+1} \lambda_{k} = 1 gives k=1Nμk=1\displaystyle \sum_{k=1}^{N} \mu_{k} = 1, therefore

f(μ1x1+μ2x2++μNxN)μ1f(x1)+μ2f(x2)++μNf(xN) f( \mu_{1} x_{1} + \mu_{2} x_{2} + \cdots + \mu_{N} x_{N} ) \le \mu_{1} f( x_{1}) + \mu_{2} f( x_{2}) + \cdots + \mu_{N} f( x_{N} )

Following the definition of μi\mu_{i},

f(λ11λN+1x1+λ21λN+1x2++λN1λN+1xN)λ11λN+1f(x1)+λ21λN+1f(x2)++λN1λN+1f(xN) f( {{\lambda_{1}} \over {1 - \lambda_{N+1}}} x_{1} + {{\lambda_{2}} \over {1 - \lambda_{N+1}}} x_{2} + \cdots + {{\lambda_{N}} \over {1 - \lambda_{N+1}}} x_{N} ) \\ \le {{\lambda_{1}} \over {1 - \lambda_{N+1}}} f( x_{1}) + {{\lambda_{2}} \over {1 - \lambda_{N+1}}} f( x_{2}) + \cdots + {{\lambda_{N}} \over {1 - \lambda_{N+1}}} f( x_{N} )

grouping them as 11λN+1\displaystyle {{1} \over {1 - \lambda_{N+1}}},

f(11λN+1(λ1x1+λ2x2++λNxN))11λN+1(λ1f(x1)+λ2f(x2)++λNf(xN)) f \left( {{1} \over {1 - \lambda_{N+1}}} \left( \lambda_{1} x_{1} + \lambda_{2} x_{2} + \cdots + \lambda_{N} x_{N} \right) \right) \\ \le {{1} \over {1 - \lambda_{N+1}}} \left( \lambda_{1} f( x_{1}) + \lambda_{2} f( x_{2}) + \cdots + \lambda_{N} f( x_{N} ) \right)

Multiplying both sides by 1λN+11 - \lambda_{N+1},

(1λN+1)f(11λN+1(λ1x1+λ2x2++λNxN))λ1f(x1)+λ2f(x2)++λNf(xN) (1 - \lambda_{N+1}) f \left( {{1} \over {1 - \lambda_{N+1}}} \left( \lambda_{1} x_{1} + \lambda_{2} x_{2} + \cdots + \lambda_{N} x_{N} \right) \right) \\ \le \lambda_{1} f( x_{1}) + \lambda_{2} f( x_{2}) + \cdots + \lambda_{N} f( x_{N} )

Adding λN+1f(xN+1)\lambda_{N+1} f( x_{N+1} ) to both sides,

(1λN+1)f(11λN+1(λ1x1+λ2x2++λNxN))+λN+1f(xN+1)λ1f(x1)+λ2f(x2)++λNf(xN)+λN+1f(xN+1) (1 - \lambda_{N+1}) f \left( {{1} \over {1 - \lambda_{N+1}}} \left( \lambda_{1} x_{1} + \lambda_{2} x_{2} + \cdots + \lambda_{N} x_{N} \right) \right) + \lambda_{N+1} f (x_{N+1}) \\ \le \lambda_{1} f( x_{1}) + \lambda_{2} f( x_{2}) + \cdots + \lambda_{N} f( x_{N} ) + \lambda_{N+1} f( x_{N+1} )

Since (1λN+1)+λN+1=1( 1 - \lambda_{N+1} ) + \lambda_{N+1} = 1 and ff is a convex function,

f(1λN+11λN+1(λ1x1+λ2x2++λNxN)+λN+1xN+1)(1λN+1)f(11λN+1(λ1x1+λ2x2++λNxN))+λN+1f(xN+1) f \left( {{1 - \lambda_{N+1}} \over {1 - \lambda_{N+1}}} \left( \lambda_{1} x_{1} + \lambda_{2} x_{2} + \cdots + \lambda_{N} x_{N} \right) + \lambda_{N+1} x_{N+1} \right) \\ \le (1 - \lambda_{N+1}) f \left( {{1} \over {1 - \lambda_{N+1}}} \left( \lambda_{1} x_{1} + \lambda_{2} x_{2} + \cdots + \lambda_{N} x_{N} \right) \right) + \lambda_{N+1} f (x_{N+1})

Continuing the inequality,

f(λ1x1+λ2x2++λNxN+λN+1xN+1)(1λN+1)f(11λN+1(λ1x1+λ2x2++λNxN))+λN+1f(xN+1)λ1f(x1)+λ2f(x2)++λNf(xN)+λN+1f(xN+1) f \left( \lambda_{1} x_{1} + \lambda_{2} x_{2} + \cdots + \lambda_{N} x_{N} + \lambda_{N+1} x_{N+1} \right) \\ \le (1 - \lambda_{N+1}) f \left( {{1} \over {1 - \lambda_{N+1}}} \left( \lambda_{1} x_{1} + \lambda_{2} x_{2} + \cdots + \lambda_{N} x_{N} \right) \right) + \lambda_{N+1} f (x_{N+1}) \\ \le \lambda_{1} f( x_{1}) + \lambda_{2} f( x_{2}) + \cdots + \lambda_{N} f( x_{N} ) + \lambda_{N+1} f( x_{N+1} )

In conclusion,

f(λ1x1+λ2x2++λN+1xN+1)λ1f(x1)+λ2f(x2)++λN+1f(xN+1) f( \lambda_{1} x_{1} + \lambda_{2} x_{2} + \cdots + \lambda_{N+1} x_{N+1} ) \le \lambda_{1} f( x_{1}) + \lambda_{2} f( x_{2}) + \cdots + \lambda_{N+1} f( x_{N+1} )

See Also