logo

Definition of a General Convex Function 📂Functions

Definition of a General Convex Function

Definition

A function $f : V \to \mathbb{R}$ defined on a vector space $V$ is called a convex function if it satisfies the following for all $\mathbf{x}, \mathbf{y} \in V$ and all $t \in [0,1]$. $$ f \left( (1-t) \mathbf{x} + t \mathbf{y} \right) \le (1-t) f \left( \mathbf{x} \right) + t f \left( \mathbf{y} \right) $$

Explanation

In fact, once one studies mathematics at least at the undergraduate level, one no longer uses expressions like convex or concave; indeed, one tends not to use the non‑convex case ‘concave’ . As a rule, concepts are unified via convexity, and one thinks only in terms of convex objects. The convex function introduced above is a generalization of convex functions defined on Euclidean space $X = \mathbb{R}$.

Example

$f(x,y) := x^{2} + y^{2}$, whose set of roots forms a paraboloid, is a convex function, and the point at which its value is minimized is $(x,y) = (0,0)$.

Proposition

Optimal solution of a convex function on a convex set

Let $f : \mathbb{R}^{n} \to \mathbb{R}$ be a differentiable convex function, and let the set $C \subset \mathbb{R}^{n}$ be a convex set. Then the following two statements are equivalent.

  • $\mathbf{x}^{\ast} \in C$ is a minimizer of $f$ on $C$, i.e., a minimizing optimal solution.
  • For all $\mathbf{x} \in C$, $$ \nabla f \left( \mathbf{x}^{\ast} \right) \left( \mathbf{x} - \mathbf{x}^{\ast} \right) \ge 0 $$

Proof 1

$(\implies)$

$$ \mathbf{x}(t) := \mathbf{x}^{\ast} + t \left( \mathbf{x} - \mathbf{x}^{\ast} \right) \in C \qquad , t \in [0,1] $$ Since $C$ is assumed to be a convex set, any interior point $\mathbf{x}(t)$ on the line segment between arbitrary $\mathbf{x} \in C$ and $\mathbf{x}^{\ast}$ can be represented as above. On the other hand,

  • Because $\mathbf{x}^{\ast} = \argmin_{\mathbf{x} \in C} f$, we have $f \left( \mathbf{x}(t) \right) - f \left( \mathbf{x}^{\ast} \right) \ge 0$ for all $\mathbf{x} \in C$, and
  • the partial derivative of $f \left( \mathbf{x}(t) \right)$ with respect to $t = 0$, by the chain rule, is $\nabla f \left( \mathbf{x}^{\ast} \right) \left( \mathbf{x} - \mathbf{x}^{\ast} \right)$,

$$ \begin{align*} & \nabla f \left( \mathbf{x}^{\ast} \right) \left( \mathbf{x} - \mathbf{x}^{\ast} \right) \\ =& \left[ {{ \partial } \over { \partial t }} f \left( \mathbf{x} (t) \right) \right]_{t = 0} \\ =& \lim_{t \to 0} {{ f \left( \mathbf{x} (t) \right) - f \left( \mathbf{x}^{\ast} \right) } \over { t }} \\ \ge & 0 \end{align*} $$ so we obtain $\nabla f \left( \mathbf{x}^{\ast} \right) \left( \mathbf{x} - \mathbf{x}^{\ast} \right) \ge 0$.


$(\impliedby)$

$$ f \left( (1-t) \mathbf{z} + t \mathbf{y} \right) \le (1-t) f \left( \mathbf{z} \right) + t f \left( \mathbf{y} \right) $$ If we differentiate both sides of $$ f \left( (1-t) \mathbf{z} + t \mathbf{y} \right) \le (1-t) f \left( \mathbf{z} \right) + t f \left( \mathbf{y} \right) $$ with respect to $t$, we get $$ \left( - \mathbf{z} + \mathbf{y} \right) \nabla f \left( (1-t) \mathbf{z} + t \mathbf{y} \right) \le - f \left( \mathbf{z} \right) + f \left( \mathbf{y} \right) $$ Therefore, when $t = 0$, we obtain $$ f \left( \mathbf{z} \right) + \nabla f \left( \mathbf{z} \right) \left( \mathbf{y} - \mathbf{z} \right) \le f \left( \mathbf{y} \right) $$ From the assumption $\nabla f \left( \mathbf{x}^{\ast} \right) \left( \mathbf{x} - \mathbf{x}^{\ast} \right) \ge 0$, we obtain for $\mathbf{y} = \mathbf{x}$ and $\mathbf{z} = \mathbf{x}^{\ast}$ the following. $$ \begin{align*} & f \left( \mathbf{x}^{\ast} \right) + 0 \\ \le & f \left( \mathbf{x}^{\ast} \right) + \nabla f \left( \mathbf{x}^{\ast} \right) \left( \mathbf{x} - \mathbf{x}^{\ast} \right) \\ \le & f \left( \mathbf{x} \right) \end{align*} $$ That is, $f \left( \mathbf{x}^{\ast} \right) \le f \left( \mathbf{x} \right)$, which holds for all $\mathbf{x} \in C$, hence $\mathbf{x}^{\ast}$ is a minimizer of $f$ on $C$.


  1. Matousek. (2007). Understanding and Using Linear Programming: p186. ↩︎