logo

Definition of a General Convex Function 📂Functions

Definition of a General Convex Function

Definitions

A function $f : V \to \mathbb{R}$ defined in a vector space $V$ is called a convex function if it satisfies the following for all $\mathbf{x}, \mathbf{y} \in V$ and all $t \in [0,1]$. $$ f \left( (1-t) \mathbf{x} + t \mathbf{y} \right) \le (1-t) f \left( \mathbf{x} \right) + t f \left( \mathbf{y} \right) $$

Explanation

In fact, once you study mathematics beyond the undergraduate level, you stop using terms like concave or convex, except the term convex is used almost exclusively. The convex function introduced above is a generalization of a convex function defined in Euclidean space $X = \mathbb{R}$.

Example

The set of roots forming a parabola $f(x,y) := x^{2} + y^{2}$ is a convex function, and the point where its value is minimized is $(x,y) = (0,0)$.

Theorem

Optimal Solution of a Convex Function in a Convex Set

If $f : \mathbb{R}^{n} \to \mathbb{R}$ is a differentiable convex function, and the convex set $C \subset \mathbb{R}^{n}$ is a convex set, then the following two are equivalent:

  • $\mathbf{x}^{\ast} \in C$ is the optimal solution that minimizes $f$ in $C$.
  • For all $\mathbf{x} \in C$ $$ \nabla f \left( \mathbf{x}^{\ast} \right) \left( \mathbf{x} - \mathbf{x}^{\ast} \right) \ge 0 $$

Proof 1

$(\implies)$

$$ \mathbf{x}(t) := \mathbf{x}^{\ast} + t \left( \mathbf{x} - \mathbf{x}^{\ast} \right) \in C \qquad , t \in [0,1] $$ Since $C$ is assumed to be a convex set, any convex combination $\mathbf{x}(t)$ of $\mathbf{x} \in C$ and $\mathbf{x}^{\ast}$ can be represented as above. Meanwhile,

  • Since $\mathbf{x}^{\ast} = \argmin_{\mathbf{x} \in C} f$, for all $\mathbf{x} \in C$ $f \left( \mathbf{x}(t) \right) - f \left( \mathbf{x}^{\ast} \right) \ge 0$, and
  • The partial derivative at $t = 0$ of $f \left( \mathbf{x}(t) \right)$ according to the chain rule is $\nabla f \left( \mathbf{x}^{\ast} \right) \left( \mathbf{x} - \mathbf{x}^{\ast} \right)$,

$$ \begin{align*} & \nabla f \left( \mathbf{x}^{\ast} \right) \left( \mathbf{x} - \mathbf{x}^{\ast} \right) \\ =& \left[ {{ \partial } \over { \partial t }} f \left( \mathbf{x} (t) \right) \right]_{t = 0} \\ =& \lim_{t \to 0} {{ f \left( \mathbf{x} (t) \right) - f \left( \mathbf{x}^{\ast} \right) } \over { t }} \\ \ge & 0 \end{align*} $$ which results in $\nabla f \left( \mathbf{x}^{\ast} \right) \left( \mathbf{x} - \mathbf{x}^{\ast} \right) \ge 0$.


$(\impliedby)$

$$ f \left( (1-t) \mathbf{z} + t \mathbf{y} \right) \le (1-t) f \left( \mathbf{z} \right) + t f \left( \mathbf{y} \right) $$ Differentiating both sides with respect to $t$ yields $$ \left( - \mathbf{z} + \mathbf{y} \right) \nabla f \left( (1-t) \mathbf{z} + t \mathbf{y} \right) \le - f \left( \mathbf{z} \right) + f \left( \mathbf{y} \right) $$ thus, when $t = 0$, the following is obtained: $$ f \left( \mathbf{z} \right) + \nabla f \left( \mathbf{z} \right) \left( \mathbf{y} - \mathbf{z} \right) \le f \left( \mathbf{y} \right) $$ From the assumption, since $\nabla f \left( \mathbf{x}^{\ast} \right) \left( \mathbf{x} - \mathbf{x}^{\ast} \right) \ge 0$, the following is obtained for $\mathbf{y} = \mathbf{x}$ and $\mathbf{z} = \mathbf{x}^{\ast}$: $$ \begin{align*} & f \left( \mathbf{x}^{\ast} \right) + 0 \\ \le & f \left( \mathbf{x}^{\ast} \right) + \nabla f \left( \mathbf{x}^{\ast} \right) \left( \mathbf{x} - \mathbf{x}^{\ast} \right) \\ \le & f \left( \mathbf{x} \right) \end{align*} $$ This means $f \left( \mathbf{x}^{\ast} \right) \le f \left( \mathbf{x} \right)$, which holds for all $\mathbf{x} \in C$, so $\mathbf{x}^{\ast}$ becomes the minimal solution of $f$ in $C$.


  1. Matousek. (2007). Understanding and Using Linear Programming: p186. ↩︎