Definition of a General Convex Function
Definitions
A function $f : V \to \mathbb{R}$ defined in a vector space $V$ is called a convex function if it satisfies the following for all $\mathbf{x}, \mathbf{y} \in V$ and all $t \in [0,1]$. $$ f \left( (1-t) \mathbf{x} + t \mathbf{y} \right) \le (1-t) f \left( \mathbf{x} \right) + t f \left( \mathbf{y} \right) $$
Explanation
In fact, once you study mathematics beyond the undergraduate level, you stop using terms like concave or convex, except the term convex is used almost exclusively. The convex function introduced above is a generalization of a convex function defined in Euclidean space $X = \mathbb{R}$.
Example
The set of roots forming a parabola $f(x,y) := x^{2} + y^{2}$ is a convex function, and the point where its value is minimized is $(x,y) = (0,0)$.
Theorem
Optimal Solution of a Convex Function in a Convex Set
If $f : \mathbb{R}^{n} \to \mathbb{R}$ is a differentiable convex function, and the convex set $C \subset \mathbb{R}^{n}$ is a convex set, then the following two are equivalent:
- $\mathbf{x}^{\ast} \in C$ is the optimal solution that minimizes $f$ in $C$.
- For all $\mathbf{x} \in C$ $$ \nabla f \left( \mathbf{x}^{\ast} \right) \left( \mathbf{x} - \mathbf{x}^{\ast} \right) \ge 0 $$
Proof 1
$(\implies)$
$$ \mathbf{x}(t) := \mathbf{x}^{\ast} + t \left( \mathbf{x} - \mathbf{x}^{\ast} \right) \in C \qquad , t \in [0,1] $$ Since $C$ is assumed to be a convex set, any convex combination $\mathbf{x}(t)$ of $\mathbf{x} \in C$ and $\mathbf{x}^{\ast}$ can be represented as above. Meanwhile,
- Since $\mathbf{x}^{\ast} = \argmin_{\mathbf{x} \in C} f$, for all $\mathbf{x} \in C$ $f \left( \mathbf{x}(t) \right) - f \left( \mathbf{x}^{\ast} \right) \ge 0$, and
- The partial derivative at $t = 0$ of $f \left( \mathbf{x}(t) \right)$ according to the chain rule is $\nabla f \left( \mathbf{x}^{\ast} \right) \left( \mathbf{x} - \mathbf{x}^{\ast} \right)$,
$$ \begin{align*} & \nabla f \left( \mathbf{x}^{\ast} \right) \left( \mathbf{x} - \mathbf{x}^{\ast} \right) \\ =& \left[ {{ \partial } \over { \partial t }} f \left( \mathbf{x} (t) \right) \right]_{t = 0} \\ =& \lim_{t \to 0} {{ f \left( \mathbf{x} (t) \right) - f \left( \mathbf{x}^{\ast} \right) } \over { t }} \\ \ge & 0 \end{align*} $$ which results in $\nabla f \left( \mathbf{x}^{\ast} \right) \left( \mathbf{x} - \mathbf{x}^{\ast} \right) \ge 0$.
$(\impliedby)$
$$ f \left( (1-t) \mathbf{z} + t \mathbf{y} \right) \le (1-t) f \left( \mathbf{z} \right) + t f \left( \mathbf{y} \right) $$ Differentiating both sides with respect to $t$ yields $$ \left( - \mathbf{z} + \mathbf{y} \right) \nabla f \left( (1-t) \mathbf{z} + t \mathbf{y} \right) \le - f \left( \mathbf{z} \right) + f \left( \mathbf{y} \right) $$ thus, when $t = 0$, the following is obtained: $$ f \left( \mathbf{z} \right) + \nabla f \left( \mathbf{z} \right) \left( \mathbf{y} - \mathbf{z} \right) \le f \left( \mathbf{y} \right) $$ From the assumption, since $\nabla f \left( \mathbf{x}^{\ast} \right) \left( \mathbf{x} - \mathbf{x}^{\ast} \right) \ge 0$, the following is obtained for $\mathbf{y} = \mathbf{x}$ and $\mathbf{z} = \mathbf{x}^{\ast}$: $$ \begin{align*} & f \left( \mathbf{x}^{\ast} \right) + 0 \\ \le & f \left( \mathbf{x}^{\ast} \right) + \nabla f \left( \mathbf{x}^{\ast} \right) \left( \mathbf{x} - \mathbf{x}^{\ast} \right) \\ \le & f \left( \mathbf{x} \right) \end{align*} $$ This means $f \left( \mathbf{x}^{\ast} \right) \le f \left( \mathbf{x} \right)$, which holds for all $\mathbf{x} \in C$, so $\mathbf{x}^{\ast}$ becomes the minimal solution of $f$ in $C$.
■
Matousek. (2007). Understanding and Using Linear Programming: p186. ↩︎