Optimization Theory: Method of Lagrange Multipliers 📂Optimization

Optimization Theory: Method of Lagrange Multipliers

Description

In nonlinear optimization problems that have a nonlinear objective function, a method called Lagrangian Multiplier Method involves multiplying constraints by something called Lagrangian Multipliers and integrating them into the objective function.

$\begin{matrix} \text{Maximize} & f(x) \\ \text{subject to} & g(x) = 0 \end{matrix}$

For instance, if an optimization problem for a differentiable scalar function $f,g : \mathbb{R}^{p} \to \mathbb{R}$ is given as above, our goal is to find $x^{\ast} \in F$ that makes $f \left( x^{\ast} \right)$ the maximum within the feasible solution set $F = \left\{ x \in \mathbb{R}^{n} : g(x) = 0 \right\}$ . The Lagrangian Multiplier Method is one of the solutions for this, which turns it into the next optimization problem by multiplying Lagrangian Multipliers $-\lambda$ to $g(x)$ and adding it to $f$ , a Lagrangian Function $L \left( x, \lambda \right) := f(x) - \lambda g(x)$ which is taken as the objective function. $\begin{matrix} \text{Maximize} & L \left( x, \lambda \right) \end{matrix}$

This newly obtained optimization problem no longer has constraints, and this is not magic but math. In the process of optimizing $L$ , the point where the rate of change of $L$ with respect to $\lambda$ becomes $0$ , in fact, ${{ \partial L } \over { \partial \lambda }} = 0 \implies -g(x) = 0$ ends up satisfying the original constraints, allowing us to use optimization techniques that are constraint-free or have weaker constraints.

Examples

The distribution that maximizes entropy is the uniform distribution ¹

Let’s verify through the Lagrangian Multiplier Method that the Shannon entropy is maximized in the uniform distribution. Suppose the probability distribution of a random variable $X$ is as follows regarding $\left( p_{1} , \cdots , p_{n} \right)$ . $P\left( X = k \right) = p_{k}$

Then, the optimization problem to find $\left( p_{1} , \cdots , p_{n} \right)$ that maximizes entropy can be written as follows.

$\begin{matrix} \text{Maximize} & - \sum_{k=1}^{n} p_{k} \log p_{k} \\ \text{subject to} & \sum_{k=1}^{n} p_{k} - 1 = 0 \\ & p_{k} \ge 0 \end{matrix} \\ k = 1, \cdots, n$

The constraint $\displaystyle \sum_{k=1}^{n} p_{k} = 1$ is because the sum of all probabilities must be $1$ . The original objective function $f \left( p_{1} , \cdots , p_{n} \right) = - \sum_{k=1}^{n} p_{k} \log p_{k}$ cannot indefinitely increase due to this constraint. The Lagrangian function for this problem is as follows. $L \left( p_{1} , \cdots , p_{n} , \lambda \right) = - \sum_{k=1}^{n} p_{k} \log p_{k} - \lambda \left( \sum_{k=1}^{n} p_{k} - 1 \right)$ Then, when $L$ is partially differentiated with respect to $k = 1, \cdots , n$ into $p_{k}$ ${{ \partial L } \over { \partial p_{k} }} = - \log p_{k} + 1 - \lambda \implies \lambda = 1 - \log p_{k}$ it implies $\begin{align*} & \lambda \\ =& 1 - \log p_{1} \\ & \vdots \\ =& 1 - \log p_{n} \end{align*}$ thus, it must be $p_{1} = \cdots = p_{n}$ Meanwhile, from the partial differentiation with respect to $\lambda$ , ${{ \partial L } \over { \partial \lambda }} = 0 \implies \sum_{k=1}^{n} p_{k} = 1$ hence, the probability distribution that maximizes entropy precisely is $p_{1} = \cdots = p_{n} = {{ 1 } \over { n }}$ In other words, the uniform distribution maximizes entropy.

■

Support Vector Machine

In machine learning, Support Vector Machine can emerge as a technique to find hyperplanes through the Lagrangian Multiplier Method. Refer to the following section:

Support Vector Machine/Kernel Trick

Luenberger. (2021). Linear and Nonlinear Programming (5th Edition): p370~371. ↩︎