logo

Optimization Theory: Method of Lagrange Multipliers 📂Optimization

Optimization Theory: Method of Lagrange Multipliers

Description

In nonlinear optimization problems that have a nonlinear objective function, a method called Lagrangian Multiplier Method involves multiplying constraints by something called Lagrangian Multipliers and integrating them into the objective function.

$$ \begin{matrix} \text{Maximize} & f(x) \\ \text{subject to} & g(x) = 0 \end{matrix} $$

For instance, if an optimization problem for a differentiable scalar function $f,g : \mathbb{R}^{p} \to \mathbb{R}$ is given as above, our goal is to find $x^{\ast} \in F$ that makes $f \left( x^{\ast} \right)$ the maximum within the feasible solution set $F = \left\{ x \in \mathbb{R}^{n} : g(x) = 0 \right\}$. The Lagrangian Multiplier Method is one of the solutions for this, which turns it into the next optimization problem by multiplying Lagrangian Multipliers $-\lambda$ to $g(x)$ and adding it to $f$, a Lagrangian Function $$ L \left( x, \lambda \right) := f(x) - \lambda g(x) $$ which is taken as the objective function. $$ \begin{matrix} \text{Maximize} & L \left( x, \lambda \right) \end{matrix} $$

This newly obtained optimization problem no longer has constraints, and this is not magic but math. In the process of optimizing $L$, the point where the rate of change of $L$ with respect to $\lambda$ becomes $0$, in fact, $$ {{ \partial L } \over { \partial \lambda }} = 0 \implies -g(x) = 0 $$ ends up satisfying the original constraints, allowing us to use optimization techniques that are constraint-free or have weaker constraints.

Examples

The distribution that maximizes entropy is the uniform distribution 1

Let’s verify through the Lagrangian Multiplier Method that the Shannon entropy is maximized in the uniform distribution. Suppose the probability distribution of a random variable $X$ is as follows regarding $\left( p_{1} , \cdots , p_{n} \right)$. $$ P\left( X = k \right) = p_{k} $$

Then, the optimization problem to find $\left( p_{1} , \cdots , p_{n} \right)$ that maximizes entropy can be written as follows.

$$ \begin{matrix} \text{Maximize} & - \sum_{k=1}^{n} p_{k} \log p_{k} \\ \text{subject to} & \sum_{k=1}^{n} p_{k} - 1 = 0 \\ & p_{k} \ge 0 \end{matrix} \\ k = 1, \cdots, n $$

The constraint $\displaystyle \sum_{k=1}^{n} p_{k} = 1$ is because the sum of all probabilities must be $1$. The original objective function $f \left( p_{1} , \cdots , p_{n} \right) = - \sum_{k=1}^{n} p_{k} \log p_{k}$ cannot indefinitely increase due to this constraint. The Lagrangian function for this problem is as follows. $$ L \left( p_{1} , \cdots , p_{n} , \lambda \right) = - \sum_{k=1}^{n} p_{k} \log p_{k} - \lambda \left( \sum_{k=1}^{n} p_{k} - 1 \right) $$ Then, when $L$ is partially differentiated with respect to $k = 1, \cdots , n$ into $p_{k}$ $$ {{ \partial L } \over { \partial p_{k} }} = - \log p_{k} + 1 - \lambda \implies \lambda = 1 - \log p_{k} $$ it implies $$ \begin{align*} & \lambda \\ =& 1 - \log p_{1} \\ & \vdots \\ =& 1 - \log p_{n} \end{align*} $$ thus, it must be $$ p_{1} = \cdots = p_{n} $$ Meanwhile, from the partial differentiation with respect to $\lambda$, $$ {{ \partial L } \over { \partial \lambda }} = 0 \implies \sum_{k=1}^{n} p_{k} = 1 $$ hence, the probability distribution that maximizes entropy precisely is $$ p_{1} = \cdots = p_{n} = {{ 1 } \over { n }} $$ In other words, the uniform distribution maximizes entropy.

Support Vector Machine

In machine learning, Support Vector Machine can emerge as a technique to find hyperplanes through the Lagrangian Multiplier Method. Refer to the following section:


  1. Luenberger. (2021). Linear and Nonlinear Programming (5th Edition): p370~371. ↩︎