logo

Optimization Theory: Method of Lagrange Multipliers 📂Optimization

Optimization Theory: Method of Lagrange Multipliers

Description

In nonlinear optimization problems that have a nonlinear objective function, a method called Lagrangian Multiplier Method involves multiplying constraints by something called Lagrangian Multipliers and integrating them into the objective function.

Maximizef(x)subject tog(x)=0 \begin{matrix} \text{Maximize} & f(x) \\ \text{subject to} & g(x) = 0 \end{matrix}

For instance, if an optimization problem for a differentiable scalar function f,g:RpRf,g : \mathbb{R}^{p} \to \mathbb{R} is given as above, our goal is to find xFx^{\ast} \in F that makes f(x)f \left( x^{\ast} \right) the maximum within the feasible solution set F={xRn:g(x)=0}F = \left\{ x \in \mathbb{R}^{n} : g(x) = 0 \right\}. The Lagrangian Multiplier Method is one of the solutions for this, which turns it into the next optimization problem by multiplying Lagrangian Multipliers λ-\lambda to g(x)g(x) and adding it to ff, a Lagrangian Function L(x,λ):=f(x)λg(x) L \left( x, \lambda \right) := f(x) - \lambda g(x) which is taken as the objective function. MaximizeL(x,λ) \begin{matrix} \text{Maximize} & L \left( x, \lambda \right) \end{matrix}

This newly obtained optimization problem no longer has constraints, and this is not magic but math. In the process of optimizing LL, the point where the rate of change of LL with respect to λ\lambda becomes 00, in fact, Lλ=0    g(x)=0 {{ \partial L } \over { \partial \lambda }} = 0 \implies -g(x) = 0 ends up satisfying the original constraints, allowing us to use optimization techniques that are constraint-free or have weaker constraints.

Examples

The distribution that maximizes entropy is the uniform distribution 1

Let’s verify through the Lagrangian Multiplier Method that the Shannon entropy is maximized in the uniform distribution. Suppose the probability distribution of a random variable XX is as follows regarding (p1,,pn)\left( p_{1} , \cdots , p_{n} \right). P(X=k)=pk P\left( X = k \right) = p_{k}

Then, the optimization problem to find (p1,,pn)\left( p_{1} , \cdots , p_{n} \right) that maximizes entropy can be written as follows.

Maximizek=1npklogpksubject tok=1npk1=0pk0k=1,,n \begin{matrix} \text{Maximize} & - \sum_{k=1}^{n} p_{k} \log p_{k} \\ \text{subject to} & \sum_{k=1}^{n} p_{k} - 1 = 0 \\ & p_{k} \ge 0 \end{matrix} \\ k = 1, \cdots, n

The constraint k=1npk=1\displaystyle \sum_{k=1}^{n} p_{k} = 1 is because the sum of all probabilities must be 11. The original objective function f(p1,,pn)=k=1npklogpkf \left( p_{1} , \cdots , p_{n} \right) = - \sum_{k=1}^{n} p_{k} \log p_{k} cannot indefinitely increase due to this constraint. The Lagrangian function for this problem is as follows. L(p1,,pn,λ)=k=1npklogpkλ(k=1npk1) L \left( p_{1} , \cdots , p_{n} , \lambda \right) = - \sum_{k=1}^{n} p_{k} \log p_{k} - \lambda \left( \sum_{k=1}^{n} p_{k} - 1 \right) Then, when LL is partially differentiated with respect to k=1,,nk = 1, \cdots , n into pkp_{k} Lpk=logpk+1λ    λ=1logpk {{ \partial L } \over { \partial p_{k} }} = - \log p_{k} + 1 - \lambda \implies \lambda = 1 - \log p_{k} it implies λ=1logp1=1logpn \begin{align*} & \lambda \\ =& 1 - \log p_{1} \\ & \vdots \\ =& 1 - \log p_{n} \end{align*} thus, it must be p1==pn p_{1} = \cdots = p_{n} Meanwhile, from the partial differentiation with respect to λ\lambda, Lλ=0    k=1npk=1 {{ \partial L } \over { \partial \lambda }} = 0 \implies \sum_{k=1}^{n} p_{k} = 1 hence, the probability distribution that maximizes entropy precisely is p1==pn=1n p_{1} = \cdots = p_{n} = {{ 1 } \over { n }} In other words, the uniform distribution maximizes entropy.

Support Vector Machine

In machine learning, Support Vector Machine can emerge as a technique to find hyperplanes through the Lagrangian Multiplier Method. Refer to the following section:


  1. Luenberger. (2021). Linear and Nonlinear Programming (5th Edition): p370~371. ↩︎