logo

Proximal Alternating Linearized Minimization Algorithm (PALM) 📂Optimization

Proximal Alternating Linearized Minimization Algorithm (PALM)

Overview

Jérôme Bolte, Shoham Sabach, and Marc Teboulle introduced an optimization technique called Proximal Alternating Linearized Minimization (PALM) algorithm in their paper Proximal alternating linearized minimization for nonconvex and nonsmooth problems.

Algorithm

The method to solve optimization problems like (1)(1) is called the Proximal Alternating Linearized Minimization (PALM) algorithm.

This, in simple terms, means performing alternating optimization for two variables using the proximal gradient method.

Description1 2

According to the abstract of the paper, the PALM algorithm is introduced as a method to solve optimization problems that are nonconvex and nonsmooth. Consider the following optimization problem.

minimizex,yΨ(x,y)=f(x)+g(y)+H(x,y)(1) \operatornamewithlimits{minimize}_{x, y} \Psi (x, y) = f(x) + g(y) + H(x, y) \tag{1}

Here xRn,yRmx \in \mathbb{R}^{n}, y \in \mathbb{R}^{m} and H:Rn×RmRH : \mathbb{R}^{n} \times \mathbb{R}^{m} \to \mathbb{R} is the C1C^{1} function. Also, let’s say f,g,Hf, g, H is not convex.

Linearized

The Linearized in the algorithm’s name means that we intend to approximate HH linearly. Applying Taylor’s theorem separately to the two variables xx and yy, we can approximate HH as follows.

H(x,y)H(x(k),y)+(xx(k))TxH(x(k),y)=H(x(k),y)+xx(k),xH(x(k),y) \begin{align*} H(x, y) &\approx H(x^{(k)}, y) + (x - x^{(k)})^{T} \nabla_{x}H(x^{(k)},y) \\ &= H(x^{(k)} ,y) + \langle x - x^{(k)}, \nabla_{x}H(x^{(k)},y) \rangle \tag{2} \\ % &\approx \langle x - x^{(k)}, \nabla_{x}H(x^{(k)},y) \rangle \\ \end{align*}

H(x,y)H(x,y(k))+(yy(k))TyH(x,y(k))=H(x,y(k))+yy(k),yH(x,y(k)) \begin{align*} H(x, y) &\approx H(x, y^{(k)}) + (y - y^{(k)})^{T} \nabla_{y}H(x,y^{(k)}) \\ &= H(x, y^{(k)}) + \langle y - y^{(k)}, \nabla_{y}H(x,y^{(k)}) \rangle \tag{3} \\ % &\approx \langle x - x^{(k)}, \nabla_{x}H(x^{(k)},y) \rangle \\ \end{align*}

Now, let’s say Ψ\Psi approximated HH as (2)(2), and Ψ^\widehat{\Psi} and (3)(3) approximated it as Φ\Phi.

Ψ^(x,y)=f(x)+g(y)+H(x(k),y)+xx(k),xH(x(k),y)Φ(x,y)=f(x)+g(y)+H(x,y(k))+yy(k),yH(x,y(k)) \begin{align*} \widehat{\Psi}(x,y) &= f(x) + g(y) + H(x^{(k)} ,y) + \langle x - x^{(k)}, \nabla_{x}H(x^{(k)},y) \rangle \\ \Phi (x,y) &= f(x) + g(y) + H(x, y^{(k)}) + \langle y - y^{(k)}, \nabla_{y}H(x,y^{(k)}) \rangle \end{align*}

As seen below, the result would be the same even if we put it this way. But let’s proceed assuming it’s as above.

Ψ^(x,y)=f(x)+xx(k),xH(x(k),y)Φ(x,y)=g(y)+yy(k),yH(x,y(k)) \begin{align*} \widehat{\Psi}(x,y) &= f(x) + \langle x - x^{(k)}, \nabla_{x}H(x^{(k)},y) \rangle \\ \Phi (x,y) &= g(y) + \langle y - y^{(k)}, \nabla_{y}H(x,y^{(k)}) \rangle \end{align*}

Alternating

Now, applying linear approximation and then alternating optimization of the two variables in Ψ\Psi using the alternating algorithm, problem (1)(1) divides into the following two sub-problems.

{x(k+1)=arg minxΨ^(x,y(k))=arg minxf(x)+g(y(k))+H(x(k),y(k))+xx(k),xH(x(k),y(k))y(k+1)=arg minyΦ(x(k+1),y)=arg minyf(x(k+1))+g(y)+H(x(k+1),y(k))+yy(k),yH(x(k+1),y(k)) \begin{cases} x^{(k+1)} &= \argmin\limits_{x} \widehat{\Psi}(x, y^{(k)}) \\ &= \argmin\limits_{x} f(x) + g(y^{(k)}) + H(x^{(k)} ,y^{(k)}) + \langle x - x^{(k)}, \nabla_{x}H(x^{(k)},y^{(k)}) \rangle \\ \\ y^{(k+1)} &= \argmin\limits_{y} \Phi (x^{(k+1)}, y) \\ &= \argmin\limits_{y} f(x^{(k+1)}) + g(y) + H(x^{(k+1)}, y^{(k)}) + \langle y - y^{(k)}, \nabla_{y}H(x^{(k+1)},y^{(k)}) \rangle \\ \end{cases}

In the first equation, the second and third terms do not affect the optimization, so they can be removed. Similarly, in the second equation, removing the first and third terms leaves the problem unchanged. Thus, the algorithm can be rewritten as follows.

{x(k+1)=arg minxf(x)+xx(k),xH(x(k),y(k))y(k+1)=arg minyg(y)+yy(k),yH(x(k+1),y(k))(4) \begin{cases} x^{(k+1)} = \argmin\limits_{x} f(x) + \langle x - x^{(k)}, \nabla_{x}H(x^{(k)},y^{(k)}) \rangle \\ y^{(k+1)} = \argmin\limits_{y} g(y) + \langle y - y^{(k)}, \nabla_{y}H(x^{(k+1)},y^{(k)}) \rangle \\ \end{cases} \tag{4}

Proximal

Now, if we add a term for proximal optimization to (4)(4), it becomes as follows.

{x(k+1)=arg minx{f(x)+xx(k),xH(x(k),y(k))+c2xx(k)22}y(k+1)=arg miny{g(y)+yy(k),yH(x(k+1),y(k))+d2yy(k)22}(5) \begin{cases} x^{(k+1)} = \argmin\limits_{x} \left\{ f(x) + \langle x - x^{(k)}, \nabla_{x}H(x^{(k)},y^{(k)}) \rangle + \dfrac{c}{2}\left\| x - x^{(k)} \right\|_{2}^{2} \right\} \tag{5} \\ y^{(k+1)} = \argmin\limits_{y} \left\{ g(y) + \langle y - y^{(k)}, \nabla_{y}H(x^{(k+1)},y^{(k)}) \rangle + \dfrac{d}{2}\left\| y - y^{(k)} \right\|_{2}^{2} \right\} \\ \end{cases}

  • Explanation 1

    Let’s combine the second and third terms in the first equation as follows.

    ψ(x)=xx(k),xH(x(k),y(k))+c2xx(k)22 \psi (x) = \langle x - x^{(k)}, \nabla_{x}H(x^{(k)},y^{(k)}) \rangle + \dfrac{c}{2}\left\| x - x^{(k)} \right\|_{2}^{2}

    If we find the point where ψ\psi is minimized,

    xψ(x)=0    xH(x(k),y(k))+c(xx(k))=0    x=x(k)1cxH(x(k),y(k)) \begin{align*} \nabla_{x}\psi (x) = 0 &\implies \nabla_{x}H(x^{(k)}, y^{(k)}) + c(x - x^{(k)}) =0 \\ &\implies x = x^{(k)} - \dfrac{1}{c}\nabla_{x}H(x^{(k)}, y^{(k)}) \\ \end{align*}

    The point that is not too far from this point and also minimizes ff can be represented by a proximal operator. Therefore, x(k+1)x^{(k+1)} is updated as follows.

    x(k+1)prox1cf(x(k)1cxH(x(k),y(k))) x^{(k+1)} \in \operatornamewithlimits{prox}_{\frac{1}{c}f} \left( x^{(k)} - \dfrac{1}{c}\nabla_{x}H(x^{(k)}, y^{(k)}) \right)

    This is the same as the proximal gradient method.

  • Explanation 2

    The optimization problem (5)(5) can be shown to be the same as proxcf(x(k)1cxH(x(k),y(k)))\operatornamewithlimits{prox}_{cf} \left( x^{(k)} - \dfrac{1}{c}\nabla_{x}H(x^{(k)}, y^{(k)}) \right) as follows.

    prox1cf(x(k)1cxH(x(k),y(k)))=arg minx{f(x)+c2x(x(k)1cxH(x(k),y(k)))22}=arg minx{f(x)+c2(xx(k))+1cxH(x(k),y(k))22}=arg minx{f(x)+c2(xx(k))22+(xx(k))TxH(x(k),y(k))+1c2xH(x(k),y(k))22}=arg minx{f(x)+c2(xx(k))22+xx(k),xH(x(k),y(k))} \begin{align*} & \operatornamewithlimits{prox}_{\frac{1}{c}f} \left( x^{(k)} - \dfrac{1}{c}\nabla_{x}H(x^{(k)}, y^{(k)}) \right) \\ &= \argmin\limits_{x} \left\{ f(x) + \dfrac{c}{2} \left\| x - \left( x^{(k)} - \dfrac{1}{c}\nabla_{x}H(x^{(k)}, y^{(k)}) \right) \right\|_{2}^{2} \right\} \\ &= \argmin\limits_{x} \left\{ f(x) + \dfrac{c}{2} \left\| (x - x^{(k)}) + \dfrac{1}{c}\nabla_{x}H(x^{(k)}, y^{(k)}) \right\|_{2}^{2} \right\} \\ &= \argmin\limits_{x} \left\{ f(x) + \dfrac{c}{2} \left\| (x - x^{(k)}) \right\|_{2}^{2} + (x - x^{(k)})^{T}\nabla_{x}H(x^{(k)}, y^{(k)}) + \dfrac{1}{c^{2}}\left\| \nabla_{x}H(x^{(k)}, y^{(k)}) \right\|_{2}^{2} \right\} \\ &= \argmin\limits_{x} \left\{ f(x) + \dfrac{c}{2} \left\| (x - x^{(k)}) \right\|_{2}^{2} + \langle x - x^{(k)}, \nabla_{x}H(x^{(k)}, y^{(k)}) \rangle \right\} \\ \end{align*}

Following the explanations above, we finally obtain the following iterative algorithm.

{x(k+1)prox1cf(x(k)1cxH(x(k),y(k)))y(k+1)prox1dg(y(k)1dyH(x(k+1),y(k))) \begin{cases} x^{(k+1)} \in \operatornamewithlimits{prox}_{\frac{1}{c}f} \left( x^{(k)} - \dfrac{1}{c}\nabla_{x}H(x^{(k)}, y^{(k)}) \right) \\ y^{(k+1)} \in \operatornamewithlimits{prox}_{\frac{1}{d}g} \left( y^{(k)} - \dfrac{1}{d}\nabla_{y}H(x^{(k+1)}, y^{(k)}) \right) \\ \end{cases}


  1. Bolte, Jérôme, Shoham Sabach, and Marc Teboulle. “Proximal alternating linearized minimization for nonconvex and nonsmooth problems.” Mathematical Programming 146.1-2 (2014): 459-494. ↩︎

  2. https://angms.science/doc/NCVX/PALM0.pdf ↩︎