Proximal Gradient Method 📂Optimization

Proximal Gradient Method

Definition ¹

Let’s say the non-differentiable objective function $H(\mathbf{x}) : \mathbb{R}^{n} \to \mathbb{R}$ is decomposed into a differentiable function $f$ and a non-differentiable function $g$.

$$ H(\mathbf{x}) = f(\mathbf{x}) + g(\mathbf{x}) $$

The method of solving the optimization problem for $H$ using the following iterative algorithm is called the proximal gradient method.

$$ \mathbf{x}^{(k+1)} = \operatorname{prox}_{\lambda g}(\mathbf{x}^{(k)} - \lambda \nabla f(\mathbf{x}^{(k)})) $$

Explanation

It is called the proximal gradient method because it uses both gradient descent and proximal optimization.

Derivation

※ The method of deriving the algorithm by approximating $f(\mathbf{x})$ with Taylor’s theorem is not introduced because the formula development is thought to be not smooth and difficult to understand the algorithm formula.

The function we want to optimize is $H$, and we think of dividing it into a differentiable part $f$ and a non-differentiable part $g$. In other words, we want to optimize both $f$ and $g$ at the same time. However, since $f$ is differentiable, the next step’s optimal point can be obtained through gradient descent.

$$ \mathbf{x}_{f} = \mathbf{x} - \lambda \nabla f(\mathbf{x}) $$

A point that optimizes $g$ while not being far away can be obtained through proximal optimization.

$$ \mathbf{x}_{\ast} = \operatorname{prox}_{\lambda g}(\mathbf{x}_{f}) = \argmin\limits_{\mathbf{v}} \left\{ g(\mathbf{v}) + \dfrac{1}{2\lambda}\left\| \mathbf{v} - \mathbf{x}_{f} \right\|_{2}^{2} : \mathbf{v} \in \mathbb{R}^{n} \right\} $$

Therefore, we obtain the following formula for updating the optimal solution.

$$ \begin{align*} \mathbf{x}^{(k+1)} &= \operatorname{prox}_{\lambda g}(\mathbf{x}^{(k)} - \lambda \nabla f(\mathbf{x}^{(k)})) \\ &= \argmin\limits_{\mathbf{v}} \left\{ g(\mathbf{v}) + \dfrac{1}{2\lambda}\left\| \mathbf{v} - \left( \mathbf{x}^{(k)} - \lambda \nabla f(\mathbf{x}^{(k)}) \right) \right\|_{2}^{2} : \mathbf{v} \in \mathbb{R}^{n} \right\} \end{align*} $$

■

Parikh, Neal, and Stephen Boyd. “Proximal algorithms.” Foundations and trends® in Optimization 1.3 (2014), Ch4.2 Proximal gradient method ↩︎