Proximal Gradient Method
Definition 1
Let’s say the non-differentiable objective function is decomposed into a differentiable function and a non-differentiable function .
The method of solving the optimization problem for using the following iterative algorithm is called the proximal gradient method.
Explanation
It is called the proximal gradient method because it uses both gradient descent and proximal optimization.
Derivation
※ The method of deriving the algorithm by approximating with Taylor’s theorem is not introduced because the formula development is thought to be not smooth and difficult to understand the algorithm formula.
The function we want to optimize is , and we think of dividing it into a differentiable part and a non-differentiable part . In other words, we want to optimize both and at the same time. However, since is differentiable, the next step’s optimal point can be obtained through gradient descent.
A point that optimizes while not being far away can be obtained through proximal optimization.
Therefore, we obtain the following formula for updating the optimal solution.
■
Parikh, Neal, and Stephen Boyd. “Proximal algorithms.” Foundations and trends® in Optimization 1.3 (2014), Ch4.2 Proximal gradient method ↩︎