logo

Proximal Operator 📂Optimization

Proximal Operator

Definition1

Let X\left\| \cdot \right\|_{X} be the norm of the vector space XX. The proximal operator proxλf:X2X\operatorname{prox}_{\lambda f} : X \to 2^{X} for a function f:XRf : X \to \mathbb{R} is defined as follows, for λ>0\lambda > 0,

proxλf(x):=arg minv{λf(v)+12vxX2:vX}=arg minv{f(v)+12λvxX2:vX} \begin{align} \operatorname{prox}_{\lambda f} (\mathbf{x}) &:= \argmin\limits_{\mathbf{v}} \left\{ \lambda f(\mathbf{v}) + \dfrac{1}{2}\left\| \mathbf{v} - \mathbf{x} \right\|_{X}^{2} : \mathbf{v} \in X \right\} \\ &= \argmin\limits_{\mathbf{v}} \left\{ f(\mathbf{v}) + \dfrac{1}{2\lambda}\left\| \mathbf{v} - \mathbf{x} \right\|_{X}^{2} : \mathbf{v} \in X \right\} \nonumber \end{align}

arg min\argmin represents the optimal solution.

Simplified Definition

If the above definition is difficult, it can be understood as follows. The proximal operator proxλf:RnRn\operatorname{prox}_{\lambda f} : \mathbb{R}^{n} \to \mathbb{R}^{n} for a function f:RnRf : \mathbb{R}^{n} \to \mathbb{R} is defined as:

proxλf(x):=arg minv{λf(v)+12vx22:vRn} \operatorname{prox}_{\lambda f} (\mathbf{x}) := \argmin\limits_{\mathbf{v}} \left\{ \lambda f(\mathbf{v}) + \dfrac{1}{2}\left\| \mathbf{v} - \mathbf{x} \right\|_{2}^{2} : \mathbf{v} \in \mathbb{R}^{n} \right\}

Explanation 2

According to definition (1)(1), proxλf(x)\operatorname{prox}_{\lambda f} (\mathbf{x}) means compromising between minimizing the function value of ff and getting closer to x\mathbf{x}. In other words, it can be said that it is a point that makes the value of ff the smallest in the vicinity of x\mathbf{x}. It can be seen as adding a kind of regularization to the problem of optimizing ff. Depending on the value of λ\lambda, it is determined which term is given more value in reduction. Point proxλf(x)\operatorname{prox}_{\lambda f} (\mathbf{x}) is called the proximal point for ff with respect to x\mathbf{x}.

Proximal means ‘close’ or ’nearby’, and proxλf(x)\operatorname{prox}_{\lambda f} (\mathbf{x}) seems to be named proximal because it finds the optimal solution of ff among the points close to x\mathbf{x}.

Functions ff that are easy to calculate with proxλf()\operatorname{prox}_{\lambda f} (\cdot) are called prox friendly.

In the case of X=RnX = \mathbb{R}^{n}, if ff is a convex function, then proxλf(x)\operatorname{prox}_{\lambda f} (\mathbf{x}) is unique.

Algorithms that solve optimization problems using the proximal operator are called proximal algorithms. These include:

Properties3

  • Separable sum: If the function ff is separable as f(u,v)=ϕ(u)+ψ(v)f(\mathbf{u}, \mathbf{v}) = \phi (\mathbf{u}) + \psi (\mathbf{v}), then the following holds: proxλf(x,y)=(proxλϕ(x),proxλψ(y)) \operatorname{prox}_{\lambda f} (\mathbf{x}, \mathbf{y}) = (\operatorname{prox}_{\lambda \phi} (\mathbf{x}), \operatorname{prox}_{\lambda \psi} (\mathbf{y})) If it is completely separable for more than one variable as f(x)=infi(xi)f(x) = \sum_{i}^{n}f_{i}(x_{i}), then, (proxλf(x))i=proxλfi(xi) (\operatorname{prox}_{\lambda f} (x))_{i} = \operatorname{prox}_{\lambda f_{i}} (x_{i})
  • Postcomposition: For f(x)=αϕ(x)+bf(\mathbf{x}) = \alpha \phi (\mathbf{x}) + b and α>0\alpha \gt 0, the following holds: proxλf(x)=proxαλϕ(x) \operatorname{prox}_{\lambda f} (\mathbf{x}) = \operatorname{prox}_{\alpha \lambda \phi} (\mathbf{x})
  • Precomposition: For f(x)=ϕ(αx+b)f(\mathbf{x}) = \phi (\alpha \mathbf{x} + b) and α0\alpha \ne 0, the following holds: proxλf(x)=1α(proxα2λϕ(αx+b)b) \operatorname{prox}_{\lambda f} (\mathbf{x}) = \dfrac{1}{\alpha}\left( \operatorname{prox}_{\alpha^{2} \lambda \phi} (\alpha \mathbf{x} + b) - b \right) If f(x)=ϕ(Qx)f(\mathbf{x}) = \phi (Q \mathbf{x}) and QQ is orthogonal, then the following holds: proxλf(x)=QTproxλϕ(Qx) \operatorname{prox}_{\lambda f} (\mathbf{x}) = Q^{T}\operatorname{prox}_{\lambda \phi} (Q\mathbf{x})
  • Affine addition: For f(x)=ϕ(x)+aTx+bf(\mathbf{x}) = \phi (\mathbf{x}) + \mathbf{a}^T\mathbf{x} + b, the following holds: proxλf(x)=proxλϕ(xλa) \operatorname{prox}_{\lambda f} (\mathbf{x}) = \operatorname{prox}_{\lambda \phi} (\mathbf{x} - \lambda \mathbf{a})
  • Regularization: For f(x)=ϕ(x)+ρ2xa22f(\mathbf{x}) = \phi (\mathbf{x}) + \frac{\rho}{2} \left\| \mathbf{x} - \mathbf{a} \right\|_{2}^{2}, the following holds: proxλf(x)=proxλ~ϕ((λ~/λ)x+(ρλ~)a) \operatorname{prox}_{\lambda f} (\mathbf{x}) = \operatorname{prox}_{\tilde{\lambda} \phi} ((\tilde{\lambda}/\lambda)\mathbf{x} + (\rho \tilde{\lambda}) \mathbf{a}) Here, λ~=λ/(1+λρ)\tilde{\lambda} = \lambda / (1+\lambda \rho).

  1. Parikh, Neal, and Stephen Boyd. “Proximal algorithms.” Foundations and trends® in Optimization 1.3 (2014), p3. ↩︎

  2. https://web.eecs.umich.edu/~fessler/course/598/l/n-05-prox.pdf ↩︎

  3. https://web.stanford.edu/~boyd/papers/pdf/prox_algs.pdf Ch2 Properties ↩︎