Proximal Operator

Definition¹

Let $\left\| \cdot \right\|_{X}$ be the norm of the vector space $X$ . The proximal operator $\operatorname{prox}_{\lambda f} : X \to 2^{X}$ for a function $f : X \to \mathbb{R}$ is defined as follows, for $\lambda > 0$ ,

$\begin{align} \operatorname{prox}_{\lambda f} (\mathbf{x}) &:= \argmin\limits_{\mathbf{v}} \left\{ \lambda f(\mathbf{v}) + \dfrac{1}{2}\left\| \mathbf{v} - \mathbf{x} \right\|_{X}^{2} : \mathbf{v} \in X \right\} \\ &= \argmin\limits_{\mathbf{v}} \left\{ f(\mathbf{v}) + \dfrac{1}{2\lambda}\left\| \mathbf{v} - \mathbf{x} \right\|_{X}^{2} : \mathbf{v} \in X \right\} \nonumber \end{align}$

$\argmin$ represents the optimal solution.

Simplified Definition

If the above definition is difficult, it can be understood as follows. The proximal operator $\operatorname{prox}_{\lambda f} : \mathbb{R}^{n} \to \mathbb{R}^{n}$ for a function $f : \mathbb{R}^{n} \to \mathbb{R}$ is defined as:

$\operatorname{prox}_{\lambda f} (\mathbf{x}) := \argmin\limits_{\mathbf{v}} \left\{ \lambda f(\mathbf{v}) + \dfrac{1}{2}\left\| \mathbf{v} - \mathbf{x} \right\|_{2}^{2} : \mathbf{v} \in \mathbb{R}^{n} \right\}$

Explanation ²

According to definition $(1)$ , $\operatorname{prox}_{\lambda f} (\mathbf{x})$ means compromising between minimizing the function value of $f$ and getting closer to $\mathbf{x}$ . In other words, it can be said that it is a point that makes the value of $f$ the smallest in the vicinity of $\mathbf{x}$ . It can be seen as adding a kind of regularization to the problem of optimizing $f$ . Depending on the value of $\lambda$ , it is determined which term is given more value in reduction. Point $\operatorname{prox}_{\lambda f} (\mathbf{x})$ is called the proximal point for $f$ with respect to $\mathbf{x}$ .

Proximal means ‘close’ or ’nearby’, and $\operatorname{prox}_{\lambda f} (\mathbf{x})$ seems to be named proximal because it finds the optimal solution of $f$ among the points close to $\mathbf{x}$ .

Functions $f$ that are easy to calculate with $\operatorname{prox}_{\lambda f} (\cdot)$ are called prox friendly.

In the case of $X = \mathbb{R}^{n}$ , if $f$ is a convex function, then $\operatorname{prox}_{\lambda f} (\mathbf{x})$ is unique.

Algorithms that solve optimization problems using the proximal operator are called proximal algorithms. These include:

Properties³

Separable sum: If the function $f$ is separable as $f(\mathbf{u}, \mathbf{v}) = \phi (\mathbf{u}) + \psi (\mathbf{v})$ , then the following holds: $\operatorname{prox}_{\lambda f} (\mathbf{x}, \mathbf{y}) = (\operatorname{prox}_{\lambda \phi} (\mathbf{x}), \operatorname{prox}_{\lambda \psi} (\mathbf{y}))$ If it is completely separable for more than one variable as $f(x) = \sum_{i}^{n}f_{i}(x_{i})$ , then, $(\operatorname{prox}_{\lambda f} (x))_{i} = \operatorname{prox}_{\lambda f_{i}} (x_{i})$
Postcomposition: For $f(\mathbf{x}) = \alpha \phi (\mathbf{x}) + b$ and $\alpha \gt 0$ , the following holds: $\operatorname{prox}_{\lambda f} (\mathbf{x}) = \operatorname{prox}_{\alpha \lambda \phi} (\mathbf{x})$
Precomposition: For $f(\mathbf{x}) = \phi (\alpha \mathbf{x} + b)$ and $\alpha \ne 0$ , the following holds: $\operatorname{prox}_{\lambda f} (\mathbf{x}) = \dfrac{1}{\alpha}\left( \operatorname{prox}_{\alpha^{2} \lambda \phi} (\alpha \mathbf{x} + b) - b \right)$ If $f(\mathbf{x}) = \phi (Q \mathbf{x})$ and $Q$ is orthogonal, then the following holds: $\operatorname{prox}_{\lambda f} (\mathbf{x}) = Q^{T}\operatorname{prox}_{\lambda \phi} (Q\mathbf{x})$
Affine addition: For $f(\mathbf{x}) = \phi (\mathbf{x}) + \mathbf{a}^T\mathbf{x} + b$ , the following holds: $\operatorname{prox}_{\lambda f} (\mathbf{x}) = \operatorname{prox}_{\lambda \phi} (\mathbf{x} - \lambda \mathbf{a})$
Regularization: For $f(\mathbf{x}) = \phi (\mathbf{x}) + \frac{\rho}{2} \left\| \mathbf{x} - \mathbf{a} \right\|_{2}^{2}$ , the following holds: $\operatorname{prox}_{\lambda f} (\mathbf{x}) = \operatorname{prox}_{\tilde{\lambda} \phi} ((\tilde{\lambda}/\lambda)\mathbf{x} + (\rho \tilde{\lambda}) \mathbf{a})$ Here, $\tilde{\lambda} = \lambda / (1+\lambda \rho)$ .

Parikh, Neal, and Stephen Boyd. “Proximal algorithms.” Foundations and trends® in Optimization 1.3 (2014), p3. ↩︎
https://web.eecs.umich.edu/~fessler/course/598/l/n-05-prox.pdf ↩︎
https://web.stanford.edu/~boyd/papers/pdf/prox_algs.pdf Ch2 Properties ↩︎