logo

자동미분과 이원수 📂Machine Learning

자동미분과 이원수

Overview

Dual numbers are numbers that can be expressed in the following form for two real numbers $a, b \in \mathbb{R}$.

$$ a + b\epsilon, \quad (\epsilon^{2} = 0,\ \epsilon \neq 0) $$

The addition and multiplication system of dual numbers is useful for implementing the forward mode of automatic differentiation.

Description1

In automatic differentiation, especially the forward mode, when computing the function value $f$, the derivative is calculated simultaneously. For example, if we want to compute the derivative of $y(x) = \ln (x^{2} + \sin x)$, we can calculate it using the following formula, which we will call $\dot{w} = \dfrac{dw}{dx}$.

$$ \begin{array}{|l|l|} \hline \textbf{Forward calulations} & \textbf{Derivatives} \\ \hline w_{1} = x & \dot{w}_{1} = 1 \\[0.5em] w_{2} = w_{1}^{2} & \dot{w}_{2} = 2w_{1} = 2x \\[0.5em] w_{3} = \sin w_{1} & \dot{w}_{3} = \cos w_{1} = \cos x \\[0.5em] w_{4} = w_{2} + w_{3} & \dot{w}_{4} = \dot{w}_{2} + \dot{w}_{3} = 2x + \cos x \\[0.5em] w_{5} = \ln (w_{4}) & \dot{w}_{5} = \dfrac{\dot{w}_{4}}{w_{4}} = \dfrac{2x + \cos x}{x^{2} + \sin x} \\[1em] \hline \end{array} $$

In this case, by using operations on dual numbers, it is possible to simultaneously and naturally compute the function value and the derivative. Let us express the dual number $a + b\epsilon$ as an ordered pair $(a, b)$.

Addition of Dual Numbers

$$ (a, b) + (c, d) = (a + c, b + d) $$

Multiplication of Dual Numbers

$$ (a, b)(c, d) = (ac, ab+bc) $$

Differentiable Functions Defined on Dual Numbers

For a differentiable function $f : \mathbb{R} \to \mathbb{R}$, $$ f(a + b\epsilon) := f(a) + f^{\prime}(a)b\epsilon = \big( f(a), b f^{\prime}(a) \big) $$

Composition of Functions Defined on Dual Numbers

For $f, g : \mathbb{R} \to \mathbb{R}$, $$ (f \circ g)(a + b\epsilon) := f(g(a)) + f^{\prime}(g(a))g^{\prime}(a)b\epsilon = \big( f(g(a)), bf^{\prime}(g(a))g^{\prime}(a) \big) $$

Differentiation

Consider a variable to be differentiated $x$, represented as a dual number $(x, 1)$, and a constant $\alpha$, represented as $(\alpha, 0)$. Then, the addition of dual numbers itself represents the first component as the function value and the second component as the derivative. For example, consider a function (constant addition) $x \mapsto x + \alpha$. The function value from $x = x_{0}$ is $x_{0} + \alpha$, and its derivative is $\left. \dfrac{d(x + \alpha)}{dx}\right|_{x = x_{0}} = 1$. Expressed in dual numbers, it is as follows:

$$ (x, 1) + (\alpha, 0) = (x + \alpha, 1) $$

The first component is the function value $x + \alpha$, and the second component is the derivative $1$. Of course, this also holds for $x + x$.

$$ \dfrac{d(x+x)}{dx} = 2x, \qquad (x, 1) + (x, 1) = (2x, 2) $$

Now, consider the function $x \mapsto \alpha x$ (multiplication). From $x = x_{0}$, the function value is $\alpha x_{0}$, and the derivative is $\left. \dfrac{d(\alpha x)}{dx} \right|_{x = x_{0}} = \alpha$. In dual numbers, it is expressed as follows:

$$ (x, 1)(\alpha, 0) = (\alpha x, x\cdot0 + 1\cdot\alpha) = (\alpha x, \alpha) $$

Similarly, the first component is the function value, and the second component is the derivative. This also holds for exponentiation $x \mapsto x^{2}$.

$$ (x, 1)(x, 1) = (x^{2}, 2x) $$

The derivative is preserved when substituted into the differentiable function $f$ and the composite function $f \circ g$.

$$ f(x, 1) = \big( f(x), f^{\prime}(x) \big), \qquad (f \circ g)(x, 1) = \big( f(g(x)), f^{\prime}(g(x))g^{\prime}(x) \big) $$

Now, let’s revisit the example $y(x) = \ln (x^{2} + \sin x)$ mentioned earlier. If we substitute the dual number $(x, 1)$ instead of the real number $x$, the calculation is as follows:

$$ \begin{align*} (x, 1)^{2} &= (x^{2}, 2x) \\ \sin(x, 1) &= (\sin x, \cos x) \\ (x, 1)^{2} + \sin(x, 1) &= (x^{2} + \sin x, 2x + \cos x) \\ \ln( (x, 1)^{2} + \sin(x, 1) ) &= \ln(x^{2} + \sin x, 2x + \cos x) \\ &= \Big( \ln(x^{2} + \sin x), (2x + \cos x) \dfrac{1}{x^{2} + \sin x} \Big) \\ &= \Big( \ln(x^{2} + \sin x), \dfrac{2x + \cos x}{x^{2} + \sin x} \Big) \\ \end{align*} $$

Computing the derivative of $y$ actually results in the following, confirming that it matches the second component of the dual number.

$$ \dfrac{dy}{dx} = \dfrac{d}{dx} \ln (x^{2} + \sin x) = \dfrac{2x + \cos x}{x^{2} + \sin x} $$

Refer to the guide below for implementing automatic differentiation in Julia.

See Also


  1. Mykel J. Kochenderfer, Algorithms for Optimization (2019), p27-32 ↩︎