logo

Automatic Differentiation and Dual Numbers 📂Machine Learning

Automatic Differentiation and Dual Numbers

Overview

Dual numbers are numbers that can be expressed in the following form for two real numbers a,bRa, b \in \mathbb{R}.

a+bϵ,(ϵ2=0, ϵ0) a + b\epsilon, \quad (\epsilon^{2} = 0,\ \epsilon \neq 0)

The addition and multiplication system of dual numbers is useful for implementing the forward mode of automatic differentiation.

Description1

In automatic differentiation, especially the forward mode, when computing the function value ff, the derivative is calculated simultaneously. For example, if we want to compute the derivative of y(x)=ln(x2+sinx)y(x) = \ln (x^{2} + \sin x), we can calculate it using the following formula, which we will call w˙=dwdx\dot{w} = \dfrac{dw}{dx}.

Forward calulationsDerivativesw1=xw˙1=1w2=w12w˙2=2w1=2xw3=sinw1w˙3=cosw1=cosxw4=w2+w3w˙4=w˙2+w˙3=2x+cosxw5=ln(w4)w˙5=w˙4w4=2x+cosxx2+sinx \begin{array}{|l|l|} \hline \textbf{Forward calulations} & \textbf{Derivatives} \\ \hline w_{1} = x & \dot{w}_{1} = 1 \\[0.5em] w_{2} = w_{1}^{2} & \dot{w}_{2} = 2w_{1} = 2x \\[0.5em] w_{3} = \sin w_{1} & \dot{w}_{3} = \cos w_{1} = \cos x \\[0.5em] w_{4} = w_{2} + w_{3} & \dot{w}_{4} = \dot{w}_{2} + \dot{w}_{3} = 2x + \cos x \\[0.5em] w_{5} = \ln (w_{4}) & \dot{w}_{5} = \dfrac{\dot{w}_{4}}{w_{4}} = \dfrac{2x + \cos x}{x^{2} + \sin x} \\[1em] \hline \end{array}

In this case, by using operations on dual numbers, it is possible to simultaneously and naturally compute the function value and the derivative. Let us express the dual number a+bϵa + b\epsilon as an ordered pair (a,b)(a, b).

Addition of Dual Numbers

(a,b)+(c,d)=(a+c,b+d) (a, b) + (c, d) = (a + c, b + d)

Multiplication of Dual Numbers

(a,b)(c,d)=(ac,ad+bc) (a, b)(c, d) = (ac, ad+bc)

Differentiable Functions Defined on Dual Numbers

For a differentiable function f:RRf : \mathbb{R} \to \mathbb{R}, f(a+bϵ):=f(a)+f(a)bϵ=(f(a),bf(a)) f(a + b\epsilon) := f(a) + f^{\prime}(a)b\epsilon = \big( f(a), b f^{\prime}(a) \big)

Composition of Functions Defined on Dual Numbers

For f,g:RRf, g : \mathbb{R} \to \mathbb{R}, (fg)(a+bϵ):=f(g(a))+f(g(a))g(a)bϵ=(f(g(a)),bf(g(a))g(a)) (f \circ g)(a + b\epsilon) := f(g(a)) + f^{\prime}(g(a))g^{\prime}(a)b\epsilon = \big( f(g(a)), bf^{\prime}(g(a))g^{\prime}(a) \big)

Differentiation

Consider a variable to be differentiated xx, represented as a dual number (x,1)(x, 1), and a constant α\alpha, represented as (α,0)(\alpha, 0). Then, the addition of dual numbers itself represents the first component as the function value and the second component as the derivative. For example, consider a function (constant addition) xx+αx \mapsto x + \alpha. The function value from x=x0x = x_{0} is x0+αx_{0} + \alpha, and its derivative is d(x+α)dxx=x0=1\left. \dfrac{d(x + \alpha)}{dx}\right|_{x = x_{0}} = 1. Expressed in dual numbers, it is as follows:

(x,1)+(α,0)=(x+α,1) (x, 1) + (\alpha, 0) = (x + \alpha, 1)

The first component is the function value x+αx + \alpha, and the second component is the derivative 11. Of course, this also holds for x+xx + x.

d(x+x)dx=2x,(x,1)+(x,1)=(2x,2) \dfrac{d(x+x)}{dx} = 2x, \qquad (x, 1) + (x, 1) = (2x, 2)

Now, consider the function xαxx \mapsto \alpha x (multiplication). From x=x0x = x_{0}, the function value is αx0\alpha x_{0}, and the derivative is d(αx)dxx=x0=α\left. \dfrac{d(\alpha x)}{dx} \right|_{x = x_{0}} = \alpha. In dual numbers, it is expressed as follows:

(x,1)(α,0)=(αx,x0+1α)=(αx,α) (x, 1)(\alpha, 0) = (\alpha x, x\cdot0 + 1\cdot\alpha) = (\alpha x, \alpha)

Similarly, the first component is the function value, and the second component is the derivative. This also holds for exponentiation xx2x \mapsto x^{2}.

(x,1)(x,1)=(x2,2x) (x, 1)(x, 1) = (x^{2}, 2x)

The derivative is preserved when substituted into the differentiable function ff and the composite function fgf \circ g.

f(x,1)=(f(x),f(x)),(fg)(x,1)=(f(g(x)),f(g(x))g(x)) f(x, 1) = \big( f(x), f^{\prime}(x) \big), \qquad (f \circ g)(x, 1) = \big( f(g(x)), f^{\prime}(g(x))g^{\prime}(x) \big)

Now, let’s revisit the example y(x)=ln(x2+sinx)y(x) = \ln (x^{2} + \sin x) mentioned earlier. If we substitute the dual number (x,1)(x, 1) instead of the real number xx, the calculation is as follows:

(x,1)2=(x2,2x)sin(x,1)=(sinx,cosx)(x,1)2+sin(x,1)=(x2+sinx,2x+cosx)ln((x,1)2+sin(x,1))=ln(x2+sinx,2x+cosx)=(ln(x2+sinx),(2x+cosx)1x2+sinx)=(ln(x2+sinx),2x+cosxx2+sinx) \begin{align*} (x, 1)^{2} &= (x^{2}, 2x) \\ \sin(x, 1) &= (\sin x, \cos x) \\ (x, 1)^{2} + \sin(x, 1) &= (x^{2} + \sin x, 2x + \cos x) \\ \ln( (x, 1)^{2} + \sin(x, 1) ) &= \ln(x^{2} + \sin x, 2x + \cos x) \\ &= \Big( \ln(x^{2} + \sin x), (2x + \cos x) \dfrac{1}{x^{2} + \sin x} \Big) \\ &= \Big( \ln(x^{2} + \sin x), \dfrac{2x + \cos x}{x^{2} + \sin x} \Big) \\ \end{align*}

Computing the derivative of yy actually results in the following, confirming that it matches the second component of the dual number.

dydx=ddxln(x2+sinx)=2x+cosxx2+sinx \dfrac{dy}{dx} = \dfrac{d}{dx} \ln (x^{2} + \sin x) = \dfrac{2x + \cos x}{x^{2} + \sin x}

Refer to the guide below for implementing automatic differentiation in Julia.

See Also


  1. Mykel J. Kochenderfer, Algorithms for Optimization (2019), p27-32 ↩︎