Automatic differentiation
📂Machine LearningAutomatic differentiation
Definition
Automatic differentiation refers to a method for obtaining the derivative of a function defined by computer programming code. It is also abbreviated as AD or autodiff.
Explanation
Automatic differentiation involves using the chain rule to compute the derivative of a composite function composed of functions whose derivatives are already known. In simple terms, it is the chain rule itself. Implementing the chain rule in programming code constitutes automatic differentiation. Therefore, unlike methods that numerically approximate derivatives, automatic differentiation calculates the derivative coefficients accurately. It is used in deep learning to optimize neural networks through backpropagation. As you will see below, automatic differentiation does not actually compute the derivative itself; it simply sequentially adds, multiplies, or composes derivatives that are already known.
Depending on the method of computation, it can be broadly classified into forward mode and reverse mode.
About f:Rn→R
Let us consider a bivariate function f.
f(x1,x2)=x1x2+sin(x1)+2(x1)2
The partial derivative ∂x1∂f is computed as follows:
∂x1∂f=∂x1∂(x1x2)+∂x1∂sin(x1)+2(x1)2
Each term here is already known as shown below:
∂x1∂(x1x2)=x2,∂x1∂sin(x1)=cos(x1),∂x1∂2(x1)2=4x1
In other words, once the values of x1 and x2 are determined, the derivative coefficient is immediately decided. Implementing this in programming code would look as follows:
Forward Mode
Let’s set w1=x1, w2=x2, and consider the case when (x1,x5)=(2,5) to compute the derivative coefficient ∂x1∂f(2,5) of f. Each time we add, multiply, or apply any function, we record the derivative. Let us denote this as ∂x1∂w=w˙.
Forward calulationsw1=x1=2w2=x2=5w3=w1w2=10w4=sin(w1)=sin2w5=(w1)2=4w6=2w5=8w7=w3+w4+w6=10+sin2+8=yDerivativesw˙1=1w˙2=0w˙3=w2=5w˙4=cos2w˙5=2w1=4w˙6=2w˙5=8w˙7=w˙3+w˙4+w˙6=5+cos2+8=y˙
Therefore, we have ∂x1∂f(2,5)=y˙=cos2+13. If the derivative with respect to x2 was recorded during the process of computing w7, we obtain the gradient ∇f=[∂x1f∂x2f]T.
See Also