How to Use Automatic Differentiation in PyTorch 📂Machine Learning

How to Use Automatic Differentiation in PyTorch

Explanation

We introduce how to perform automatic differentiation in PyTorch. In PyTorch, automatic differentiation is implemented through the torch.autograd.grad function.

torch.autograd.grad(outputs, inputs, grad_outputs=None, retain_graph=None, create_graph=False, only_inputs=True, allow_unused=None, is_grads_batched=False, materialize_grads=False)

outputs: The function value for which the gradient (automatic differentiation) is calculated.
inputs: The point at which the gradient is calculated.
grad_outputs: This is the element to be multiplied by the calculated gradient. Usually, it is set to torch.ones_like(outputs). For instance, if grad_outputs = 2*torch.ones_like(outputs), then $2\nabla f$ will be computed.
retain_graph: Determines whether to retain the graph used for gradient calculations. The default is False, meaning the graphs are deleted after the function execution, making it impossible to recompute the gradients for the same values. Check the example below for more details.
create_graph: The default is False, and this option decides whether the computed gradients will be differentiable again. For example, to compute the second derivative, set it to True.
allow_unused: Set it to True if the second parameter inputs is not the input used to compute outputs.

Single Variable Function

Let’s define the polynomial $y = x^{3} + 2x^{2} + 3x + 1$ and find its derivative using automatic differentiation. Based on the above explanation, we write the following code. Note that the input x must have the requires_grad=True option so that the automatic differentiation graph is generated.

import torch
import torch.autograd as autograd 

x = torch.linspace(0, 1, 100, requires_grad=True)
y = x**3 + 2*x**2 + 3*x + 1
z = x**2 + 2*x + 3

y_autograd = autograd.grad(y, x, grad_outputs=torch.ones_like(x))
(tensor([ 3.0000,  3.0407,  3.0820,  3.1240,  3.1665,  3.2097,  3.2534,  3.2978,
         3.3428,  3.3884,  3.4346,  3.4815,  3.5289,  3.5770,  3.6257,  3.6749,
         3.7248,  3.7753,  3.8264,  3.8782,  3.9305,  3.9835,  4.0370,  4.0912,
         4.1460,  4.2014,  4.2574,  4.3140,  4.3713,  4.4291,  4.4876,  4.5467,
         4.6064,  4.6667,  4.7276,  4.7891,  4.8512,  4.9140,  4.9773,  5.0413,
         5.1059,  5.1711,  5.2369,  5.3033,  5.3704,  5.4380,  5.5063,  5.5751,
         5.6446,  5.7147,  5.7854,  5.8567,  5.9287,  6.0012,  6.0744,  6.1481,
         6.2225,  6.2975,  6.3731,  6.4493,  6.5262,  6.6036,  6.6817,  6.7603,
         6.8396,  6.9195,  7.0000,  7.0811,  7.1628,  7.2452,  7.3281,  7.4117,
         7.4959,  7.5807,  7.6661,  7.7521,  7.8387,  7.9259,  8.0138,  8.1022,
         8.1913,  8.2810,  8.3713,  8.4622,  8.5537,  8.6459,  8.7386,  8.8320,
         8.9259,  9.0205,  9.1157,  9.2115,  9.3079,  9.4050,  9.5026,  9.6009,
         9.6997,  9.7992,  9.8993, 10.0000]),)

Because retain_graph was not set to True, the graph for y is deleted and cannot be recalculated. The code below shows that while the derivative of z can be computed, the derivative of y cannot be recomputed. If you set retain_graph = True, the derivative of y can be obtained without losing the graph of y.

>>> y_autograd = autograd.grad(y, x, grad_outputs=torch.ones_like(x))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\rydbr\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\autograd\__init__.py", line 303, in grad
    return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward.

>>> autograd.grad(z, x, grad_outputs=torch.ones_like(x))
(tensor([2.0000, 2.0202, 2.0404, 2.0606, 2.0808, 2.1010, 2.1212, 2.1414, 2.1616,
        2.1818, 2.2020, 2.2222, 2.2424, 2.2626, 2.2828, 2.3030, 2.3232, 2.3434,
        2.3636, 2.3838, 2.4040, 2.4242, 2.4444, 2.4646, 2.4848, 2.5051, 2.5253,
        2.5455, 2.5657, 2.5859, 2.6061, 2.6263, 2.6465, 2.6667, 2.6869, 2.7071,
        2.7273, 2.7475, 2.7677, 2.7879, 2.8081, 2.8283, 2.8485, 2.8687, 2.8889,
        2.9091, 2.9293, 2.9495, 2.9697, 2.9899, 3.0101, 3.0303, 3.0505, 3.0707,
        3.0909, 3.1111, 3.1313, 3.1515, 3.1717, 3.1919, 3.2121, 3.2323, 3.2525,
        3.2727, 3.2929, 3.3131, 3.3333, 3.3535, 3.3737, 3.3939, 3.4141, 3.4343,
        3.4545, 3.4747, 3.4949, 3.5152, 3.5354, 3.5556, 3.5758, 3.5960, 3.6162,
        3.6364, 3.6566, 3.6768, 3.6970, 3.7172, 3.7374, 3.7576, 3.7778, 3.7980,
        3.8182, 3.8384, 3.8586, 3.8788, 3.8990, 3.9192, 3.9394, 3.9596, 3.9798,
        4.0000]),)

Meanwhile, because create_graph=True was not set when calculating y_autograd, the derivative of y_autograd cannot be calculated. To get the second derivative, you must set the relevant option to True.

>>> y_autograd2 = autograd.grad(y_autograd, x, grad_outputs=torch.ones_like(x))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\rydbr\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\autograd\__init__.py", line 303, in grad
    return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

Let’s check if the derivative was calculated correctly.

import matplotlib.pyplot as plt
y_prime = 3*x**2 + 4*x + 3
plt.plot(x.detach().numpy(), y_prime.detach().numpy(), label='y_prime', linewidth=5, c='blue')
plt.plot(x.detach().numpy(), y_autograd[0].detach().numpy(), label='y_autograd', linestyle='--', c='red', linewidth=3)
plt.legend()
plt.show()

Multi Variable Function

Let’s define a bivariate function $z = x^{2}y + 2xy^{3}$ over the domain $[0, 1] \times [1, 2]$ .

x = torch.linspace(0, 1, 100, requires_grad=True)
y = torch.linspace(2, 1, 100, requires_grad=True)
X, Y = torch.meshgrid(x,y, indexing='xy')
X, Y = X.reshape(-1), Y.reshape(-1)

z = (X**2)*Y + 2*X*(Y**3)

The partial derivative $\dfrac{\partial z(x, 0)}{\partial x}$ of this function can be obtained as follows. First, obtain $\partial_{x} z$ using z_x_auto = autograd.grad(z, X, grad_outputs=torch.ones_like(z)), then reshape it into a matrix of the form $100 \times 100$ and select the last row.

z_x_auto = autograd.grad(z, X, grad_outputs=torch.ones_like(z), retain_graph=True)
z_x_auto[0].reshape(100,100)[:, 0]

If we compare the values of $\partial_{x} z$ over the entire range, we get the following.


z_x = 2*X*Y + 2*(Y**3)

plt.subplot(1,3,1)
plt.imshow(z_x.reshape(100,100).detach().numpy(),
           extent=(x.min().item(), x.max().item(), y.min().item(), y.max().item()))
plt.title('z_x')
plt.xlabel('x')
plt.ylabel('y')
plt.colorbar()
plt.subplot(1,3,2)
plt.imshow(z_x_auto[0].reshape(100,100).detach().numpy(),
           extent=(x.min().item(), x.max().item(), y.min().item(), y.max().item()))
plt.title('z_x_auto')
plt.xlabel('x')
plt.ylabel('y')
plt.colorbar()
plt.subplot(1,3,3)
plt.imshow((z_x - z_x_auto[0]).reshape(100,100).detach().numpy(),
           extent=(x.min().item(), x.max().item(), y.min().item(), y.max().item()))
plt.title('z_x - z_x_auto')
plt.xlabel('x')
plt.ylabel('y')
plt.colorbar()
plt.show()

To obtain the gradient instead of the partial derivative, use both X and Y as the inputs parameter. The first component of the output becomes $\dfrac{\partial z}{\partial x}$ and the second component becomes $\dfrac{\partial z}{\partial y}$ .

>>> z_xy_auto = autograd.grad(z, [X, Y], grad_outputs=torch.ones_like(z), retain_graph=True)
>>> z_xy_auto[0]
tensor([16.0000, 16.0404, 16.0808,  ...,  3.9596,  3.9798,  4.0000])
>>> z_xy_auto[1]
tensor([0.0000, 0.2425, 0.4853,  ..., 6.8388, 6.9193, 7.0000])

To confirm that the obtained $\partial_{x}z$ is the same as what was acquired above,

plt.subplot(1,2,1)
plt.imshow(z_x_auto[0].reshape(100,100).detach().numpy(),
           extent=(x.min().item(), x.max().item(), y.min().item(), y.max().item()))
plt.title('z_x_auto')
plt.xlabel('x')
plt.ylabel('y')
plt.colorbar()
plt.subplot(1,2,2)
plt.imshow(z_xy_auto[0].reshape(100,100).detach().numpy(),
           extent=(x.min().item(), x.max().item(), y.min().item(), y.max().item()))
plt.title('z_xy_auto[0]')
plt.xlabel('x')
plt.ylabel('y')
plt.colorbar()
plt.show()

Environment

OS: Windows11
Version: Python 3.11.5, torch==2.0.1+cu118, matplotlib==3.8.0