파이토치에서 자동미분하는 법
Explanation
We introduce how to perform automatic differentiation in PyTorch. In PyTorch, automatic differentiation is implemented through the torch.autograd.grad
function.
torch.autograd.grad(outputs, inputs, grad_outputs=None, retain_graph=None, create_graph=False, only_inputs=True, allow_unused=None, is_grads_batched=False, materialize_grads=False)
outputs
: The function value for which the gradient (automatic differentiation) is calculated.inputs
: The point at which the gradient is calculated.grad_outputs
: This is the element to be multiplied by the calculated gradient. Usually, it is set totorch.ones_like(outputs)
. For instance, ifgrad_outputs = 2*torch.ones_like(outputs)
, then will be computed.retain_graph
: Determines whether to retain the graph used for gradient calculations. The default isFalse
, meaning the graphs are deleted after the function execution, making it impossible to recompute the gradients for the same values. Check the example below for more details.create_graph
: The default isFalse
, and this option decides whether the computed gradients will be differentiable again. For example, to compute the second derivative, set it toTrue
.allow_unused
: Set it toTrue
if the second parameterinputs
is not the input used to computeoutputs
.
Single Variable Function
Let’s define the polynomial and find its derivative using automatic differentiation. Based on the above explanation, we write the following code. Note that the input x
must have the requires_grad=True
option so that the automatic differentiation graph is generated.
import torch
import torch.autograd as autograd
x = torch.linspace(0, 1, 100, requires_grad=True)
y = x**3 + 2*x**2 + 3*x + 1
z = x**2 + 2*x + 3
y_autograd = autograd.grad(y, x, grad_outputs=torch.ones_like(x))
(tensor([ 3.0000, 3.0407, 3.0820, 3.1240, 3.1665, 3.2097, 3.2534, 3.2978,
3.3428, 3.3884, 3.4346, 3.4815, 3.5289, 3.5770, 3.6257, 3.6749,
3.7248, 3.7753, 3.8264, 3.8782, 3.9305, 3.9835, 4.0370, 4.0912,
4.1460, 4.2014, 4.2574, 4.3140, 4.3713, 4.4291, 4.4876, 4.5467,
4.6064, 4.6667, 4.7276, 4.7891, 4.8512, 4.9140, 4.9773, 5.0413,
5.1059, 5.1711, 5.2369, 5.3033, 5.3704, 5.4380, 5.5063, 5.5751,
5.6446, 5.7147, 5.7854, 5.8567, 5.9287, 6.0012, 6.0744, 6.1481,
6.2225, 6.2975, 6.3731, 6.4493, 6.5262, 6.6036, 6.6817, 6.7603,
6.8396, 6.9195, 7.0000, 7.0811, 7.1628, 7.2452, 7.3281, 7.4117,
7.4959, 7.5807, 7.6661, 7.7521, 7.8387, 7.9259, 8.0138, 8.1022,
8.1913, 8.2810, 8.3713, 8.4622, 8.5537, 8.6459, 8.7386, 8.8320,
8.9259, 9.0205, 9.1157, 9.2115, 9.3079, 9.4050, 9.5026, 9.6009,
9.6997, 9.7992, 9.8993, 10.0000]),)
Because retain_graph
was not set to True
, the graph for y
is deleted and cannot be recalculated. The code below shows that while the derivative of z
can be computed, the derivative of y
cannot be recomputed. If you set retain_graph = True
, the derivative of y
can be obtained without losing the graph of y
.
>>> y_autograd = autograd.grad(y, x, grad_outputs=torch.ones_like(x))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\rydbr\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\autograd\__init__.py", line 303, in grad
return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward.
>>> autograd.grad(z, x, grad_outputs=torch.ones_like(x))
(tensor([2.0000, 2.0202, 2.0404, 2.0606, 2.0808, 2.1010, 2.1212, 2.1414, 2.1616,
2.1818, 2.2020, 2.2222, 2.2424, 2.2626, 2.2828, 2.3030, 2.3232, 2.3434,
2.3636, 2.3838, 2.4040, 2.4242, 2.4444, 2.4646, 2.4848, 2.5051, 2.5253,
2.5455, 2.5657, 2.5859, 2.6061, 2.6263, 2.6465, 2.6667, 2.6869, 2.7071,
2.7273, 2.7475, 2.7677, 2.7879, 2.8081, 2.8283, 2.8485, 2.8687, 2.8889,
2.9091, 2.9293, 2.9495, 2.9697, 2.9899, 3.0101, 3.0303, 3.0505, 3.0707,
3.0909, 3.1111, 3.1313, 3.1515, 3.1717, 3.1919, 3.2121, 3.2323, 3.2525,
3.2727, 3.2929, 3.3131, 3.3333, 3.3535, 3.3737, 3.3939, 3.4141, 3.4343,
3.4545, 3.4747, 3.4949, 3.5152, 3.5354, 3.5556, 3.5758, 3.5960, 3.6162,
3.6364, 3.6566, 3.6768, 3.6970, 3.7172, 3.7374, 3.7576, 3.7778, 3.7980,
3.8182, 3.8384, 3.8586, 3.8788, 3.8990, 3.9192, 3.9394, 3.9596, 3.9798,
4.0000]),)
Meanwhile, because create_graph=True
was not set when calculating y_autograd
, the derivative of y_autograd
cannot be calculated. To get the second derivative, you must set the relevant option to True
.
>>> y_autograd2 = autograd.grad(y_autograd, x, grad_outputs=torch.ones_like(x))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\rydbr\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\autograd\__init__.py", line 303, in grad
return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
Let’s check if the derivative was calculated correctly.
import matplotlib.pyplot as plt
y_prime = 3*x**2 + 4*x + 3
plt.plot(x.detach().numpy(), y_prime.detach().numpy(), label='y_prime', linewidth=5, c='blue')
plt.plot(x.detach().numpy(), y_autograd[0].detach().numpy(), label='y_autograd', linestyle='--', c='red', linewidth=3)
plt.legend()
plt.show()
Multi Variable Function
Let’s define a bivariate function over the domain .
x = torch.linspace(0, 1, 100, requires_grad=True)
y = torch.linspace(2, 1, 100, requires_grad=True)
X, Y = torch.meshgrid(x,y, indexing='xy')
X, Y = X.reshape(-1), Y.reshape(-1)
z = (X**2)*Y + 2*X*(Y**3)
The partial derivative of this function can be obtained as follows. First, obtain using z_x_auto = autograd.grad(z, X, grad_outputs=torch.ones_like(z))
, then reshape it into a matrix of the form and select the last row.
z_x_auto = autograd.grad(z, X, grad_outputs=torch.ones_like(z), retain_graph=True)
z_x_auto[0].reshape(100,100)[:, 0]
If we compare the values of over the entire range, we get the following.
z_x = 2*X*Y + 2*(Y**3)
plt.subplot(1,3,1)
plt.imshow(z_x.reshape(100,100).detach().numpy(),
extent=(x.min().item(), x.max().item(), y.min().item(), y.max().item()))
plt.title('z_x')
plt.xlabel('x')
plt.ylabel('y')
plt.colorbar()
plt.subplot(1,3,2)
plt.imshow(z_x_auto[0].reshape(100,100).detach().numpy(),
extent=(x.min().item(), x.max().item(), y.min().item(), y.max().item()))
plt.title('z_x_auto')
plt.xlabel('x')
plt.ylabel('y')
plt.colorbar()
plt.subplot(1,3,3)
plt.imshow((z_x - z_x_auto[0]).reshape(100,100).detach().numpy(),
extent=(x.min().item(), x.max().item(), y.min().item(), y.max().item()))
plt.title('z_x - z_x_auto')
plt.xlabel('x')
plt.ylabel('y')
plt.colorbar()
plt.show()
To obtain the gradient instead of the partial derivative, use both X
and Y
as the inputs
parameter. The first component of the output becomes and the second component becomes .
>>> z_xy_auto = autograd.grad(z, [X, Y], grad_outputs=torch.ones_like(z), retain_graph=True)
>>> z_xy_auto[0]
tensor([16.0000, 16.0404, 16.0808, ..., 3.9596, 3.9798, 4.0000])
>>> z_xy_auto[1]
tensor([0.0000, 0.2425, 0.4853, ..., 6.8388, 6.9193, 7.0000])
To confirm that the obtained is the same as what was acquired above,
plt.subplot(1,2,1)
plt.imshow(z_x_auto[0].reshape(100,100).detach().numpy(),
extent=(x.min().item(), x.max().item(), y.min().item(), y.max().item()))
plt.title('z_x_auto')
plt.xlabel('x')
plt.ylabel('y')
plt.colorbar()
plt.subplot(1,2,2)
plt.imshow(z_xy_auto[0].reshape(100,100).detach().numpy(),
extent=(x.min().item(), x.max().item(), y.min().item(), y.max().item()))
plt.title('z_xy_auto[0]')
plt.xlabel('x')
plt.ylabel('y')
plt.colorbar()
plt.show()
Environment
- OS: Windows11
- Version: Python 3.11.5, torch==2.0.1+cu118, matplotlib==3.8.0