파이토치에서 자동미분하는 법 📂머신러닝

파이토치에서 자동미분하는 법

설명

파이토치에서 자동미분하는 방법을 소개한다. 파이토치에서 자동 미분은 torch.autograd.grad 함수로 구현되어있다.

torch.autograd.grad(outputs, inputs, grad_outputs=None, retain_graph=None, create_graph=False, only_inputs=True, allow_unused=None, is_grads_batched=False, materialize_grads=False)

outputs: 그래디언트(자동미분)를 계산할 함수의 함숫값.
inputs: 그래디언트를 계산할 점.
grad_outputs: 계산된 그래디언트에 곱해질 요소이다. 특별한 일 없으면 torch.ones_like(outputs)와 같이 두면 된다. 가령 grad_outputs = 2*torch.ones_like(outputs)라고 두면 $2\nabla f$ 가 계산되는 식이다.
retain_graph: 그래디언트 계산을 위한 그래프를 남겨둘지 말지를 결정한다. 기본값은 False인데, 이러면 함수가 실행된 뒤에 이미 있던 그래프들이 지워져서 같은 값의 그래디언트를 다시 구할 수 없다. 무슨 말인지는 밑의 예제에서 자세히 확인하자.
create_graph: 기본값은 False이며, 계산된 그래디언트가 다시 자동미분가능할지를 선택하는 옵션이다. 즉 2계도함수를 구하고 싶으면 True로 둬야한다.
allow_unused: 두번째 인자인 inputs가 outputs를 계산할 때 쓰인 입력이 아니라면 True로 설정해줘야한다.

일변수함수

다항식 $y = x^{3} + 2x^{2} + 3x + 1$ 을 정의하고 이의 도함수를 자동미분으로 구해보자. 위의 설명에 따라 다음이 코드를 작성한다. 입력인 x에는 반드시 requires_grad=True 옵션을 두어야 자동미분 그래프가 생성된다는 것에 주의하자.

import torch
import torch.autograd as autograd 

x = torch.linspace(0, 1, 100, requires_grad=True)
y = x**3 + 2*x**2 + 3*x + 1
z = x**2 + 2*x + 3

y_autograd = autograd.grad(y, x, grad_outputs=torch.ones_like(x))
(tensor([ 3.0000,  3.0407,  3.0820,  3.1240,  3.1665,  3.2097,  3.2534,  3.2978,
         3.3428,  3.3884,  3.4346,  3.4815,  3.5289,  3.5770,  3.6257,  3.6749,
         3.7248,  3.7753,  3.8264,  3.8782,  3.9305,  3.9835,  4.0370,  4.0912,
         4.1460,  4.2014,  4.2574,  4.3140,  4.3713,  4.4291,  4.4876,  4.5467,
         4.6064,  4.6667,  4.7276,  4.7891,  4.8512,  4.9140,  4.9773,  5.0413,
         5.1059,  5.1711,  5.2369,  5.3033,  5.3704,  5.4380,  5.5063,  5.5751,
         5.6446,  5.7147,  5.7854,  5.8567,  5.9287,  6.0012,  6.0744,  6.1481,
         6.2225,  6.2975,  6.3731,  6.4493,  6.5262,  6.6036,  6.6817,  6.7603,
         6.8396,  6.9195,  7.0000,  7.0811,  7.1628,  7.2452,  7.3281,  7.4117,
         7.4959,  7.5807,  7.6661,  7.7521,  7.8387,  7.9259,  8.0138,  8.1022,
         8.1913,  8.2810,  8.3713,  8.4622,  8.5537,  8.6459,  8.7386,  8.8320,
         8.9259,  9.0205,  9.1157,  9.2115,  9.3079,  9.4050,  9.5026,  9.6009,
         9.6997,  9.7992,  9.8993, 10.0000]),)

retain_graph를 True로 설정하지 않았기 때문에 y에 대한 그래프가 사라져 이를 다시 계산할 수 없다. 아래의 코드를 보면 z의 미분은 계산되지만, y의 미분은 다시 계산되지 않는 걸 확인할 수 있다. retain_graph = True라고 두면 y의 도함수를 구해도 y의 그래프가 사라지지 않는다.

>>> y_autograd = autograd.grad(y, x, grad_outputs=torch.ones_like(x))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\rydbr\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\autograd\__init__.py", line 303, in grad
    return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward.

>>> autograd.grad(z, x, grad_outputs=torch.ones_like(x))
(tensor([2.0000, 2.0202, 2.0404, 2.0606, 2.0808, 2.1010, 2.1212, 2.1414, 2.1616,
        2.1818, 2.2020, 2.2222, 2.2424, 2.2626, 2.2828, 2.3030, 2.3232, 2.3434,
        2.3636, 2.3838, 2.4040, 2.4242, 2.4444, 2.4646, 2.4848, 2.5051, 2.5253,
        2.5455, 2.5657, 2.5859, 2.6061, 2.6263, 2.6465, 2.6667, 2.6869, 2.7071,
        2.7273, 2.7475, 2.7677, 2.7879, 2.8081, 2.8283, 2.8485, 2.8687, 2.8889,
        2.9091, 2.9293, 2.9495, 2.9697, 2.9899, 3.0101, 3.0303, 3.0505, 3.0707,
        3.0909, 3.1111, 3.1313, 3.1515, 3.1717, 3.1919, 3.2121, 3.2323, 3.2525,
        3.2727, 3.2929, 3.3131, 3.3333, 3.3535, 3.3737, 3.3939, 3.4141, 3.4343,
        3.4545, 3.4747, 3.4949, 3.5152, 3.5354, 3.5556, 3.5758, 3.5960, 3.6162,
        3.6364, 3.6566, 3.6768, 3.6970, 3.7172, 3.7374, 3.7576, 3.7778, 3.7980,
        3.8182, 3.8384, 3.8586, 3.8788, 3.8990, 3.9192, 3.9394, 3.9596, 3.9798,
        4.0000]),)

한편 y_autograd를 계산할 때 create_graph=True라고 설정하지 않아서 y_autograd의 도함수는 계산할 수 없다. 이계도함수를 얻고 싶다면 반드시 해당 옵션을 True로 설정하자.

>>> y_autograd2 = autograd.grad(y_autograd, x, grad_outputs=torch.ones_like(x))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\rydbr\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\autograd\__init__.py", line 303, in grad
    return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

도함수가 잘 구해졌는지 확인해보자.

import matplotlib.pyplot as plt
y_prime = 3*x**2 + 4*x + 3
plt.plot(x.detach().numpy(), y_prime.detach().numpy(), label='y_prime', linewidth=5, c='blue')
plt.plot(x.detach().numpy(), y_autograd[0].detach().numpy(), label='y_autograd', linestyle='--', c='red', linewidth=3)
plt.legend()
plt.show()

다변수함수

정의역 $[0, 1] \times [1, 2]$ 에서 이변수함수 $z = x^{2}y + 2xy^{3}$ 을 정의하자.

x = torch.linspace(0, 1, 100, requires_grad=True)
y = torch.linspace(2, 1, 100, requires_grad=True)
X, Y = torch.meshgrid(x,y, indexing='xy')
X, Y = X.reshape(-1), Y.reshape(-1)

z = (X**2)*Y + 2*X*(Y**3)

이 함수의 편미분 $\dfrac{\partial z(x, 0)}{\partial x}$ 는 다음과 같이 구할 수 있다. 우선 z_x_auto = autograd.grad(z, X, grad_outputs=torch.ones_like(z))로 $\partial_{x} z$ 를 구하고, 이를 $100 \times 100$ 모양의 행렬로 고친 뒤 마지막 행행을 선택하면 된다.

z_x_auto = autograd.grad(z, X, grad_outputs=torch.ones_like(z), retain_graph=True)
z_x_auto[0].reshape(100,100)[:, 0]

전체 범위에서의 $\partial_{x} z$ 의 값을 비교해보면 다음과 같다.


z_x = 2*X*Y + 2*(Y**3)

plt.subplot(1,3,1)
plt.imshow(z_x.reshape(100,100).detach().numpy(),
           extent=(x.min().item(), x.max().item(), y.min().item(), y.max().item()))
plt.title('z_x')
plt.xlabel('x')
plt.ylabel('y')
plt.colorbar()
plt.subplot(1,3,2)
plt.imshow(z_x_auto[0].reshape(100,100).detach().numpy(),
           extent=(x.min().item(), x.max().item(), y.min().item(), y.max().item()))
plt.title('z_x_auto')
plt.xlabel('x')
plt.ylabel('y')
plt.colorbar()
plt.subplot(1,3,3)
plt.imshow((z_x - z_x_auto[0]).reshape(100,100).detach().numpy(),
           extent=(x.min().item(), x.max().item(), y.min().item(), y.max().item()))
plt.title('z_x - z_x_auto')
plt.xlabel('x')
plt.ylabel('y')
plt.colorbar()
plt.show()

편미분이 아니라 그래디언트를 얻으려면 inputs의 인자로 X와 Y를 모두 대입하면 된다. 출력의 첫번째 성분은 $\dfrac{\partial z}{\partial x}$ 가 되고, 두번째 성분은 $\dfrac{\partial z}{\partial y}$ 가 된다.

>>> z_xy_auto = autograd.grad(z, [X, Y], grad_outputs=torch.ones_like(z), retain_graph=True)
>>> z_xy_auto[0]
tensor([16.0000, 16.0404, 16.0808,  ...,  3.9596,  3.9798,  4.0000])
>>> z_xy_auto[1]
tensor([0.0000, 0.2425, 0.4853,  ..., 6.8388, 6.9193, 7.0000])

이렇게 얻은 $\partial_{x}z$ 가 위에서 얻은 것과 같은지 확인해보면,

plt.subplot(1,2,1)
plt.imshow(z_x_auto[0].reshape(100,100).detach().numpy(),
           extent=(x.min().item(), x.max().item(), y.min().item(), y.max().item()))
plt.title('z_x_auto')
plt.xlabel('x')
plt.ylabel('y')
plt.colorbar()
plt.subplot(1,2,2)
plt.imshow(z_xy_auto[0].reshape(100,100).detach().numpy(),
           extent=(x.min().item(), x.max().item(), y.min().item(), y.max().item()))
plt.title('z_xy_auto[0]')
plt.xlabel('x')
plt.ylabel('y')
plt.colorbar()
plt.show()

환경

OS: Windows11
Version: Python 3.11.5, torch==2.0.1+cu118, matplotlib==3.8.0