머신 러닝에서 자주 쓰이는 벡터, 행렬의 그래디언트 계산표

머신 러닝에서 자주 쓰이는 벡터, 행렬의 그래디언트 계산표

gradient for vector and matrix in machine learning

스칼라 함수그래디언트:

$$ \frac{ \partial f(\mathbf{w})}{ \partial \mathbf{w} } :=\nabla f(\mathbf{w})=\begin{bmatrix} \dfrac{ \partial f(\mathbf{w})}{ \partial w_{1} } & \dfrac{ \partial f(\mathbf{w})}{ \partial w_{2} } & \cdots & \dfrac{ \partial f(\mathbf{w})}{ \partial w_{n} } \end{bmatrix}^{T} $$

내적의 그래디언트

$f(\mathbf{w})=\mathbf{w}^{T}\mathbf{x}$라고 하면

$$ \begin{align*} \frac{ \partial f(\mathbf{w})}{ \partial \mathbf{w} } =\frac{ \partial (\mathbf{w}^{T}\mathbf{x})}{ \partial \mathbf{w} } &=\begin{bmatrix} \dfrac{ \partial \left( \sum _{i=1} ^{n} w_{i}x_{i}\right)}{ \partial w_{1} } & \dfrac{ \partial \left( \sum _{i=1} ^{n} w_{i}x_{i}\right)}{ \partial w_{2} } & \cdots & \dfrac{ \partial \left( \sum _{i=1} ^{n} w_{i}x_{i}\right)}{ \partial w_{n} } \end{bmatrix}^{T} \\ &= \begin{bmatrix} x_{1} & x_{2} & \cdots & x_{n} \end{bmatrix}^{T} \\ &= \mathbf{x} \end{align*} $$

또한 $\mathbf{w}^{T}\mathbf{x} = \mathbf{x}^{T}\mathbf{w}$이므로,

$$ \frac{ \partial \mathbf{x}^{T}\mathbf{w}}{ \partial \mathbf{w} } = \mathbf{x} $$

놈의 그래디언트

은 $\left\| \mathbf{w} \right\|^{2} = \mathbf{w}^{T} \mathbf{w}$이므로,

$$ \dfrac{\partial \left\| \mathbf{w} \right\|^{2}}{\partial \mathbf{w}} = \dfrac{\partial (\mathbf{w}^{T}\mathbf{w})}{\partial \mathbf{w}} = \mathbf{w} $$

이차형식의 그래디언트

미분 계산이 쉽도록 먼저 계산을 다음과 같이 해준다. 임의의 $k\in\left\{1,\dots,n\right\}$에 대해서 다음의 식을 얻는다.

$$ \begin{align*} f(\mathbf{w}) &= \mathbf{w}^{T}\mathbf{R}\mathbf{w} =\sum \limits _{j=1} ^{n} w_{j} \sum \limits _{i=1} ^{n}r_{ji}w_{i} \\ & = w_{k} \sum \limits _{i=1} ^{n}r_{ki}w_{i} +\sum \limits _{j\ne k}w_{j} \sum \limits _{i=1} ^{n}r_{ji}w_{i} \\ &= w_{k}\left(r_{kk}w_{k} + \sum \limits _{i\ne k} r_{ki}w_{i}\right) +\sum \limits _{j\ne k}w_{j} \left(r_{jk}w_{k}+ \sum \limits _{i \ne k} r_{ji}w_{i} \right) \\ &= w_{k}^{2}r_{kk} + w_{k}\sum \limits _{i\ne k} r_{ki}w_{i} + \sum \limits _{j\ne k}w_{j}r_{jk}w_{k}+ \sum \limits _{j\ne k}\sum \limits _{i \ne k}w_{j} r_{ji}w_{i} \end{align*} $$

따라서 $\dfrac{ \partial f(\mathbf{w})}{ \partial w_{k}}$를 계산하면 다음과 같다.

$$ \begin{align*} \frac{ \partial f(\mathbf{w})}{ \partial w_{k} }&=\frac{ \partial }{ \partial w_{k} } \left( w_{k}^{2}r_{kk} + w_{k}\sum \limits _{i\ne k} r_{ki}w_{i} + \sum \limits _{j\ne k}w_{j}r_{jk}w_{k}+ \sum \limits _{j\ne k}\sum \limits _{i \ne k}w_{j} r_{ji}w_{i} \right) \\ &=2w_{k}r_{kk} + \sum \limits _{i\ne k} r_{ki}w_{i} + \sum \limits _{j\ne k}w_{j}r_{jk} \\ &=\sum \limits _{i=1}^{n} r_{ki}w_{i} + \sum \limits _{j=1}^{n}r_{jk}w_{j} \end{align*} $$

따라서 $f$의 그래디언트를 계산하면 다음과 같다.

$$ \begin{align*} \frac{ \partial f(\mathbf{w})}{ \partial \mathbf{w} } = \mathbf{w}^{T}\mathbf{R}\mathbf{w} &= \begin{bmatrix} \dfrac{ \partial f (\mathbf{w})}{ \partial w_{1} } & \dfrac{ \partial f (\mathbf{w})}{ \partial w_{2} } & \dots & \dfrac{ \partial f (\mathbf{w})}{ \partial w_{n} } \end{bmatrix}^{T} \\[1em] &= \begin{bmatrix} \sum _{i=1} ^{n} r_{1i}w_{i} + \sum _{j=1}^{n} r_{j1}w_{j} \\[0.5em] \sum _{i=1} ^{n} r_{2i}w_{i} + \sum _{j=1}^{n} r_{j2}w_{j} \\[0.5em] \vdots \\[0.5em] \sum _{i=1} ^{n} r_{ni}w_{i} + \sum _{j=1}^{n} r_{jn}w_{j} \end{bmatrix} \\[3.5em] &= \begin{bmatrix} \sum _{i=1} ^{n} r_{1i}w_{i} \\[0.5em] \sum _{i=1} ^{n} r_{2i}w_{i} \\[0.5em] \vdots \\[0.5em] \sum _{i=1} ^{n} r_{ni}w_{i} \end{bmatrix} + \begin{bmatrix} \sum _{j=1}^{n} r_{j1}w_{j} \\[0.5em] \sum _{j=1}^{n} r_{j2}w_{j} \\[0.5em] \vdots \\[0.5em] \sum _{j=1}^{n} r_{jn}w_{j} \end{bmatrix} \\[3.5em] &=\mathbf{R} \mathbf{w} + \mathbf{R}^{T}\mathbf{w}=\left( \mathbf{R}+\mathbf{R}^{T} \right)\mathbf{w} \end{align*} $$

마지막 줄은 $(1)$에 의해 성립한다. 만약 $\mathbf{R}$이 대칭행렬이면

$$ \frac{ \partial }{ \partial \mathbf{w} }\left( \mathbf{w}^{T}\mathbf{R}\mathbf{w} \right)=2\mathbf{R}\mathbf{w} $$

댓글