Derivatives of Vectors and Matrices
📂Vector Analysis Derivatives of Vectors and Matrices Gradient of a Scalar Function Scalar function f : R n → R f : \mathbb{R}^{n} \to \mathbb{R} f : R n → R ’s gradient is as follows.
∂ f ( x ) ∂ x : = ∇ f ( x ) = [ ∂ f ( x ) ∂ x 1 ∂ f ( x ) ∂ x 2 ⋯ ∂ f ( x ) ∂ x n ] T
\frac{ \partial f(\mathbf{x})}{ \partial \mathbf{x} } := \nabla f(\mathbf{x}) = \begin{bmatrix}
\dfrac{ \partial f(\mathbf{x})}{ \partial x_{1} } & \dfrac{ \partial f(\mathbf{x})}{ \partial x_{2} } & \cdots & \dfrac{ \partial f(\mathbf{x})}{ \partial x_{n} }
\end{bmatrix}^{T}
∂ x ∂ f ( x ) := ∇ f ( x ) = [ ∂ x 1 ∂ f ( x ) ∂ x 2 ∂ f ( x ) ⋯ ∂ x n ∂ f ( x ) ] T
Here, ∂ f ( x ) ∂ x i \dfrac{ \partial f(\mathbf{x})}{ \partial x_{i} } ∂ x i ∂ f ( x ) is the partial derivative of f f f with respect to x i x_{i} x i .
Inner Product For a fixed w ∈ R n \mathbf{w} \in \mathbb{R}^{n} w ∈ R n , let’s assume,
∂ f ( x ) ∂ x = ∂ ( w T x ) ∂ x = ∂ ( x T w ) ∂ x = w
\frac{ \partial f(\mathbf{x})}{ \partial \mathbf{x} } = \frac{ \partial (\mathbf{w}^{T}\mathbf{x})}{ \partial \mathbf{x} } = \frac{ \partial (\mathbf{x}^{T}\mathbf{w})}{ \partial \mathbf{x} } = \mathbf{w}
∂ x ∂ f ( x ) = ∂ x ∂ ( w T x ) = ∂ x ∂ ( x T w ) = w
Norm Let’s assume,
∇ f ( x ) = ∂ ∥ x ∥ 2 ∂ x = 2 x
\nabla f(\mathbf{x}) = \dfrac{\partial \left\| \mathbf{x} \right\|^{2}}{\partial \mathbf{x}} = 2\mathbf{x}
∇ f ( x ) = ∂ x ∂ ∥ x ∥ 2 = 2 x
For matrix n × n n \times n n × n and R \mathbf{R} R , let’s assume,
∂ f ( x ) ∂ x = ∂ ( x R x ) ∂ x = ( R + R T ) x
\dfrac{\partial f(\mathbf{x})}{\partial \mathbf{x}} = \dfrac{\partial (\mathbf{x}\mathbf{R}\mathbf{x})}{\partial \mathbf{x}} = (\mathbf{R} + \mathbf{R}^{T})\mathbf{x}
∂ x ∂ f ( x ) = ∂ x ∂ ( xRx ) = ( R + R T ) x
Proof Inner Product ∂ f ( x ) ∂ x = ∂ ( w T x ) ∂ x = [ ∂ ( ∑ i = 1 n w i x i ) ∂ x 1 ∂ ( ∑ i = 1 n w i x i ) ∂ x 2 ⋯ ∂ ( ∑ i = 1 n w i x i ) ∂ x n ] T = [ w 1 w 2 ⋯ w n ] T = w
\begin{align*}
\frac{ \partial f(\mathbf{x})}{ \partial \mathbf{x} } =\frac{ \partial (\mathbf{w}^{T}\mathbf{x})}{ \partial \mathbf{x} } &=\begin{bmatrix}
\dfrac{ \partial \left( \sum _{i=1} ^{n} w_{i}x_{i}\right)}{ \partial x_{1} } & \dfrac{ \partial \left( \sum _{i=1} ^{n} w_{i}x_{i}\right)}{ \partial x_{2} } & \cdots & \dfrac{ \partial \left( \sum _{i=1} ^{n} w_{i}x_{i}\right)}{ \partial x_{n} }
\end{bmatrix}^{T}
\\ &= \begin{bmatrix}
w_{1} & w_{2} & \cdots & w_{n}
\end{bmatrix}^{T}
\\ &= \mathbf{w}
\end{align*}
∂ x ∂ f ( x ) = ∂ x ∂ ( w T x ) = [ ∂ x 1 ∂ ( ∑ i = 1 n w i x i ) ∂ x 2 ∂ ( ∑ i = 1 n w i x i ) ⋯ ∂ x n ∂ ( ∑ i = 1 n w i x i ) ] T = [ w 1 w 2 ⋯ w n ] T = w
Also, since w T x = x T w \mathbf{w}^{T}\mathbf{x} = \mathbf{x}^{T}\mathbf{w} w T x = x T w ,
∂ x T w ∂ x = w
\frac{ \partial \mathbf{x}^{T}\mathbf{w}}{ \partial \mathbf{x} } = \mathbf{w}
∂ x ∂ x T w = w
■
Norm Similar to the proof for the inner product,
∂ f ( x ) ∂ x = ∂ ( x T x ) ∂ x = [ ∂ ( ∑ i = 1 n x i x i ) ∂ x 1 ∂ ( ∑ i = 1 n x i x i ) ∂ x 2 ⋯ ∂ ( ∑ i = 1 n x i x i ) ∂ x n ] T = [ ∂ ( ∑ i = 1 n x i 2 ) ∂ x 1 ∂ ( ∑ i = 1 n x i 2 ) ∂ x 2 ⋯ ∂ ( ∑ i = 1 n x i 2 ) ∂ x n ] T = [ 2 x 1 2 x 2 ⋯ 2 x n ] T = 2 x
\begin{align*}
\frac{ \partial f(\mathbf{x})}{ \partial \mathbf{x} } = \frac{ \partial (\mathbf{x}^{T}\mathbf{x})}{ \partial \mathbf{x} } &=\begin{bmatrix}
\dfrac{ \partial \left( \sum _{i=1} ^{n} x_{i}x_{i}\right)}{ \partial x_{1} } & \dfrac{ \partial \left( \sum _{i=1} ^{n} x_{i}x_{i}\right)}{ \partial x_{2} } & \cdots & \dfrac{ \partial \left( \sum _{i=1} ^{n} x_{i}x_{i}\right)}{ \partial x_{n} }
\end{bmatrix}^{T} \\
&=\begin{bmatrix}
\dfrac{ \partial \left( \sum _{i=1} ^{n} x_{i}^{2}\right)}{ \partial x_{1} } & \dfrac{ \partial \left( \sum _{i=1} ^{n} x_{i}^{2}\right)}{ \partial x_{2} } & \cdots & \dfrac{ \partial \left( \sum _{i=1} ^{n} x_{i}^{2}\right)}{ \partial x_{n} }
\end{bmatrix}^{T} \\
&= \begin{bmatrix}
2x_{1} & 2x_{2} & \cdots & 2x_{n}
\end{bmatrix}^{T} \\
&= 2\mathbf{x}
\end{align*}
∂ x ∂ f ( x ) = ∂ x ∂ ( x T x ) = [ ∂ x 1 ∂ ( ∑ i = 1 n x i x i ) ∂ x 2 ∂ ( ∑ i = 1 n x i x i ) ⋯ ∂ x n ∂ ( ∑ i = 1 n x i x i ) ] T = [ ∂ x 1 ∂ ( ∑ i = 1 n x i 2 ) ∂ x 2 ∂ ( ∑ i = 1 n x i 2 ) ⋯ ∂ x n ∂ ( ∑ i = 1 n x i 2 ) ] T = [ 2 x 1 2 x 2 ⋯ 2 x n ] T = 2 x
■
To ease the computation of the differentiation, first, the calculation is done as follows. For any given k ∈ { 1 , … , n } k\in\left\{1,\dots,n\right\} k ∈ { 1 , … , n } , the following equation is obtained.
f ( x ) = x T R x = ∑ j = 1 n x j ∑ i = 1 n r j i x i = x k ∑ i = 1 n r k i x i + ∑ j ≠ k x j ∑ i = 1 n r j i x i = x k ( r k k x k + ∑ i ≠ k r k i x i ) + ∑ j ≠ k x j ( r j k x k + ∑ i ≠ k r j i x i ) = x k 2 r k k + x k ∑ i ≠ k r k i x i + ∑ j ≠ k x j r j k x k + ∑ j ≠ k ∑ i ≠ k x j r j i x i
\begin{align*}
f(\mathbf{x}) &= \mathbf{x}^{T}\mathbf{R}\mathbf{x} =\sum \limits _{j=1} ^{n} x_{j} \sum \limits _{i=1} ^{n}r_{ji}x_{i}
\\ & = x_{k} \sum \limits _{i=1} ^{n}r_{ki}x_{i} +\sum \limits _{j\ne k}x_{j} \sum \limits _{i=1} ^{n}r_{ji}x_{i}
\\ &= x_{k}\left(r_{kk}x_{k} + \sum \limits _{i\ne k} r_{ki}x_{i}\right) +\sum \limits _{j\ne k}x_{j} \left(r_{jk}x_{k}+ \sum \limits _{i \ne k} r_{ji}x_{i} \right)
\\ &= x_{k}^{2}r_{kk} + x_{k}\sum \limits _{i\ne k} r_{ki}x_{i} + \sum \limits _{j\ne k}x_{j}r_{jk}x_{k}+ \sum \limits _{j\ne k}\sum \limits _{i \ne k}x_{j} r_{ji}x_{i}
\end{align*}
f ( x ) = x T Rx = j = 1 ∑ n x j i = 1 ∑ n r ji x i = x k i = 1 ∑ n r ki x i + j = k ∑ x j i = 1 ∑ n r ji x i = x k r kk x k + i = k ∑ r ki x i + j = k ∑ x j r jk x k + i = k ∑ r ji x i = x k 2 r kk + x k i = k ∑ r ki x i + j = k ∑ x j r jk x k + j = k ∑ i = k ∑ x j r ji x i
When calculating ∂ f ( x ) ∂ x k \dfrac{ \partial f(\mathbf{x})}{ \partial x_{k}} ∂ x k ∂ f ( x ) , it is as follows.
∂ f ( x ) ∂ x k = ∂ ∂ x k ( x k 2 r k k + x k ∑ i ≠ k r k i x i + ∑ j ≠ k x j r j k x k + ∑ j ≠ k ∑ i ≠ k x j r j i x i ) = 2 w k r k k + ∑ i ≠ k r k i x i + ∑ j ≠ k x j r j k = ∑ i = 1 n r k i x i + ∑ j = 1 n r j k x j
\begin{align*}
\frac{ \partial f(\mathbf{x})}{ \partial x_{k} }&=\frac{ \partial }{ \partial x_{k} } \left( x_{k}^{2}r_{kk} + x_{k}\sum \limits _{i\ne k} r_{ki}x_{i} + \sum \limits _{j\ne k}x_{j}r_{jk}x_{k}+ \sum \limits _{j\ne k}\sum \limits _{i \ne k}x_{j} r_{ji}x_{i} \right)
\\ &=2w_{k}r_{kk} + \sum \limits _{i\ne k} r_{ki}x_{i} + \sum \limits _{j\ne k}x_{j}r_{jk}
\\ &=\sum \limits _{i=1}^{n} r_{ki}x_{i} + \sum \limits _{j=1}^{n}r_{jk}x_{j}
\end{align*}
∂ x k ∂ f ( x ) = ∂ x k ∂ x k 2 r kk + x k i = k ∑ r ki x i + j = k ∑ x j r jk x k + j = k ∑ i = k ∑ x j r ji x i = 2 w k r kk + i = k ∑ r ki x i + j = k ∑ x j r jk = i = 1 ∑ n r ki x i + j = 1 ∑ n r jk x j
Therefore, calculating the gradient of f f f results in the following.
∂ f ( x ) ∂ x = ∂ ( x T R x ) ∂ x = [ ∂ f ( x ) ∂ x 1 ∂ f ( x ) ∂ x 2 … ∂ f ( x ) ∂ x n ] T = [ ∑ i = 1 n r 1 i x i + ∑ j = 1 n r j 1 x j ∑ i = 1 n r 2 i x i + ∑ j = 1 n r j 2 x j ⋮ ∑ i = 1 n r n i x i + ∑ j = 1 n r j n x j ] = [ ∑ i = 1 n r 1 i x i ∑ i = 1 n r 2 i x i ⋮ ∑ i = 1 n r n i x i ] + [ ∑ j = 1 n r j 1 x j ∑ j = 1 n r j 2 x j ⋮ ∑ j = 1 n r j n x j ] = R x + R T x = ( R + R T ) x
\begin{align*}
\frac{ \partial f(\mathbf{x})}{ \partial \mathbf{x} } = \dfrac{\partial \left( \mathbf{x}^{T}\mathbf{R}\mathbf{x} \right)}{\partial \mathbf{x}} &= \begin{bmatrix}
\dfrac{ \partial f (\mathbf{x})}{ \partial x_{1} } & \dfrac{ \partial f (\mathbf{x})}{ \partial x_{2} } & \dots & \dfrac{ \partial f (\mathbf{x})}{ \partial x_{n} }
\end{bmatrix}^{T}
\\[1em] &= \begin{bmatrix}
\sum _{i=1} ^{n} r_{1i}x_{i} + \sum _{j=1}^{n} r_{j1}x_{j}
\\[0.5em] \sum _{i=1} ^{n} r_{2i}x_{i} + \sum _{j=1}^{n} r_{j2}x_{j}
\\[0.5em] \vdots
\\[0.5em] \sum _{i=1} ^{n} r_{ni}x_{i} + \sum _{j=1}^{n} r_{jn}x_{j}
\end{bmatrix}
\\[3.5em] &= \begin{bmatrix}
\sum _{i=1} ^{n} r_{1i}x_{i}
\\[0.5em] \sum _{i=1} ^{n} r_{2i}x_{i}
\\[0.5em] \vdots
\\[0.5em] \sum _{i=1} ^{n} r_{ni}x_{i}
\end{bmatrix} + \begin{bmatrix}
\sum _{j=1}^{n} r_{j1}x_{j}
\\[0.5em] \sum _{j=1}^{n} r_{j2}x_{j}
\\[0.5em] \vdots
\\[0.5em] \sum _{j=1}^{n} r_{jn}x_{j}
\end{bmatrix}
\\[3.5em] &=\mathbf{R} \mathbf{x} + \mathbf{R}^{T}\mathbf{x}=\left( \mathbf{R}+\mathbf{R}^{T} \right)\mathbf{x}
\end{align*}
∂ x ∂ f ( x ) = ∂ x ∂ ( x T Rx ) = [ ∂ x 1 ∂ f ( x ) ∂ x 2 ∂ f ( x ) … ∂ x n ∂ f ( x ) ] T = ∑ i = 1 n r 1 i x i + ∑ j = 1 n r j 1 x j ∑ i = 1 n r 2 i x i + ∑ j = 1 n r j 2 x j ⋮ ∑ i = 1 n r ni x i + ∑ j = 1 n r jn x j = ∑ i = 1 n r 1 i x i ∑ i = 1 n r 2 i x i ⋮ ∑ i = 1 n r ni x i + ∑ j = 1 n r j 1 x j ∑ j = 1 n r j 2 x j ⋮ ∑ j = 1 n r jn x j = Rx + R T x = ( R + R T ) x
Matrix-Vector Multiplication
For X ∈ M n × n \mathbf{X} \in M_{n\times n} X ∈ M n × n , y ∈ M n × 1 \mathbf{y} \in M_{n \times 1} y ∈ M n × 1 ,
X y = [ ∑ i = 1 n x 1 i y i ∑ i = 1 n x 2 i y i ⋮ ∑ i = 1 n x n i y i ] , X T y = [ ∑ i = 1 n x i 1 y i ∑ i = 1 n x i 2 y i ⋮ ∑ i = 1 n x i n y i ]
\mathbf{X}\mathbf{y} =\begin{bmatrix}
\sum _{i=1} ^{n}x_{1i}y_{i} \\
\sum _{i=1} ^{n}x_{2i}y_{i} \\
\vdots \\
\sum _{i=1} ^{n}x_{ni}y_{i}
\end{bmatrix},\qquad
\mathbf{X}^{T}\mathbf{y} =\begin{bmatrix}
\sum _{i=1} ^{n}x_{i1}y_{i} \\
\sum _{i=1} ^{n}x_{i2}y_{i} \\
\vdots \\
\sum _{i=1} ^{n}x_{in}y_{i}
\end{bmatrix}
Xy = ∑ i = 1 n x 1 i y i ∑ i = 1 n x 2 i y i ⋮ ∑ i = 1 n x ni y i , X T y = ∑ i = 1 n x i 1 y i ∑ i = 1 n x i 2 y i ⋮ ∑ i = 1 n x in y i
If R \mathbf{R} R is a symmetric matrix ,
∂ ∂ x ( x T R x ) = 2 R x
\frac{ \partial }{ \partial \mathbf{x} }\left( \mathbf{x}^{T}\mathbf{R}\mathbf{x} \right)=2\mathbf{R}\mathbf{x}
∂ x ∂ ( x T Rx ) = 2 Rx