Matrix Calculus of Trace
📂Vector Analysis Matrix Calculus of Trace Let X \mathbf{X} X be n × n n \times n n × n matrix . Define ∂ ∂ X = ∇ X \dfrac{\partial }{\partial \mathbf{X}} = \nabla_{\mathbf{X}} ∂ X ∂ = ∇ X as the matrix gradient . Then, the following formula holds:
∂ Tr ( X ) ∂ X = I , ∂ Tr ( a X ) ∂ X = a I (1)
\dfrac{\partial \Tr(\mathbf{X})}{\partial \mathbf{X}} = I, \qquad \dfrac{\partial \Tr(a\mathbf{X})}{\partial \mathbf{X}} = aI \tag{1}
∂ X ∂ Tr ( X ) = I , ∂ X ∂ Tr ( a X ) = a I ( 1 )
Here, a ∈ R a \in \mathbb{R} a ∈ R is a constant (scalar), and I I I is an identity matrix .
Suppose A ∈ R n × p \mathbf{A} \in \mathbb{R}^{n \times p} A ∈ R n × p and X ∈ R p × n \mathbf{X} \in \mathbb{R}^{p \times n} X ∈ R p × n . Then, the following holds:
∂ Tr ( A X ) ∂ X = ∂ Tr ( X A ) ∂ X = A T (2)
\dfrac{\partial \Tr(\mathbf{A}\mathbf{X})}{\partial \mathbf{X}} = \dfrac{\partial \Tr( \mathbf{X}\mathbf{A})}{\partial \mathbf{X}} = \mathbf{A}^{\mathsf{T}} \tag{2}
∂ X ∂ Tr ( AX ) = ∂ X ∂ Tr ( XA ) = A T ( 2 )
∂ Tr ( A X T ) ∂ X = ∂ Tr ( X T A ) ∂ X = A
\dfrac{\partial \Tr(\mathbf{A}\mathbf{X}^{\mathsf{T}})}{\partial \mathbf{X}} = \dfrac{\partial \Tr( \mathbf{X}^{\mathsf{T}}\mathbf{A})}{\partial \mathbf{X}} = \mathbf{A}
∂ X ∂ Tr ( A X T ) = ∂ X ∂ Tr ( X T A ) = A
As a corollary, the following holds for A ∈ R n × p \mathbf{A} \in \mathbb{R}^{n \times p} A ∈ R n × p , X ∈ R p × q \mathbf{X} \in \mathbb{R}^{p \times q} X ∈ R p × q , B ∈ R q × n \mathbf{B} \in \mathbb{R}^{q \times n} B ∈ R q × n :
∂ Tr ( A X B ) ∂ X = A T B T (3)
\dfrac{\partial \Tr(\mathbf{A}\mathbf{X}\mathbf{B})}{\partial \mathbf{X}} = \mathbf{A}^{\mathsf{T}}\mathbf{B}^{\mathsf{T}} \tag{3}
∂ X ∂ Tr ( AXB ) = A T B T ( 3 ) Let A ∈ R n × n \mathbf{A} \in \mathbb{R}^{n \times n} A ∈ R n × n and X ∈ R m × n \mathbf{X} \in \mathbb{R}^{m \times n} X ∈ R m × n . Then, the following holds:
∂ Tr ( A X T X ) ∂ X = ∂ Tr ( X T X A ) ∂ X = ∂ Tr ( X A X T ) ∂ X = X ( A T + A ) (4)
\dfrac{\partial \Tr(\mathbf{A}\mathbf{X}^{\mathsf{T}}\mathbf{X})}{\partial \mathbf{X}} = \dfrac{\partial \Tr(\mathbf{X}^{\mathsf{T}}\mathbf{X}\mathbf{A})}{\partial \mathbf{X}} = \dfrac{\partial \Tr(\mathbf{X}\mathbf{A}\mathbf{X}^{\mathsf{T}})}{\partial \mathbf{X}} = \mathbf{X}(\mathbf{A}^{\mathsf{T}} + \mathbf{A}) \tag{4}
∂ X ∂ Tr ( A X T X ) = ∂ X ∂ Tr ( X T XA ) = ∂ X ∂ Tr ( XA X T ) = X ( A T + A ) ( 4 )
For A , X ∈ R n × n \mathbf{A}, \mathbf{X} \in \mathbb{R}^{n \times n} A , X ∈ R n × n , the following holds:
∂ Tr ( A X X ) ∂ X = X T A T + A T X T (5)
\dfrac{\partial \Tr(\mathbf{A} \mathbf{X}\mathbf{X})}{\partial \mathbf{X}} = \mathbf{X}^{\mathsf{T}}\mathbf{A}^{\mathsf{T}} + \mathbf{A}^{\mathsf{T}}\mathbf{X}^{\mathsf{T}} \tag{5}
∂ X ∂ Tr ( AXX ) = X T A T + A T X T ( 5 )
For A , B , X ∈ R n × n \mathbf{A}, \mathbf{B}, \mathbf{X} \in \mathbb{R}^{n \times n} A , B , X ∈ R n × n , the following holds:
∇ X Tr ( A X T B X ) = ∂ Tr ( A X T B X ) ∂ X = B X A + B T X A T (6)
\nabla_{\mathbf{X}} \Tr (\mathbf{A} \mathbf{X}^{\mathsf{T}} \mathbf{B} \mathbf{X}) = \dfrac{\partial \Tr (\mathbf{A} \mathbf{X}^{\mathsf{T}} \mathbf{B} \mathbf{X})}{\partial \mathbf{X}} = \mathbf{B} \mathbf{X} \mathbf{A} + \mathbf{B}^{\mathsf{T}} \mathbf{X} \mathbf{A}^{\mathsf{T}} \tag{6}
∇ X Tr ( A X T BX ) = ∂ X ∂ Tr ( A X T BX ) = BXA + B T X A T ( 6 )
For X ∈ R n × n \mathbf{X} \in \mathbb{R}^{n \times n} X ∈ R n × n and natural number n n n , the following holds:
∂ Tr ( X n ) ∂ X = n ( X n − 1 ) T (7)
\dfrac{\partial \Tr(\mathbf{X}^{n})}{\partial \mathbf{X}} = n(\mathbf{X}^{n-1})^{\mathsf{T}} \tag{7}
∂ X ∂ Tr ( X n ) = n ( X n − 1 ) T ( 7 )
Explanation The trace Tr A \Tr A Tr A of a matrix A A A is the sum of all its diagonal elements. While it can be simply viewed as a value, it can also be considered a function that converts a matrix into a scalar. In this manner, the trace is a function defined as follows:
Tr : R n × n → R
\Tr : \mathbb{R}^{n \times n} \to \mathbb{R}
Tr : R n × n → R
When discussing a function, it is essential to talk about differentiation . Reflecting on the function’s value, Tr ( A ) = ∑ i = 1 n a i i \Tr(A) = \sum\limits_{i=1}^{n} a_{ii} Tr ( A ) = i = 1 ∑ n a ii indicates that it is merely a linear function, implying there is no concern about differentiability. However, the fact that the variable is a matrix might not be intuitive. For more details, refer to the gradient matrix article.
Examining the results above reveals similarities with scalar differentiation, leading one to accept the trace as a tool that facilitates intuitive handling of matrix differentiation. Particularly when all matrices such as A , B , X \mathbf{A}, \mathbf{B}, \mathbf{X} A , B , X are symmetric, which means the specific nature of matrices doesn’t need to be considered, the results align closely with polynomial function differentiation.
( 1 ) (1) ( 1 ) : The identity matrix I I I acts as an identity element in matrix multiplication, corresponding to scalar differentiation d x d x = 1 \dfrac{d x}{d x} = 1 d x d x = 1 .( 2 ) (2) ( 2 ) , ( 3 ) (3) ( 3 ) : Provide intuitive results corresponding to linear function differentiation.( 4 ) (4) ( 4 ) , ( 5 ) (5) ( 5 ) : Correspond with quadratic function differentiation.( 7 ) (7) ( 7 ) : Correspond with polynomial function differentiation.While the proof below shows direct calculations, using the method known as trace trick simplifies the calculations. For any arbitrary form concerning X \mathbf{X} X or expressions containing multiple instances of X \mathbf{X} X , direct calculation is practically unfeasible, necessitating the use of the trace trick.
More formulas can be found in Table of Scalar Function Matrix Differentiation .
Proof ( 1 ) (1) ( 1 ) Since Tr ( X ) = ∑ i = 1 n x i i \Tr (\mathbf{X}) = \sum\limits_{i=1}^{n} x_{ii} Tr ( X ) = i = 1 ∑ n x ii , the following holds:
∂ Tr ( X ) ∂ x i j = { 1 , i = j 0 , i ≠ j
\dfrac{\partial \Tr (\mathbf{X})}{\partial x_{ij}} = \begin{cases}
1 &, i=j \\
0 &, i \neq j
\end{cases}
∂ x ij ∂ Tr ( X ) = { 1 0 , i = j , i = j
Thus, the following result is obtained:
∂ Tr ( X ) ∂ X = [ ∂ Tr ( X ) ∂ x 11 ⋯ ∂ Tr ( X ) ∂ x 1 n ⋮ ⋱ ⋮ ∂ Tr ( X ) ∂ x n 1 ⋯ ∂ Tr ( X ) ∂ x n n ] = [ 1 0 ⋯ 0 0 1 ⋯ 0 ⋮ ⋮ ⋱ ⋮ 0 0 ⋯ 1 ] = I
\dfrac{\partial \Tr (\mathbf{X})}{\partial \mathbf{X}} =
\begin{bmatrix}
\dfrac{\partial \Tr (\mathbf{X})}{\partial x_{11}} & \cdots & \dfrac{\partial \Tr (\mathbf{X})}{\partial x_{1n}} \\
\vdots & \ddots & \vdots \\
\dfrac{\partial \Tr (\mathbf{X})}{\partial x_{n1}} & \cdots & \dfrac{\partial \Tr (\mathbf{X})}{\partial x_{nn}}
\end{bmatrix} =
\begin{bmatrix}
1 & 0 & \cdots & 0 \\ 0 & 1 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & 1
\end{bmatrix} = I
∂ X ∂ Tr ( X ) = ∂ x 11 ∂ Tr ( X ) ⋮ ∂ x n 1 ∂ Tr ( X ) ⋯ ⋱ ⋯ ∂ x 1 n ∂ Tr ( X ) ⋮ ∂ x nn ∂ Tr ( X ) = 1 0 ⋮ 0 0 1 ⋮ 0 ⋯ ⋯ ⋱ ⋯ 0 0 ⋮ 1 = I
■
( 2 ) (2) ( 2 ) Let’s assume A ∈ R n × p \mathbf{A} \in \mathbb{R}^{n \times p} A ∈ R n × p and X ∈ R p × n \mathbf{X} \in \mathbb{R}^{p \times n} X ∈ R p × n .
Initially, by the cyclic property of trace, we have Tr ( A X ) = Tr ( X A ) \Tr (\mathbf{A}\mathbf{X}) = \Tr(\mathbf{X}\mathbf{A}) Tr ( AX ) = Tr ( XA ) .
Cyclic property of trace
Tr ( A B ) = Tr ( B A )
\Tr(AB) = \Tr(BA)
Tr ( A B ) = Tr ( B A )
The element i j ij ij of matrix A X \mathbf{A} \mathbf{X} AX is ∑ k = 1 p a i k x k j \sum\limits_{k=1}^{p} a_{ik} x_{kj} k = 1 ∑ p a ik x kj ; therefore, it is Tr ( A X ) = ∑ i = 1 n ∑ k = 1 p a i k x k i \Tr (\mathbf{A} \mathbf{X}) = \sum\limits_{i=1}^{n}\sum\limits_{k=1}^{p} a_{ik} x_{ki} Tr ( AX ) = i = 1 ∑ n k = 1 ∑ p a ik x ki . Thus, it is ∂ Tr ( A X ) ∂ x i j = a j i \dfrac{\partial \Tr (\mathbf{A}\mathbf{X})}{\partial x_{ij}} = a_{ji} ∂ x ij ∂ Tr ( AX ) = a ji , and the following holds:
∂ Tr ( A X ) ∂ X = [ ∂ Tr ( A X ) ∂ x 11 ⋯ ∂ Tr ( A X ) ∂ x 1 n ⋮ ⋱ ⋮ ∂ Tr ( A X ) ∂ x p 1 ⋯ ∂ Tr ( A X ) ∂ x p n ] = [ a 11 a 21 ⋯ a n 1 a 12 a 22 ⋯ a n 2 ⋮ ⋮ ⋱ ⋮ a 1 p a 2 p ⋯ a n p ] = A T
\begin{align*}
\dfrac{\partial \Tr (\mathbf{A}\mathbf{X})}{\partial \mathbf{X}}
&=\begin{bmatrix}
\dfrac{\partial \Tr (\mathbf{A} \mathbf{X})}{\partial x_{11}} & \cdots & \dfrac{\partial \Tr (\mathbf{A} \mathbf{X})}{\partial x_{1n}} \\
\vdots & \ddots & \vdots \\
\dfrac{\partial \Tr (\mathbf{A} \mathbf{X})}{\partial x_{p1}} & \cdots & \dfrac{\partial \Tr (\mathbf{A} \mathbf{X})}{\partial x_{pn}}
\end{bmatrix} \\
&= \begin{bmatrix}
a_{11} & a_{21} & \cdots & a_{n1} \\
a_{12} & a_{22} & \cdots & a_{n2} \\
\vdots & \vdots & \ddots & \vdots \\
a_{1p} & a_{2p} & \cdots & a_{np}
\end{bmatrix} \\
&= \mathbf{A}^{\mathsf{T}}
\end{align*}
∂ X ∂ Tr ( AX ) = ∂ x 11 ∂ Tr ( AX ) ⋮ ∂ x p 1 ∂ Tr ( AX ) ⋯ ⋱ ⋯ ∂ x 1 n ∂ Tr ( AX ) ⋮ ∂ x p n ∂ Tr ( AX ) = a 11 a 12 ⋮ a 1 p a 21 a 22 ⋮ a 2 p ⋯ ⋯ ⋱ ⋯ a n 1 a n 2 ⋮ a n p = A T
By using this result and the cyclic property of trace, ( 3 ) (3) ( 3 ) is obtained immediately.
■
( 4 ) (4) ( 4 ) According to the Matrix Power Formula , the i j ij ij component, trace, and partial derivative of A X T X \mathbf{A}\mathbf{X}^{\mathsf{T}}\mathbf{X} A X T X are as follows:
[ A X T X ] i j = ∑ k = 1 n ∑ ℓ = 1 m a i k x ℓ k x ℓ j , Tr ( A X T X ) = ∑ s = 1 n ∑ k = 1 n ∑ ℓ = 1 m a s k x ℓ k x ℓ s
[\mathbf{A}\mathbf{X}^{\mathsf{T}}\mathbf{X}]_{ij} = \sum\limits_{k=1}^{n}\sum\limits_{\ell=1}^{m} a_{ik} x_{\ell k}x_{\ell j}, \quad \Tr (\mathbf{A}\mathbf{X}^{\mathsf{T}}\mathbf{X}) = \sum\limits_{s=1}^{n}\sum\limits_{k=1}^{n}\sum\limits_{\ell=1}^{m} a_{sk} x_{\ell k}x_{\ell s}
[ A X T X ] ij = k = 1 ∑ n ℓ = 1 ∑ m a ik x ℓ k x ℓ j , Tr ( A X T X ) = s = 1 ∑ n k = 1 ∑ n ℓ = 1 ∑ m a s k x ℓ k x ℓ s
∂ Tr ( A X T X ) ∂ x i j = ∑ k = 1 n a j k x i k + ∑ s = 1 n a s j x i s = ∑ k = 1 n x i k a j k + ∑ k = 1 n x i k a k j = [ X A T ] i j + [ X A ] i j
\begin{align*}
\dfrac{\partial \Tr (\mathbf{A}\mathbf{X}^{\mathsf{T}}\mathbf{X})}{\partial x_{ij}}
&= \sum\limits_{k=1}^{n}a_{jk}x_{ik} + \sum\limits_{s=1}^{n}a_{sj}x_{is} \\
&= \sum\limits_{k=1}^{n}x_{ik}a_{jk} + \sum\limits_{k=1}^{n}x_{ik}a_{kj}
&= [\mathbf{X}\mathbf{A}^{\mathsf{T}}]_{ij} + [\mathbf{X}\mathbf{A}]_{ij}
\end{align*}
∂ x ij ∂ Tr ( A X T X ) = k = 1 ∑ n a jk x ik + s = 1 ∑ n a s j x i s = k = 1 ∑ n x ik a jk + k = 1 ∑ n x ik a kj = [ X A T ] ij + [ XA ] ij
Hence,
∂ Tr ( A X T X ) ∂ X = X A T + X A = X ( A T + A )
\dfrac{\partial \Tr (\mathbf{A}\mathbf{X}^{\mathsf{T}}\mathbf{X})}{\partial \mathbf{X}} =
\mathbf{X}\mathbf{A}^{\mathsf{T}} + \mathbf{X}\mathbf{A} = \mathbf{X}(\mathbf{A}^{\mathsf{T}} + \mathbf{A})
∂ X ∂ Tr ( A X T X ) = X A T + XA = X ( A T + A )
( 5 ) (5) ( 5 ) Since [ A X X ] i j = ∑ k , s = 1 n a i k x k s x s j [\mathbf{A} \mathbf{X} \mathbf{X}]_{ij} = \sum\limits_{k, s=1}^{n} a_{ik}x_{ks}x_{sj} [ AXX ] ij = k , s = 1 ∑ n a ik x k s x s j ,
Tr ( A X X ) = ∑ ℓ = 1 n ∑ k , s = 1 n a ℓ k x k s x s ℓ
\Tr (\mathbf{A}\mathbf{X} \mathbf{X}) = \sum\limits_{\ell= 1}^{n} \sum\limits_{k, s=1}^{n} a_{\ell k}x_{ks}x_{s\ell}
Tr ( AXX ) = ℓ = 1 ∑ n k , s = 1 ∑ n a ℓ k x k s x s ℓ
Therefore, the partial derivative is as follows:
∂ Tr ( A X X ) ∂ x i j = ∑ ℓ = 1 n a ℓ i x j ℓ + ∑ k = 1 n a j k x k i
\dfrac{\partial \Tr (\mathbf{A}\mathbf{X} \mathbf{X})}{\partial x_{ij}} = \sum\limits_{\ell=1}^{n} a_{\ell i}x_{j\ell} + \sum\limits_{k=1}^{n} a_{jk}x_{ki}
∂ x ij ∂ Tr ( AXX ) = ℓ = 1 ∑ n a ℓ i x j ℓ + k = 1 ∑ n a jk x ki
Thus, the following is obtained:
[ ∂ Tr ( A X X ) ∂ X ] i j = ∑ ℓ = 1 n a ℓ i x j ℓ + ∑ k = 1 n a j k x k i = [ A T X T ] i j + [ X T A T ]
\left[ \dfrac{\partial \Tr (\mathbf{A}\mathbf{X} \mathbf{X})}{\partial \mathbf{X}} \right]_{ij} = \sum\limits_{\ell=1}^{n} a_{\ell i}x_{j\ell} + \sum\limits_{k=1}^{n} a_{jk}x_{ki} = [\mathbf{A}^{\mathsf{T}}\mathbf{X}^{\mathsf{T}}]_{ij} + [\mathbf{X}^{\mathsf{T}}\mathbf{A}^{\mathsf{T}}]
[ ∂ X ∂ Tr ( AXX ) ] ij = ℓ = 1 ∑ n a ℓ i x j ℓ + k = 1 ∑ n a jk x ki = [ A T X T ] ij + [ X T A T ]
⟹ ∂ Tr ( A X X ) ∂ X = A T X T + X T A T
\implies
\dfrac{\partial \Tr (\mathbf{A}\mathbf{X} \mathbf{X})}{\partial \mathbf{X}} = \mathbf{A}^{\mathsf{T}}\mathbf{X}^{\mathsf{T}} + \mathbf{X}^{\mathsf{T}}\mathbf{A}^{\mathsf{T}}
⟹ ∂ X ∂ Tr ( AXX ) = A T X T + X T A T
■
( 6 ) (6) ( 6 ) Assuming we have followed the proof process above without difficulty, we shall briefly describe:
[ A X T B X ] i j = ∑ k , s , ℓ = 1 n a i k x s k b s ℓ x ℓ j
\left[ \mathbf{A} \mathbf{X}^{\mathsf{T}} \mathbf{B} \mathbf{X} \right]_{ij} = \sum_{k,s,\ell = 1}^{n} a_{ik} x_{sk} b_{s\ell} x_{\ell j}
[ A X T BX ] ij = k , s , ℓ = 1 ∑ n a ik x s k b s ℓ x ℓ j
⟹ Tr ( A X T B X ) = ∑ r = 1 n ∑ k , s , ℓ = 1 n a r k x s k b s ℓ x ℓ r
\implies \Tr (\mathbf{A} \mathbf{X}^{\mathsf{T}} \mathbf{B} \mathbf{X}) = \sum\limits_{r=1}^{n}\sum_{k,s,\ell = 1}^{n} a_{rk} x_{sk} b_{s\ell} x_{\ell r}
⟹ Tr ( A X T BX ) = r = 1 ∑ n k , s , ℓ = 1 ∑ n a r k x s k b s ℓ x ℓ r
⟹ [ ∇ X Tr ( A X T B X ) ] i j = ∑ r , ℓ a r j b i ℓ x ℓ r + ∑ k , s a j k x s k b s i = ∑ r , ℓ b i ℓ x ℓ r a r j + ∑ k , s b s i x s k a j k = [ B X A ] i j + [ B T X A T ] i j
\begin{align*}
\implies \left[ \nabla_{\mathbf{X}} \Tr (\mathbf{A} \mathbf{X}^{\mathsf{T}} \mathbf{B} \mathbf{X}) \right]_{ij}
&= \sum_{r,\ell} a_{rj}b_{i\ell}x_{\ell r} + \sum_{k,s} a_{jk}x_{sk}b_{si} \\
&= \sum_{r,\ell} b_{i\ell}x_{\ell r}a_{rj} + \sum_{k,s} b_{si}x_{sk}a_{jk} \\
&= [\mathbf{B} \mathbf{X} \mathbf{A}]_{ij} + [\mathbf{B}^{\mathsf{T}} \mathbf{X} \mathbf{A}^{\mathsf{T}}]_{ij}
\end{align*}
⟹ [ ∇ X Tr ( A X T BX ) ] ij = r , ℓ ∑ a r j b i ℓ x ℓ r + k , s ∑ a jk x s k b s i = r , ℓ ∑ b i ℓ x ℓ r a r j + k , s ∑ b s i x s k a jk = [ BXA ] ij + [ B T X A T ] ij
⟹ ∇ X Tr ( A X T B X ) = B X A + B T X A T
\implies \nabla_{\mathbf{X}} \Tr (\mathbf{A} \mathbf{X}^{\mathsf{T}} \mathbf{B} \mathbf{X}) = \mathbf{B} \mathbf{X} \mathbf{A} + \mathbf{B}^{\mathsf{T}} \mathbf{X} \mathbf{A}^{\mathsf{T}}
⟹ ∇ X Tr ( A X T BX ) = BXA + B T X A T
■
( 7 ) (7) ( 7 ) [ X n ] i j = ∑ k ( 2 ) , … , k ( n ) = 1 n x i k ( 2 ) x k ( 2 ) k ( 3 ) ⋯ x k ( n ) j
[\mathbf{X}^{n}]_{ij} = \sum\limits_{k_{(2)}, \dots, k_{(n)}=1}^{n} x_{ik_{(2)}} x_{k_{(2)}k_{(3)}} \cdots x_{k_{(n)}j}
[ X n ] ij = k ( 2 ) , … , k ( n ) = 1 ∑ n x i k ( 2 ) x k ( 2 ) k ( 3 ) ⋯ x k ( n ) j
⟹ Tr ( X n ) = ∑ ℓ = 1 n ∑ k ( 2 ) , … , k ( n ) = 1 n x ℓ k ( 2 ) x k ( 2 ) k ( 3 ) ⋯ x k ( n ) ℓ
\implies \Tr(\mathbf{X}^{n}) = \sum\limits_{\ell=1}^{n} \sum\limits_{k_{(2)}, \dots, k_{(n)}=1}^{n} x_{\ell k_{(2)}} x_{k_{(2)}k_{(3)}} \cdots x_{k_{(n)}\ell}
⟹ Tr ( X n ) = ℓ = 1 ∑ n k ( 2 ) , … , k ( n ) = 1 ∑ n x ℓ k ( 2 ) x k ( 2 ) k ( 3 ) ⋯ x k ( n ) ℓ
⟹ [ ∇ X Tr ( X n ) ] i j = n ∑ k ( 3 ) , … , k ( n ) = 1 n x j k ( 3 ) x k ( 3 ) k ( 4 ) ⋯ x k ( n ) i = n [ ( X n − 1 ) T ] i j
\implies [\nabla_{\mathbf{X}} \Tr(\mathbf{X}^{n})]_{ij} = n \sum\limits_{k_{(3)}, \dots, k_{(n)}=1}^{n} x_{j k_{(3)}} x_{k_{(3)}k_{(4)}} \cdots x_{k_{(n)}i} = n[(\mathbf{X}^{n-1})^{\mathsf{T}}]_{ij}
⟹ [ ∇ X Tr ( X n ) ] ij = n k ( 3 ) , … , k ( n ) = 1 ∑ n x j k ( 3 ) x k ( 3 ) k ( 4 ) ⋯ x k ( n ) i = n [( X n − 1 ) T ] ij
⟹ ∇ X Tr ( X n ) = n ( X n − 1 ) T
\implies \nabla_{\mathbf{X}} \Tr(\mathbf{X}^{n}) = n(\mathbf{X}^{n-1})^{\mathsf{T}}
⟹ ∇ X Tr ( X n ) = n ( X n − 1 ) T
■