Matrix Calculus of Trace 📂Vector Analysis

Matrix Calculus of Trace

Formula

Let $\mathbf{X}$ be $n \times n$ matrix. Define $\dfrac{\partial }{\partial \mathbf{X}} = \nabla_{\mathbf{X}}$ as the matrix gradient. Then, the following formula holds:
$\dfrac{\partial \Tr(\mathbf{X})}{\partial \mathbf{X}} = I, \qquad \dfrac{\partial \Tr(a\mathbf{X})}{\partial \mathbf{X}} = aI \tag{1}$
Here, $a \in \mathbb{R}$ is a constant (scalar), and $I$ is an identity matrix.
Suppose $\mathbf{A} \in \mathbb{R}^{n \times p}$ and $\mathbf{X} \in \mathbb{R}^{p \times n}$ . Then, the following holds: $\dfrac{\partial \Tr(\mathbf{A}\mathbf{X})}{\partial \mathbf{X}} = \dfrac{\partial \Tr( \mathbf{X}\mathbf{A})}{\partial \mathbf{X}} = \mathbf{A}^{\mathsf{T}} \tag{2}$ $\dfrac{\partial \Tr(\mathbf{A}\mathbf{X}^{\mathsf{T}})}{\partial \mathbf{X}} = \dfrac{\partial \Tr( \mathbf{X}^{\mathsf{T}}\mathbf{A})}{\partial \mathbf{X}} = \mathbf{A}$
- As a corollary, the following holds for $\mathbf{A} \in \mathbb{R}^{n \times p}$ , $\mathbf{X} \in \mathbb{R}^{p \times q}$ , $\mathbf{B} \in \mathbb{R}^{q \times n}$ : $\dfrac{\partial \Tr(\mathbf{A}\mathbf{X}\mathbf{B})}{\partial \mathbf{X}} = \mathbf{A}^{\mathsf{T}}\mathbf{B}^{\mathsf{T}} \tag{3}$
Let $\mathbf{A} \in \mathbb{R}^{n \times n}$ and $\mathbf{X} \in \mathbb{R}^{m \times n}$ . Then, the following holds: $\dfrac{\partial \Tr(\mathbf{A}\mathbf{X}^{\mathsf{T}}\mathbf{X})}{\partial \mathbf{X}} = \dfrac{\partial \Tr(\mathbf{X}^{\mathsf{T}}\mathbf{X}\mathbf{A})}{\partial \mathbf{X}} = \dfrac{\partial \Tr(\mathbf{X}\mathbf{A}\mathbf{X}^{\mathsf{T}})}{\partial \mathbf{X}} = \mathbf{X}(\mathbf{A}^{\mathsf{T}} + \mathbf{A}) \tag{4}$ For $\mathbf{A}, \mathbf{X} \in \mathbb{R}^{n \times n}$ , the following holds: $\dfrac{\partial \Tr(\mathbf{A} \mathbf{X}\mathbf{X})}{\partial \mathbf{X}} = \mathbf{X}^{\mathsf{T}}\mathbf{A}^{\mathsf{T}} + \mathbf{A}^{\mathsf{T}}\mathbf{X}^{\mathsf{T}} \tag{5}$ For $\mathbf{A}, \mathbf{B}, \mathbf{X} \in \mathbb{R}^{n \times n}$ , the following holds: $\nabla_{\mathbf{X}} \Tr (\mathbf{A} \mathbf{X}^{\mathsf{T}} \mathbf{B} \mathbf{X}) = \dfrac{\partial \Tr (\mathbf{A} \mathbf{X}^{\mathsf{T}} \mathbf{B} \mathbf{X})}{\partial \mathbf{X}} = \mathbf{B} \mathbf{X} \mathbf{A} + \mathbf{B}^{\mathsf{T}} \mathbf{X} \mathbf{A}^{\mathsf{T}} \tag{6}$
For $\mathbf{X} \in \mathbb{R}^{n \times n}$ and natural number $n$ , the following holds: $\dfrac{\partial \Tr(\mathbf{X}^{n})}{\partial \mathbf{X}} = n(\mathbf{X}^{n-1})^{\mathsf{T}} \tag{7}$

Explanation

The trace $\Tr A$ of a matrix $A$ is the sum of all its diagonal elements. While it can be simply viewed as a value, it can also be considered a function that converts a matrix into a scalar. In this manner, the trace is a function defined as follows:

$\Tr : \mathbb{R}^{n \times n} \to \mathbb{R}$

When discussing a function, it is essential to talk about differentiation. Reflecting on the function’s value, $\Tr(A) = \sum\limits_{i=1}^{n} a_{ii}$ indicates that it is merely a linear function, implying there is no concern about differentiability. However, the fact that the variable is a matrix might not be intuitive. For more details, refer to the gradient matrix article.

Examining the results above reveals similarities with scalar differentiation, leading one to accept the trace as a tool that facilitates intuitive handling of matrix differentiation. Particularly when all matrices such as $\mathbf{A}, \mathbf{B}, \mathbf{X}$ are symmetric, which means the specific nature of matrices doesn’t need to be considered, the results align closely with polynomial function differentiation.

$(1)$ : The identity matrix $I$ acts as an identity element in matrix multiplication, corresponding to scalar differentiation $\dfrac{d x}{d x} = 1$ .
$(2)$ , $(3)$ : Provide intuitive results corresponding to linear function differentiation.
$(4)$ , $(5)$ : Correspond with quadratic function differentiation.
$(7)$ : Correspond with polynomial function differentiation.

While the proof below shows direct calculations, using the method known as trace trick simplifies the calculations. For any arbitrary form concerning $\mathbf{X}$ or expressions containing multiple instances of $\mathbf{X}$ , direct calculation is practically unfeasible, necessitating the use of the trace trick.

More formulas can be found in Table of Scalar Function Matrix Differentiation.

Proof

$(1)$

Since $\Tr (\mathbf{X}) = \sum\limits_{i=1}^{n} x_{ii}$ , the following holds:

$\dfrac{\partial \Tr (\mathbf{X})}{\partial x_{ij}} = \begin{cases} 1 &, i=j \\ 0 &, i \neq j \end{cases}$

Thus, the following result is obtained:

$\dfrac{\partial \Tr (\mathbf{X})}{\partial \mathbf{X}} = \begin{bmatrix} \dfrac{\partial \Tr (\mathbf{X})}{\partial x_{11}} & \cdots & \dfrac{\partial \Tr (\mathbf{X})}{\partial x_{1n}} \\ \vdots & \ddots & \vdots \\ \dfrac{\partial \Tr (\mathbf{X})}{\partial x_{n1}} & \cdots & \dfrac{\partial \Tr (\mathbf{X})}{\partial x_{nn}} \end{bmatrix} = \begin{bmatrix} 1 & 0 & \cdots & 0 \\ 0 & 1 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & 1 \end{bmatrix} = I$

■

$(2)$

Let’s assume $\mathbf{A} \in \mathbb{R}^{n \times p}$ and $\mathbf{X} \in \mathbb{R}^{p \times n}$ . Initially, by the cyclic property of trace, we have $\Tr (\mathbf{A}\mathbf{X}) = \Tr(\mathbf{X}\mathbf{A})$ .

Cyclic property of trace
$\Tr(AB) = \Tr(BA)$

The element $ij$ of matrix $\mathbf{A} \mathbf{X}$ is $\sum\limits_{k=1}^{p} a_{ik} x_{kj}$ ; therefore, it is $\Tr (\mathbf{A} \mathbf{X}) = \sum\limits_{i=1}^{n}\sum\limits_{k=1}^{p} a_{ik} x_{ki}$ . Thus, it is $\dfrac{\partial \Tr (\mathbf{A}\mathbf{X})}{\partial x_{ij}} = a_{ji}$ , and the following holds:

$\begin{align*} \dfrac{\partial \Tr (\mathbf{A}\mathbf{X})}{\partial \mathbf{X}} &=\begin{bmatrix} \dfrac{\partial \Tr (\mathbf{A} \mathbf{X})}{\partial x_{11}} & \cdots & \dfrac{\partial \Tr (\mathbf{A} \mathbf{X})}{\partial x_{1n}} \\ \vdots & \ddots & \vdots \\ \dfrac{\partial \Tr (\mathbf{A} \mathbf{X})}{\partial x_{p1}} & \cdots & \dfrac{\partial \Tr (\mathbf{A} \mathbf{X})}{\partial x_{pn}} \end{bmatrix} \\ &= \begin{bmatrix} a_{11} & a_{21} & \cdots & a_{n1} \\ a_{12} & a_{22} & \cdots & a_{n2} \\ \vdots & \vdots & \ddots & \vdots \\ a_{1p} & a_{2p} & \cdots & a_{np} \end{bmatrix} \\ &= \mathbf{A}^{\mathsf{T}} \end{align*}$

By using this result and the cyclic property of trace, $(3)$ is obtained immediately.

■

$(4)$

According to the Matrix Power Formula, the $ij$ component, trace, and partial derivative of $\mathbf{A}\mathbf{X}^{\mathsf{T}}\mathbf{X}$ are as follows:

$[\mathbf{A}\mathbf{X}^{\mathsf{T}}\mathbf{X}]_{ij} = \sum\limits_{k=1}^{n}\sum\limits_{\ell=1}^{m} a_{ik} x_{\ell k}x_{\ell j}, \quad \Tr (\mathbf{A}\mathbf{X}^{\mathsf{T}}\mathbf{X}) = \sum\limits_{s=1}^{n}\sum\limits_{k=1}^{n}\sum\limits_{\ell=1}^{m} a_{sk} x_{\ell k}x_{\ell s}$

$\begin{align*} \dfrac{\partial \Tr (\mathbf{A}\mathbf{X}^{\mathsf{T}}\mathbf{X})}{\partial x_{ij}} &= \sum\limits_{k=1}^{n}a_{jk}x_{ik} + \sum\limits_{s=1}^{n}a_{sj}x_{is} \\ &= \sum\limits_{k=1}^{n}x_{ik}a_{jk} + \sum\limits_{k=1}^{n}x_{ik}a_{kj} &= [\mathbf{X}\mathbf{A}^{\mathsf{T}}]_{ij} + [\mathbf{X}\mathbf{A}]_{ij} \end{align*}$

Hence,

$\dfrac{\partial \Tr (\mathbf{A}\mathbf{X}^{\mathsf{T}}\mathbf{X})}{\partial \mathbf{X}} = \mathbf{X}\mathbf{A}^{\mathsf{T}} + \mathbf{X}\mathbf{A} = \mathbf{X}(\mathbf{A}^{\mathsf{T}} + \mathbf{A})$

$(5)$

Since $[\mathbf{A} \mathbf{X} \mathbf{X}]_{ij} = \sum\limits_{k, s=1}^{n} a_{ik}x_{ks}x_{sj}$ ,

$\Tr (\mathbf{A}\mathbf{X} \mathbf{X}) = \sum\limits_{\ell= 1}^{n} \sum\limits_{k, s=1}^{n} a_{\ell k}x_{ks}x_{s\ell}$

Therefore, the partial derivative is as follows:

$\dfrac{\partial \Tr (\mathbf{A}\mathbf{X} \mathbf{X})}{\partial x_{ij}} = \sum\limits_{\ell=1}^{n} a_{\ell i}x_{j\ell} + \sum\limits_{k=1}^{n} a_{jk}x_{ki}$

Thus, the following is obtained:

$\left[ \dfrac{\partial \Tr (\mathbf{A}\mathbf{X} \mathbf{X})}{\partial \mathbf{X}} \right]_{ij} = \sum\limits_{\ell=1}^{n} a_{\ell i}x_{j\ell} + \sum\limits_{k=1}^{n} a_{jk}x_{ki} = [\mathbf{A}^{\mathsf{T}}\mathbf{X}^{\mathsf{T}}]_{ij} + [\mathbf{X}^{\mathsf{T}}\mathbf{A}^{\mathsf{T}}]$

$\implies \dfrac{\partial \Tr (\mathbf{A}\mathbf{X} \mathbf{X})}{\partial \mathbf{X}} = \mathbf{A}^{\mathsf{T}}\mathbf{X}^{\mathsf{T}} + \mathbf{X}^{\mathsf{T}}\mathbf{A}^{\mathsf{T}}$

■

$(6)$

Assuming we have followed the proof process above without difficulty, we shall briefly describe:

$\left[ \mathbf{A} \mathbf{X}^{\mathsf{T}} \mathbf{B} \mathbf{X} \right]_{ij} = \sum_{k,s,\ell = 1}^{n} a_{ik} x_{sk} b_{s\ell} x_{\ell j}$

$\implies \Tr (\mathbf{A} \mathbf{X}^{\mathsf{T}} \mathbf{B} \mathbf{X}) = \sum\limits_{r=1}^{n}\sum_{k,s,\ell = 1}^{n} a_{rk} x_{sk} b_{s\ell} x_{\ell r}$

$\begin{align*} \implies \left[ \nabla_{\mathbf{X}} \Tr (\mathbf{A} \mathbf{X}^{\mathsf{T}} \mathbf{B} \mathbf{X}) \right]_{ij} &= \sum_{r,\ell} a_{rj}b_{i\ell}x_{\ell r} + \sum_{k,s} a_{jk}x_{sk}b_{si} \\ &= \sum_{r,\ell} b_{i\ell}x_{\ell r}a_{rj} + \sum_{k,s} b_{si}x_{sk}a_{jk} \\ &= [\mathbf{B} \mathbf{X} \mathbf{A}]_{ij} + [\mathbf{B}^{\mathsf{T}} \mathbf{X} \mathbf{A}^{\mathsf{T}}]_{ij} \end{align*}$

$\implies \nabla_{\mathbf{X}} \Tr (\mathbf{A} \mathbf{X}^{\mathsf{T}} \mathbf{B} \mathbf{X}) = \mathbf{B} \mathbf{X} \mathbf{A} + \mathbf{B}^{\mathsf{T}} \mathbf{X} \mathbf{A}^{\mathsf{T}}$

■

$(7)$

$[\mathbf{X}^{n}]_{ij} = \sum\limits_{k_{(2)}, \dots, k_{(n)}=1}^{n} x_{ik_{(2)}} x_{k_{(2)}k_{(3)}} \cdots x_{k_{(n)}j}$

$\implies \Tr(\mathbf{X}^{n}) = \sum\limits_{\ell=1}^{n} \sum\limits_{k_{(2)}, \dots, k_{(n)}=1}^{n} x_{\ell k_{(2)}} x_{k_{(2)}k_{(3)}} \cdots x_{k_{(n)}\ell}$

$\implies [\nabla_{\mathbf{X}} \Tr(\mathbf{X}^{n})]_{ij} = n \sum\limits_{k_{(3)}, \dots, k_{(n)}=1}^{n} x_{j k_{(3)}} x_{k_{(3)}k_{(4)}} \cdots x_{k_{(n)}i} = n[(\mathbf{X}^{n-1})^{\mathsf{T}}]_{ij}$

$\implies \nabla_{\mathbf{X}} \Tr(\mathbf{X}^{n}) = n(\mathbf{X}^{n-1})^{\mathsf{T}}$

■

Matrix Calculus of Trace

Formula

Explanation

Proof

(1)(1)(1)

(2)(2)(2)

(4)(4)(4)

(5)(5)(5)

(6)(6)(6)