Matrix Differentiation Table for Scalar Functions
Explanation
The formulas for matrix calculus have been summarized in the table below. The notation used throughout the document is as follows:
- $\mathbf{a}, \mathbf{b} \in \mathbb{R}^{n}$: A constant vector independent of $\mathbf{x}$ or $\mathbf{X}$
- $\mathbf{A}, \mathbf{B}, \mathbf{C} \in \mathbb{R}^{n \times n}$: A constant matrix independent of $\mathbf{x}$ or $\mathbf{X}$
- $\mathbf{x} \in \mathbb{R}^{n}$: A variable vector
- $\mathbf{X} \in \mathbb{R}^{n \times n}$: A variable matrix
An interesting aspect of differentiation rules is that since $\Tr (\mathbf{X}) = \Tr (\mathbf{X}^{\mathsf{T}})$, $\nabla_{\mathbf{X}} \Tr (\mathbf{X}) = I = \nabla_{\mathbf{X}} \Tr (\mathbf{X}^{\mathsf{T}})$ holds, but the moment other matrices are multiplied at the front or back, due to the properties of transpose, the differentiation results of expressions containing $\mathbf{X}$ and those containing $\mathbf{X}^{\mathsf{T}}$ differ.
Expression $f(\mathbf{X})$ | Derivative $\nabla_{\mathbf{X}} f$ | Proof |
---|---|---|
$\Tr (\mathbf{X})$, $\Tr (\mathbf{X}^{\mathsf{T}})$ | $I$ | Link |
$\Tr (a\mathbf{X})$, $\Tr (a\mathbf{X}^{\mathsf{T}})$ | $aI$ | Link |
$\Tr (\mathbf{A}\mathbf{X})$, $\Tr (\mathbf{X}\mathbf{A})$ | $\mathbf{A}^{\mathsf{T}}$ | Link |
$\Tr (\mathbf{A}\mathbf{X}^{\mathsf{T}})$, $\Tr (\mathbf{X}\mathbf{A}^{\mathsf{T}})$ | $\mathbf{A}$ | Link |
$\Tr (\mathbf{A}\mathbf{X}\mathbf{B}), \Tr (\mathbf{B}\mathbf{A}\mathbf{X}), \Tr (\mathbf{X}\mathbf{B}\mathbf{A})$ | $\mathbf{A}^{\mathsf{T}}\mathbf{B}^{\mathsf{T}}$ | Link |
$\Tr (\mathbf{A}\mathbf{X}^{\mathsf{T}}\mathbf{B}), \Tr (\mathbf{B}\mathbf{A}\mathbf{X}^{\mathsf{T}}), \Tr (\mathbf{X}^{\mathsf{T}}\mathbf{B}\mathbf{A})$ | $\mathbf{B}\mathbf{A}$ | Link |
$\Tr (\mathbf{A}\mathbf{X}^{\mathsf{T}}\mathbf{X})$ | $\mathbf{X}(\mathbf{A} + \mathbf{A}^{\mathsf{T}})$ | Link |
$\Tr (\mathbf{A}\mathbf{X}\mathbf{X})$ | $\mathbf{A}^{\mathsf{T}}\mathbf{X}^{\mathsf{T}} + \mathbf{X}^{\mathsf{T}}\mathbf{A}^{\mathsf{T}}$ | Link |
$\Tr (\mathbf{A}\mathbf{X}^{\mathsf{T}}\mathbf{X}\mathbf{B})$ | $\mathbf{X}(\mathbf{B}\mathbf{A} + \mathbf{A}^{\mathsf{T}}\mathbf{B}^{\mathsf{T}})$ | Link |
$\Tr (\mathbf{A}\mathbf{X}\mathbf{X}\mathbf{B})$ | $\mathbf{A}^{\mathsf{T}}\mathbf{B}^{\mathsf{T}}\mathbf{X}^{\mathsf{T}} + \mathbf{X}^{\mathsf{T}}\mathbf{A}^{\mathsf{T}}\mathbf{B}^{\mathsf{T}}$ | Link |
$\Tr (\mathbf{A}^{\mathsf{T}}\mathbf{X}^{\mathsf{T}}\mathbf{B}\mathbf{X})$ | $\mathbf{B}\mathbf{X}\mathbf{A}^{\mathsf{T}} + \mathbf{B}^{\mathsf{T}}\mathbf{X}\mathbf{A}$ | Link |
$\Tr (\mathbf{A}\mathbf{X}^{\mathsf{T}}\mathbf{C}\mathbf{X}\mathbf{B})$ | $\mathbf{C}\mathbf{X}\mathbf{B}\mathbf{A} + \mathbf{C}^{\mathsf{T}}\mathbf{X}\mathbf{A}^{\mathsf{T}}\mathbf{B}^{\mathsf{T}}$ | Link |
$\Tr (\mathbf{A}\mathbf{X}\mathbf{C}\mathbf{X}\mathbf{B})$ | $\mathbf{A}^{\mathsf{T}}\mathbf{B}^{\mathsf{T}}\mathbf{X}^{\mathsf{T}}\mathbf{C}^{\mathsf{T}} + \mathbf{C}^{\mathsf{T}}\mathbf{X}^{\mathsf{T}}\mathbf{A}^{\mathsf{T}}\mathbf{B}^{\mathsf{T}}$ | Link |
$\Tr (\mathbf{X}^{n})$ | $n(\mathbf{X}^{\mathsf{T}})^{n-1}$ | Link |
$\Tr (\mathbf{A}\mathbf{X}^{n})$ | $\sum\limits_{i=0}^{n-1} \left[ \mathbf{X}^{n-1-i}\mathbf{A}\mathbf{X}^{i} \right]^{\mathsf{T}}$ | Link |
$\Tr (p(\mathbf{X}))$ *${}^{\tiny \text{아래정의참고}}$ | $(p^{\prime}(\mathbf{X}))^{\mathsf{T}}$ | Link |
- Let $p(x)$ be any polynomial. Let $p^{\prime}(x)$ be the derivative of $p$. $p(\mathbf{X})$ and $p^{\prime}(\mathbf{X})$ correspond to matrix polynomials related to $p$ and $p^{\prime}$. $$ \begin{align*} p(x) &= \sum\limits_{i=0}^{m} a_{i}x^{i} \\ p^{\prime}(x) &= \sum\limits_{i=1}^{m} i a_{i}x^{i-1} \end{align*} \implies \begin{align*} p(\mathbf{X}) &= \sum\limits_{i=0}^{m} a_{i}\mathbf{X}^{i} \\ p^{\prime}(\mathbf{X}) &= \sum\limits_{i=1}^{m} i a_{i}\mathbf{X}^{i-1} \end{align*} $$
Expression $f(\mathbf{X})$ | Result $\nabla_{\mathbf{X}} f$ | Proof |
---|---|---|
$\mathbf{a}^{\mathsf{T}}\mathbf{X}\mathbf{b}$ | $\mathbf{a}\mathbf{b}^{\mathsf{T}}$ | Link |
$\mathbf{a}^{\mathsf{T}}\mathbf{X}^{\mathsf{T}}\mathbf{X}\mathbf{b}$ | $\mathbf{X}(\mathbf{a}\mathbf{b}^{\mathsf{T}} + \mathbf{b}\mathbf{a}^{\mathsf{T}})$ | Link |
$\mathbf{a}^{\mathsf{T}}\mathbf{X}\mathbf{X}\mathbf{b}$ | $\mathbf{a}\mathbf{b}^{\mathsf{T}}\mathbf{X}^{\mathsf{T}} + \mathbf{X}^{\mathsf{T}}\mathbf{a}\mathbf{b}^{\mathsf{T}}$ | Link |
$\mathbf{a}^{\mathsf{T}}\mathbf{X}^{\mathsf{T}}\mathbf{C}\mathbf{X}\mathbf{b}$ | $\mathbf{C}\mathbf{X}\mathbf{b}\mathbf{a}^{\mathsf{T}} + \mathbf{X}\mathbf{C}^{\mathsf{T}}\mathbf{a}\mathbf{b}^{\mathsf{T}}$ | Link |
Proof
The proof involves the use of the trace trick. If you are unfamiliar with this, first read and understand the computational method, then view the proof.
Properties of Matrix Differentiation Formula
For a variable matrix $\mathbf{X}, \mathbf{Y} \in \mathbb{R}^{n \times n}$, a scalar $\alpha \in \mathbb{R}$, and a constant matrix $\mathbf{A} \in \mathbb{R}^{n \times n}$, the following holds.
- $\mathrm{d}(\alpha \mathbf{X}) = \alpha \mathrm{d}\mathbf{X}$
- $\mathrm{d}(\mathbf{X}^{\mathsf{T}}) = (\mathrm{d}\mathbf{X})^{\mathsf{T}}$
- $\mathrm{d}(\mathbf{A}\mathbf{X}) = \mathbf{A} \mathrm{d}\mathbf{X}$ and $\mathrm{d}(\mathbf{X}\mathbf{A}) = (\mathrm{d}\mathbf{X}) \mathbf{A}$
- $\mathrm{d}(\mathbf{X} + \mathbf{Y}) = \mathrm{d}\mathbf{X} + \mathrm{d}\mathbf{Y}$
- $\mathrm{d}(\mathbf{X}\mathbf{Y}) = (\mathrm{d}\mathbf{X})\mathbf{Y} + \mathbf{X} \mathrm{d}\mathbf{Y}$
- $\Tr (\alpha \mathbf{X}) = \alpha \Tr (\mathbf{X})$
- Linearity: $\Tr (\mathbf{X} + \mathbf{Y}) = \Tr (\mathbf{X}) + \Tr (\mathbf{Y})$
- Cyclic Property: $\Tr (\mathbf{X}\mathbf{Y}\mathbf{Z}) = \Tr (\mathbf{Y}\mathbf{Z}\mathbf{X}) = \Tr (\mathbf{Z}\mathbf{X}\mathbf{Y})$
- Invariance under Transposition: $\Tr (\mathbf{X}^{\mathsf{T}}) = \Tr (\mathbf{X})$
$f(\mathbf{X}) = \Tr (a\mathbf{X})$
$$ \begin{align*} \mathrm{d}f &= \mathrm{d} \Tr (f) \\ &= \mathrm{d} \Tr (a\mathbf{X}) \\ &= \Tr \mathrm{d}(a\mathbf{X}) \\ &= \Tr (a\mathrm{d}\mathbf{X}) \\ &= \Tr (aI\mathrm{d}\mathbf{X}) \\ &= \Tr ((aI)^{\mathsf{T}}\mathrm{d}\mathbf{X}) \end{align*} $$
$$ \implies \nabla_{\mathbf{X}} (\Tr (a\mathbf{X})) = aI $$
If $a = 1$,
$$ \implies \nabla_{\mathbf{X}} (\Tr (\mathbf{X})) = I $$
Also, since $\Tr (a\mathbf{X}^{\mathsf{T}}) = \Tr (a\mathbf{X})$,
$$ \nabla_{\mathbf{X}} (\Tr (a\mathbf{X}^{\mathsf{T}})) = aI $$
$$ \implies \nabla_{\mathbf{X}} (\Tr (\mathbf{X}^{\mathsf{T}})) = I $$
■
$f(\mathbf{X}) = \Tr (\mathbf{A}\mathbf{X})$
$$ \begin{align*} \mathrm{d}f &= \mathrm{d} \Tr (f) \\ &= \mathrm{d} \Tr (\mathbf{A}\mathbf{X}) \\ &= \Tr \mathrm{d}(\mathbf{A}\mathbf{X}) \\ &= \Tr (\mathbf{A} \mathrm{d}\mathbf{X}) \\ &= \Tr (\mathbf{A}^{\mathsf{T}}\mathrm{d}\mathbf{X}) \end{align*} $$
$$ \implies \nabla_{\mathbf{X}} (\Tr (\mathbf{A}\mathbf{X})) = \mathbf{A}^{\mathsf{T}} $$
■
$f(\mathbf{X}) = \Tr (\mathbf{A}\mathbf{X}\mathbf{B})$
$$ \begin{align*} \mathrm{d}f &= \mathrm{d} \Tr (f) \\ &= \mathrm{d} \Tr (\mathbf{A}\mathbf{X}\mathbf{B}) \\ &= \Tr \mathrm{d}(\mathbf{A}\mathbf{X}\mathbf{B}) \\ &= \Tr (\mathbf{A} (\mathrm{d}\mathbf{X})\mathbf{B}) \\ &= \Tr (\mathbf{B}\mathbf{A} (\mathrm{d}\mathbf{X})) \\ &= \Tr ((\mathbf{A}^{\mathsf{T}}\mathbf{B}^{\mathsf{T}})^{\mathsf{T}}\mathrm{d}\mathbf{X}) \end{align*} $$
$$ \implies \nabla_{\mathbf{X}} (\Tr (\mathbf{A}\mathbf{X}\mathbf{B})) = \mathbf{A}^{\mathsf{T}}\mathbf{B}^{\mathsf{T}} $$
Alternatively, since $\Tr (\mathbf{A}\mathbf{X}\mathbf{B}) = \Tr (\mathbf{B}\mathbf{A}\mathbf{X})$ and $\nabla_{\mathbf{X}} (\Tr (\mathbf{A}\mathbf{X})) = \mathbf{A}^{\mathsf{T}}$,
$$ \nabla_{\mathbf{X}} (\Tr (\mathbf{A}\mathbf{X}\mathbf{B})) = \mathbf{A}^{\mathsf{T}}\mathbf{B}^{\mathsf{T}} $$
■
$f(\mathbf{X}) = \Tr (\mathbf{A}\mathbf{X}^{\mathsf{T}}\mathbf{B})$
$$ \begin{align*} \mathrm{d}f &= \mathrm{d} \Tr (f) \\ &= \mathrm{d} \Tr (\mathbf{A}\mathbf{X}^{\mathsf{T}}\mathbf{B}) \\ &= \Tr \mathrm{d}(\mathbf{A}\mathbf{X}^{\mathsf{T}}\mathbf{B}) \\ &= \Tr (\mathbf{A} (\mathrm{d}\mathbf{X})^{\mathsf{T}}\mathbf{B}) \\ &= \Tr ((\mathrm{d}\mathbf{X})^{\mathsf{T}}\mathbf{B}\mathbf{A}) \\ &= \Tr ((\mathbf{B}\mathbf{A})^{\mathsf{T}}\mathrm{d}\mathbf{X}) \end{align*} $$
$$ \implies \nabla_{\mathbf{X}} (\Tr (\mathbf{A}\mathbf{X}^{\mathsf{T}}\mathbf{B})) = \mathbf{B}\mathbf{A} $$
Alternatively, since $\Tr (\mathbf{A}\mathbf{X}^{\mathsf{T}}\mathbf{B}) = \Tr (\mathbf{B}^{\mathsf{T}}\mathbf{X}\mathbf{A}^{\mathsf{T}})=\Tr (\mathbf{A}^{\mathsf{T}} \mathbf{B}^{\mathsf{T}} \mathbf{X})$ and $\nabla_{\mathbf{X}} \Tr (\mathbf{A}\mathbf{X}) = \mathbf{A}^{\mathsf{T}}$,
$$ \nabla_{\mathbf{X}} (\Tr (\mathbf{A}\mathbf{X}^{\mathsf{T}}\mathbf{B})) = \mathbf{B}\mathbf{A} $$
If $\mathbf{B} = I$,
$$ \nabla_{\mathbf{X}} (\Tr (\mathbf{A}\mathbf{X}^{\mathsf{T}})) = \mathbf{A} $$
■
$f(\mathbf{X}) = \Tr (\mathbf{A}^{\mathsf{T}}\mathbf{X}^{\mathsf{T}}\mathbf{B}\mathbf{X})$
$$ \begin{align*} \mathrm{d}f &= \mathrm{d} \Tr (f) \\ &= \mathrm{d} \Tr (\mathbf{A}^{\mathsf{T}}\mathbf{X}^{\mathsf{T}}\mathbf{B}\mathbf{X}) \\ &= \Tr \mathrm{d}(\mathbf{A}^{\mathsf{T}}\mathbf{X}^{\mathsf{T}}\mathbf{B}\mathbf{X}) \\ &= \Tr (\mathbf{A}^{\mathsf{T}} (\mathrm{d}\mathbf{X})^{\mathsf{T}}\mathbf{B}\mathbf{X} + \mathbf{A}^{\mathsf{T}}\mathbf{X}^{\mathsf{T}}\mathbf{B}(\mathrm{d}\mathbf{X})) \\ &= \Tr ((\mathrm{d}\mathbf{X})^{\mathsf{T}} \mathbf{B}\mathbf{X}\mathbf{A}^{\mathsf{T}} + \mathbf{A}^{\mathsf{T}}\mathbf{X}^{\mathsf{T}}\mathbf{B} \mathrm{d}\mathbf{X}) \\ &= \Tr ((\mathbf{B}\mathbf{X}\mathbf{A}^{\mathsf{T}})^{\mathsf{T}}\mathrm{d}\mathbf{X} + (\mathbf{B}^{\mathsf{T}}\mathbf{X}\mathbf{A})^{\mathsf{T}}\mathrm{d}\mathbf{X}) \\ &= \Tr ((\mathbf{B}\mathbf{X}\mathbf{A}^{\mathsf{T}} + \mathbf{B}^{\mathsf{T}}\mathbf{X}\mathbf{A})^{\mathsf{T}}\mathrm{d}\mathbf{X}) \\ \end{align*} $$
$$ \implies \nabla_{\mathbf{X}} (\Tr (\mathbf{A}^{\mathsf{T}}\mathbf{X}^{\mathsf{T}}\mathbf{B}\mathbf{X})) = \mathbf{B}\mathbf{X}\mathbf{A}^{\mathsf{T}} + \mathbf{B}^{\mathsf{T}}\mathbf{X}\mathbf{A} $$
$f(\mathbf{X}) = \Tr (\mathbf{A}\mathbf{X}^{\mathsf{T}}\mathbf{C}\mathbf{X}\mathbf{B})$
$$ \begin{align*} \mathrm{d}f &= \mathrm{d} \Tr (f) \\ &= \mathrm{d} \Tr (\mathbf{A}\mathbf{X}^{\mathsf{T}}\mathbf{C}\mathbf{X}\mathbf{B}) \\ &= \Tr \mathrm{d}(\mathbf{A}\mathbf{X}^{\mathsf{T}}\mathbf{C}\mathbf{X}\mathbf{B}) \\ &= \Tr (\mathbf{A} (\mathrm{d}\mathbf{X})^{\mathsf{T}}\mathbf{C}\mathbf{X}\mathbf{B} + \mathbf{A}\mathbf{X}^{\mathsf{T}}\mathbf{C}(\mathrm{d}\mathbf{X})\mathbf{B}) \\ &= \Tr ((\mathrm{d}\mathbf{X})^{\mathsf{T}} \mathbf{C}\mathbf{X}\mathbf{B}\mathbf{A} + \mathbf{B}\mathbf{A}\mathbf{X}^{\mathsf{T}}\mathbf{C} \mathrm{d}\mathbf{X}) \\ &= \Tr ((\mathbf{C}\mathbf{X}\mathbf{B}\mathbf{A})^{\mathsf{T}}\mathrm{d}\mathbf{X} + (\mathbf{C}^{\mathsf{T}}\mathbf{X}\mathbf{A}^{\mathsf{T}}\mathbf{B}^{\mathsf{T}})^{\mathsf{T}}\mathrm{d}\mathbf{X}) \\ &= \Tr ((\mathbf{C}\mathbf{X}\mathbf{B}\mathbf{A} + \mathbf{C}^{\mathsf{T}}\mathbf{X}\mathbf{A}^{\mathsf{T}}\mathbf{B}^{\mathsf{T}})^{\mathsf{T}}\mathrm{d}\mathbf{X}) \\ \end{align*} $$
$$ \implies \nabla_{\mathbf{X}} (\Tr (\mathbf{A}\mathbf{X}^{\mathsf{T}}\mathbf{C}\mathbf{X}\mathbf{B})) = \mathbf{C}\mathbf{X}\mathbf{B}\mathbf{A} + \mathbf{C}^{\mathsf{T}}\mathbf{X}\mathbf{A}^{\mathsf{T}}\mathbf{B}^{\mathsf{T}} $$
Alternatively, since $\Tr (\mathbf{A}\mathbf{X}^{\mathsf{T}}\mathbf{C}\mathbf{X}\mathbf{B}) = \Tr (\mathbf{B}\mathbf{A}\mathbf{X}^{\mathsf{T}}\mathbf{C}\mathbf{X})$ and $\nabla_{\mathbf{X}} (\Tr (\mathbf{A}\mathbf{X}^{\mathsf{T}}\mathbf{B}\mathbf{X})) = \mathbf{B}\mathbf{X}\mathbf{A} + \mathbf{B}^{\mathsf{T}}\mathbf{X}\mathbf{A}^{\mathsf{T}}$,
$$ \nabla_{\mathbf{X}} (\Tr (\mathbf{A}\mathbf{X}^{\mathsf{T}}\mathbf{C}\mathbf{X}\mathbf{B})) = \mathbf{C}\mathbf{X}\mathbf{B}\mathbf{A} + \mathbf{C}^{\mathsf{T}}\mathbf{X}\mathbf{A}^{\mathsf{T}}\mathbf{B}^{\mathsf{T}} $$
If $\mathbf{C} = I$,
$$ \nabla_{\mathbf{X}} (\Tr (\mathbf{A}\mathbf{X}^{\mathsf{T}}\mathbf{X}\mathbf{B})) = \mathbf{X}\mathbf{B}\mathbf{A} + \mathbf{X}\mathbf{A}^{\mathsf{T}}\mathbf{B}^{\mathsf{T}} = \mathbf{X}(\mathbf{B}\mathbf{A} + \mathbf{A}^{\mathsf{T}}\mathbf{B}^{\mathsf{T}}) $$
If $\mathbf{B} = \mathbf{C} = I$,
$$ \nabla_{\mathbf{X}} (\Tr (\mathbf{A}\mathbf{X}^{\mathsf{T}}\mathbf{X})) = \mathbf{X}\mathbf{A} + \mathbf{X}\mathbf{A}^{\mathsf{T}} = \mathbf{X}(\mathbf{A} + \mathbf{A}^{\mathsf{T}}) $$
■
$f(\mathbf{X}) = \Tr (\mathbf{A}\mathbf{X}\mathbf{C}\mathbf{X}\mathbf{B})$
$$ \begin{align*} \mathrm{d}f &= \mathrm{d} \Tr (f) \\ &= \mathrm{d} \Tr (\mathbf{A}\mathbf{X}\mathbf{C}\mathbf{X}\mathbf{B}) \\ &= \Tr \mathrm{d} (\mathbf{A}\mathbf{X}\mathbf{C}\mathbf{X}\mathbf{B}) \\ &= \Tr (\mathbf{A}(\mathrm{d}\mathbf{X})\mathbf{C}\mathbf{X}\mathbf{B} + \mathbf{A}\mathbf{X}\mathbf{C}(\mathrm{d}\mathbf{X})\mathbf{B}) \\ &= \Tr (\mathbf{C}\mathbf{X}\mathbf{B}\mathbf{A}\mathrm{d}\mathbf{X} + \mathbf{B}\mathbf{A}\mathbf{X}\mathbf{C}\mathrm{d}\mathbf{X}) \\ &= \Tr ((\mathbf{C}\mathbf{X}\mathbf{B}\mathbf{A} + \mathbf{B}\mathbf{A}\mathbf{X}\mathbf{C})\mathrm{d}\mathbf{X}) \\ &= \Tr ((\mathbf{A}^{\mathsf{T}}\mathbf{B}^{\mathsf{T}}\mathbf{X}^{\mathsf{T}}\mathbf{C}^{\mathsf{T}} + \mathbf{C}^{\mathsf{T}}\mathbf{X}^{\mathsf{T}}\mathbf{A}^{\mathsf{T}}\mathbf{B}^{\mathsf{T}})^{\mathsf{T}}\mathrm{d}\mathbf{X}) \\ \end{align*} $$
$$ \implies \nabla_{\mathbf{X}} (\Tr (\mathbf{A}\mathbf{X}\mathbf{C}\mathbf{X}\mathbf{B})) = (\mathbf{A}^{\mathsf{T}}\mathbf{B}^{\mathsf{T}}\mathbf{X}^{\mathsf{T}}\mathbf{C}^{\mathsf{T}} + \mathbf{C}^{\mathsf{T}}\mathbf{X}^{\mathsf{T}}\mathbf{A}^{\mathsf{T}}\mathbf{B}^{\mathsf{T}}) $$
If $\mathbf{C} = I$,
$$ \nabla_{\mathbf{X}} (\Tr (\mathbf{A}\mathbf{X}\mathbf{X}\mathbf{B})) = \mathbf{A}^{\mathsf{T}}\mathbf{B}^{\mathsf{T}}\mathbf{X}^{\mathsf{T}} + \mathbf{X}^{\mathsf{T}}\mathbf{A}^{\mathsf{T}}\mathbf{B}^{\mathsf{T}} $$
■
$f(\mathbf{X}) = \Tr (\mathbf{X}^{n})$
$$ \begin{align*} \mathrm{d}f &= \mathrm{d} \Tr (f) \\ &= \mathrm{d} \Tr (\mathbf{X}^{n}) \\ &= \Tr \mathrm{d}(\mathbf{X}^{n}) \\ &= \Tr \left( (\mathrm{d}\mathbf{X}) X^{n-1} + \mathbf{X} (\mathrm{d}\mathbf{X}) X^{n-2} + \cdots + \mathbf{X}^{n-2}(\mathrm{d}\mathbf{X})\mathbf{X} + \mathbf{X}^{n-1}\mathrm{d}\mathbf{X} \right) \\ &= \Tr \left( \overbrace{\mathbf{X}^{n-1} \mathrm{d}\mathbf{X} + \cdots + \mathbf{X}^{n-1} \mathrm{d}\mathbf{X}}^{n} \right) \\ &= \Tr \left( n\mathbf{X}^{n-1} \mathrm{d}\mathbf{X} \right) \\ &= \Tr \left( ((n\mathbf{X}^{n-1})^{\mathsf{T}})^{\mathsf{T}} \mathrm{d}\mathbf{X} \right) \\ \end{align*} $$
$$ \implies \nabla_{\mathbf{X}} (\Tr (\mathbf{X}^{n})) = n(\mathbf{X}^{\mathsf{T}})^{n-1} $$
■
$f(\mathbf{X}) = \Tr (\mathbf{A}\mathbf{X}^{n})$
$$ \begin{align*} \mathrm{d}f &= \mathrm{d} \Tr (f) \\ &= \mathrm{d} \Tr (\mathbf{A}\mathbf{X}^{n}) \\ &= \Tr \mathrm{d}(\mathbf{A}\mathbf{X}^{n}) \\ &= \Tr \left[ \mathbf{A}(\mathrm{d}\mathbf{X})\mathbf{X}^{n-1} + \mathbf{A}\mathbf{X}(\mathrm{d}\mathbf{X})\mathbf{X}^{n-2} + \cdots \mathbf{A}\mathbf{X}^{n-2}(\mathrm{d}\mathbf{X})\mathbf{X} + \mathbf{A}\mathbf{X}^{n-1}(\mathrm{d}\mathbf{X}) \right] \\ &= \Tr \left[ \mathbf{X}^{n-1}\mathbf{A} \mathrm{d}\mathbf{X} + \mathbf{X}^{n-2}\mathbf{A}\mathbf{X} \mathrm{d}\mathbf{X} + \cdots + \mathbf{X}\mathbf{A}\mathbf{X}^{n-2}\mathrm{d}\mathbf{X} + \mathbf{A}\mathbf{X}^{n-1}\mathrm{d}\mathbf{X} \right] \\ &= \Tr \left( \left[ \sum\limits_{i=0}^{n-1} \mathbf{X}^{n-1-i}\mathbf{A}\mathbf{X}^{i} \right]\mathrm{d}\mathbf{X} \right) \\ &= \Tr \left( \left( \left[ \sum\limits_{i=0}^{n-1} \mathbf{X}^{n-1-i}\mathbf{A}\mathbf{X}^{i} \right]^{\mathsf{T}} \right)^{\mathsf{T}} \mathrm{d}\mathbf{X} \right) \\ &= \Tr \left( \left( \sum\limits_{i=0}^{n-1} \left[ \mathbf{X}^{n-1-i}\mathbf{A}\mathbf{X}^{i} \right]^{\mathsf{T}} \right)^{\mathsf{T}} \mathrm{d}\mathbf{X} \right) \\ \end{align*} $$
$$ \implies \nabla_{\mathbf{X}} (\Tr (\mathbf{A}\mathbf{X}^{n})) = \sum\limits_{i=0}^{n-1} \left[ \mathbf{X}^{n-1-i}\mathbf{A}\mathbf{X}^{i} \right]^{\mathsf{T}} $$
■
$f(\mathbf{X}) = \Tr (a_{m}\mathbf{X}^{m} + \dots + a_{1}\mathbf{X} + a_{0})$
Since $\nabla_{\mathbf{X}} (\Tr (\mathbf{X}^{n})) = n(\mathbf{X}^{\mathsf{T}})^{n-1}$ and both the trace and derivative $\nabla_{X}$ are linear,
$$ \begin{align*} \nabla_{\mathbf{X}} \Tr (a_{m}\mathbf{X}^{m} + \dots + a_{1}\mathbf{X} + a_{0}) &= ma_{m}(\mathbf{X}^{\mathsf{T}})^{m-1} + 2a_{2}\mathbf{X}^{\mathsf{T}} + a_{1} \\ &= \sum\limits_{i=1}^{m} i a_{i}(\mathbf{X}^{\mathsf{T}})^{i-1} \end{align*} $$
■
$f(\mathbf{X}) = \mathbf{a}^{\mathsf{T}}\mathbf{X}\mathbf{X}\mathbf{b}$
Based on the definition and properties of the trace, the following holds:
$$ \nabla_{\mathbf{X}} ( \mathbf{a}^{\mathsf{T}}\mathbf{X}\mathbf{X}\mathbf{b} ) = \nabla_{\mathbf{X}} \Tr( \mathbf{a}^{\mathsf{T}}\mathbf{X}\mathbf{X}\mathbf{b} ) = \nabla_{\mathbf{X}} \Tr( \mathbf{b}\mathbf{a}^{\mathsf{T}}\mathbf{X}\mathbf{X} ) $$
Since $\nabla_{\mathbf{X}} (\Tr (\mathbf{A}\mathbf{X}\mathbf{X})) = \mathbf{A}^{\mathsf{T}}\mathbf{X}^{\mathsf{T}} + \mathbf{X}^{\mathsf{T}}\mathbf{a}^{\mathsf{T}}$,
$$ \nabla_{\mathbf{X}} (\mathbf{a}^{\mathsf{T}}\mathbf{X}\mathbf{X}\mathbf{b}) = \mathbf{a}\mathbf{b}^{\mathsf{T}}\mathbf{X}^{\mathsf{T}} + \mathbf{X}^{\mathsf{T}}\mathbf{a}\mathbf{b}^{\mathsf{T}} $$
$f(\mathbf{X}) = \mathbf{a}^{\mathsf{T}}\mathbf{X}^{\mathsf{T}}\mathbf{C}\mathbf{X}\mathbf{b}$
$$ \begin{align*} \mathrm{d}f &= \mathrm{d} \Tr (f) \\ &= \mathrm{d} \Tr (\mathbf{a}^{\mathsf{T}}\mathbf{X}^{\mathsf{T}}\mathbf{C}\mathbf{X}\mathbf{b}) \\ &= \mathrm{d} \Tr (\mathbf{b}\mathbf{a}^{\mathsf{T}}\mathbf{X}^{\mathsf{T}}\mathbf{C}\mathbf{X}) \\ &= \Tr \mathrm{d}(\mathbf{b}\mathbf{a}^{\mathsf{T}}\mathbf{X}^{\mathsf{T}}\mathbf{C}\mathbf{X}) \\ &= \Tr \left( \mathbf{b}\mathbf{a}^{\mathsf{T}}(\mathrm{d}\mathbf{X})^{\mathsf{T}}\mathbf{C}\mathbf{X} + \mathbf{b}\mathbf{a}^{\mathsf{T}}\mathbf{X}^{\mathsf{T}}\mathbf{C}\mathrm{d}\mathbf{X} \right) \\ &= \Tr \left( (\mathrm{d}\mathbf{X})^{\mathsf{T}}\mathbf{C}\mathbf{X}\mathbf{b}\mathbf{a}^{\mathsf{T}} + (\mathbf{C}^{\mathsf{T}}\mathbf{X}\mathbf{a}\mathbf{b}^{\mathsf{T}})^{\mathsf{T}}\mathrm{d}\mathbf{X} \right) \\ &= \Tr \left( (\mathbf{C}\mathbf{X}\mathbf{b}\mathbf{a}^{\mathsf{T}})^{\mathsf{T}}\mathrm{d}\mathbf{X} + (\mathbf{C}^{\mathsf{T}}\mathbf{X}\mathbf{a}\mathbf{b}^{\mathsf{T}})^{\mathsf{T}}\mathrm{d}\mathbf{X} \right) \\ &= \Tr \left( \left[ \mathbf{C}\mathbf{X}\mathbf{b}\mathbf{a}^{\mathsf{T}} + \mathbf{C}^{\mathsf{T}}\mathbf{X}\mathbf{a}\mathbf{b}^{\mathsf{T}} \right]^{\mathsf{T}}\mathrm{d}\mathbf{X} \right) \\ \end{align*} $$
$$ \implies \nabla_{\mathbf{X}} (\mathbf{a}^{\mathsf{T}}\mathbf{X}^{\mathsf{T}}\mathbf{C}\mathbf{X}\mathbf{b}) = \mathbf{C}\mathbf{X}\mathbf{b}\mathbf{a}^{\mathsf{T}} + \mathbf{C}^{\mathsf{T}}\mathbf{X}\mathbf{a}\mathbf{b}^{\mathsf{T}} $$
Alternatively, since $\nabla_{\mathbf{X}} (\mathbf{a}^{\mathsf{T}}\mathbf{X}^{\mathsf{T}}\mathbf{C}\mathbf{X}\mathbf{b}) = \nabla_{\mathbf{X}} \Tr (\mathbf{a}^{\mathsf{T}}\mathbf{X}^{\mathsf{T}}\mathbf{C}\mathbf{X}\mathbf{b}) = \nabla_{\mathbf{X}} \Tr (\mathbf{b}\mathbf{a}^{\mathsf{T}}\mathbf{X}^{\mathsf{T}}\mathbf{C}\mathbf{X})$ and,
since $\nabla_{\mathbf{X}} (\Tr (\mathbf{A}^{\mathsf{T}}\mathbf{X}^{\mathsf{T}}\mathbf{B}\mathbf{X})) = \mathbf{B}\mathbf{X}\mathbf{A}^{\mathsf{T}} + \mathbf{B}^{\mathsf{T}}\mathbf{X}\mathbf{A}$, the following holds:
$$ \nabla_{\mathbf{X}} (\mathbf{a}^{\mathsf{T}}\mathbf{X}^{\mathsf{T}}\mathbf{C}\mathbf{X}\mathbf{b}) = \mathbf{C}\mathbf{X}\mathbf{b}\mathbf{a}^{\mathsf{T}} + \mathbf{C}^{\mathsf{T}}\mathbf{X}\mathbf{a}\mathbf{b}^{\mathsf{T}} $$
■