Matrix Differential Dissection
Definition
$n \times n$ The differential of a matrix $\mathbf{X} = [x_{ij}] a$ is defined as follows.
$$ \mathrm{d} \mathbf{X} = \begin{bmatrix} \mathrm{d} x_{11} & \mathrm{d} x_{12} & \cdots & \mathrm{d} x_{1n} \\ \mathrm{d} x_{21} & \mathrm{d} x_{22} & \cdots & \mathrm{d} x_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ \mathrm{d} x_{n1} & \mathrm{d} x_{n2} & \cdots & \mathrm{d} x_{nn} \end{bmatrix} $$
Explanation
If the generalization of the differential $\mathrm{d}x$ of a univariate function to a vector is $\mathrm{d}\mathbf{x}$, then its generalization to a matrix is $\mathrm{d}\mathbf{X}$.
The total differential of a univariate function $f: \mathbb{R} \to \mathbb{R}$:
$$ \mathrm{d}f = \dfrac{\mathrm{d}f}{\mathrm{d}x} \mathrm{d}x $$
The total differential of a multivariable function $f: \mathbb{R}^{n} \to \mathbb{R}$: $$ \mathrm{d}f = \sum_{i} \dfrac{\partial f}{\partial x_{i}} \mathrm{d}x_{i} = \Braket{\nabla_{\mathbf{x}}f, \mathrm{d}\mathbf{x}} = (\nabla_{\mathbf{x}}f)^{\mathsf{T}} \mathrm{d}\mathbf{x} = \begin{bmatrix} \dfrac{\partial f}{\partial x_{1}} & \cdots & \dfrac{\partial f}{\partial x_{n}} \end{bmatrix} \begin{bmatrix} \mathrm{d}x_{1} \\ \vdots \\ \mathrm{d}x_{n} \end{bmatrix} $$
The total differential of a function of a matrix $f: \mathbb{R}^{n \times n} \to \mathbb{R}$: $$ \mathrm{d}f = \sum\limits_{i,j} \dfrac{\partial f}{\partial x_{ij}} \mathrm{d}x_{ij} = \Braket{\nabla_{\mathbf{X}}f, \mathrm{d}\mathbf{X}} = \Tr \left( \left( \nabla_{\mathbf{X}}f \right)^{\mathsf{T}} \mathrm{d}\mathbf{X} \right) $$
The properties of the differential of a scalar $\mathrm{d}x$ remain unchanged.
$$ \mathrm{d}(ax) = a \mathrm{d}x \qquad \mathrm{d}(x + y) = \mathrm{d}x + \mathrm{d}y \qquad \mathrm{d}(xy) = y\mathrm{d}x + x\mathrm{d}y $$
Properties
For a variable matrix $\mathbf{X}, \mathbf{Y} \in \mathbb{R}^{n \times n}$ and scalar $\alpha \in \mathbb{R}$, and a constant matrix $\mathbf{A} \in \mathbb{R}^{n \times n}$, the following hold:
- $\mathrm{d}(\alpha \mathbf{X}) = \alpha \mathrm{d}\mathbf{X}$
- $\mathrm{d}(\mathbf{X}^{\mathsf{T}}) = (\mathrm{d}\mathbf{X})^{\mathsf{T}}$
- $\mathrm{d}(\mathbf{A}\mathbf{X}) = \mathbf{A} \mathrm{d}\mathbf{X}$ and $\mathrm{d}(\mathbf{X}\mathbf{A}) = (\mathrm{d}\mathbf{X}) \mathbf{A}$
- $\mathrm{d}(\mathbf{X} + \mathbf{Y}) = \mathrm{d}\mathbf{X} + \mathrm{d}\mathbf{Y}$
- $\mathrm{d}(\mathbf{X}\mathbf{Y}) = (\mathrm{d}\mathbf{X})\mathbf{Y} + \mathbf{X} \mathrm{d}\mathbf{Y}$
Proof
3.
Since $[\mathbf{A}\mathbf{X}]_{ij} = \sum_{k=1}^{n} a_{ik} x_{kj}$,
$$ [\mathrm{d}(\mathbf{A}\mathbf{X})]_{ij} = \mathrm{d}\left( \sum_{k=1}^{n} a_{ik} x_{kj} \right) = \sum_{k=1}^{n} a_{ik}(\mathrm{d}x_{kj}) $$
$$ \implies \mathrm{d}(\mathbf{A}\mathbf{X}) = \mathbf{A}\mathrm{d}\mathbf{X} $$
■
5.
Since $[\mathbf{X}\mathbf{Y}]_{ij} = \sum_{k=1}^{n} x_{ik} y_{kj}$,
$$ [\mathrm{d}(\mathbf{X}\mathbf{Y})]_{ij} = \mathrm{d}\left( \sum_{k=1}^{n} x_{ik} y_{kj} \right) = \sum_{k=1}^{n} (\mathrm{d}x_{ik}) y_{kj} + \sum_{k=1}^{n} x_{ik} \mathrm{d}y_{kj} $$
Therefore, we obtain the following.
$$ [\mathrm{d}(\mathbf{X}\mathbf{Y})]_{ij} = \sum_{k=1}^{n} x_{ik} \mathrm{d}y_{kj} + \sum_{k=1}^{n} x_{ik} \mathrm{d}y_{kj} = [(\mathrm{d}\mathbf{X})\mathbf{Y}]_{ij} + [\mathbf{X}\mathrm{d}\mathbf{Y}]_{ij} \\[1em] \implies \mathrm{d}(\mathbf{X}\mathbf{Y}) = (\mathrm{d}\mathbf{X})\mathbf{Y} + \mathbf{X}\mathrm{d}\mathbf{Y} $$
■