logo

Matrix Differential Dissection 📂Vector Analysis

Matrix Differential Dissection

Definition

n×nn \times n The differential of a matrix X=[xij]a\mathbf{X} = [x_{ij}] a is defined as follows.

dX=[dx11dx12dx1ndx21dx22dx2ndxn1dxn2dxnn] \mathrm{d} \mathbf{X} = \begin{bmatrix} \mathrm{d} x_{11} & \mathrm{d} x_{12} & \cdots & \mathrm{d} x_{1n} \\ \mathrm{d} x_{21} & \mathrm{d} x_{22} & \cdots & \mathrm{d} x_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ \mathrm{d} x_{n1} & \mathrm{d} x_{n2} & \cdots & \mathrm{d} x_{nn} \end{bmatrix}

Explanation

If the generalization of the differential dx\mathrm{d}x of a univariate function to a vector is dx\mathrm{d}\mathbf{x}, then its generalization to a matrix is dX\mathrm{d}\mathbf{X}.

The total differential of a univariate function f:RRf: \mathbb{R} \to \mathbb{R}:

df=dfdxdx \mathrm{d}f = \dfrac{\mathrm{d}f}{\mathrm{d}x} \mathrm{d}x

The total differential of a multivariable function f:RnRf: \mathbb{R}^{n} \to \mathbb{R}: df=ifxidxi=xf,dx=(xf)Tdx=[fx1fxn][dx1dxn] \mathrm{d}f = \sum_{i} \dfrac{\partial f}{\partial x_{i}} \mathrm{d}x_{i} = \Braket{\nabla_{\mathbf{x}}f, \mathrm{d}\mathbf{x}} = (\nabla_{\mathbf{x}}f)^{\mathsf{T}} \mathrm{d}\mathbf{x} = \begin{bmatrix} \dfrac{\partial f}{\partial x_{1}} & \cdots & \dfrac{\partial f}{\partial x_{n}} \end{bmatrix} \begin{bmatrix} \mathrm{d}x_{1} \\ \vdots \\ \mathrm{d}x_{n} \end{bmatrix}

The total differential of a function of a matrix f:Rn×nRf: \mathbb{R}^{n \times n} \to \mathbb{R}: df=i,jfxijdxij=Xf,dX=Tr((Xf)TdX) \mathrm{d}f = \sum\limits_{i,j} \dfrac{\partial f}{\partial x_{ij}} \mathrm{d}x_{ij} = \Braket{\nabla_{\mathbf{X}}f, \mathrm{d}\mathbf{X}} = \Tr \left( \left( \nabla_{\mathbf{X}}f \right)^{\mathsf{T}} \mathrm{d}\mathbf{X} \right)

The properties of the differential of a scalar dx\mathrm{d}x remain unchanged.

d(ax)=adxd(x+y)=dx+dyd(xy)=ydx+xdy \mathrm{d}(ax) = a \mathrm{d}x \qquad \mathrm{d}(x + y) = \mathrm{d}x + \mathrm{d}y \qquad \mathrm{d}(xy) = y\mathrm{d}x + x\mathrm{d}y

Properties

For a variable matrix X,YRn×n\mathbf{X}, \mathbf{Y} \in \mathbb{R}^{n \times n} and scalar αR\alpha \in \mathbb{R}, and a constant matrix ARn×n\mathbf{A} \in \mathbb{R}^{n \times n}, the following hold:

  1. d(αX)=αdX\mathrm{d}(\alpha \mathbf{X}) = \alpha \mathrm{d}\mathbf{X}
  2. d(XT)=(dX)T\mathrm{d}(\mathbf{X}^{\mathsf{T}}) = (\mathrm{d}\mathbf{X})^{\mathsf{T}}
  3. d(AX)=AdX\mathrm{d}(\mathbf{A}\mathbf{X}) = \mathbf{A} \mathrm{d}\mathbf{X} and d(XA)=(dX)A\mathrm{d}(\mathbf{X}\mathbf{A}) = (\mathrm{d}\mathbf{X}) \mathbf{A}
  4. d(X+Y)=dX+dY\mathrm{d}(\mathbf{X} + \mathbf{Y}) = \mathrm{d}\mathbf{X} + \mathrm{d}\mathbf{Y}
  5. d(XY)=(dX)Y+XdY\mathrm{d}(\mathbf{X}\mathbf{Y}) = (\mathrm{d}\mathbf{X})\mathbf{Y} + \mathbf{X} \mathrm{d}\mathbf{Y}

Proof

3.

Since [AX]ij=k=1naikxkj[\mathbf{A}\mathbf{X}]_{ij} = \sum_{k=1}^{n} a_{ik} x_{kj},

[d(AX)]ij=d(k=1naikxkj)=k=1naik(dxkj) [\mathrm{d}(\mathbf{A}\mathbf{X})]_{ij} = \mathrm{d}\left( \sum_{k=1}^{n} a_{ik} x_{kj} \right) = \sum_{k=1}^{n} a_{ik}(\mathrm{d}x_{kj})

    d(AX)=AdX \implies \mathrm{d}(\mathbf{A}\mathbf{X}) = \mathbf{A}\mathrm{d}\mathbf{X}

5.

Since [XY]ij=k=1nxikykj[\mathbf{X}\mathbf{Y}]_{ij} = \sum_{k=1}^{n} x_{ik} y_{kj},

[d(XY)]ij=d(k=1nxikykj)=k=1n(dxik)ykj+k=1nxikdykj [\mathrm{d}(\mathbf{X}\mathbf{Y})]_{ij} = \mathrm{d}\left( \sum_{k=1}^{n} x_{ik} y_{kj} \right) = \sum_{k=1}^{n} (\mathrm{d}x_{ik}) y_{kj} + \sum_{k=1}^{n} x_{ik} \mathrm{d}y_{kj}

Therefore, we obtain the following.

[d(XY)]ij=k=1nxikdykj+k=1nxikdykj=[(dX)Y]ij+[XdY]ij    d(XY)=(dX)Y+XdY [\mathrm{d}(\mathbf{X}\mathbf{Y})]_{ij} = \sum_{k=1}^{n} x_{ik} \mathrm{d}y_{kj} + \sum_{k=1}^{n} x_{ik} \mathrm{d}y_{kj} = [(\mathrm{d}\mathbf{X})\mathbf{Y}]_{ij} + [\mathbf{X}\mathrm{d}\mathbf{Y}]_{ij} \\[1em] \implies \mathrm{d}(\mathbf{X}\mathbf{Y}) = (\mathrm{d}\mathbf{X})\mathbf{Y} + \mathbf{X}\mathrm{d}\mathbf{Y}