logo

Differentiation of Matrix-valued Functions 📂Vector Analysis

Differentiation of Matrix-valued Functions

Overview

We define the derivative of matrix-valued functions. The way to define the derivative of a matrix-valued function is the same as for the derivative of vector-valued functions. It is naturally defined by applying the derivative of scalar functions to each component.

Let $\mathbf{A} : I \subset \mathbb{R} \to \mathbb{R}^{n \times m}$ be a matrix-valued function.

$$ \mathbf{A}(t) = \begin{bmatrix} a_{11}(t) & \cdots & a_{1m}(t) \\ \vdots & \ddots & \vdots \\ a_{n1}(t) & \cdots & a_{nm}(t) \end{bmatrix} $$

If the following limit exists, we say that $\mathbf{A}$ is differentiable at $t$, and its value is called the derivative of $\mathbf{A}$ at $t$.

$$ \dfrac{d}{dt} \mathbf{A}(t) = \mathbf{A}^{\prime}(t) := \lim_{h \to 0} \dfrac{\mathbf{A}(t+h) - \mathbf{A}(t)}{h} $$

If for every $t \in I$ the $\mathbf{A}^{\prime}(t)$ exists, then $\mathbf{A}$ is said to be differentiable on $I$, and $\mathbf{A}^{\prime}$ is called the derivative of $\mathbf{A}$.

Explanation

The derivative of a matrix-valued function is essentially identical to the case of vector-valued functions, and is naturally understood by viewing a matrix as a single vector $\mathbb{R}^{n \times m} \cong \mathbb{R}^{nm}$. By definition, $\mathbf{A}$ being differentiable means that each component is differentiable. Thus the derivative of $\mathbf{A}$ is as follows.

$$ \begin{bmatrix} \dfrac{d \mathbf{A}}{dt} \end{bmatrix}_{ij} = \dfrac{d a_{ij}}{dt} $$

The differences from the vector-valued case concern operations defined only for matrices, such as matrix multiplication, transpose, trace, determinant, and matrix inverse.

Properties

If $\mathbf{A}$ is differentiable, then the following functions are also differentiable, and the formulas are as follows.

(a) $\dfrac{d}{dt}(\mathbf{A} + \mathbf{B}) = \dfrac{d\mathbf{A}}{dt} + \dfrac{d\mathbf{B}}{dt}$

(b) $\dfrac{d}{dt}(c\mathbf{A}) = c\dfrac{d\mathbf{A}}{dt}$

(c) $\dfrac{d}{dt}(\mathbf{A}\mathbf{B}) = \dfrac{d\mathbf{A}}{dt}\mathbf{B} + \mathbf{A}\dfrac{d\mathbf{B}}{dt}$

(d) $\dfrac{d}{dt}(\mathbf{A}^{\mathsf{T}}) = \left(\dfrac{d\mathbf{A}}{dt}\right)^{\mathsf{T}}$

(e) $\dfrac{d}{dt}(\tr\mathbf{A}) = \tr\left(\dfrac{d\mathbf{A}}{dt}\right)$

(f) $\dfrac{d}{dt}(\det\mathbf{A}) = \det\mathbf{A} \cdot \tr\left(\mathbf{A}^{-1}\dfrac{d\mathbf{A}}{dt}\right)$

(g) $\dfrac{d}{dt}(\mathbf{A}^{-1}) = -\mathbf{A}^{-1}\dfrac{d\mathbf{A}}{dt}\mathbf{A}^{-1}$

(h) $\dfrac{d}{dt} \Braket{\mathbf{A}, \mathbf{B}} = \Braket{\dfrac{d\mathbf{A}}{dt}, \mathbf{B}} + \Braket{\mathbf{A}, \dfrac{d\mathbf{B}}{dt}}$

Proof

If we expand the above formulas componentwise, they are expressed as sums and products of differentiable scalar functions. Since sums and products of differentiable functions are still differentiable, the above properties hold.

(f)

This is called the Jacobi formula.

$$ \dfrac{d}{dt} (\det \mathbf{A}(t)) = \det \mathbf{A}(t) \cdot \tr\left( \mathbf{A}(t)^{-1} \dfrac{d\mathbf{A}(t)}{dt} \right) $$

(g)

Since the map sending a matrix to its inverse is differentiable, the inverse matrix is also differentiable. Differentiating both sides of $\mathbf{A} \mathbf{A}^{-1} = I$ yields the following.

$$ \dfrac{d}{dt}(\mathbf{A} \mathbf{A}^{-1}) = \dfrac{d \mathbf{A}}{dt} \mathbf{A}^{-1} + \mathbf{A}\dfrac{d \mathbf{A}^{-1}}{dt} = \dfrac{d I}{dt} = O $$

Rearranging for $\dfrac{d \mathbf{A}^{-1}}{dt}$ gives the following.

$$ \dfrac{d \mathbf{A}^{-1}}{dt} = -\mathbf{A}^{-1}\dfrac{d \mathbf{A}}{dt}\mathbf{A}^{-1} $$

If we consider the case when $1 \times 1$ is a matrix of size 1 (i.e., the scalar case), we recover the familiar formula. For example, for $x^{-1}$, we have $\dfrac{d x^{-1}}{dx} = - x^{-1} \dfrac{dx}{dx} x^{-1} = -x^{2}$.