Jacobian of Composite Functions
Theorem
Let’s assume we have two functions $f : \mathbb{R}^{n} \to \mathbb{R}^{m}$ and $g : \mathbb{R}^{m} \to \mathbb{R}^{k}$. We denote the Jacobian of $f$ as $J(f)$. Then, the following holds.
$$ J(g \circ f) = J(g) J(f) $$
Explanation
Since the Jacobian is the most generalized derivative, the above theorem is a generalization of the chain rule.
Proof
By definition of the Jacobian,
$$ J(g \circ f) = \begin{bmatrix} \dfrac{\partial (g \circ f)_{1}}{\partial x_{1}} & \cdots & \dfrac{\partial (g \circ f)_{1}}{\partial x_{n}} \\ \vdots & \ddots & \vdots \\ \dfrac{\partial (g \circ f)_{k}}{\partial x_{1}} & \cdots & \dfrac{\partial (g \circ f)_{k}}{\partial x_{n}} \end{bmatrix} = \begin{bmatrix} \dfrac{\partial g_{1}}{\partial x_{1}} & \cdots & \dfrac{\partial g_{1}}{\partial x_{n}} \\ \vdots & \ddots & \vdots \\ \dfrac{\partial g_{k}}{\partial x_{1}} & \cdots & \dfrac{\partial g_{k}}{\partial x_{n}} \end{bmatrix} $$
Given $g_{i} = g_{i}(f_{1}(\mathbf{x}), \dots, f_{m}(\mathbf{x})))$,
$$ \dfrac{\partial g_{i}}{\partial x_{j}} = \sum \limits_{\ell=1}^{m} \dfrac{\partial g_{i}}{\partial f_{\ell}} \dfrac{\partial f_{\ell}}{\partial x_{j}} $$
Therefore,
$$ \begin{align*} J(g \circ f) =&\ \begin{bmatrix} \sum \limits_{\ell=1}^{m} \dfrac{\partial g_{1}}{\partial f_{\ell}} \dfrac{\partial f_{\ell}}{\partial x_{1}} & \cdots & \sum \limits_{\ell=1}^{m} \dfrac{\partial g_{1}}{\partial f_{\ell}} \dfrac{\partial f_{\ell}}{\partial x_{n}} \\ \vdots & \ddots & \vdots \\ \sum \limits_{\ell=1}^{m} \dfrac{\partial g_{k}}{\partial f_{\ell}} \dfrac{\partial f_{\ell}}{\partial x_{1}} & \cdots & \sum \limits_{\ell=1}^{m} \dfrac{\partial g_{k}}{\partial f_{\ell}} \dfrac{\partial f_{\ell}}{\partial x_{m}} \end{bmatrix} \\ =&\ \begin{bmatrix} \dfrac{\partial g_{1}}{\partial f_{1}} & \cdots & \dfrac{\partial g_{1}}{\partial f_{m}} \\ \vdots & \ddots & \vdots \\ \dfrac{\partial g_{k}}{\partial f_{1}}& \cdots & \dfrac{\partial g_{k}}{\partial f_{m}} \end{bmatrix} \begin{bmatrix} \dfrac{\partial f_{1}}{\partial x_{1}} & \cdots & \dfrac{\partial f_{1}}{\partial x_{n}} \\ \vdots & \ddots & \vdots \\ \dfrac{\partial f_{m}}{\partial x_{1}} & \cdots & \dfrac{\partial f_{m}}{\partial x_{n}} \end{bmatrix} \\ =&\ J(g) J(f) \end{align*} $$
■