Chain Rule for Multivariable Vector Functions
Theorem
Let’s assume that two functions $\mathbf{g} : D \subset \mathbb{R}^{m} \to \mathbb{R}^{k}$, $\mathbf{f} : \mathbf{g}(\mathbb{R}^{k}) \subset \mathbb{R}^{k} \to \mathbb{R}^{n}$ are differentiable. Then, the composition of these two functions $\mathbf{F} = \mathbf{f} \circ \mathbf{g} : \mathbb{R}^{m} \to \mathbb{R}^{n}$ is also differentiable, and the (total) derivative of $\mathbf{F}$ satisfies the following.
$$ \mathbf{F}^{\prime}(\mathbf{x}) = \mathbf{f}^{\prime}\left( \mathbf{g}(\mathbf{x}) \right) \mathbf{g}^{\prime}(\mathbf{x}) $$
Explanation
This is called the chain rule.
If we denote $\mathbf{x} = (x_{1}, \dots, x_{m})$, $\mathbf{g}(\mathbf{x}) = (g_{1}, \dots, g_{k})$, $\mathbf{f}(g_{1}, \dots, g_{k}) = (f_{1}, \dots, f_{n})$, the specific form of the formula can be represented by the following $n \times m$ matrix from the definition of total derivative.
$$ \begin{align*} \mathbf{F}^{\prime} (\mathbf{x}) =&\ \begin{bmatrix} \dfrac{\partial f_{1}(\mathbf{g}(\mathbf{x}))}{\partial g_{1}} & \dfrac{\partial f_{1}}{\partial g_{2}} & \cdots & \dfrac{\partial f_{1}}{\partial g_{k}} \\[1em] \dfrac{\partial f_{2}}{\partial g_{1}} & \dfrac{\partial f_{2}}{\partial g_{2}} & \cdots & \dfrac{\partial f_{2}}{\partial g_{k}} \\[1em] \vdots & \vdots & \ddots & \vdots \\[1em] \dfrac{\partial f_{n}}{\partial g_{1}} & \dfrac{\partial f_{n}}{\partial g_{2}} & \cdots & \dfrac{\partial f_{n}}{\partial g_{k}} \end{bmatrix} \begin{bmatrix} \dfrac{\partial g_{1}(\mathbf{x})}{\partial x_{1}} & \dfrac{\partial g_{1}}{\partial x_{2}} & \cdots & \dfrac{\partial g_{1}}{\partial x_{m}} \\[1em] \dfrac{\partial g_{2}}{\partial x_{1}} & \dfrac{\partial g_{2}}{\partial x_{2}} & \cdots & \dfrac{\partial g_{2}}{\partial x_{m}} \\[1em] \vdots & \vdots & \ddots & \vdots \\[1em] \dfrac{\partial g_{k}}{\partial x_{1}} & \dfrac{\partial g_{k}}{\partial x_{2}} & \cdots & \dfrac{\partial g_{k}}{\partial x_{m}} \end{bmatrix} \\[1em] =&\ \begin{bmatrix} \dfrac{\partial f_{1}}{\partial g_{1}} \dfrac{\partial g_{1}}{\partial x_{1}}+\dfrac{\partial f_{1}}{\partial g_{2}}\dfrac{\partial g_{2}}{\partial x_{1}} + \cdots + \dfrac{\partial f_{1}}{\partial g_{k}} \dfrac{\partial g_{k}}{\partial x_{1}} & \dots & \dfrac{\partial f_{1}}{\partial g_{1}} \dfrac{\partial g_{1}}{\partial x_{1}}+\dfrac{\partial f_{1}}{\partial g_{2}}\dfrac{\partial g_{2}}{\partial x_{m}} + \cdots + \dfrac{\partial f_{1}}{\partial g_{k}} \dfrac{\partial g_{k}}{\partial x_{m}} \\[1em] \vdots & \ddots & \vdots \\[1em] \dfrac{\partial f_{n}}{\partial g_{1}} \dfrac{\partial g_{1}}{\partial x_{1}}+\dfrac{\partial f_{n}}{\partial g_{2}}\dfrac{\partial g_{2}}{\partial x_{1}} + \cdots + \dfrac{\partial f_{n}}{\partial g_{k}} \dfrac{\partial g_{k}}{\partial x_{1}} & \cdots & \dfrac{\partial f_{n}}{\partial g_{1}} \dfrac{\partial g_{1}}{\partial x_{1}}+\dfrac{\partial f_{n}}{\partial g_{2}}\dfrac{\partial g_{2}}{\partial x_{m}} + \cdots + \dfrac{\partial f_{n}}{\partial g_{m}} \dfrac{\partial g_{k}}{\partial x_{m}} \end{bmatrix} \\[1em] =&\ \begin{bmatrix} \displaystyle \sum\limits_{\ell =1}^{k} \dfrac{\partial f_{1}}{\partial g_{\ell}} \dfrac{\partial g_{\ell}}{\partial x_{1}} & \dots & \displaystyle \sum\limits_{\ell=1}^{k} \dfrac{\partial f_{1}}{\partial g_{\ell}} \dfrac{\partial g_{\ell}}{\partial x_{m}} \\[1em] \vdots & \ddots & \vdots \\[1em] \displaystyle \sum\limits_{\ell=1}^{k} \dfrac{\partial f_{n}}{\partial g_{\ell}} \dfrac{\partial g_{\ell}}{\partial x_{1}} & \dots & \displaystyle \sum\limits_{\ell=1}^{k} \dfrac{\partial f_{n}}{\partial g_{\ell}} \dfrac{\partial g_{\ell}}{\partial x_{m}} \end{bmatrix} \end{align*} $$
In Einstein notation, for $1 \le i \le n$, $1 \le j \le m$
$$ \mathbf{F}^{\prime} = \left[ F_{ij}^{\prime} \right] = \begin{bmatrix} \dfrac{\partial f_{1}}{\partial g_{\ell}} \dfrac{\partial g_{\ell}}{\partial x_{1}} & \dots & \dfrac{\partial f_{1}}{\partial g_{\ell}} \dfrac{\partial g_{\ell}}{\partial x_{m}} \\[1em] \vdots & \ddots & \vdots \\[1em] \dfrac{\partial f_{n}}{\partial g_{\ell}} \dfrac{\partial g_{\ell}}{\partial x_{1}} & \dots & \dfrac{\partial f_{n}}{\partial g_{\ell}} \dfrac{\partial g_{\ell}}{\partial x_{m}} \end{bmatrix} $$
$$ F_{ij}^{\prime} = \dfrac{\partial f_{i}}{\partial g_{\ell}} \dfrac{\partial g_{\ell}}{\partial x_{j}} $$
Since this is the most generalized form, various specific formulas can be obtained according to $k, m, n$.
Formulas
Case 1. $g : \mathbb{R} \to \mathbb{R}$, $f : \mathbb{R} \to \mathbb{R}$, $F = f \circ g : \mathbb{R} \to \mathbb{R}$
When $x \in \mathbb{R}$, $g = g(x)$, $f = f(g(x))$,
$$ F^{\prime} = \dfrac{d F}{d x} = \dfrac{d f}{d g} \dfrac{d g}{d x} $$
Case 2. $\mathbf{g} : \mathbb{R} \to \mathbb{R}^{k}$, $f : \mathbb{R}^{k} \to \mathbb{R}$, $F = f \circ \mathbf{g} : \mathbb{R} \to \mathbb{R}$
When $x \in \mathbb{R}$, $\mathbf{g}(x) = (g_{1}, \dots, g_{k})$, $f = f(g_{1}, \dots ,g_{k})$,
$$ F^{\prime} = \dfrac{d F}{d x} = \sum \limits_{\ell=1}^{k}\dfrac{\partial f}{\partial g_{\ell}} \dfrac{d g_{\ell}}{d x} $$
Case 3. $g : \mathbb{R}^{m} \to \mathbb{R}$, $f : \mathbb{R} \to \mathbb{R}$, $F = f \circ g : \mathbb{R}^{m} \to \mathbb{R}$
When $\mathbf{x} = (x_{1}, \dots, x_{n}) \in \mathbb{R}^{n}$, $g = g(\mathbf{x})$, $f = f(g(\mathbf{x}))$,
$$ F^{\prime} = \dfrac{d F}{d \mathbf{x}} = \begin{bmatrix} \dfrac{d f}{d g} \dfrac{\partial g}{\partial x_{1}} & \dots & \dfrac{d f}{d g} \dfrac{\partial g}{\partial x_{m}} \end{bmatrix} $$
$$ F_{j}^{\prime} = \dfrac{d f}{d g} \dfrac{\partial g}{\partial x_{j}},\quad 1 \le j \le m $$
Case 4. $\mathbf{g} : \mathbb{R}^{m} \to \mathbb{R}^{k}$, $f : \mathbb{R}^{k} \to \mathbb{R}$, $F = f \circ \mathbf{g} : \mathbb{R}^{m} \to \mathbb{R}$
When $\mathbf{x} = (x_{1}, \dots, x_{n}) \in \mathbb{R}^{n}$, $\mathbf{g}(x) = (g_{1}, \dots, g_{k})$, $f = f(g_{1}, \dots, g_{k})$,
$$ F^{\prime} = \dfrac{d F}{d \mathbf{x}} = \begin{bmatrix} \sum \limits_{\ell=1}^{k} \dfrac{\partial f}{\partial g_{\ell}} \dfrac{\partial g_{\ell}}{\partial x_{1}} & \dots & \sum \limits_{\ell=1}^{k} \dfrac{\partial f}{\partial g_{\ell}} \dfrac{\partial g_{\ell}}{\partial x_{m}} \end{bmatrix} $$
$$ F_{j}^{\prime} = \sum \limits_{\ell=1}^{k} \dfrac{\partial f}{\partial g_{\ell}} \dfrac{\partial g_{\ell}}{\partial x_{j}},\quad 1 \le j \le m $$
Case 5. $g : \mathbb{R} \to \mathbb{R}$, $\mathbf{f} : \mathbb{R} \to \mathbb{R}^{n}$, $\mathbf{F} = \mathbf{f} \circ g : \mathbb{R} \to \mathbb{R}^{n}$
When $x \in \mathbb{R}$, $g = g(x)$, $\mathbf{f}(g(x)) = (f_{1}, \dots, f_{n})$,
$$ \mathbf{F}^{\prime} = \dfrac{d \mathbf{F}}{d x} = \begin{bmatrix} \dfrac{d f_{1}}{d g} \dfrac{d g}{d x} \\[1em] \vdots \\[1em] \dfrac{d f_{n}}{d g} \dfrac{d g}{d x} \end{bmatrix} $$
$$ F_{i}^{\prime} = \dfrac{d f_{i}}{d g} \dfrac{d g}{d x},\quad 1\le i \le n $$
Case 6. $\mathbf{g} : \mathbb{R} \to \mathbb{R}^{k}$, $\mathbf{f} : \mathbb{R}^{k} \to \mathbb{R}^{n}$, $\mathbf{F} = \mathbf{f} \circ \mathbf{g} : \mathbb{R} \to \mathbb{R}^{n}$
When $x \in \mathbb{R}$, $\mathbf{g}(x) = (g_{1}, \dots, g_{k})$, $\mathbf{f}(g_{1}, \dots ,g_{k}) = (f_{1}, \dots, f_{n})$,
$$ \mathbf{F}^{\prime} = \dfrac{d \mathbf{F}}{d x} = \begin{bmatrix} \sum \limits_{\ell=1}^{k} \dfrac{\partial f_{1}}{\partial g_{\ell}} \dfrac{d g_{\ell}}{d x} \\[1em] \vdots \\[1em] \sum \limits_{\ell=1}^{k} \dfrac{\partial f_{n}}{\partial g_{\ell}} \dfrac{d g_{\ell}}{d x} \end{bmatrix} $$
$$ F_{i}^{\prime} = \sum \limits_{\ell=1}^{k} \dfrac{\partial f_{i}}{\partial g_{\ell}} \dfrac{d g_{\ell}}{d x},\quad 1\le i \le n $$
Case 7. $g : \mathbb{R}^{m} \to \mathbb{R}$, $\mathbf{f} : \mathbb{R} \to \mathbb{R}^{n}$, $\mathbf{F} = \mathbf{f} \circ g : \mathbb{R}^{m} \to \mathbb{R}^{n}$
When $\mathbf{x} = (x_{1}, \dots, x_{n}) \in \mathbb{R}^{n}$, $g = g(\mathbf{x})$, $\mathbf{f}(g(\mathbf{x})) = (f_{1}, \dots, f_{n})$,
$$ \mathbf{F}^{\prime} = \dfrac{d \mathbf{F}}{d \mathbf{x}} = \begin{bmatrix} \dfrac{d f_{1}}{d g} \dfrac{\partial g}{\partial x_{1}} & \dots & \dfrac{d f_{1}}{d g} \dfrac{\partial g}{\partial x_{m}} \\[1em] \vdots & \ddots & \vdots \\[1em] \dfrac{d f_{n}}{d g} \dfrac{\partial g}{\partial x_{1}} & \dots & \dfrac{d f_{n}}{d g} \dfrac{\partial g}{\partial x_{m}} \end{bmatrix} $$
$$ F_{ij}^{\prime} = \dfrac{d f_{i}}{d g} \dfrac{\partial g}{\partial x_{j}},\quad 1\le i \le n, 1 \le j \le m $$
Case 8. $\mathbf{g} : \mathbb{R}^{m} \to \mathbb{R}^{k}$, $\mathbf{f} : \mathbb{R}^{k} \to \mathbb{R}^{n}$, $\mathbf{F} = \mathbf{f} \circ \mathbf{g} : \mathbb{R}^{m} \to \mathbb{R}^{n}$
When $\mathbf{x} = (x_{1}, \dots, x_{n}) \in \mathbb{R}^{n}$, $g(\mathbf{x}) = (g_{1}, \dots, g_{k})$, $\mathbf{f}(g_{1}, \dots, g_{k}) = (f_{1}, \dots, f_{n})$,
$$ \mathbf{F}^{\prime} = \dfrac{d \mathbf{F}}{d \mathbf{x}} = \begin{bmatrix} \sum \limits_{\ell=1}^{k} \dfrac{\partial f_{1}}{\partial g_{\ell}} \dfrac{\partial g_{\ell}}{\partial x_{1}} & \dots & \sum \limits_{\ell=1}^{k} \dfrac{\partial f_{1}}{\partial g_{\ell}} \dfrac{\partial g_{\ell}}{\partial x_{m}} \\[1em] \vdots & \ddots & \vdots \\[1em] \sum \limits_{\ell=1}^{k} \dfrac{\partial f_{n}}{\partial g_{\ell}} \dfrac{\partial g_{\ell}}{\partial x_{1}} & \dots & \sum \limits_{\ell=1}^{k} \dfrac{\partial f_{n}}{\partial g_{\ell}} \dfrac{\partial g_{\ell}}{\partial x_{m}} \end{bmatrix} $$
$$ F_{ij}^{\prime} = \sum \limits_{\ell=1}^{k} \dfrac{\partial f_{i}}{\partial g_{\ell}} \dfrac{\partial g_{\ell}}{\partial x_{j}},\quad 1\le i \le n, 1 \le j \le m $$
Proof
Refer to the generalized proof.
■