The Chain Rule of Differentiation in Analysis
Theorem1
If $f :[a,b] \to \mathbb{R}$ is a continuous function and is differentiable at $x\in [a,b]$, and if $g : f([a,b])\to \mathbb{R}$ is differentiable at $f (x)\in f([a,b])$, and if we define $h : [a,b] \to \mathbb{R}$ as follows.
$$ h(t)=g\left( f(t) \right)\quad (a\le t \le b) $$
Then, $h$ is differentiable at $x$ and its value is as follows.
$$ h^{\prime}(x)=g^{\prime}(f(x))f^{\prime}(x) $$ Using the composite function symbol, it can be represented as: $$ ( g \circ f)^{\prime}(x)=g^{\prime}(f(x))f^{\prime}(x) $$
Explanation
This result is commonly referred to as the chain rule.
Here, $f^{\prime}(x)$ is also called the inner derivative. If we denote $y=f(x)$, $z=g(y)$, and represent it using Leibniz’s notation, it can be expressed as follows. $$ \frac{dz}{dx}=\frac{dz}{dy}\frac{dy}{dx} $$
The reason why Leibniz’s notation is convenient is because the left side of the above equation looks as though it’s being simplified like the right side. $\dfrac{dy}{dx}$ is not “dx over dy” but the derivative of $y$, yet treating it like a fraction perfectly fits its meaning.
Proof
First, let’s define the function $G$ as follows.
$$ G(f(t)) :=\begin{cases} \frac{g(f(x))-g(f(t))}{f(x)-f(t)} -g^{\prime}(f(x)) & f(t) \ne f(x) \\ 0 & f(t)=f(x)\end{cases},\quad (t\in[a,b]) $$
Then, for all $f(t)$, the following holds.
$$ \lim \limits_{ f(s) \to f(t) } G(f(s))=G(f(t)) $$
Since this is a condition for continuity, $G$ is a continuous function. Furthermore, the following holds.
$$ h(x)-h(t) = g(f(x))-g(f(t))=\Big( f(x)-f(t) \Big) \Big( g^{\prime}(f(x))+G(f(t)) \Big) $$
Then, by the properties of limits, the equation below holds.
$$ \begin{align*} h^{\prime}(x) =&\ \lim \limits_{t \to x} \frac{ h(x)-h(t)}{x-t} \\ =&\ \lim \limits_{t \to x} \frac{ \Big( f(x)-f(t) \Big) \Big( g^{\prime}(f(x))+G(f(t)) \Big)}{x-t} \\ =&\ \lim \limits_{t \to x} \left[ g^{\prime}(f(x))\frac{ f(x)-f(t) }{x-t}+G(f(t))\frac{f(x)-f(t) }{x-t} \right] \\ =&\ \lim \limits_{t \to x} \left[ g^{\prime}(f(x))\frac{ f(x)-f(t) }{x-t}\right]+\lim \limits_{t \to x} \left[G(f(t))\frac{ f(x)-f(t) }{x-t} \right] \\ =&\ \lim \limits_{t \to x} g^{\prime}(f(x))\lim \limits_{t \to x}\frac{ f(x)-f(t) }{x-t}+\lim \limits_{t \to x}G(f(t))\lim \limits_{t \to x}\frac{ f(x)-f(t) }{x-t} \\ =&\ g^{\prime}(f(x))f^{\prime}(x)+0\cdot f^{\prime}(x) \\ =&\ g^{\prime}(f(x))f^{\prime}(x) \end{align*} $$
■
Walter Rudin, Principles of Mathmatical Analysis (3rd Edition, 1976), p105 ↩︎