Chain Rule for Fréchet Derivatives
📂Banach Space Chain Rule for Fréchet Derivatives Theorem Let’s assume ( X , ∥ ⋅ ∥ X ) , ( Y , ∥ ⋅ ∥ Y ) , ( Z , ∥ ⋅ ∥ Z ) (X, \left\| \cdot \right\|_{X}), (Y, \left\| \cdot \right\|_{Y}), (Z, \left\| \cdot \right\|_{Z}) ( X , ∥ ⋅ ∥ X ) , ( Y , ∥ ⋅ ∥ Y ) , ( Z , ∥ ⋅ ∥ Z ) is a Banach space . Let Ω ⊂ X \Omega \subset X Ω ⊂ X , U ⊂ Y U \subset Y U ⊂ Y be open sets . And functions F : Ω → Y F : \Omega \to Y F : Ω → Y , G : U → Z G : U \to Z G : U → Z are given. Then, F ( Ω ) ⊂ U F(\Omega) \subset U F ( Ω ) ⊂ U is satisfied. Now, let’s assume F F F is differentiable at x ∈ Ω x\in\Omega x ∈ Ω in the sense of [Fréchet], and G G G is differentiable at z = F ( x ) ∈ U z=F(x)\in U z = F ( x ) ∈ U . Then, H : = G ∘ F H:=G \circ F H := G ∘ F is also differentiable at x ∈ Ω x\in \Omega x ∈ Ω and the following equation holds:
D H ( x ) = D G ( z ) D F ( x ) = D G ( F ( x ) ) ⋅ D F ( x )
DH(x) = DG(z)DF(x)=DG\big( F(x) \big)\cdot DF(x)
DH ( x ) = D G ( z ) D F ( x ) = D G ( F ( x ) ) ⋅ D F ( x )
Explanation Naturally, the chain rule applies to Fréchet derivatives as well.
Proof First, let’s assume R , R 1 R, R_{1} R , R 1 is as follows.
R ( x , y ) = F ( x + y ) − F ( x ) − D F ( x ) y , ∀ y ∈ X , x + y ∈ Ω
\begin{equation}
R(x,y)=F(x+y)-F(x)-DF(x)y,\quad \forall y\in X,\ x+y\in \Omega
\end{equation}
R ( x , y ) = F ( x + y ) − F ( x ) − D F ( x ) y , ∀ y ∈ X , x + y ∈ Ω
R 1 ( z , w ) = G ( z + w ) − G ( z ) − D G ( z ) w , ∀ w ∈ Y , z + w ∈ U
\begin{equation}
R_{1}(z,w)=G(z+w)-G(z)-DG(z)w,\quad \forall w\in Y,\ z+w\in U
\end{equation}
R 1 ( z , w ) = G ( z + w ) − G ( z ) − D G ( z ) w , ∀ w ∈ Y , z + w ∈ U
Then, by assumption, since F F F is differentiable at x x x and G G G is differentiable at z z z ,
lim ∥ y ∥ X → 0 ∥ R ( x , y ) ∥ Y ∥ y ∥ X = 0 = lim ∥ w ∥ Y → 0 ∥ R 1 ( z , w ) ∥ Z ∥ w ∥ Y
\begin{equation}
\lim \limits_{\|y\|_{X} \to 0} \frac{\| R(x,y)\|_{Y}}{\|y\|_{X}}=0= \lim \limits_{\|w\|_{Y} \to 0} \frac{\| R_{1}(z,w)\|_{Z}}{\|w\|_{Y}}
\end{equation}
∥ y ∥ X → 0 lim ∥ y ∥ X ∥ R ( x , y ) ∥ Y = 0 = ∥ w ∥ Y → 0 lim ∥ w ∥ Y ∥ R 1 ( z , w ) ∥ Z
Moreover, by ( 1 ) (1) ( 1 ) , for x + y ∈ Ω x+y\in \Omega x + y ∈ Ω which is y ∈ X y\in X y ∈ X ,
H ( x + y ) = G ( F ( x + y ) ) = G ( F ( x ) + D F ( x ) y + R ( x , y ) )
\begin{align*}
H(x+y) =&\ G\big( F(x+y) \big)
\\ =&\ G\big( F(x)+DF(x)y+R(x,y) \big)
\end{align*}
H ( x + y ) = = G ( F ( x + y ) ) G ( F ( x ) + D F ( x ) y + R ( x , y ) )
If we assume D F ( x ) y + R ( x , y ) = W ′ DF(x)y+R(x,y)=W^{\prime} D F ( x ) y + R ( x , y ) = W ′ , since G G G is linear and due to z = F ( x ) z=F(x) z = F ( x ) , by ( 2 ) (2) ( 2 ) ,
H ( x + y ) = G ( z + W ′ ) = G ( z ) + D G ( z ) W ′ + R 1 ( z , W ′ ) = G ( z ) + D G ( z ) ( D F ( x ) y + R ( x , y ) ) + R 1 ( z , D F ( x ) y + R ( x , y ) ) = H ( x ) + D G ( z ) D F ( x ) y + D G ( z ) R ( x , y ) + R 1 ( z , D F ( x ) y + R ( x , t ) )
\begin{align*}
H(x+y) =&\ G(z+W^{\prime})
\\ =&\ G(z)+DG(z)W^{\prime}+R_{1}(z,W^{\prime})
\\ =&\ G(z)+DG(z)\big( DF(x)y+ R(x,y) \big) + R_{1}(z, DF(x)y+R(x,y) \big)
\\ =&\ H(x)+DG(z)DF(x)y+DG(z)R(x,y)+ R_{1}\big(z,DF(x)y+R(x,t) \big) \tag{4}
\end{align*}
H ( x + y ) = = = = G ( z + W ′ ) G ( z ) + D G ( z ) W ′ + R 1 ( z , W ′ ) G ( z ) + D G ( z ) ( D F ( x ) y + R ( x , y ) ) + R 1 ( z , D F ( x ) y + R ( x , y ) ) H ( x ) + D G ( z ) D F ( x ) y + D G ( z ) R ( x , y ) + R 1 ( z , D F ( x ) y + R ( x , t ) ) ( 4 )
Let the last two terms be R 2 ( x , y ) R_2(x,y) R 2 ( x , y ) , and assume f f f is as follows.
R 2 ( x , y ) = D G ( z ) R ( x , y ) + R 1 ( z , D F ( x ) y + R ( x , y ) ) ∈ Z
R_2(x,y)=DG(z)R(x,y)+R_{1}\big( z, DF(x)y +R(x,y) \big) \in Z
R 2 ( x , y ) = D G ( z ) R ( x , y ) + R 1 ( z , D F ( x ) y + R ( x , y ) ) ∈ Z
f ( w ) = { ∥ R 1 ( z , w ) ∥ Z ∥ w ∥ Y ∀ w ∈ Y , z + w ∈ U , w ≠ 0 0 w = 0
f(w) = \begin{cases} \dfrac{ \| R_{1}(z,w) \|_{Z}}{\|w\|_{Y}} \quad & \forall w \in Y, z+w\in U, w \ne 0
\\ 0 & w=0 \end{cases}
f ( w ) = ⎩ ⎨ ⎧ ∥ w ∥ Y ∥ R 1 ( z , w ) ∥ Z 0 ∀ w ∈ Y , z + w ∈ U , w = 0 w = 0
Then, you can verify that lim ∥ y ∥ → 0 ∥ R 2 ( x , y ) ∥ Z ∥ y ∥ X = 0 \lim \limits_{\| y\| \to 0} \dfrac{\|R_2(x,y)\|_{Z}}{\|y \|_{X}}=0 ∥ y ∥ → 0 lim ∥ y ∥ X ∥ R 2 ( x , y ) ∥ Z = 0 holds. By the definition of the norm , the triangle inequality holds, and since ∥ L x ∥ ≤ ∥ L ∥ ∥ x ∥ \|L x\|\le \|L\| \|x\| ∥ Lx ∥ ≤ ∥ L ∥∥ x ∥ ,
∥ R 2 ( x , y ) ∥ Z ∥ y ∥ X ≤ ∥ D G ( z ) R ( x , y ) ∥ Z ∥ y ∥ X + ∥ R 1 ( z , D F ( x ) y + R ( x , y ) ) ∥ Z ∥ y ∥ X ≤ ∥ D G ( z ) ∥ ∥ R ( x , y ) ∥ Y ∥ y ∥ X + ∥ R 1 ( z , D F ( x ) y + R ( x , y ) ) ∥ Z ∥ y ∥ X
\begin{align*}
\frac{\| R_2(x,y) \|_{Z}}{\|y \|_{X}} \color{red}{\le}& \frac{\| DG(z)R(x,y) \|_{Z} }{\| y\|_{X}} +\frac{\|R_{1} \big( z, DF(x)y+R(x,y) \big)\|_{Z}}{\|y\|_{X}}
\\[1em] \color{green}{\le}& \|DG(z)\| \frac{\| R(x,y)\|_{Y}}{\| y\|_{X}} +\frac{\|R_{1} \big(z, DF(x)y+R(x,y) \big)\|_{Z}}{\|y\|_{X}}
\end{align*}
∥ y ∥ X ∥ R 2 ( x , y ) ∥ Z ≤ ≤ ∥ y ∥ X ∥ D G ( z ) R ( x , y ) ∥ Z + ∥ y ∥ X ∥ R 1 ( z , D F ( x ) y + R ( x , y ) ) ∥ Z ∥ D G ( z ) ∥ ∥ y ∥ X ∥ R ( x , y ) ∥ Y + ∥ y ∥ X ∥ R 1 ( z , D F ( x ) y + R ( x , y ) ) ∥ Z
Also, due to the definition of f f f and the triangle inequality ,
∥ D G ( z ) ∥ ∥ R ( x , y ) ∥ Y ∥ y ∥ X + ∥ R 1 ( z , D F ( x ) y + R ( x , y ) ) ∥ Z ∥ y ∥ X = ∥ D G ( z ) ∥ ∥ R ( x , y ) ∥ Y ∥ y ∥ X + ∥ R 1 ( z , D F ( x ) y + R ( x , y ) ) ∥ Z ∥ D F ( x ) y + R ( x , y ) ∥ Y ∥ D F ( x ) y + R ( x , y ) ∥ Y ∥ y ∥ X = ∥ D G ( z ) ∥ ∥ R ( x , y ) ∥ Y ∥ y ∥ X + f ( D F ( x ) y + R ( x , y ) ) ∥ D F ( x ) y + R ( x , y ) ∥ Y ∥ y ∥ X ≤ ∥ D G ( z ) ∥ ∥ R ( x , y ) ∥ Y ∥ y ∥ X + f ( D F ( x ) y + R ( x , y ) ) [ ∥ D F ( x ) y ∥ Y ∥ y ∥ X + ∥ R ( x , y ) ∥ Y ∥ y ∥ X ] ≤ ∥ D G ( z ) ∥ ∥ R ( x , y ) ∥ Y ∥ y ∥ X + f ( D F ( x ) y + R ( x , y ) ) [ ∥ D F ( x ) ∥ ∥ y ∥ X ∥ y ∥ X + ∥ R ( x , y ) ∥ Y ∥ y ∥ X ]
\begin{array}{ll} & \|DG(z)\| \dfrac{\| R(x,y)\|_{Y}}{\| y\|_{X}} +\dfrac{\|R_{1} \big(z, DF(x)y+R(x,y) \big)\|_{Z}}{\|y\|_{X}}
\\[1.5em] =&\ \|DG(z)\| \dfrac{\| R(x,y)\|_{Y}}{\| y\|_{X}} +\dfrac{\|R_{1} \big(z, DF(x)y+R(x,y) \big)\|_{Z}}{\|DF(x)y +R(x,y)\|_{Y}}\dfrac{\|DF(x)y +R(x,y)\|_{Y}}{\|y\|_{X}}
\\[1.5em] \color{magenta}{=}& \|DG(z)\| \dfrac{\| R(x,y)\|_{Y}}{\| y\|_{X}} +f\big( DF(x)y +R(x,y) \big)\dfrac{\|DF(x)y +R(x,y)\|_{Y}}{\|y\|_{X}}
\\[1.5em] \color{red}{\le}& \|DG(z)\| \dfrac{\| R(x,y)\|_{Y}}{\| y\|_{X}} +f\big( DF(x)y +R(x,y) \big)\Bigg[\dfrac{\|DF(x)y\|_{Y}}{\|y\|_{X}} +\dfrac{\|R(x,y)\|_{Y}}{\|y\|_{X}} \Bigg]
\\[1.5em] \color{green}{\le}& \|DG(z)\| \dfrac{\| R(x,y)\|_{Y}}{\| y\|_{X}} +f\big( DF(x)y +R(x,y) \big)\Bigg[\|DF(x)\|\dfrac{\|y\|_{X}}{\|y\|_{X}} +\dfrac{\|R(x,y)\|_{Y}}{\|y\|_{X}} \Bigg] \end{array}
= = ≤ ≤ ∥ D G ( z ) ∥ ∥ y ∥ X ∥ R ( x , y ) ∥ Y + ∥ y ∥ X ∥ R 1 ( z , D F ( x ) y + R ( x , y ) ) ∥ Z ∥ D G ( z ) ∥ ∥ y ∥ X ∥ R ( x , y ) ∥ Y + ∥ D F ( x ) y + R ( x , y ) ∥ Y ∥ R 1 ( z , D F ( x ) y + R ( x , y ) ) ∥ Z ∥ y ∥ X ∥ D F ( x ) y + R ( x , y ) ∥ Y ∥ D G ( z ) ∥ ∥ y ∥ X ∥ R ( x , y ) ∥ Y + f ( D F ( x ) y + R ( x , y ) ) ∥ y ∥ X ∥ D F ( x ) y + R ( x , y ) ∥ Y ∥ D G ( z ) ∥ ∥ y ∥ X ∥ R ( x , y ) ∥ Y + f ( D F ( x ) y + R ( x , y ) ) [ ∥ y ∥ X ∥ D F ( x ) y ∥ Y + ∥ y ∥ X ∥ R ( x , y ) ∥ Y ] ∥ D G ( z ) ∥ ∥ y ∥ X ∥ R ( x , y ) ∥ Y + f ( D F ( x ) y + R ( x , y ) ) [ ∥ D F ( x ) ∥ ∥ y ∥ X ∥ y ∥ X + ∥ y ∥ X ∥ R ( x , y ) ∥ Y ]
Firstly, since lim ∥ y ∥ X → 0 ∥ R ( x , y ) ∥ Y ∥ y ∥ X = 0 \lim \limits_{\| y\|_{X} \to 0} \dfrac{\| R(x,y)\|_{Y}}{\| y\|_{X}}=0 ∥ y ∥ X → 0 lim ∥ y ∥ X ∥ R ( x , y ) ∥ Y = 0 , the first term is 0 0 0 when ∥ y ∥ → 0 \| y\| \to 0 ∥ y ∥ → 0 . According to ( 3 ) (3) ( 3 ) and the definition of f f f , when D F ( x ) y + R ( x , y ) → 0 DF(x)y+R(x,y) \to 0 D F ( x ) y + R ( x , y ) → 0 , it is f → 0 f \to 0 f → 0 . As we assume differentiability, and since D F ( x ) DF(x) D F ( x ) is bounded linear , when ∥ y ∥ → 0 \|y\| \to 0 ∥ y ∥ → 0 , it is D F ( x ) y → 0 DF(x)y \to 0 D F ( x ) y → 0 . Also, the very last term also converges to 0 0 0 by the assumption of differentiability. Therefore,
lim ∥ y ∥ → 0 ∥ R 2 ( x , y ) ∥ Z ∥ y ∥ X ≤ ∥ D G ( z ) ∥ ⋅ 0 + 0 ⋅ [ ∥ D F ( x ) ∥ + 0 ] = 0
\lim \limits_{\| y\| \to 0} \frac{\| R_2(x,y) \|_{Z}}{\|y \|_{X}}\le \|DG(z) \| \cdot 0 + 0\cdot \Big[ \|DF(x)\| + 0 \Big] =0
∥ y ∥ → 0 lim ∥ y ∥ X ∥ R 2 ( x , y ) ∥ Z ≤ ∥ D G ( z ) ∥ ⋅ 0 + 0 ⋅ [ ∥ D F ( x ) ∥ + 0 ] = 0
Applying this result to ( 4 ) (4) ( 4 ) ,
H ( x + y ) − H ( x ) + D G ( z ) D F ( x ) y = R 2 ( x , y )
H(x+y)-H(x)+DG(z)DF(x)y=R_2(x,y)
H ( x + y ) − H ( x ) + D G ( z ) D F ( x ) y = R 2 ( x , y )
⟹ ∥ H ( x + y ) − H ( x ) + D G ( z ) D F ( x ) y ∥ Z ∥ y ∥ X = ∥ R 2 ( x , y ) ∥ Z ∥ y ∥ X
\implies \frac{\left\|H(x+y)-H(x)+DG(z)DF(x)y\right\|_{Z}}{\|y\|_{X}}=\frac{\left\| R_2(x,y)\right\|_{Z} }{\|y\|_{X}}
⟹ ∥ y ∥ X ∥ H ( x + y ) − H ( x ) + D G ( z ) D F ( x ) y ∥ Z = ∥ y ∥ X ∥ R 2 ( x , y ) ∥ Z
⟹ lim ∥ y ∥ X → 0 ∥ H ( x + y ) − H ( x ) + D G ( z ) D F ( x ) y ∥ Z ∥ y ∥ X = lim ∥ y ∥ X → 0 ∥ R 2 ( x , y ) ∥ Z ∥ y ∥ X = 0
\implies \lim \limits_{\|y\|_{X} \to 0}\frac{\left\|H(x+y)-H(x)+DG(z)DF(x)y\right\|_{Z}}{\|y\|_{X}}=\lim \limits_{\|y\|_{X} \to 0}\frac{\left\| R_2(x,y)\right\|_{Z} }{\|y\|_{X}}=0
⟹ ∥ y ∥ X → 0 lim ∥ y ∥ X ∥ H ( x + y ) − H ( x ) + D G ( z ) D F ( x ) y ∥ Z = ∥ y ∥ X → 0 lim ∥ y ∥ X ∥ R 2 ( x , y ) ∥ Z = 0
Therefore, based on the definition of differentiability , H H H is differentiable at x ∈ Ω x\in \Omega x ∈ Ω , and the derivative of H H H is
D H ( x ) = D G ( z ) D F ( x )
DH(x)=DG(z)DF(x)
DH ( x ) = D G ( z ) D F ( x )
■