Definition of Directional Derivative 📂Vector Analysis

Definition of Directional Derivative

Buildup

Let’s say a multivariable function $f = \mathbb{R}^{n} \to \mathbb{R}$ is given. When trying to calculate the derivative of $f$, unlike the case with a univariable function, one must consider the rate of change in ‘which direction’. A familiar example is the partial derivative. The partial derivative considers the rate of change with respect to only one variable. For instance, the partial derivative $\dfrac{\partial f}{\partial y}$ of $f=f(x,y,z)$ with respect to the variable $y$ takes into account the change in the function value of $f$ only in the direction of $(0,1,0)$.

The directional derivative is a concept meant to think about the rate of change towards any given direction, rather than in the directions of each variable separately.

Definition¹

Let’s assume a multivariable function $f = \mathbb{R}^{n} \to \mathbb{R}$ and a unit vector $\mathbf{u} \in \mathbb{R}^{n}$ are given. If the following limit exists, it is called the directional derivative of $f$ in the direction $\mathbf{u}$ at $\mathbf{x}$ and is denoted by $\nabla_{\mathbf{u}}f(\mathbf{x})$.

$$ \nabla _{\mathbf{u}} f(\mathbf{x}) := \lim \limits _{t \to 0} \dfrac{f (\mathbf{x} + t \mathbf{u}) - f(\mathbf{x})}{t} $$

Explanation

Partial Differentiation
$$ \dfrac{\partial f}{\partial x_{i}}(\mathbf{x}) = \lim \limits _{t \to 0} \dfrac{f (\mathbf{x} + t \mathbf{e}_{i}) - f(\mathbf{x})}{t} $$

The definition of the directional derivative only differs from that of partial differentiation in that the $\mathbf{e}_{i}$, which signifies the direction of each variable, is replaced by any arbitrary direction $\mathbf{u}$. By generalizing it this way, one can see that the partial derivative is a special case of the directional derivative.

The following notations are used.

$$ \nabla _{\mathbf{u}} f(\mathbf{x}) = f_{\mathbf{u}}^{\prime}(\mathbf{x}) = D _{\mathbf{u}} f(\mathbf{x}) = \partial_{\mathbf{u}}f(\mathbf{x}) = \dfrac{\partial f}{\partial \mathbf{u}}(\mathbf{x}) $$

Let’s assume there’s a fixed unit vector $\mathbf{u}$. Then, every time $f$ is given, $\nabla _{\mathbf{u}}f$ is determined, which means the vector $\mathbf{u}$ itself can be considered an operator. Therefore, notations such as $\mathbf{u}f$ or $\mathbf{u}[f]$ are also used. Especially in differential geometry, tangent vectors are treated as operators, and it’s thought that “tangent vector = differentiation”. Refer to See Also for more information.

From the theorem introduced below, it can be understood that the directional derivative can be expressed by partial derivatives.

Furthermore, it can be shown that the value of the directional derivative is greatest when $\mathbf{u}$ is in the same direction as the gradient $\nabla f$, hence the direction of $\nabla f$ is the same as that of the direction in which the rate of change of $f$ is the highest. Thus, it can be considered that the gradient notation does not have a subscript in $\nabla$ because it is the directional derivative in ’that highest rate of change direction’.

Theorem

The following equation holds between the directional derivative $\nabla _{\mathbf{u}} f$ of $f$ and its gradient $\nabla f$.

$$ \nabla _{\mathbf{u}} f = \nabla f \cdot \mathbf{u} = \dfrac{\partial f}{\partial x_{1}} u_{1} + \dfrac{\partial f}{\partial x_{2}} u_{2} + \dots + \dfrac{\partial f}{\partial x_{n}} u_{n} $$

Proof

Let’s say $g (t) = f (\mathbf{x} + t \mathbf{u})$. If we find the derivative of $g$, since the derivative of a scalar function is the gradient, by the chain rule,

$$ g^{\prime} (t) = f ^{\prime} (\mathbf{x} + t \mathbf{u}) \cdot \mathbf{u} = \nabla f (\mathbf{x} + t \mathbf{u}) \cdot \mathbf{u} $$

Then we obtain the following.

$$ g^{\prime} (0) = \nabla f (\mathbf{x}) \cdot \mathbf{u} $$

Furthermore, by the definition of the directional derivative, the following holds.

$$ \nabla _{\mathbf{u}} f = \lim \limits _{t \to 0} \dfrac{f (\mathbf{x} + t \mathbf{u}) - f(\mathbf{x})}{t} = \lim \limits _{t \to 0} \dfrac{ g(t) - g(0)}{t} = g^{\prime}(0) = \nabla f (\mathbf{x}) \cdot \mathbf{u} $$

■