Newton-Raphson Method
Methods 1
Let’s suppose that $f,f’,f’’$ is continuous near $\alpha$ and define it as $f(\alpha) = 0, f '(\alpha) \ne 0$.
For an initial value $x_{0}$, close enough to $\alpha$, the sequence $\left\{ x_{n} \right\}$ defined by $$ x_{n+1} := x_{n} - {{ f ( x_{n} ) } \over { f ' ( x_{n} ) }} $$ converges quadratically to $\alpha$ when $n \to \infty$.
Explanation
The Newton-Raphson method, also known as the Newton method, is useful for finding an approximate solution to equation $f(x) = 0$ due to its simplicity and fast convergence rate, despite requiring conditions such as differentiability and continuity.
To converge quadratically means to converge at a rate proportional to $2$th power during discussion on convergence rates. Although there’s no direct translation for this expression, unlike linear convergence which translates to ’linear’, we opted to use the English term directly due to its precise and rare usage. Nonetheless, it is mentioned to mathematically demonstrate its faster convergence compared to the bisection method in most cases.
However, as one can infer from phrases like ’near $\alpha$’ or ‘a sufficiently close initial value’, it’s not a method that can be thoughtlessly applied. As shown above, initializing with unlucky values could lead to division by $0$ or drive the sequence far from the solution. Therefore, code implementations need error handling, such as stopping calculations if a solution isn’t found within a certain number of iterations.
Such divergence errors, despite mathematical proofs of convergence, still pose a significant issue in practice, mainly because users might not accurately determine ’near $\alpha$’ or ‘a sufficiently close initial value’. Compared to the bisection method, which guarantees a solution albeit slower, this clearly is a disadvantage.
See Also
- Generalized Newton-Raphson method for higher dimensions
- Newton-Fourier method: Suppose $f,f’,f’’$ is continuous on the closed interval $[a,b]$. If $f$ increases or decreases on $[a,b]$, holding a solution, then the sequence $\left\{ x_{n} \right\}$ defined by $\displaystyle x_{n+1} = x_{n} - {{ f ( x_{n} ) } \over { f ' ( x_{n} ) }}$ converges quadratically to $\alpha$ when $n \to \infty$.
Considering stronger conditions enables Newton-Fourier method to converge quadratically while guaranteeing convergence. However, this method fundamentally installs the closed interval $[a,b]$ into the ’neighborhood of $\alpha$’ akin to Newton-Raphson method. Hence, with all conditions checked, it becomes identical to the Newton-Raphson method, eliminating the need for separate code.
Proof
Strategy: Derive the $2$th term using Taylor expansion, then create a recursive inequality to demonstrate convergence.
Part 1. Taylor Expansion
Expanding $f$ around $x_{n}$ gives us, for some $\xi_{n}$ between $x$ and $x_{n}$, $$ f(x) = f(x_{n} ) + (x - x_{n}) f '(x_{n}) + {{(x - x_{n})^2 } \over {2}} f '' (\xi_{n}) $$ Since $f( \alpha ) = 0$, substituting $x = \alpha$ we get $$ 0 = f(x_{n} ) + ( \alpha - x_{n}) f '(x_{n}) + {{( \alpha - x_{n})^2 } \over {2}} f '' (\xi_{n}) $$ Dividing both sides by $f ' (x_{n})$ yields $$ 0 = {{f(x_{n} )} \over {f ' (x_{n})}} + \alpha - x_{n} + {{( \alpha - x_{n})^2 } \over {2}} {{f’’ (\xi_{n})} \over { f '(x_{n}) }} $$ Hence $\displaystyle {{ f ( x_{n} ) } \over { f ' ( x_{n} ) }} - x_{n} = - x_{n+1}$, and by rearranging $$ 0 = \alpha - x_{n+1} + {{( \alpha - x_{n})^2 } \over {2}} {{f’’ (\xi_{n})} \over { f '(x_{n}) }} $$ By collecting terms we get $$ \begin{equation} \displaystyle \alpha - x_{n+1} = - {{f’’ (\xi_{n})} \over { 2 f '(x_{n}) }} ( \alpha - x_{n})^2 \end{equation} $$
Part 2. Convergence
$f '$ is continuous near $\alpha$ and because of $f ' (\alpha) \ne 0$, there exists $I_{\delta} : = [ \alpha - \delta , \alpha + \delta ]$ for every $x \in I_{\delta}$ satisfying $f ' (x) \ne 0$.
If we set $\displaystyle M( \delta ) := {{ \max_{x \in I_{\delta} } | f '' (x) | } \over { 2 \min_{x \in I_{\delta} } | f '(x) | }}$, by choosing $\delta$ smaller each time, $\displaystyle \max_{x \in I_{\delta} } | f '' (x) |$ doesn’t increase and $\displaystyle \min_{x \in I_{\delta} } | f '(x) |$ doesn’t decrease, thus the whole of $M ( \delta )$ doesn’t increase. Therefore, we can determine a sufficiently small $\delta = \delta_{0}$ that makes $M(\delta) | \alpha - x_{0} | < 1$ true. This $x_{0}$ is sufficiently close to $\alpha$.
Now, let’s denote this by $\delta_{0}$.
Taking absolute values of both sides of $(1)$ we get $$ | \alpha - x_{n+1}| \le M | \alpha - x_{n} |^2 $$ Multiplying both sides by $M$ yields $$ M | \alpha - x_{n+1}| \le M^2 | \alpha - x_{n} |^2 $$ Encapsulating the right side into a square gives $$ M | \alpha - x_{n+1}| \le ( M | \alpha - x_{n} | )^2 $$ Repeating this process till we derive $x_{0}$ from $x_{n+1}$, $$ M | \alpha - x_{n+1}| \le ( M | \alpha - x_{n} | )^2 \le ( M | \alpha - x_{n-1} | )^4 \le \cdots \le ( M | \alpha - x_{0} | )^{2^{n+1}} $$ Then summarizing, $$ | \alpha - x_{n} | \le {{1 } \over {M}} ( M | \alpha - x_{0} | )^{2^{n}} $$ Since $M | \alpha - x_{0} | < 1$, when $n \to \infty$ then $x_{n} \to \alpha$
Part 3. Quadratic Convergence
Given $(1)$, when $x_{n} \to \alpha$ since $\displaystyle {{ \alpha - x_{n+1} } \over { ( \alpha - x_{n})^2 }} = - {{f’’ (\xi_{n})} \over { 2 f '(x_{n}) }}$, then $$ \lim_{n \to \infty} {{\alpha - x_{n+1} } \over { ( \alpha - x_{n} )^2 }} = - \lim_{n \to \infty} {{f '' ( \xi_{n} )} \over { 2 f ' ( x_{n} ) }} = -{{f '' (\alpha)} \over { 2 f ' ( \alpha ) }} $$
■
Although it was mentioned in the explanation that convergence isn’t always guaranteed, in practice, selecting a sufficiently small $I$ is crucial.
Implementation
Below is the code written in R. If df
is provided as the derivative, the calculation proceeds directly with the derivative, otherwise, the numerical derivative is used. The itmax
option is the maximum number of iterations, and by default, if a sufficiently accurate solution isn’t achieved after $1000$ iterations, the calculation is abandoned.
d<-function(f,x,tol=10^(-8))
{
h<-1
d1<-1
d2<-0
while(abs(d1-d2)>tol)
{
d1<-d2
d2<-(f(x+h)-f(x))/h
h<-h/2
}
return(d2)
}
Newton<-function(f,x0,df=FALSE,itmax=10^3,tol=10^(-8))
{
if(f(x0)==0){return(x0)}
denom=0
x1 = x0 - f(x0)/d(f,x0)
for(i in 1:itmax)
{
if(is.function(df)){
denom=df(x1)
}else{
denom=d(f,x1)
}
if(denom==0){stop('Zero Derivative\')}
x2 = x1 - f(x1)/denom
if(abs(x2-x1)<tol){
return(x2)
}else{
x1 = x2
}
}
stop('Maybe Wrong Initial Point')
}
f<-function(x) {x^3 + 8}
Newton(f,7)
g<-function(x) {x^6 - x - 1}
Newton(g,3)
h<-function(x) {exp(x)-1}
Newton(h,-2)
Atkinson. (1989). An Introduction to Numerical Analysis(2nd Edition): p58. ↩︎