Polynomial Interpolation
Definition 1
For different $x_{0} , \cdots , x_{n}$ data $(x_{0}, y_{0}) , \cdots , (x_{n} , y_{n})$, a polynomial function $p$ that satisfies $p (x_{i} ) = y_{i}$ and $\deg p \le n$ is called Polynomial Interpolation.
Theorem
Existence and Uniqueness
- [1]: For the given data, there exists a unique $p$.
Lagrange’s Formula
- [2]: $$p_{n} (x) = \sum_{i=0}^{n} y_{i} l_{i} (X)$$
Newton’s Divided Difference Formula
- [3]: $$p_{n} (x) = f(x_{0}) + \sum_{i=1}^{n} f [ x_{0} , \cdots , x_{i} ] \prod_{j=0}^{i-1} (x - x_{j} )$$
Error Analysis
- [4]: For a $(n+1)$ times differentiable function $f : \mathbb{R} \to \mathbb{R}$ and some $\xi \in \mathscr{H} \left\{ x_{0} , \cdots , x_{n} \right\}$, the polynomial interpolation $p_{n}$ of $f$ satisfies the following for some $t \in \mathbb{R}$. $$\displaystyle f(t) - p_{n} (t) = {{ (t - x_{0}) \cdots (t - x_{n}) } \over { (n+1)! }} f^{(n+1)} ( \xi )$$
- $\mathscr{H} \left\{ a,b,c, \cdots \right\}$ represents the smallest interval that includes $a,b,c, \cdots$.
Explanation
Polynomial interpolation is simplified in Korean as 다항식 보간법.
Condition $\deg p \le n$
The condition $\deg p \le n$ implies there’s not always a guarantee that a $p$ satisfying $\deg p = n$ exists. For instance, when $n=2$, if the three points lie in a straight line as shown above, a $p_{2} (x) = a_{2} x^{2} + a_{1} x + a_{0}$ passing through $(n+1)=3$ points does not exist, while a lower degree $p_{1} (x) = a_{1} x + a_{0}$ does. This actually means that $p_{2} (x) = a_{2} x^{2} + a_{1} x + a_{0}$ was found, but in cases where $a_{2} = 0$.
Lagrange’s Formula and Newton’s Divided Difference Formula are the same
Though formulas [2] and [3] appear different, they are essentially the same due to [1]. The core difference between the two formulas is merely how $p_{n}$ is represented. Functionally, the significant difference is that Newton’s Divided Difference Formula is easier to update when new data is added.
Error with the Actual Function
Theorem [4] shows how much difference there is between the interpolating $p_{n}$ and the actual function $f$. Generally, the divergence rate of $(n+1)!$ is very fast, so the accuracy of interpolation $p_{n}$ increases with more data. However, this formula does not necessarily discuss convergence. For a simple example, consider when $t$ is very, very far from $\mathscr{H} \left\{ x_{0} , \cdots , x_{n} \right\}$. Moreover, if the value of $f$ worsens with each differentiation, and that even surpasses the divergence rate of $(n+1)!$, it can turn weird.
Proof
[1]
Strategy: Show the uniqueness of the coefficients $a_{0} , a_{1} , \cdots , a_{n}$ of $p_{n}$ that satisfy $y_{i} = p_{n} (x_{i}) = a_{0} + a_{1} x_{i} + \cdots + a_{n} x_{i}^{n}$ for all $i = 0, \cdots , n$ by representing a system of $(n+1)$ equations in matrix form and simultaneously ensuring the existence and uniqueness of the inverse matrix.
Define $$ \mathbf{y} := \begin{bmatrix} y_{0} \\ y_{1} \\ \vdots \\ y_{n} \end{bmatrix}, X := \begin{bmatrix} 1 & x_{0} & \cdots & x_{0}^{n} \\ 1 & x_{1} & \cdots & x_{1}^{n} \\ \vdots & \vdots & \ddots & \vdots \\ 1 & x_{n} & \cdots & x_{n}^{n} \end{bmatrix} , \mathbf{a} := \begin{bmatrix} a_{0} \\ a_{1} \\ \vdots \\ a_{n} \end{bmatrix} $$ and represent the simultaneous equations in matrix form, and then $$ \begin{bmatrix} y_{0} \\ y_{1} \\ \vdots \\ y_{n} \end{bmatrix} = \begin{bmatrix} 1 & x_{0} & \cdots & x_{0}^{n} \\ 1 & x_{1} & \cdots & x_{1}^{n} \\ \vdots & \vdots & \ddots & \vdots \\ 1 & x_{n} & \cdots & x_{n}^{n} \end{bmatrix} \begin{bmatrix} a_{0} \\ a_{1} \\ \vdots \\ a_{n} \end{bmatrix} $$ Now, the problem has changed to finding a solution $\mathbf{a}$ that satisfies $\mathbf{y} = X \mathbf{a}$.
Vandermonde matrix determinant: The determinant of $X$ is $\displaystyle \det X = \prod_{1 \le i < j \le n } (x_{j} - x_{i})$
Since the $x_{0} , \cdots , x_{n}$ are different by assumption, $\det X \ne 0$, and $X^{-1}$ exists. Thus, $\mathbf{a} = X^{-1} \mathbf{y}$ also exists. Meanwhile, since the inverse of a given matrix is unique, $\mathbf{a}$ is also unique.
■
[2]
Seen through the Kronecker delta function.
■
[3]
Calculated honestly using the divided differences.
■
[4]
Strategy: Define new dummy functions and avoid direct calculation by using their differentiability. Since the setting is complex, it is actually easier to understand from the back.
Claim: For $E (x) := f(x) - p_{n} (X)$, the following holds. $$ E(t) = {{ (t - x_{0}) \cdots (t - x_{n}) } \over { (n+1)! }} f^{(n+1)} ( \xi ) $$
Part 1.
Firstly, if $t$ is the same as the nodal points $x_{0} , \cdots , x_{n}$, it trivially holds, so assume $t$ is different from these nodal points. $$ \begin{align*} \Psi (x) :=& (x - x_{0} ) (x - x_{1}) \cdots (x - x_{n}) \\ G (x) :=& E(x) - {{ \Psi (x) } \over { \Psi (t) }} E(t) \end{align*} $$ Upon defining the new functions as above, since $f$, $p_{n}$, and $\Psi$ are $(n+1)$ times differentiable, $G$ is also $(n+1)$ times differentiable.
Part 2.
Substituting $x = x_{i}$ into $G$ leads to $E(x_{i}) = f(x_{i}) - p_{n} (x_{i}) = 0$ and $\Psi (x_{i} ) = 0$, thus $$ G(x_{i}) = E(x_{i} ) - {{ \Psi (x_{i}) } \over { \Psi (t) }} E(t) = 0 $$ Substituting $x = t$ into $G$ leads to $\displaystyle {{ \Psi (t) } \over { \Psi (t) }} = 1$, thus $$ G(t) = E(t) - E(t) =0 $$ Therefore, $G$ has $(n+2)$ different zero $x_{0} , \cdots , x_{n} , t$s.
Part 3.
For convenience, let’s say $x_{n+1} :=t$.
Rolle’s theorem: If function $f(x)$ is continuous on $[a,b]$, differentiable on $(a,b)$, and $f(a)=f(b)$, then there exists at least one $c$ in $(a,b)$ that satisfies $f ' (c)=0$.
For all $i=0, \cdots , n$, $$ G(x_{i}) = 0 = G(x_{i+1}) $$ hence, by Rolle’s theorem, there exists $x'_{i} \in \mathscr{H} \left\{ x_{i} , x_{i+1} \right\}$ satisfying $g ' ( x’_{i}) = 0$, and similarly for all $i=0, \cdots , (n-1)$, $$ g ' (x’_{i}) = 0 = g ' (x’_{i+1}) $$ hence, by Rolle’s theorem, there exists $x''_{i} \in \mathscr{H} \left\{ x’_{i} , x’_{i+1} \right\}$ satisfying $G''( x’’_{i}) = 0$. By inductively using Rolle’s theorem $(n+1)$ times, it can be guaranteed that there exists $\xi \in \mathscr{H} \left\{ x_{0} , \cdots , x_{n+1} \right\}$ satisfying $$ G^{(n+1)} ( \xi ) = 0 $$
Meanwhile, since $p_{n}$ is a $n$ degree polynomial, $$ E^{(n+1)} (x) = f^{(n+1)} ( x) $$ The highest degree term of $\Psi$ is $x^{n+1}$, so $$ \Psi^{(n+1)} (x) = (n+1)! $$ Differentiating both sides of $\displaystyle G (x) = E(x) - {{ \Psi (x) } \over { \Psi (t) }} E(t)$ $(n+1)$ times with respect to $x$ gives $$ G^{(n+1)} (x) = f^{(n+1)} ( x) - {{ (n+1)! } \over { \Psi (t) }} E(t) $$ Substituting $x=\xi$ into $G^{(n+1)}$ and $f^{(n+1)}$ yields $$ 0 = f^{(n+1)} ( \xi ) - {{ (n+1)! } \over { \Psi (t) }} E(t) $$ Since $E (t) = f(t) - p_{n} (t)$, it follows that $$ f(t) - \sum_{j=0}^{n} f( x_{j} ) l_{j} (t) = {{ (t - x_{0}) \cdots (t - x_{n}) } \over { (n+1)! }} f^{(n+1)} ( \xi ) $$
■
Atkinson. (1989). An Introduction to Numerical Analysis(2nd Edition): p131. ↩︎