Kringing in Spatial Data Analysis
Model
Ordinary Kriging
In Spatial Data Analysis, for a Random Field $\mathbf{Y} = \left( Y \left( s_{1} \right) , \cdots , Y \left( s_{n} \right) \right)$ following a Multivariate Normal Distribution with Mean $\mu \in \mathbb{R}$ and Covariance Matrix $\Sigma \in \mathbb{R}^{n \times n}$, the value estimated for a new site $s_{0}$ using the model $$ \mathbf{Y} = \mu \mathbf{1} + \varepsilon $$ is called the Ordinary Kriging Estimate. The act of estimating using this model is also referred to as Kriging.
- $\mathbf{1} = (1 , \cdots , 1)$ is a 1-vector with all elements being $1$.
- $N_{n} \left( \mathbf{0} , \Sigma \right)$ denotes the Multivariate Normal Distribution.
Explanation
Etymology
Kriging is named after the pioneer Daine G. Krige, and the term has become a general verb. In statistics, terms like prediction, forecasting, fitting, estimation, and the concept of filling in ’empty spaces’ similar to Interpolation are often used to describe it, summing up all these concepts into the verb “to Krig.”
A distinguishing feature of Kriging, as opposed to applied mathematics, computer algorithms, and machine learning techniques, is its statistical approach of considering not just the mean (point estimate) but also the variance. For example, the Kriging estimate’s variance would be high in areas where each point has high variance, and low between points of low variance. This can be applied in choosing locations for data observation sites; for instance, in measuring concentrations of particulate matter, the interest may not be in how to measure but where to measure, focusing on where the variance of measurements is highest.
Dependence
The formula $$ \mathbf{Y} = \mu \mathbf{1} + \varepsilon \qquad , \text{where } \varepsilon \sim N_{n} \left( 0, \Sigma \right) $$ reveals that in the model, our main interest is in $\varepsilon$, unlike in regression or time series analysis. If $\Sigma$ is a Diagonal Matrix, implying no dependence between observations, it essentially means there’s no spatial structure to analyze, negating the need for Kriging. In practice, $\Sigma$ is determined through the Semivariogram Model as follows. $$ \Sigma = \sigma^{2} H \left( \phi \right) + \tau^{2} I $$ Here, $\tau^{2}$ represents the “Nugget Effect Variance” (the covariance seen regardless of distance in actual Data), and $I$ is the Identity Matrix.
Generalization
The use of a generalized model for other independent variables in Kriging is called Universal Kriging. $$ \mathbf{Y} = \mathbf{X} \beta + \varepsilon $$
Formula
Given a Random Field $\left\{ Y (s_{k}) \right\}_{k=1}^{n}$ assumed to be an Intrinsic Stationary Spatial Process, and a new point to predict $s_{0}$, define the Matrix $\Gamma \in \mathbb{R}^{n \times n}$ with respect to the Variogram $2 \gamma$ as $\left( \Gamma \right)_{ij} := \gamma \left( s_{i} - s_{j} \right)$, and the Vector $\gamma_{0} \in \mathbb{R}^{n}$ as $\left( \gamma_{0} \right)_{i} := \left( \gamma \left( s_{0} - s_{i} \right) \right)$. The Best Linear Unbiased Predictor (BLUP) of $Y \left( s_{0} \right)$ is the Inner Product of $l$ and $\mathbf{Y}$, given by $$ l^{T} \mathbf{Y} = \begin{bmatrix} l_{1} & \cdots & l_{n} \end{bmatrix} \begin{bmatrix} Y \left( s_{1} \right)\\ Y \left( s_{2} \right)\\ \vdots\\ Y \left( s_{n} \right) \end{bmatrix} = \sum_{k=1}^{n} l_{k} Y \left( s_{k} \right) $$ where the vector $l$ is specifically determined as follows. $$ l = \Gamma^{-1} \left( \gamma_{0} + {{ 1 - \mathbf{1}^T \Gamma^{-1} \gamma_{0} } \over { \mathbf{1}^T \Gamma^{-1} \mathbf{1} }} \mathbf{1} \right) $$
Derivation 1
This section elaborates on the mathematical process of Kriging. Adding the assumption of a Gaussian Process to these formulas results in Ordinary Kriging.
Part 1. Optimization Problem
For some constants $l_{1} , \cdots , l_{n} , \delta_{0} \in \mathbb{R}$, we aim to predict a new $Y \left( s_{0} \right)$ as a Linear Combination of existing data $$ \hat{y} \left( s_{0} \right) = l_{1} y_{1} + \cdots + l_{n} y_{n} + \delta_{0} $$ which means finding the Optimal Solution $l_{1} , \cdots , l_{n} , \delta_{0}$ that Minimizes the Objective Function $$ E \left[ Y \left( s_{0} \right) - \left( \sum_{k} l_{k} Y \left( s_{k} \right) + \delta_{0} \right) \right]^{2} $$
Definition of Intrinsic Stationarity: Consider a Spatial Process $\left\{ Y(s) \right\}_{s \in D}$ in a fixed subset $D \subset \mathbb{R}^{r}$ of the Euclidean Space, comprising a set of Random Variables $Y(s) : \Omega \to \mathbb{R}^{1}$. Specifically, let’s denote $n \in \mathbb{N}$ Sites as $\left\{ s_{1} , \cdots , s_{n} \right\} \subset D$, assuming that $Y(s)$ has a Variance for all $s \in D$. If $\left[ Y \left( s + \mathbf{h} \right) - Y(s) \right]$’s mean is $0$ and its variance depends only on $\mathbf{h}$, then $\left\{ Y \left( s_{k} \right) \right\}$ is said to possess Intrinsic Stationarity. $$ \begin{align*} E \left[ Y \left( s + \mathbf{h} \right) - Y(s) \right] =& 0 \\ \operatorname{Var} \left[ Y \left( s + \mathbf{h} \right) - Y(s) \right] =& 2 \gamma ( \mathbf{h} ) \end{align*} $$
By imposing the constraint $\sum_{k} l_{k} = 1$, if $\left\{ Y \left( s_{k} \right) \right\}_{k=1}^{n}$ is intrinsically stationary, $$ \begin{align*} & E \left[ Y \left( s_{0} \right) - \left( \sum_{k} l_{k} Y \left( s_{k} \right) \right) \right] \\ =& E \left[ \sum_{k} l_{k} Y \left( s_{0} \right) - \left( \sum_{k} l_{k} Y \left( s_{k} \right) \right) \right] \\ =& \sum_{k} l_{k} E \left[ Y \left( s_{0} \right) - Y \left( s_{k} \right) \right] \\ =& 0 \end{align*} $$ we can ensure that. The Objective Function to be minimized becomes $$ E \left[ Y \left( s_{0} \right) - \sum_{k} l_{k} Y \left( s_{k} \right) \right]^{2} + \delta_{0}^{2} $$ where $\delta_{0}^{2}$ is irrelevant to the prediction. If the model is $\mathbf{Y} = \mu \mathbf{1} + \varepsilon$, $\delta_{0}$ corresponds to $\mu$, and it’s reasonable to set $\delta_{0} = 0$ as $$ E \left[ Y \left( s_{0} \right) - \sum_{k} l_{k} Y \left( s_{k} \right) \right]^{2} $$ Setting $a_{0} = 1$ and $a_{k} = - l_{k}$, $$ E \left[ Y \left( s_{0} \right) - \sum_{k=1}^{n} l_{k} Y \left( s_{k} \right) \right]^{2} = E \left[ \sum_{k=0}^{n} a_{k} Y \left( s_{k} \right) \right]^{2} $$ leads to solving the following optimization problem using the Lagrange Multiplier Method. $$ \begin{matrix} \text{Minimize} & \displaystyle E \left[ \sum_{k=0}^{n} a_{k} Y \left( s_{k} \right) \right]^{2} \\ \text{subject to} & \displaystyle \sum_{k=0}^{n} a_{k} = 0 \end{matrix} $$
Part 2. Semivariogram $\gamma$
Let’s express $E \left[ \sum_{k=0}^{n} a_{k} Y \left( s_{k} \right) \right]^{2}$ in terms of a semivariogram assumed to be computable from the data.
Definition of Semivariogram: Consider a Spatial Process $\left\{ Y(s) \right\}_{s \in D}$ in a fixed subset $D \subset \mathbb{R}^{r}$ of the Euclidean Space, comprising a set of Random Variables $Y(s) : \Omega \to \mathbb{R}^{1}$. Specifically, let’s denote $n \in \mathbb{N}$ sites as $\left\{ s_{1} , \cdots , s_{n} \right\} \subset D$, assuming that $Y(s)$ has a Variance for all $s \in D$. The function defined as $2 \gamma ( \mathbf{h} )$ is called the Variogram. $$ 2 \gamma ( \mathbf{h} ) := E \left[ Y \left( s + \mathbf{h} \right) - Y(s) \right]^{2} $$ Specifically, half of the Variogram $\gamma ( \mathbf{h} )$ is known as the Semivariogram.
Assuming $\gamma \left( s_{i} - s_{j} \right)$ as the semivariogram between two sites $s_{i}, s_{j}$, for any set $\left\{ a_{k} : k = 1 , \cdots , n \right\} \subset \mathbb{R}$ satisfying $\sum_{0=1}^{n} a_{k} = 0$, the following holds true. $$ \sum_{i} \sum_{j} a_{i} a_{j} \gamma \left( s_{i} - s_{j} \right) = - E \left[ \sum_{i} a_{i} Y \left( s_{i} \right) \right]^{2} $$ This is evident from the following derivation: $$ \begin{align*} & \sum_{i} \sum_{j} a_{i} a_{j} \gamma \left( s_{i} - s_{j} \right) \\ =& {{ 1 } \over { 2 }} \sum_{i} \sum_{j} a_{i} a_{j} \operatorname{Var} \left[ Y \left( s_{i} \right) - Y \left( s_{j} \right) \right] \\ =& {{ 1 } \over { 2 }} \sum_{i} \sum_{j} a_{i} a_{j} E \left[ Y \left( s_{i} \right) - Y \left( s_{j} \right) \right]^{2} \\ =& {{ 1 } \over { 2 }} \sum_{i} \sum_{j} a_{i} a_{j} E \left( \left[ Y \left( s_{i} \right) \right]^{2} - 2 Y \left( s_{i} \right) Y \left( s_{j} \right) + \left[ Y \left( s_{j} \right) \right]^{2} \right) \\ =& - \sum_{i} \sum_{j} a_{i} a_{j} E \left( Y \left( s_{i} \right) Y \left( s_{j} \right) \right) & \because \text{cases of } i = j \\ =& - E \sum_{i} \sum_{j} a_{i} a_{j} Y \left( s_{i} \right) Y \left( s_{j} \right) \\ =& - E \sum_{i} a_{i} Y \left( s_{i} \right) \sum_{j} a_{j} Y \left( s_{j} \right) \\ =& - E \left[ \sum_{i} a_{i} Y \left( s_{i} \right) \right]^{2} \end{align*} $$ Setting $\gamma_{ij} = \gamma \left( s_{i} - s_{j} \right)$ and $\gamma_{0j} = \gamma \left( s_{0} - s_{j} \right)$, our objective function becomes $$ \begin{align*} & E \left[ \sum_{i} a_{i} Y \left( s_{i} \right) \right]^{2} \\ =& - \sum_{i} \sum_{j} a_{i} a_{j} \gamma \left( s_{i} - s_{j} \right) \\ =& - \sum_{i} \sum_{j} l_{i} l_{j} \gamma_{ij} + 2 \sum_{i} l_{i} \gamma_{0i} \end{align*} $$ Applying the Lagrange Multiplier Method with the constraint $\sum_{i} l_{i} = 1$ leads to $$ \ - \sum_{i} \sum_{j} l_{i} l_{j} \gamma_{ij} + 2 \sum_{i} l_{i} \gamma_{0i} - \lambda \sum_{i} l_{i} $$ Hence, differentiating with respect to $l_{i}$ yields $$ \ - \sum_{j} l_{j} \gamma_{ij} + \gamma_{0i} - \lambda = 0 $$ which minimizes the objective function.
Part 3. Optimal Solution
Now, we derive the specific form of the optimal solution. Denote the element of matrix $\Gamma \in \mathbb{R}^{n \times n}$ as $\gamma_{ij}$, meaning $\left( \Gamma \right)_{ij} := \gamma_{ij}$, and define vector $\gamma_{0} \in \mathbb{R}^{n}$ as $\left( \gamma_{0} \right)_{i} := \gamma_{0i}$. If we set the vector of coefficients as $l := \left( l_{1} , \cdots , l_{n} \right) \in \mathbb{R}^{n}$, then from Part 2, the derived equation can be expressed in the following matrix/vector form: $$ \begin{align*} & - \sum_{j} l_{j} \gamma_{ij} + \gamma_{0i} - \lambda = 0 \\ \implies & - \Gamma l + \gamma_{0} - \lambda \mathbf{1} = 0 \\ \implies & \Gamma l + \lambda \mathbf{1} = \gamma_{0} \end{align*} $$ Meanwhile, from the constraint, we have $$ \sum_{i} l_{i} = 1 \iff \begin{bmatrix} 1 & \cdots & 1 \end{bmatrix} \begin{bmatrix} l_{1} \\ \vdots \\ l_{n} \end{bmatrix} = 1 \iff \mathbf{1}^{T} l = 1 $$ which also gives us $\mathbf{1}^{T} l = 1$. Here, $\mathbf{x}^{T}$ represents the Transpose of $\mathbf{x}$. First, considering $\lambda$ alone, we have $$ \begin{align*} 1 =& \mathbf{1}^T l \\ =& \mathbf{1}^T \Gamma^{-1} \Gamma l \\ =& \mathbf{1}^T \Gamma^{-1} \left( \gamma_{0} - \lambda \mathbf{1} \right) \\ =& \mathbf{1}^T \Gamma^{-1} \gamma_{0} - \lambda \mathbf{1}^T \Gamma^{-1} \mathbf{1} \end{align*} $$ which simplifies to $$ \ - \lambda = {{ 1 - \mathbf{1}^T \Gamma^{-1} \gamma_{0} } \over { \mathbf{1}^T \Gamma^{-1} \mathbf{1} }} $$ At this point, $l$ is essentially determined. $$ \begin{align*} & \Gamma l + \lambda \mathbf{1} = \gamma_{0} \\ \implies & \Gamma l = \gamma_{0} - \lambda \mathbf{1} \\ \implies & \Gamma l = \gamma_{0} + {{ 1 - \mathbf{1}^T \Gamma^{-1} \gamma_{0} } \over { \mathbf{1}^T \Gamma^{-1} \mathbf{1} }} \mathbf{1} \\ \implies & l = \Gamma^{-1} \left( \gamma_{0} + {{ 1 - \mathbf{1}^T \Gamma^{-1} \gamma_{0} } \over { \mathbf{1}^T \Gamma^{-1} \mathbf{1} }} \mathbf{1} \right) \end{align*} $$
This concludes the derivation process for the optimal solution in Kriging, detailing how the estimates are calculated through the optimization of the objective function under given constraints, and how the semivariogram plays a central role in this estimation process.
■
Banerjee. (2015). Hierarchical Modeling and Analysis for Spatial Data(2nd Edition): p25, 40~41. ↩︎