logo

Kringing in Spatial Data Analysis 📂Statistical Analysis

Kringing in Spatial Data Analysis

Model

Ordinary Kriging

In Spatial Data Analysis, for a Random Field Y=(Y(s1),,Y(sn))\mathbf{Y} = \left( Y \left( s_{1} \right) , \cdots , Y \left( s_{n} \right) \right) following a Multivariate Normal Distribution with Mean μR\mu \in \mathbb{R} and Covariance Matrix ΣRn×n\Sigma \in \mathbb{R}^{n \times n}, the value estimated for a new site s0s_{0} using the model Y=μ1+ε \mathbf{Y} = \mu \mathbf{1} + \varepsilon is called the Ordinary Kriging Estimate. The act of estimating using this model is also referred to as Kriging.


  • 1=(1,,1)\mathbf{1} = (1 , \cdots , 1) is a 1-vector with all elements being 11.
  • Nn(0,Σ)N_{n} \left( \mathbf{0} , \Sigma \right) denotes the Multivariate Normal Distribution.

Explanation

Etymology

Kriging is named after the pioneer Daine G. Krige, and the term has become a general verb. In statistics, terms like prediction, forecasting, fitting, estimation, and the concept of filling in ’empty spaces’ similar to Interpolation are often used to describe it, summing up all these concepts into the verb “to Krig.”

A distinguishing feature of Kriging, as opposed to applied mathematics, computer algorithms, and machine learning techniques, is its statistical approach of considering not just the mean (point estimate) but also the variance. For example, the Kriging estimate’s variance would be high in areas where each point has high variance, and low between points of low variance. This can be applied in choosing locations for data observation sites; for instance, in measuring concentrations of particulate matter, the interest may not be in how to measure but where to measure, focusing on where the variance of measurements is highest.

Dependence

The formula Y=μ1+ε,where εNn(0,Σ) \mathbf{Y} = \mu \mathbf{1} + \varepsilon \qquad , \text{where } \varepsilon \sim N_{n} \left( 0, \Sigma \right) reveals that in the model, our main interest is in ε\varepsilon, unlike in regression or time series analysis. If Σ\Sigma is a Diagonal Matrix, implying no dependence between observations, it essentially means there’s no spatial structure to analyze, negating the need for Kriging. In practice, Σ\Sigma is determined through the Semivariogram Model as follows. Σ=σ2H(ϕ)+τ2I \Sigma = \sigma^{2} H \left( \phi \right) + \tau^{2} I Here, τ2\tau^{2} represents the “Nugget Effect Variance” (the covariance seen regardless of distance in actual Data), and II is the Identity Matrix.

Generalization

The use of a generalized model for other independent variables in Kriging is called Universal Kriging. Y=Xβ+ε \mathbf{Y} = \mathbf{X} \beta + \varepsilon

Formula

Given a Random Field {Y(sk)}k=1n\left\{ Y (s_{k}) \right\}_{k=1}^{n} assumed to be an Intrinsic Stationary Spatial Process, and a new point to predict s0s_{0}, define the Matrix ΓRn×n\Gamma \in \mathbb{R}^{n \times n} with respect to the Variogram 2γ2 \gamma as (Γ)ij:=γ(sisj)\left( \Gamma \right)_{ij} := \gamma \left( s_{i} - s_{j} \right), and the Vector γ0Rn\gamma_{0} \in \mathbb{R}^{n} as (γ0)i:=(γ(s0si))\left( \gamma_{0} \right)_{i} := \left( \gamma \left( s_{0} - s_{i} \right) \right). The Best Linear Unbiased Predictor (BLUP) of Y(s0)Y \left( s_{0} \right) is the Inner Product of ll and Y\mathbf{Y}, given by lTY=[l1ln][Y(s1)Y(s2)Y(sn)]=k=1nlkY(sk) l^{T} \mathbf{Y} = \begin{bmatrix} l_{1} & \cdots & l_{n} \end{bmatrix} \begin{bmatrix} Y \left( s_{1} \right)\\ Y \left( s_{2} \right)\\ \vdots\\ Y \left( s_{n} \right) \end{bmatrix} = \sum_{k=1}^{n} l_{k} Y \left( s_{k} \right) where the vector ll is specifically determined as follows. l=Γ1(γ0+11TΓ1γ01TΓ111) l = \Gamma^{-1} \left( \gamma_{0} + {{ 1 - \mathbf{1}^T \Gamma^{-1} \gamma_{0} } \over { \mathbf{1}^T \Gamma^{-1} \mathbf{1} }} \mathbf{1} \right)

Derivation 1

This section elaborates on the mathematical process of Kriging. Adding the assumption of a Gaussian Process to these formulas results in Ordinary Kriging.


Part 1. Optimization Problem

For some constants l1,,ln,δ0Rl_{1} , \cdots , l_{n} , \delta_{0} \in \mathbb{R}, we aim to predict a new Y(s0)Y \left( s_{0} \right) as a Linear Combination of existing data y^(s0)=l1y1++lnyn+δ0 \hat{y} \left( s_{0} \right) = l_{1} y_{1} + \cdots + l_{n} y_{n} + \delta_{0} which means finding the Optimal Solution l1,,ln,δ0l_{1} , \cdots , l_{n} , \delta_{0} that Minimizes the Objective Function E[Y(s0)(klkY(sk)+δ0)]2 E \left[ Y \left( s_{0} \right) - \left( \sum_{k} l_{k} Y \left( s_{k} \right) + \delta_{0} \right) \right]^{2}

Definition of Intrinsic Stationarity: Consider a Spatial Process {Y(s)}sD\left\{ Y(s) \right\}_{s \in D} in a fixed subset DRrD \subset \mathbb{R}^{r} of the Euclidean Space, comprising a set of Random Variables Y(s):ΩR1Y(s) : \Omega \to \mathbb{R}^{1}. Specifically, let’s denote nNn \in \mathbb{N} Sites as {s1,,sn}D\left\{ s_{1} , \cdots , s_{n} \right\} \subset D, assuming that Y(s)Y(s) has a Variance for all sDs \in D. If [Y(s+h)Y(s)]\left[ Y \left( s + \mathbf{h} \right) - Y(s) \right]’s mean is 00 and its variance depends only on h\mathbf{h}, then {Y(sk)}\left\{ Y \left( s_{k} \right) \right\} is said to possess Intrinsic Stationarity. E[Y(s+h)Y(s)]=0Var[Y(s+h)Y(s)]=2γ(h) \begin{align*} E \left[ Y \left( s + \mathbf{h} \right) - Y(s) \right] =& 0 \\ \operatorname{Var} \left[ Y \left( s + \mathbf{h} \right) - Y(s) \right] =& 2 \gamma ( \mathbf{h} ) \end{align*}

By imposing the constraint klk=1\sum_{k} l_{k} = 1, if {Y(sk)}k=1n\left\{ Y \left( s_{k} \right) \right\}_{k=1}^{n} is intrinsically stationary, E[Y(s0)(klkY(sk))]=E[klkY(s0)(klkY(sk))]=klkE[Y(s0)Y(sk)]=0 \begin{align*} & E \left[ Y \left( s_{0} \right) - \left( \sum_{k} l_{k} Y \left( s_{k} \right) \right) \right] \\ =& E \left[ \sum_{k} l_{k} Y \left( s_{0} \right) - \left( \sum_{k} l_{k} Y \left( s_{k} \right) \right) \right] \\ =& \sum_{k} l_{k} E \left[ Y \left( s_{0} \right) - Y \left( s_{k} \right) \right] \\ =& 0 \end{align*} we can ensure that. The Objective Function to be minimized becomes E[Y(s0)klkY(sk)]2+δ02 E \left[ Y \left( s_{0} \right) - \sum_{k} l_{k} Y \left( s_{k} \right) \right]^{2} + \delta_{0}^{2} where δ02\delta_{0}^{2} is irrelevant to the prediction. If the model is Y=μ1+ε\mathbf{Y} = \mu \mathbf{1} + \varepsilon, δ0\delta_{0} corresponds to μ\mu, and it’s reasonable to set δ0=0\delta_{0} = 0 as E[Y(s0)klkY(sk)]2 E \left[ Y \left( s_{0} \right) - \sum_{k} l_{k} Y \left( s_{k} \right) \right]^{2} Setting a0=1a_{0} = 1 and ak=lka_{k} = - l_{k}, E[Y(s0)k=1nlkY(sk)]2=E[k=0nakY(sk)]2 E \left[ Y \left( s_{0} \right) - \sum_{k=1}^{n} l_{k} Y \left( s_{k} \right) \right]^{2} = E \left[ \sum_{k=0}^{n} a_{k} Y \left( s_{k} \right) \right]^{2} leads to solving the following optimization problem using the Lagrange Multiplier Method. MinimizeE[k=0nakY(sk)]2subject tok=0nak=0 \begin{matrix} \text{Minimize} & \displaystyle E \left[ \sum_{k=0}^{n} a_{k} Y \left( s_{k} \right) \right]^{2} \\ \text{subject to} & \displaystyle \sum_{k=0}^{n} a_{k} = 0 \end{matrix}


Part 2. Semivariogram γ\gamma

Let’s express E[k=0nakY(sk)]2E \left[ \sum_{k=0}^{n} a_{k} Y \left( s_{k} \right) \right]^{2} in terms of a semivariogram assumed to be computable from the data.

Definition of Semivariogram: Consider a Spatial Process {Y(s)}sD\left\{ Y(s) \right\}_{s \in D} in a fixed subset DRrD \subset \mathbb{R}^{r} of the Euclidean Space, comprising a set of Random Variables Y(s):ΩR1Y(s) : \Omega \to \mathbb{R}^{1}. Specifically, let’s denote nNn \in \mathbb{N} sites as {s1,,sn}D\left\{ s_{1} , \cdots , s_{n} \right\} \subset D, assuming that Y(s)Y(s) has a Variance for all sDs \in D. The function defined as 2γ(h)2 \gamma ( \mathbf{h} ) is called the Variogram. 2γ(h):=E[Y(s+h)Y(s)]2 2 \gamma ( \mathbf{h} ) := E \left[ Y \left( s + \mathbf{h} \right) - Y(s) \right]^{2} Specifically, half of the Variogram γ(h)\gamma ( \mathbf{h} ) is known as the Semivariogram.

Assuming γ(sisj)\gamma \left( s_{i} - s_{j} \right) as the semivariogram between two sites si,sjs_{i}, s_{j}, for any set {ak:k=1,,n}R\left\{ a_{k} : k = 1 , \cdots , n \right\} \subset \mathbb{R} satisfying 0=1nak=0\sum_{0=1}^{n} a_{k} = 0, the following holds true. ijaiajγ(sisj)=E[iaiY(si)]2 \sum_{i} \sum_{j} a_{i} a_{j} \gamma \left( s_{i} - s_{j} \right) = - E \left[ \sum_{i} a_{i} Y \left( s_{i} \right) \right]^{2} This is evident from the following derivation: ijaiajγ(sisj)=12ijaiajVar[Y(si)Y(sj)]=12ijaiajE[Y(si)Y(sj)]2=12ijaiajE([Y(si)]22Y(si)Y(sj)+[Y(sj)]2)=ijaiajE(Y(si)Y(sj))cases of i=j=EijaiajY(si)Y(sj)=EiaiY(si)jajY(sj)=E[iaiY(si)]2 \begin{align*} & \sum_{i} \sum_{j} a_{i} a_{j} \gamma \left( s_{i} - s_{j} \right) \\ =& {{ 1 } \over { 2 }} \sum_{i} \sum_{j} a_{i} a_{j} \operatorname{Var} \left[ Y \left( s_{i} \right) - Y \left( s_{j} \right) \right] \\ =& {{ 1 } \over { 2 }} \sum_{i} \sum_{j} a_{i} a_{j} E \left[ Y \left( s_{i} \right) - Y \left( s_{j} \right) \right]^{2} \\ =& {{ 1 } \over { 2 }} \sum_{i} \sum_{j} a_{i} a_{j} E \left( \left[ Y \left( s_{i} \right) \right]^{2} - 2 Y \left( s_{i} \right) Y \left( s_{j} \right) + \left[ Y \left( s_{j} \right) \right]^{2} \right) \\ =& - \sum_{i} \sum_{j} a_{i} a_{j} E \left( Y \left( s_{i} \right) Y \left( s_{j} \right) \right) & \because \text{cases of } i = j \\ =& - E \sum_{i} \sum_{j} a_{i} a_{j} Y \left( s_{i} \right) Y \left( s_{j} \right) \\ =& - E \sum_{i} a_{i} Y \left( s_{i} \right) \sum_{j} a_{j} Y \left( s_{j} \right) \\ =& - E \left[ \sum_{i} a_{i} Y \left( s_{i} \right) \right]^{2} \end{align*} Setting γij=γ(sisj)\gamma_{ij} = \gamma \left( s_{i} - s_{j} \right) and γ0j=γ(s0sj)\gamma_{0j} = \gamma \left( s_{0} - s_{j} \right), our objective function becomes E[iaiY(si)]2=ijaiajγ(sisj)=ijliljγij+2iliγ0i \begin{align*} & E \left[ \sum_{i} a_{i} Y \left( s_{i} \right) \right]^{2} \\ =& - \sum_{i} \sum_{j} a_{i} a_{j} \gamma \left( s_{i} - s_{j} \right) \\ =& - \sum_{i} \sum_{j} l_{i} l_{j} \gamma_{ij} + 2 \sum_{i} l_{i} \gamma_{0i} \end{align*} Applying the Lagrange Multiplier Method with the constraint ili=1\sum_{i} l_{i} = 1 leads to  ijliljγij+2iliγ0iλili \ - \sum_{i} \sum_{j} l_{i} l_{j} \gamma_{ij} + 2 \sum_{i} l_{i} \gamma_{0i} - \lambda \sum_{i} l_{i} Hence, differentiating with respect to lil_{i} yields  jljγij+γ0iλ=0 \ - \sum_{j} l_{j} \gamma_{ij} + \gamma_{0i} - \lambda = 0 which minimizes the objective function.


Part 3. Optimal Solution

Now, we derive the specific form of the optimal solution. Denote the element of matrix ΓRn×n\Gamma \in \mathbb{R}^{n \times n} as γij\gamma_{ij}, meaning (Γ)ij:=γij\left( \Gamma \right)_{ij} := \gamma_{ij}, and define vector γ0Rn\gamma_{0} \in \mathbb{R}^{n} as (γ0)i:=γ0i\left( \gamma_{0} \right)_{i} := \gamma_{0i}. If we set the vector of coefficients as l:=(l1,,ln)Rnl := \left( l_{1} , \cdots , l_{n} \right) \in \mathbb{R}^{n}, then from Part 2, the derived equation can be expressed in the following matrix/vector form: jljγij+γ0iλ=0    Γl+γ0λ1=0    Γl+λ1=γ0 \begin{align*} & - \sum_{j} l_{j} \gamma_{ij} + \gamma_{0i} - \lambda = 0 \\ \implies & - \Gamma l + \gamma_{0} - \lambda \mathbf{1} = 0 \\ \implies & \Gamma l + \lambda \mathbf{1} = \gamma_{0} \end{align*} Meanwhile, from the constraint, we have ili=1    [11][l1ln]=1    1Tl=1 \sum_{i} l_{i} = 1 \iff \begin{bmatrix} 1 & \cdots & 1 \end{bmatrix} \begin{bmatrix} l_{1} \\ \vdots \\ l_{n} \end{bmatrix} = 1 \iff \mathbf{1}^{T} l = 1 which also gives us 1Tl=1\mathbf{1}^{T} l = 1. Here, xT\mathbf{x}^{T} represents the Transpose of x\mathbf{x}. First, considering λ\lambda alone, we have 1=1Tl=1TΓ1Γl=1TΓ1(γ0λ1)=1TΓ1γ0λ1TΓ11 \begin{align*} 1 =& \mathbf{1}^T l \\ =& \mathbf{1}^T \Gamma^{-1} \Gamma l \\ =& \mathbf{1}^T \Gamma^{-1} \left( \gamma_{0} - \lambda \mathbf{1} \right) \\ =& \mathbf{1}^T \Gamma^{-1} \gamma_{0} - \lambda \mathbf{1}^T \Gamma^{-1} \mathbf{1} \end{align*} which simplifies to  λ=11TΓ1γ01TΓ11 \ - \lambda = {{ 1 - \mathbf{1}^T \Gamma^{-1} \gamma_{0} } \over { \mathbf{1}^T \Gamma^{-1} \mathbf{1} }} At this point, ll is essentially determined. Γl+λ1=γ0    Γl=γ0λ1    Γl=γ0+11TΓ1γ01TΓ111    l=Γ1(γ0+11TΓ1γ01TΓ111) \begin{align*} & \Gamma l + \lambda \mathbf{1} = \gamma_{0} \\ \implies & \Gamma l = \gamma_{0} - \lambda \mathbf{1} \\ \implies & \Gamma l = \gamma_{0} + {{ 1 - \mathbf{1}^T \Gamma^{-1} \gamma_{0} } \over { \mathbf{1}^T \Gamma^{-1} \mathbf{1} }} \mathbf{1} \\ \implies & l = \Gamma^{-1} \left( \gamma_{0} + {{ 1 - \mathbf{1}^T \Gamma^{-1} \gamma_{0} } \over { \mathbf{1}^T \Gamma^{-1} \mathbf{1} }} \mathbf{1} \right) \end{align*}

This concludes the derivation process for the optimal solution in Kriging, detailing how the estimates are calculated through the optimization of the objective function under given constraints, and how the semivariogram plays a central role in this estimation process.


  1. Banerjee. (2015). Hierarchical Modeling and Analysis for Spatial Data(2nd Edition): p25, 40~41. ↩︎