Kriging is named after the pioneer Daine G. Krige, and the term has become a general verb. In statistics, terms like prediction, forecasting, fitting, estimation, and the concept of filling in ’empty spaces’ similar to Interpolation are often used to describe it, summing up all these concepts into the verb “to Krig.”
A distinguishing feature of Kriging, as opposed to applied mathematics, computer algorithms, and machine learning techniques, is its statistical approach of considering not just the mean (point estimate) but also the variance. For example, the Kriging estimate’s variance would be high in areas where each point has high variance, and low between points of low variance. This can be applied in choosing locations for data observation sites; for instance, in measuring concentrations of particulate matter, the interest may not be in how to measure but where to measure, focusing on where the variance of measurements is highest.
Dependence
The formula Y=μ1+ε,where ε∼Nn(0,Σ) reveals that in the model, our main interest is in ε, unlike in regression or time series analysis. If Σ is a Diagonal Matrix, implying no dependence between observations, it essentially means there’s no spatial structure to analyze, negating the need for Kriging. In practice, Σ is determined through the Semivariogram Model as follows.
Σ=σ2H(ϕ)+τ2I
Here, τ2 represents the “Nugget Effect Variance” (the covariance seen regardless of distance in actual Data), and I is the Identity Matrix.
Generalization
The use of a generalized model for other independent variables in Kriging is called Universal Kriging.
Y=Xβ+ε
Formula
Given a Random Field{Y(sk)}k=1n assumed to be an Intrinsic Stationary Spatial Process, and a new point to predict s0, define the MatrixΓ∈Rn×n with respect to the Variogram2γ as (Γ)ij:=γ(si−sj), and the Vectorγ0∈Rn as (γ0)i:=(γ(s0−si)). The Best Linear Unbiased Predictor (BLUP) of Y(s0) is the Inner Product of l and Y, given by
lTY=[l1⋯ln]Y(s1)Y(s2)⋮Y(sn)=k=1∑nlkY(sk)
where the vector l is specifically determined as follows.
l=Γ−1(γ0+1TΓ−111−1TΓ−1γ01)
This section elaborates on the mathematical process of Kriging. Adding the assumption of a Gaussian Process to these formulas results in Ordinary Kriging.
Part 1. Optimization Problem
For some constants l1,⋯,ln,δ0∈R, we aim to predict a new Y(s0) as a Linear Combination of existing data
y^(s0)=l1y1+⋯+lnyn+δ0
which means finding the Optimal Solutionl1,⋯,ln,δ0 that Minimizes the Objective FunctionE[Y(s0)−(k∑lkY(sk)+δ0)]2
Definition of Intrinsic Stationarity: Consider a Spatial Process{Y(s)}s∈D in a fixed subset D⊂Rr of the Euclidean Space, comprising a set of Random VariablesY(s):Ω→R1. Specifically, let’s denote n∈NSites as {s1,⋯,sn}⊂D, assuming that Y(s) has a Variance for all s∈D. If [Y(s+h)−Y(s)]’s mean is 0 and its variance depends only on h, then {Y(sk)} is said to possess Intrinsic Stationarity.
E[Y(s+h)−Y(s)]=Var[Y(s+h)−Y(s)]=02γ(h)
By imposing the constraint ∑klk=1, if {Y(sk)}k=1n is intrinsically stationary,
===E[Y(s0)−(k∑lkY(sk))]E[k∑lkY(s0)−(k∑lkY(sk))]k∑lkE[Y(s0)−Y(sk)]0
we can ensure that. The Objective Function to be minimized becomes
E[Y(s0)−k∑lkY(sk)]2+δ02
where δ02 is irrelevant to the prediction. If the model is Y=μ1+ε, δ0 corresponds to μ, and it’s reasonable to set δ0=0 as
E[Y(s0)−k∑lkY(sk)]2
Setting a0=1 and ak=−lk,
E[Y(s0)−k=1∑nlkY(sk)]2=E[k=0∑nakY(sk)]2
leads to solving the following optimization problem using the Lagrange Multiplier Method.
Minimizesubject toE[k=0∑nakY(sk)]2k=0∑nak=0
Part 2. Semivariogram γ
Let’s express E[∑k=0nakY(sk)]2 in terms of a semivariogram assumed to be computable from the data.
Definition of Semivariogram: Consider a Spatial Process{Y(s)}s∈D in a fixed subset D⊂Rr of the Euclidean Space, comprising a set of Random VariablesY(s):Ω→R1. Specifically, let’s denote n∈N sites as {s1,⋯,sn}⊂D, assuming that Y(s) has a Variance for all s∈D. The function defined as 2γ(h) is called the Variogram.
2γ(h):=E[Y(s+h)−Y(s)]2
Specifically, half of the Variogram γ(h) is known as the Semivariogram.
Assuming γ(si−sj) as the semivariogram between two sites si,sj, for any set {ak:k=1,⋯,n}⊂R satisfying ∑0=1nak=0, the following holds true.
i∑j∑aiajγ(si−sj)=−E[i∑aiY(si)]2
This is evident from the following derivation:
=======i∑j∑aiajγ(si−sj)21i∑j∑aiajVar[Y(si)−Y(sj)]21i∑j∑aiajE[Y(si)−Y(sj)]221i∑j∑aiajE([Y(si)]2−2Y(si)Y(sj)+[Y(sj)]2)−i∑j∑aiajE(Y(si)Y(sj))−Ei∑j∑aiajY(si)Y(sj)−Ei∑aiY(si)j∑ajY(sj)−E[i∑aiY(si)]2∵cases of i=j
Setting γij=γ(si−sj) and γ0j=γ(s0−sj), our objective function becomes
==E[i∑aiY(si)]2−i∑j∑aiajγ(si−sj)−i∑j∑liljγij+2i∑liγ0i
Applying the Lagrange Multiplier Method with the constraint ∑ili=1 leads to
−i∑j∑liljγij+2i∑liγ0i−λi∑li
Hence, differentiating with respect to li yields
−j∑ljγij+γ0i−λ=0
which minimizes the objective function.
Part 3. Optimal Solution
Now, we derive the specific form of the optimal solution. Denote the element of matrix Γ∈Rn×n as γij, meaning (Γ)ij:=γij, and define vector γ0∈Rn as (γ0)i:=γ0i. If we set the vector of coefficients as l:=(l1,⋯,ln)∈Rn, then from Part 2, the derived equation can be expressed in the following matrix/vector form:
⟹⟹−j∑ljγij+γ0i−λ=0−Γl+γ0−λ1=0Γl+λ1=γ0
Meanwhile, from the constraint, we have
i∑li=1⟺[1⋯1]l1⋮ln=1⟺1Tl=1
which also gives us 1Tl=1. Here, xT represents the Transpose of x. First, considering λ alone, we have
1====1Tl1TΓ−1Γl1TΓ−1(γ0−λ1)1TΓ−1γ0−λ1TΓ−11
which simplifies to
−λ=1TΓ−111−1TΓ−1γ0
At this point, l is essentially determined.
⟹⟹⟹Γl+λ1=γ0Γl=γ0−λ1Γl=γ0+1TΓ−111−1TΓ−1γ01l=Γ−1(γ0+1TΓ−111−1TΓ−1γ01)
This concludes the derivation process for the optimal solution in Kriging, detailing how the estimates are calculated through the optimization of the objective function under given constraints, and how the semivariogram plays a central role in this estimation process.
■
Banerjee. (2015). Hierarchical Modeling and Analysis for Spatial Data(2nd Edition): p25, 40~41. ↩︎