logo

Universal Kriging 📂Statistical Analysis

Universal Kriging

Model

Ordinary Kriging

In Spatial Data Analysis, for a Random Field Y=(Y(s1),,Y(sn))\mathbf{Y} = \left( Y \left( s_{1} \right) , \cdots , Y \left( s_{n} \right) \right) that follows a Multivariate Normal Distribution εNn(0,Σ)\varepsilon \sim N_{n} \left( \mathbf{0} , \Sigma \right) with Mean μR\mu \in \mathbb{R} and Covariance Matrix ΣRn×n\Sigma \in \mathbb{R}^{n \times n}, the following model Y=μ1+ε \mathbf{Y} = \mu \mathbf{1} + \varepsilon is used to estimate ■code.■ at a new ■code. Site ■code.■ s0s_{0}. This estimated value is called the Ordinary Kriging Estimate. The act of setting up this model for estimation is also referred to as Kriging.

Universal Kriging

X:=[1X1(s1)Xp(s1)1X1(s2)Xp(s2)1X1(sn)Xp(sn)] \mathbf{X} := \begin{bmatrix} 1 & X_{1} \left( s_{1} \right) & \cdots & X_{p} \left( s_{1} \right) \\ 1 & X_{1} \left( s_{2} \right) & \cdots & X_{p} \left( s_{2} \right) \\ \vdots & \vdots & \ddots & \vdots \\ 1 & X_{1} \left( s_{n} \right) & \cdots & X_{p} \left( s_{n} \right) \end{bmatrix} For an independent variable X1,,XpX_{1} , \cdots , X_{p}, when using the following Design Matrix and the Vector of Regression Coefficients β=(β0,β1,,βp)\beta = \left( \beta_{0} , \beta_{1} , \cdots , \beta_{p} \right) for a Multivariate Normal Distribution εNn(0,Σ)\varepsilon \sim N_{n} \left( \mathbf{0} , \Sigma \right), the following model Y=Xβ+ε \mathbf{Y} = \mathbf{X} \beta + \varepsilon is used to estimate ■code.■ using ■code.■ at a new ■code. Site ■code.■ s0s_{0}. This process is called Universal Kriging.


  • 1=(1,,1)\mathbf{1} = (1 , \cdots , 1) is a 1-vector where all elements are 11.
  • Nn(0,Σ)N_{n} \left( \mathbf{0} , \Sigma \right) represents the Multivariate Normal Distribution.

Explanation

Universal Kriging, simply put, is a type of Kriging that incorporates Multiple Regression Analysis into Kriging. In Ordinary Kriging, μ\mu plays the same role as β0\beta_{0} in Universal Kriging. If we compare it to Time Series Analysis, it can be seen as a Dynamic Regression Model, where we consider not only the spatial structure but also other variables that can explain the variable of interest YY.

If we look at the equations, the regression part might seem a bit different, but if we move Xβ\mathbf{X} \beta to the left-hand side to make (YXβ)=0+ε \left( \mathbf{Y} - \mathbf{X} \beta \right) = \mathbf{0} + \varepsilon it’s essentially the same as Ordinary Kriging where the mean vector is a zero vector μ=0\mu = \mathbf{0}.

Formulas

Let’s say the Random Field {Y(sk)}k=1n\left\{ Y (s_{k}) \right\}_{k=1}^{n} is an Intrinsic Stationary Spatial Process, and we want to predict a new point denoted as s0s_{0}. Let’s define Vector γRn\gamma \in \mathbb{R}^{n} according to Variogram 2γ2 \gamma as follows. For some vector λ=(λ1,,λn)\lambda = \left(\lambda_{1} , \cdots ,\lambda_{n} \right), the Best Linear Unbiased Predictor (BLUP) of ■code.■ is λTY=[λ1λn][Y(s1)Y(s2)Y(sn)]=k=1nλkY(sk) \lambda^{T} \mathbf{Y} = \begin{bmatrix} \lambda_{1} & \cdots & \lambda_{n} \end{bmatrix} \begin{bmatrix} Y \left( s_{1} \right)\\ Y \left( s_{2} \right)\\ \vdots\\ Y \left( s_{n} \right) \end{bmatrix} = \sum_{k=1}^{n} \lambda_{k} Y \left( s_{k} \right) and vector λ\lambda is specifically calculated as follows. λ=Σ1γ+Σ1X(XTΣ1X)1(x0XTΣ1γ) \lambda = \Sigma^{-1} \gamma + \Sigma^{-1} \mathbf{X} \left( \mathbf{X}^{T} \Sigma^{-1} \mathbf{X} \right)^{-1} \left( \mathbf{x}_{0} - \mathbf{X}^{T} \Sigma^{-1} \gamma \right)

Derivation 1

Part 1. Conditional Expectation

E[(Y(s0)h(y))2y] E \left[ \left. \left( Y \left( s_{0} \right) - h \left( \mathbf{y} \right) \right)^{2} \right| \mathbf{y} \right] Our goal is to find ■code.■ that minimizes the above error, given data y=(y(s1),,y(sn))\mathbf{y} = \left( y \left( s_{1} \right) , \cdots , y \left( s_{n} \right) \right). E[(Y(s0)h(y))2y]=E[(Y(s0)E(Y(s0)y))2y]+[E(Y(s0)y)h(y)]2E[(Y(s0)E(Y(s0)y))2y] \begin{align*} & E \left[ \left. \left( Y \left( s_{0} \right) - h \left( \mathbf{y} \right) \right)^{2} \right| \mathbf{y} \right] \\ =& E \left[ \left. \left( Y \left( s_{0} \right) - E \left( Y \left( s_{0} \right) | \mathbf{y} \right) \right)^{2} \right| \mathbf{y} \right] + \left[ E \left( Y \left( s_{0} \right) | \mathbf{y} \right) - h \left( \mathbf{y} \right) \right]^{2} \\ \ge & E \left[ \left. \left( Y \left( s_{0} \right) - E \left( Y \left( s_{0} \right) | \mathbf{y} \right) \right)^{2} \right| \mathbf{y} \right] \end{align*} The necessary and sufficient condition for this equation to hold is ■code.■ h(y)=E(Y(s0)y)h \left( \mathbf{y} \right) = E \left( Y \left( s_{0} \right) | \mathbf{y} \right).


Part 2. Conditional Multivariate Normal Distribution

Assuming that the given data follows a Multivariate Normal Distribution Nn(0,Σ)N_{n} \left( \mathbf{0} , \Sigma \right), and adding a new ■code.■ Y0:=Y(S0)N(μ0,σ02)Y_{0} := Y \left( S_{0} \right) \sim N \left( \mu_{0} , \sigma_{0}^{2} \right) still follows a (1+n)(1+n) multivariate normal distribution, we start with this assumption. Represented in block matrix form, it looks like the following. [Y0Y1Yn]=[Y0Y]N1+n([μ0μ],[σ02γTγΣ]) \begin{bmatrix} Y_{0} \\ Y_{1} \\ \vdots \\ Y_{n} \end{bmatrix} = \begin{bmatrix} Y_{0} \\ \mathbf{Y} \end{bmatrix} \sim N_{1+n} \left( \begin{bmatrix} \mu_{0} \\ \mu \end{bmatrix} , \begin{bmatrix} \sigma_{0}^{2} & \gamma^{T} \\ \gamma & \Sigma \end{bmatrix} \right)

Conditional Mean and Variance of Multivariate Normal Distribution: X=[X1X2]:ΩRnμ=[μ1μ2]RnΣ=[Σ11Σ12Σ21Σ22]Rn×n \begin{align*} \mathbf{X} =& \begin{bmatrix} \mathbf{X}_{1} \\ \mathbf{X}_{2} \end{bmatrix} & : \Omega \to \mathbb{R}^{n} \\ \mu =& \begin{bmatrix} \mu_{1} \\ \mu_{2} \end{bmatrix} & \in \mathbb{R}^{n} \\ \Sigma =& \begin{bmatrix} \Sigma_{11} & \Sigma_{12} \\ \Sigma_{21} & \Sigma_{22} \end{bmatrix} & \in \mathbb{R}^{n \times n} \end{align*} Given a Random Vector XNn(μ,Σ)\mathbf{X} \sim N_{n} \left( \mu , \Sigma \right) following a Multivariate Normal Distribution represented in Jordan Block Form as X\mathbf{X}, μ\mu, Σ\Sigma. Then, the Conditional Probability Vector X1X2:ΩRm\mathbf{X}_{1} | \mathbf{X}_{2} : \Omega \to \mathbb{R}^{m} still follows a multivariate normal distribution, specifically having the following Population Mean Vector and Population Covariance Matrix. X1X2Nm(μ1+Σ12Σ221(X2μ2),Σ11Σ12Σ221Σ21) \mathbf{X}_{1} | \mathbf{X}_{2} \sim N_{m} \left( \mu_{1} + \Sigma_{12} \Sigma_{22}^{-1} \left( \mathbf{X}_{2} - \mu_{2} \right) , \Sigma_{11} - \Sigma_{12} \Sigma_{22}^{-1} \Sigma_{21} \right)

Therefore, given the data Y\mathbf{Y}, the expected value of Y0Y_{0} at s0s_{0} regarding ■code.■ x0=x(s0)Rp\mathbf{x}_{0} = \mathbf{x} \left( s_{0} \right) \in \mathbb{R}^{p} is E(Y(s0)Y)=μ0+γTΣ1(YXβ)=x0Tβ+γTΣ1(YXβ) \begin{align*} & E \left( Y \left( s_{0} \right) | \mathbf{Y} \right) \\ =& \mu_{0} + \gamma^{T} \Sigma^{-1} \left( \mathbf{Y} - \mathbf{X} \beta \right) \\ =& \mathbf{x}_{0}^{T} \beta + \gamma^{T} \Sigma^{-1} \left( \mathbf{Y} - \mathbf{X} \beta \right) \end{align*} Meanwhile, the Vector of Regression Coefficients β\beta cannot be just determined by OLS [Ordinary Least Square] β^=(XTX)1XTY\hat{\beta} = \left( \mathbf{X}^{T} \mathbf{X} \right)^{-1} \mathbf{X}^{T} \mathbf{Y} since the components of ε\varepsilon following a multivariate normal distribution are not independent. It requires the use of WLS [Weighted Least Square] β^=(XTΣ1X)1XTΣ1Y\hat{\beta} = \left( \mathbf{X}^{T} \Sigma^{-1} \mathbf{X} \right)^{-1} \mathbf{X}^{T} \Sigma^{-1} \mathbf{Y}. E(Y(s0)Y)=x0Tβ+γTΣ1YγTΣ1Xβ=x0Tβ+γTΣ1YγTΣ1Xβ=γTΣ1Y+x0T(XTΣ1X)1XTΣ1YγTΣ1X(XTΣ1X)1XTΣ1Y \begin{align*} & E \left( Y \left( s_{0} \right) | \mathbf{Y} \right) \\ =& \mathbf{x}_{0}^{T} \beta + \gamma^{T} \Sigma^{-1} \mathbf{Y} - \gamma^{T} \Sigma^{-1} \mathbf{X} \beta \\ =& \mathbf{x}_{0}^{T} \beta \\ & + \gamma^{T} \Sigma^{-1} \mathbf{Y} \\ & - \gamma^{T} \Sigma^{-1} \mathbf{X} \beta \\ =& \gamma^{T} \Sigma^{-1} \mathbf{Y} \\ & + \mathbf{x}_{0}^{T} \left( \mathbf{X}^{T} \Sigma^{-1} \mathbf{X} \right)^{-1} \mathbf{X}^{T} \Sigma^{-1} \mathbf{Y} \\ & - \gamma^{T} \Sigma^{-1} \mathbf{X} \left( \mathbf{X}^{T} \Sigma^{-1} \mathbf{X} \right)^{-1} \mathbf{X}^{T} \Sigma^{-1} \mathbf{Y} \end{align*} Here, Σ\Sigma being a Symmetric Matrix implies (Σ1)T=Σ1\left( \Sigma^{-1} \right)^{T} = \Sigma^{-1}, and combining with Y\mathbf{Y}, we get the following form λTY\lambda^{T} \mathbf{Y}. γTΣ1Y+x0T(XTΣ1X)1XTΣ1YγTΣ1X(XTΣ1X)1XTΣ1Y=[Σ1γ+Σ1X(XTΣ1X)1(x0XTΣ1γ)]TY=λTY \begin{align*} & \gamma^{T} \Sigma^{-1} \mathbf{Y} \\ & + \mathbf{x}_{0}^{T} \left( \mathbf{X}^{T} \Sigma^{-1} \mathbf{X} \right)^{-1} \mathbf{X}^{T} \Sigma^{-1} \mathbf{Y} \\ & - \gamma^{T} \Sigma^{-1} \mathbf{X} \left( \mathbf{X}^{T} \Sigma^{-1} \mathbf{X} \right)^{-1} \mathbf{X}^{T} \Sigma^{-1} \mathbf{Y} \\ = & \left[ \Sigma^{-1} \gamma + \Sigma^{-1} \mathbf{X} \left( \mathbf{X}^{T} \Sigma^{-1} \mathbf{X} \right)^{-1} \left( \mathbf{x}_{0} - \mathbf{X}^{T} \Sigma^{-1} \gamma \right) \right]^{T} \mathbf{Y} \\ = & \lambda^{T} \mathbf{Y} \end{align*}

In actual analysis, estimates for σ02\sigma_{0}^{2} and Σ\Sigma are determined through the Semivariogram Model as follows. σ02=σ2+τ2Σ=σ2H(ϕ)+τ2I \begin{align*} \sigma_{0}^{2} =& \sigma^{2} + \tau^{2} \\ \Sigma =& \sigma^{2} H \left( \phi \right) + \tau^{2} I \end{align*} Here, τ2\tau^{2} is the nugget effect variance (seen as the basic covariance despite the distance in real data), and II is the Identity Matrix.


  1. Banerjee. (2015). Hierarchical Modeling and Analysis for Spatial Data(2nd Edition): p41~42. ↩︎