Universal Kriging

Model

Ordinary Kriging

In Spatial Data Analysis, for a Random Field $\mathbf{Y} = \left( Y \left( s_{1} \right) , \cdots , Y \left( s_{n} \right) \right)$ that follows a Multivariate Normal Distribution $\varepsilon \sim N_{n} \left( \mathbf{0} , \Sigma \right)$ with Mean $\mu \in \mathbb{R}$ and Covariance Matrix $\Sigma \in \mathbb{R}^{n \times n}$ , the following model $\mathbf{Y} = \mu \mathbf{1} + \varepsilon$ is used to estimate ■code.■ at a new ■code. Site ■code.■ $s_{0}$ . This estimated value is called the Ordinary Kriging Estimate. The act of setting up this model for estimation is also referred to as Kriging.

Universal Kriging

$\mathbf{X} := \begin{bmatrix} 1 & X_{1} \left( s_{1} \right) & \cdots & X_{p} \left( s_{1} \right) \\ 1 & X_{1} \left( s_{2} \right) & \cdots & X_{p} \left( s_{2} \right) \\ \vdots & \vdots & \ddots & \vdots \\ 1 & X_{1} \left( s_{n} \right) & \cdots & X_{p} \left( s_{n} \right) \end{bmatrix}$ For an independent variable $X_{1} , \cdots , X_{p}$ , when using the following Design Matrix and the Vector of Regression Coefficients $\beta = \left( \beta_{0} , \beta_{1} , \cdots , \beta_{p} \right)$ for a Multivariate Normal Distribution $\varepsilon \sim N_{n} \left( \mathbf{0} , \Sigma \right)$ , the following model $\mathbf{Y} = \mathbf{X} \beta + \varepsilon$ is used to estimate ■code.■ using ■code.■ at a new ■code. Site ■code.■ $s_{0}$ . This process is called Universal Kriging.

$\mathbf{1} = (1 , \cdots , 1)$ is a 1-vector where all elements are $1$ .
$N_{n} \left( \mathbf{0} , \Sigma \right)$ represents the Multivariate Normal Distribution.

Explanation

Universal Kriging, simply put, is a type of Kriging that incorporates Multiple Regression Analysis into Kriging. In Ordinary Kriging, $\mu$ plays the same role as $\beta_{0}$ in Universal Kriging. If we compare it to Time Series Analysis, it can be seen as a Dynamic Regression Model, where we consider not only the spatial structure but also other variables that can explain the variable of interest $Y$ .

If we look at the equations, the regression part might seem a bit different, but if we move $\mathbf{X} \beta$ to the left-hand side to make $\left( \mathbf{Y} - \mathbf{X} \beta \right) = \mathbf{0} + \varepsilon$ it’s essentially the same as Ordinary Kriging where the mean vector is a zero vector $\mu = \mathbf{0}$ .

Formulas

Let’s say the Random Field $\left\{ Y (s_{k}) \right\}_{k=1}^{n}$ is an Intrinsic Stationary Spatial Process, and we want to predict a new point denoted as $s_{0}$ . Let’s define Vector $\gamma \in \mathbb{R}^{n}$ according to Variogram $2 \gamma$ as follows. For some vector $\lambda = \left(\lambda_{1} , \cdots ,\lambda_{n} \right)$ , the Best Linear Unbiased Predictor (BLUP) of ■code.■ is $\lambda^{T} \mathbf{Y} = \begin{bmatrix} \lambda_{1} & \cdots & \lambda_{n} \end{bmatrix} \begin{bmatrix} Y \left( s_{1} \right)\\ Y \left( s_{2} \right)\\ \vdots\\ Y \left( s_{n} \right) \end{bmatrix} = \sum_{k=1}^{n} \lambda_{k} Y \left( s_{k} \right)$ and vector $\lambda$ is specifically calculated as follows. $\lambda = \Sigma^{-1} \gamma + \Sigma^{-1} \mathbf{X} \left( \mathbf{X}^{T} \Sigma^{-1} \mathbf{X} \right)^{-1} \left( \mathbf{x}_{0} - \mathbf{X}^{T} \Sigma^{-1} \gamma \right)$

Derivation ¹

Part 1. Conditional Expectation

$E \left[ \left. \left( Y \left( s_{0} \right) - h \left( \mathbf{y} \right) \right)^{2} \right| \mathbf{y} \right]$ Our goal is to find ■code.■ that minimizes the above error, given data $\mathbf{y} = \left( y \left( s_{1} \right) , \cdots , y \left( s_{n} \right) \right)$ . $\begin{align*} & E \left[ \left. \left( Y \left( s_{0} \right) - h \left( \mathbf{y} \right) \right)^{2} \right| \mathbf{y} \right] \\ =& E \left[ \left. \left( Y \left( s_{0} \right) - E \left( Y \left( s_{0} \right) | \mathbf{y} \right) \right)^{2} \right| \mathbf{y} \right] + \left[ E \left( Y \left( s_{0} \right) | \mathbf{y} \right) - h \left( \mathbf{y} \right) \right]^{2} \\ \ge & E \left[ \left. \left( Y \left( s_{0} \right) - E \left( Y \left( s_{0} \right) | \mathbf{y} \right) \right)^{2} \right| \mathbf{y} \right] \end{align*}$ The necessary and sufficient condition for this equation to hold is ■code.■ $h \left( \mathbf{y} \right) = E \left( Y \left( s_{0} \right) | \mathbf{y} \right)$ .

Part 2. Conditional Multivariate Normal Distribution

Assuming that the given data follows a Multivariate Normal Distribution $N_{n} \left( \mathbf{0} , \Sigma \right)$ , and adding a new ■code.■ $Y_{0} := Y \left( S_{0} \right) \sim N \left( \mu_{0} , \sigma_{0}^{2} \right)$ still follows a $(1+n)$ multivariate normal distribution, we start with this assumption. Represented in block matrix form, it looks like the following. $\begin{bmatrix} Y_{0} \\ Y_{1} \\ \vdots \\ Y_{n} \end{bmatrix} = \begin{bmatrix} Y_{0} \\ \mathbf{Y} \end{bmatrix} \sim N_{1+n} \left( \begin{bmatrix} \mu_{0} \\ \mu \end{bmatrix} , \begin{bmatrix} \sigma_{0}^{2} & \gamma^{T} \\ \gamma & \Sigma \end{bmatrix} \right)$

Conditional Mean and Variance of Multivariate Normal Distribution: $\begin{align*} \mathbf{X} =& \begin{bmatrix} \mathbf{X}_{1} \\ \mathbf{X}_{2} \end{bmatrix} & : \Omega \to \mathbb{R}^{n} \\ \mu =& \begin{bmatrix} \mu_{1} \\ \mu_{2} \end{bmatrix} & \in \mathbb{R}^{n} \\ \Sigma =& \begin{bmatrix} \Sigma_{11} & \Sigma_{12} \\ \Sigma_{21} & \Sigma_{22} \end{bmatrix} & \in \mathbb{R}^{n \times n} \end{align*}$ Given a Random Vector $\mathbf{X} \sim N_{n} \left( \mu , \Sigma \right)$ following a Multivariate Normal Distribution represented in Jordan Block Form as $\mathbf{X}$ , $\mu$ , $\Sigma$ . Then, the Conditional Probability Vector $\mathbf{X}_{1} | \mathbf{X}_{2} : \Omega \to \mathbb{R}^{m}$ still follows a multivariate normal distribution, specifically having the following Population Mean Vector and Population Covariance Matrix. $\mathbf{X}_{1} | \mathbf{X}_{2} \sim N_{m} \left( \mu_{1} + \Sigma_{12} \Sigma_{22}^{-1} \left( \mathbf{X}_{2} - \mu_{2} \right) , \Sigma_{11} - \Sigma_{12} \Sigma_{22}^{-1} \Sigma_{21} \right)$

Therefore, given the data $\mathbf{Y}$ , the expected value of $Y_{0}$ at $s_{0}$ regarding ■code.■ $\mathbf{x}_{0} = \mathbf{x} \left( s_{0} \right) \in \mathbb{R}^{p}$ is $\begin{align*} & E \left( Y \left( s_{0} \right) | \mathbf{Y} \right) \\ =& \mu_{0} + \gamma^{T} \Sigma^{-1} \left( \mathbf{Y} - \mathbf{X} \beta \right) \\ =& \mathbf{x}_{0}^{T} \beta + \gamma^{T} \Sigma^{-1} \left( \mathbf{Y} - \mathbf{X} \beta \right) \end{align*}$ Meanwhile, the Vector of Regression Coefficients $\beta$ cannot be just determined by OLS [Ordinary Least Square] $\hat{\beta} = \left( \mathbf{X}^{T} \mathbf{X} \right)^{-1} \mathbf{X}^{T} \mathbf{Y}$ since the components of $\varepsilon$ following a multivariate normal distribution are not independent. It requires the use of WLS [Weighted Least Square] $\hat{\beta} = \left( \mathbf{X}^{T} \Sigma^{-1} \mathbf{X} \right)^{-1} \mathbf{X}^{T} \Sigma^{-1} \mathbf{Y}$ . $\begin{align*} & E \left( Y \left( s_{0} \right) | \mathbf{Y} \right) \\ =& \mathbf{x}_{0}^{T} \beta + \gamma^{T} \Sigma^{-1} \mathbf{Y} - \gamma^{T} \Sigma^{-1} \mathbf{X} \beta \\ =& \mathbf{x}_{0}^{T} \beta \\ & + \gamma^{T} \Sigma^{-1} \mathbf{Y} \\ & - \gamma^{T} \Sigma^{-1} \mathbf{X} \beta \\ =& \gamma^{T} \Sigma^{-1} \mathbf{Y} \\ & + \mathbf{x}_{0}^{T} \left( \mathbf{X}^{T} \Sigma^{-1} \mathbf{X} \right)^{-1} \mathbf{X}^{T} \Sigma^{-1} \mathbf{Y} \\ & - \gamma^{T} \Sigma^{-1} \mathbf{X} \left( \mathbf{X}^{T} \Sigma^{-1} \mathbf{X} \right)^{-1} \mathbf{X}^{T} \Sigma^{-1} \mathbf{Y} \end{align*}$ Here, $\Sigma$ being a Symmetric Matrix implies $\left( \Sigma^{-1} \right)^{T} = \Sigma^{-1}$ , and combining with $\mathbf{Y}$ , we get the following form $\lambda^{T} \mathbf{Y}$ . $\begin{align*} & \gamma^{T} \Sigma^{-1} \mathbf{Y} \\ & + \mathbf{x}_{0}^{T} \left( \mathbf{X}^{T} \Sigma^{-1} \mathbf{X} \right)^{-1} \mathbf{X}^{T} \Sigma^{-1} \mathbf{Y} \\ & - \gamma^{T} \Sigma^{-1} \mathbf{X} \left( \mathbf{X}^{T} \Sigma^{-1} \mathbf{X} \right)^{-1} \mathbf{X}^{T} \Sigma^{-1} \mathbf{Y} \\ = & \left[ \Sigma^{-1} \gamma + \Sigma^{-1} \mathbf{X} \left( \mathbf{X}^{T} \Sigma^{-1} \mathbf{X} \right)^{-1} \left( \mathbf{x}_{0} - \mathbf{X}^{T} \Sigma^{-1} \gamma \right) \right]^{T} \mathbf{Y} \\ = & \lambda^{T} \mathbf{Y} \end{align*}$

■

In actual analysis, estimates for $\sigma_{0}^{2}$ and $\Sigma$ are determined through the Semivariogram Model as follows. $\begin{align*} \sigma_{0}^{2} =& \sigma^{2} + \tau^{2} \\ \Sigma =& \sigma^{2} H \left( \phi \right) + \tau^{2} I \end{align*}$ Here, $\tau^{2}$ is the nugget effect variance (seen as the basic covariance despite the distance in real data), and $I$ is the Identity Matrix.

Banerjee. (2015). Hierarchical Modeling and Analysis for Spatial Data(2nd Edition): p41~42. ↩︎

Universal Kriging

Model

Ordinary Kriging

Universal Kriging

Explanation

Formulas

Derivation 1

Derivation ¹