Empirical Variogram 📂Statistical Analysis

Empirical Variogram

Buildup

Definition of the Variogram: Consider a spatial process $\left\{ Y(s) \right\}_{s \in D}$ which is a set of random variables $Y(s) : \Omega \to \mathbb{R}^{1}$ in a fixed subset $D \subset \mathbb{R}^{r}$ of the Euclidean space with direction vectors $\mathbf{h} \in \mathbb{R}^{r}$ . Specifically, represent $n \in \mathbb{N}$ sites as follows $\left\{ s_{1} , \cdots , s_{n} \right\} \subset D$ and assume that $Y(s)$ has a variance for all $s \in D$ . The following defined $2 \gamma ( \mathbf{h} )$ is called a Variogram. $2 \gamma ( \mathbf{h} ) := E \left[ Y \left( s + \mathbf{h} \right) - Y(s) \right]^{2}$ Especially, half of the variogram $\gamma ( \mathbf{h} )$ is called a Semivariogram.

In Spatial Data Analysis, the variogram is extremely important, but it’s impossible to perform calculations in all $\mathbf{h}$ in the real world, so we have no choice but to cut the data at appropriate intervals to gain some numerical values.

Definitions

Bin $B_{ij}$ ¹

When having $N$ pieces of data in $D \subset \mathbb{R}^{2}$ , obtain a total of $_{N} C_{2} = N(N-1)/2$ pairs of distances and calculate one unit of length $h_{x}$ and one unit of width $h_{y}$ , then divide into bins along the horizontal and vertical axes to get sets $B_{ij}$ . This is called an Empirical Semivariogram. $\gamma_{ij}^{\ast} = {{ 1 } \over { 2 \left| B_{ij} \right| }} \sum_{ \left\{ (k,l) : \left( s_{k} - s_{l} \right) \in B_{ij} \right\} } \left[ Y \left( s_{k} \right) - Y \left( s_{l} \right) \right]^{2}$ $D$ the Heatmap or Surface mapped to every $\left( x_{i}, y_{j} \right)$ position of $(i,j)$ as $\gamma_{ij}^{\ast}$ is called Empirical Semivariogram Contour (ESC).

Distance $N(h)$ ²

Regarding the set $N \left( h \right) := \left\{ \left( s_{k} , s_{l} \right) : \left\| s_{k} - s_{l} \right\| \approx h \right\}$ dependent on distance $h$ the following is called an Empirical Semivariogram. $\hat{\gamma} \left( h \right) = {{ 1 } \over { 2 \left| N \left( h \right) \right| }} \sum_{ \left( s_{k} , s_{l} \right) \in N \left( h \right) } \left[ Y \left( s_{k} \right) - Y \left( s_{l} \right) \right]^{2}$ The graph itself plotted with $h$ as the horizontal axis and $\hat{\gamma} \left( h \right)$ as the vertical axis is also referred to as a Semivariogram.

The absolute value symbols $\left| X \right|$ used in set $X$ mean the cardinality of the set.

Explanation

Essentially, both definitions are the same; the upper part is written more precisely, while the lower part is written more generally. There’s no difference which definition you look at unless you want to code yourself, and there’s no way the purposes for visualization can differ.

$\gamma_{ij}^{\ast} \to$ ESC

The ESC image is a visualization of the empirical variogram mapped to geography, helpful in exploratory data analysis, such as detecting anisotropy. If the contour lines appear to be round, it is isotropic; if they are elliptical, it indicates anisotropy.

$\hat{\gamma} (h) \to$ Semivariogram

Originally, the term -gram in semivariogram itself means a diagram, which is derived from the above figure. More detailed information will be covered in the post about Models of Semivariograms.

Banerjee. (2015). Hierarchical Modeling and Analysis for Spatial Data(2nd Edition): p39. ↩︎
https://juliaearth.github.io/GeoStats.jl/stable/variography/empirical.html ↩︎