logo

Empirical Variogram 📂Statistical Analysis

Empirical Variogram

Buildup

Definition of the Variogram: Consider a spatial process {Y(s)}sD\left\{ Y(s) \right\}_{s \in D} which is a set of random variables Y(s):ΩR1Y(s) : \Omega \to \mathbb{R}^{1} in a fixed subset DRrD \subset \mathbb{R}^{r} of the Euclidean space with direction vectors hRr\mathbf{h} \in \mathbb{R}^{r}. Specifically, represent nNn \in \mathbb{N} sites as follows {s1,,sn}D\left\{ s_{1} , \cdots , s_{n} \right\} \subset D and assume that Y(s)Y(s) has a variance for all sDs \in D. The following defined 2γ(h)2 \gamma ( \mathbf{h} ) is called a Variogram. 2γ(h):=E[Y(s+h)Y(s)]2 2 \gamma ( \mathbf{h} ) := E \left[ Y \left( s + \mathbf{h} \right) - Y(s) \right]^{2} Especially, half of the variogram γ(h)\gamma ( \mathbf{h} ) is called a Semivariogram.

In Spatial Data Analysis, the variogram is extremely important, but it’s impossible to perform calculations in all h\mathbf{h} in the real world, so we have no choice but to cut the data at appropriate intervals to gain some numerical values.

Definitions

Bin BijB_{ij}1

When having NN pieces of data in DR2D \subset \mathbb{R}^{2}, obtain a total of NC2=N(N1)/2_{N} C_{2} = N(N-1)/2 pairs of distances and calculate one unit of length hxh_{x} and one unit of width hyh_{y}, then divide into bins along the horizontal and vertical axes to get sets BijB_{ij}. This is called an Empirical Semivariogram. γij=12Bij{(k,l):(sksl)Bij}[Y(sk)Y(sl)]2 \gamma_{ij}^{\ast} = {{ 1 } \over { 2 \left| B_{ij} \right| }} \sum_{ \left\{ (k,l) : \left( s_{k} - s_{l} \right) \in B_{ij} \right\} } \left[ Y \left( s_{k} \right) - Y \left( s_{l} \right) \right]^{2} DD the Heatmap or Surface mapped to every (xi,yj)\left( x_{i}, y_{j} \right) position of (i,j)(i,j) as γij\gamma_{ij}^{\ast} is called Empirical Semivariogram Contour (ESC).

Distance N(h)N(h)2

Regarding the set N(h):={(sk,sl):skslh}N \left( h \right) := \left\{ \left( s_{k} , s_{l} \right) : \left\| s_{k} - s_{l} \right\| \approx h \right\} dependent on distance hh the following is called an Empirical Semivariogram. γ^(h)=12N(h)(sk,sl)N(h)[Y(sk)Y(sl)]2 \hat{\gamma} \left( h \right) = {{ 1 } \over { 2 \left| N \left( h \right) \right| }} \sum_{ \left( s_{k} , s_{l} \right) \in N \left( h \right) } \left[ Y \left( s_{k} \right) - Y \left( s_{l} \right) \right]^{2} The graph itself plotted with hh as the horizontal axis and γ^(h)\hat{\gamma} \left( h \right) as the vertical axis is also referred to as a Semivariogram.


  • The absolute value symbols X\left| X \right| used in set XX mean the cardinality of the set.

Explanation

Essentially, both definitions are the same; the upper part is written more precisely, while the lower part is written more generally. There’s no difference which definition you look at unless you want to code yourself, and there’s no way the purposes for visualization can differ.

γij\gamma_{ij}^{\ast} \to ESC

The ESC image is a visualization of the empirical variogram mapped to geography, helpful in exploratory data analysis, such as detecting anisotropy. If the contour lines appear to be round, it is isotropic; if they are elliptical, it indicates anisotropy.

γ^(h)\hat{\gamma} (h) \to Semivariogram

Originally, the term -gram in semivariogram itself means a diagram, which is derived from the above figure. More detailed information will be covered in the post about Models of Semivariograms.


  1. Banerjee. (2015). Hierarchical Modeling and Analysis for Spatial Data(2nd Edition): p39. ↩︎

  2. https://juliaearth.github.io/GeoStats.jl/stable/variography/empirical.html ↩︎