Empirical Orthogonal Function Analysis: EOF 📂Statistical Analysis

Empirical Orthogonal Function Analysis: EOF

Definition

시계열 데이터에 대한 주성분 분석을 경험적 직교 함수 분석이라 한다.

Explanation ¹ ²

EOF analysis is a statistical technique that seeks to summarize most of the original patterns by a relatively small number of uncorrelated independent variables that dominantly explain the data. In spirit it lies somewhere between principal component analysis and the Fourier transform, and its uses and strengths/weaknesses are similar to those methods.

In practice it is essentially PCA because implementations typically use singular value decomposition; the main reason people call it EOF rather than PCA is a difference of field. EOFs are often used in ocean-related contexts, applied to multivariate time-series data spanning large oceanic regions.

From the viewpoint of Fourier analysis, EOF aims to represent the trigonometric functions that serve as a basis as linear combinations of data-driven empirical functions $f_{k} : \mathbb{R} \to \mathbb{R}$. This may seem somewhat unfamiliar to mathematicians accustomed to the general (functional-analytic) notion of a function, but it is useful to remember that in geosciences a function is often thought of at high-school level as a mapping that takes one number to another.

The design matrix $X \in \mathbb{R}^{m \times n}$ is viewed as a panel dataset consisting of $n$ locations and $i = 1 , \cdots , m$ time steps. From the singular value decomposition of this matrix we obtain a matrix $U, \Sigma, V$, and the $j$-th column vector of $U$ is then the $j$-th EOF. The $j$-th singular value is interpreted, as in PCA, as a measure of the importance of the $j$-th EOF.

Example

This dataset is sea surface temperature (SST) data collected at 150 locations in the East Sea over five and a half years, obtainable from 하이컴.

alt text

If the original data look like the above, the entire dataset can be almost entirely explained by the first two EOFs as shown below.

alt text

To obtain the $U$ that produces such a result, simply apply singular value decomposition to $X \in \mathbb{R}^{m \times 150}$. The following code is written in Julia, but the language is not important.

U, S, V = svd(Matrix(data))
plot(
    plot(U[:, 1]),
    plot(U[:, 2]),
)