logo

Cross-Correlation Function 📂Statistical Analysis

Cross-Correlation Function

Definition 1

Let’s say $\left\{ X_{t} \right\}_{t=1}^{n}$, $\left\{ Y_{t} \right\}_{t=1}^{n}$ are stochastic processes.

  1. The following defined $\rho_{k}$ is called the cross-correlation function at lag $k$. $$ \rho_{k} (X,Y) := \text{cor} \left( X_{t} , Y_{t-k} \right) = \text{cor} \left( X_{t+k} , Y_{t} \right) $$
  2. The following defined $r_{k}$ is called the sample cross-correlation function at lag $k$. $$ r_{k} := {{ \sum \left( X_{t} - \overline{X} \right) \left( Y_{t-k} - \overline{Y} \right) } \over { \sqrt{ \sum \left( X_{t} - \overline{X} \right)^2 } \sqrt{ \left( Y_{t-k} - \overline{Y} \right)^2 } }} $$

Explanation

The cross-correlation function is a function for understanding the correlation between two time series data. Except for being applied to time series, it is essentially the Pearson correlation coefficient.

The sample CCF, $r_{k}$, is an estimate of the CCF, $\rho_{k}$, and if $\left\{ X_{t} \right\}_{t=1}^{n}$, $\left\{ Y_{t} \right\}_{t=1}^{n}$ are independent while having stationarity, it is said to follow a normal distribution as follows. $$ r_{k} \sim N \left( 0 , {{ 1 } \over { n}} \left[ 1 + 2 \sum_{k=1}^{\infty} \rho_{k} ( X , Y) \right] \right) $$ This can be used for hypothesis testing similar to regression analysis.

Test

Let’s say it’s $\displaystyle Y_{t} = e_{t} + \sum_{k=0}^{m} \beta_{k} X_{t-k}$.

  • $H_{0}$: $\beta_{k} = 0$ meaning, $X_{t}$ and $Y_{t-k}$ are uncorrelated.
  • $H_{1}$: $\beta_{k} \ne 0$ meaning, $X_{t}$ and $Y_{t-k}$ are correlated.

Interpretation

Under the null hypothesis, assuming both $\rho_{k} ( X , Y) = 0$ and $\displaystyle N \left( 0 , {{ 1 } \over { n }} \right)$, the standard error becomes $\displaystyle {{1} \over {\sqrt{n}}}$. Therefore, if there is a desire to perform hypothesis testing at the significance level $\alpha$, check if $| r_{k} |$ exceeds the confidence interval upper limit $\displaystyle {{z_{1- \alpha/2}} \over {\sqrt{n} }}$. If it does, it becomes a candidate for a significant lag; if not, it is considered to be uncorrelated.

See Also


  1. Cryer. (2008). Time Series Analysis: With Applications in R(2nd Edition): p261~262. ↩︎