logo

Cross-Correlation Function 📂Statistical Analysis

Cross-Correlation Function

Definition 1

Let’s say {Xt}t=1n\left\{ X_{t} \right\}_{t=1}^{n}, {Yt}t=1n\left\{ Y_{t} \right\}_{t=1}^{n} are stochastic processes.

  1. The following defined ρk\rho_{k} is called the cross-correlation function at lag kk. ρk(X,Y):=cor(Xt,Ytk)=cor(Xt+k,Yt) \rho_{k} (X,Y) := \text{cor} \left( X_{t} , Y_{t-k} \right) = \text{cor} \left( X_{t+k} , Y_{t} \right)
  2. The following defined rkr_{k} is called the sample cross-correlation function at lag kk. rk:=(XtX)(YtkY)(XtX)2(YtkY)2 r_{k} := {{ \sum \left( X_{t} - \overline{X} \right) \left( Y_{t-k} - \overline{Y} \right) } \over { \sqrt{ \sum \left( X_{t} - \overline{X} \right)^2 } \sqrt{ \left( Y_{t-k} - \overline{Y} \right)^2 } }}

Explanation

The cross-correlation function is a function for understanding the correlation between two time series data. Except for being applied to time series, it is essentially the Pearson correlation coefficient.

The sample CCF, rkr_{k}, is an estimate of the CCF, ρk\rho_{k}, and if {Xt}t=1n\left\{ X_{t} \right\}_{t=1}^{n}, {Yt}t=1n\left\{ Y_{t} \right\}_{t=1}^{n} are independent while having stationarity, it is said to follow a normal distribution as follows. rkN(0,1n[1+2k=1ρk(X,Y)]) r_{k} \sim N \left( 0 , {{ 1 } \over { n}} \left[ 1 + 2 \sum_{k=1}^{\infty} \rho_{k} ( X , Y) \right] \right) This can be used for hypothesis testing similar to regression analysis.

Test

Let’s say it’s Yt=et+k=0mβkXtk\displaystyle Y_{t} = e_{t} + \sum_{k=0}^{m} \beta_{k} X_{t-k}.

  • H0H_{0}: βk=0\beta_{k} = 0 meaning, XtX_{t} and YtkY_{t-k} are uncorrelated.
  • H1H_{1}: βk0\beta_{k} \ne 0 meaning, XtX_{t} and YtkY_{t-k} are correlated.

Interpretation

Under the null hypothesis, assuming both ρk(X,Y)=0\rho_{k} ( X , Y) = 0 and N(0,1n)\displaystyle N \left( 0 , {{ 1 } \over { n }} \right), the standard error becomes 1n\displaystyle {{1} \over {\sqrt{n}}}. Therefore, if there is a desire to perform hypothesis testing at the significance level α\alpha, check if rk| r_{k} | exceeds the confidence interval upper limit z1α/2n\displaystyle {{z_{1- \alpha/2}} \over {\sqrt{n} }}. If it does, it becomes a candidate for a significant lag; if not, it is considered to be uncorrelated.

See Also


  1. Cryer. (2008). Time Series Analysis: With Applications in R(2nd Edition): p261~262. ↩︎