logo

Autocorrelation Function 📂Statistical Analysis

Autocorrelation Function

Definition 1

Let’s say {Yt}t=1n\left\{ Y_{t} \right\}_{t=1}^{n} is a stochastic process.

  1. μt:=E(Yt)\mu_{t} := E ( Y_{t} ) is called the mean function.
  2. The following defined γt,s\gamma_{ t , s } is called the autocovariance function. γt,s:=cov(Yt,Ys)=E(Ytμt)E(Ysμs) \gamma_{t , s} : = \text{cov} ( Y_{t} , Y_{s} ) = E ( Y_{t} - \mu_{t} ) E ( Y_{s} - \mu_{s} )
  3. The following defined ρt,s\rho_{ t , s } is called the autocorrelation function. ρt,s:=cor(Yt,Ys)=γt,sγt,tγs,s \rho_{ t , s } := \text{cor} ( Y_{t} , Y_{s} ) = {{ \gamma_{t , s} } \over { \sqrt{ \gamma_{t , t} \gamma_{s , s} } }}
  4. The following defined ρk\rho_{ k } is called the autocorrelation function for lag kk. ρk:=cor(Yt,Ytk)=γt,tkγt,tγtk,tk \rho_{ k } := \text{cor} ( Y_{t} , Y_{t-k} ) = {{ \gamma_{t , t - k} } \over { \sqrt{ \gamma_{t , t} \gamma_{t-k , t-k} } }}
  5. The following defined rkr_{ k } is called the sample autocorrelation function for lag kk. rk:=t=k+1n(YtY)(YtkY)t=1n(YtY)2 r_{ k } := {{ \sum_{t = k+1}^{n} \left( Y_{t} - \overline{Y} \right) \left( Y_{t-k} - \overline{Y} \right) } \over { \sum_{t=1}^{n} \left( Y_{t} - \overline{Y} \right)^2 }}

Explanation

Autocorrelation function is a function for understanding the auto-correlation of time series data, focusing on how similar to itself it is, even if it’s the same variable, but at a certain lag. In contrast to the idea of regression analysis, which is interested in the correlation between different variables, it treats itself as divided into YtY_{t} and YtkY_{t-k} at lag kk like two variables.

Mathematical Explanation

Mathematically, if we think YtY_{t} came from MA(q)MA(q), then since it is Yt=etk=1qθketk\displaystyle Y_{t} = e_{t} - \sum_{k=1}^{q} \theta_{k} e_{t-k}, YtY_{t} can be viewed as a sum of several normal distributions, and since ρk\rho_{k} equals to θk\theta_{k}, it is useful for finding the MA(q)MA(q) model.

sACF rkr_{k} is an estimate of ACF ρk\rho_{k}, and if YtY_{t} came from the MA(q)MA(q) model, then when k>qk > q, it follows the normal distribution N(ρk,1n[1+2j=1qρj2]2)\displaystyle N \left( \rho_{k} , {{1} \over {n}} \left[ 1 + 2 \sum_{j=1}^{q} \rho_{j}^{2} \right]^2 \right). Expressed mathematically, it is rkN(ρk,1n[1+2j=1qρj2]2) r_{k} \sim N \left( \rho_{k} , {{1} \over {n}} \left[ 1 + 2 \sum_{j=1}^{q} \rho_{j}^{2} \right]^2 \right) which is used for hypothesis testing.

Tests

Given Yt=etk=1qθketk\displaystyle Y_{t} = e_{t} - \sum_{k=1}^{q} \theta_{k} e_{t-k} and assume k=1,,qk = 1 , \cdots , q.

  • H0H_{0}: MA(0)    θk=0MA(0) \iff \theta_{k} = 0, namely, YtY_{t} does not follow the moving average model.
  • H1H_{1}: MA(k)    θk0MA(k) \iff \theta_{k} \ne 0, namely, YtY_{t} has an autocorrelation at lag kk.

Interpretation

Under the null hypothesis, since ρk=θk=0\rho_{k} = \theta_{k} = 0 for all kk, assume q=0q = 0 and rkN(0,1N)\displaystyle r_{k} \sim N \left( 0 , {{1} \over {N }} \right), and the standard error becomes 1n\displaystyle {{1} \over {\sqrt{n} }}. Therefore, if you want to conduct a hypothesis test at the significance level α\alpha, check whether θk| \theta_{k} | exceeds the upper confidence limit z1α/2n\displaystyle {{ z_{1 - \alpha/2} } \over { \sqrt{n} }}. If it exceeds, it becomes a candidate for significant lag; if not, it is considered to have no autocorrelation.

Practice

20190723\_125305.png

The ma1.2.s data is a sample data from the TSA package derived from the MA(1)MA(1) model. When analyzing with the actual ARIMA model, the significance of the coefficient is also determined based on whether the absolute value of the estimate exceeds twice the standard error.

1.png

Using the acf() function of the TSA package, it produces a correlogram for various kk like the above. Without having to calculate in your head, if it exceeds the line, it is considered significant; if not, it is considered not significant. It is calculated at a default significance level 5%5 \%.

Note that, even if it slightly exceeds k=6k=6, it is statistically significant, but it is not considered to have an actual autocorrelation. Such cases of slight exceeding are very common in time series analysis, and for mental health, it is recommended to show flexibility and accept it as it is.

2.png

Drawing lines yourself as shown above is recommended as a way to confirm if you properly understood hypothesis testing using the autocorrelation function. With just one line of code in R, by running it at least once yourself, you can accept that rkr_{k} follows a normal distribution, and the standard error is calculated as se(rk)=1n\displaystyle \text{se} ( r_{k} ) = {{1} \over {\sqrt{n}}} without complicated formulas.

Code

library(TSA)
data(ma1.2.s); win.graph(6,4); acf(ma1.2.s)
arima(ma1.2.s, order=c(0,0,1))
abline(h=1.96*1/sqrt(length(ma1.2.s)),col='red')

See Also


  1. Cryer. (2008). Time Series Analysis: With Applications in R(2nd Edition): p11, 109. ↩︎