logo

Autocorrelation Function 📂Statistical Analysis

Autocorrelation Function

Definition 1

Let {Yt}t=1n\left\{ Y_{t} \right\}_{t=1}^{n} be a stochastic process, and for lag kk, let the residuals obtained by regressing YtY_{t} on Yt1,,Yt(k1)Y_{t-1}, \cdots , Y_{t-(k-1)} be et^\widehat{e_{t}}, and the residuals obtained by regressing YtkY_{t-k} on Yt1,,Yt(k1)Y_{t-1}, \cdots , Y_{t-(k-1)} be etk^\widehat{e_{t-k}}.

  1. The following defined ϕkk\phi_{kk} is referred to as the partial autocovariance function at lag kk. ϕkk:=cor(et^,etk^) \phi_{kk} := \text{cor} ( \widehat{e_{t}} , \widehat{e_{t-k}} )
  2. The following defined ϕkk\phi_{kk} is referred to as the sample partial autocovariance function at lag kk. ϕkk^:=rkj=1k1ϕ(k1),jrkj1j=1k1ϕ(k1),jrjϕk,j:=ϕ(k1),jϕkkϕ(k1),(kj) \widehat{ \phi_{kk} } := {{ r_{k} - \sum_{j=1}^{k-1} \phi_{(k-1),j} r_{k-j} } \over { 1 - \sum_{j=1}^{k-1} \phi_{(k-1),j} r_{j} }} \\ \phi_{k,j} := \phi_{(k-1),j} - \phi_{kk} \phi_{(k-1),(k-j)}

Description

Partial autocorrelation function is about understanding the autocorrelation while eliminating the influence of Yt1,,Yt(k1)Y_{t-1}, \cdots , Y_{t-(k-1)} that lies between YtY_{t} and YtkY_{t-k}, focusing solely on the relationship between the two. Although the definition might initially appear complicated with sudden mentions of regression analysis, the concept is actually simple. Consider only et^\widehat{e_{t}}. Regressing YtY_{t} on Yt1,,Ytk+1Y_{t-1}, \cdots , Y_{t-k+1} means to find the value of β1,,βk1\beta_{1} , \cdots , \beta_{k-1} that fits into the following equation: Yt=β1Yt1+βk1Yt(k1)+et^ Y_{t} = \beta_{1} Y_{t-1} + \cdots \beta_{k-1} Y_{t-(k-1)} + \widehat{e_{t}} To rewrite it, et^=Yt(β1Yt1+βk1Yt(k1)) \widehat{e_{t}} = Y_{t} - \left( \beta_{1} Y_{t-1} + \cdots \beta_{k-1} Y_{t-(k-1)} \right) This implies that the parts of et^\widehat{e_{t}} that can be explained by Yt1,,Yt(k1)Y_{t-1}, \cdots , Y_{t-(k-1)} are eliminated. Similarly, etk^\widehat{e_{t-k}} has also removed any portion that could be explained by Yt1,,Yt(k1)Y_{t-1}, \cdots , Y_{t-(k-1)}, meaning calculating cor(et^,etk^)\text{cor} ( \widehat{e_{t}} , \widehat{e_{t-k}} ) is essentially examining the correlation solely between YtY_{t} and YtkY_{t-k} without Yt1,,Yt(k1)Y_{t-1}, \cdots , Y_{t-(k-1)}. This focus only on the variables of interest is why the term ‘partial’ autocorrelation function is fitting. [ NOTE: Despite the simplicity of the concept, calculating the sPACF was quite challenging until Levinson and Durbin proposed a method that facilitated the recursive calculation of ϕkk^\widehat{ \phi_{kk} }. ]

Mathematical Explanation

Mathematically, considering that YtY_{t} comes from AR(p)AR(p), and since Yt=k=1pϕkYtk+et\displaystyle Y_{t} = \sum_{k=1}^{p} \phi_{k} Y_{t-k} + e_{t}, calculating the coefficient ϕk\phi_{k} for YtkY_{t-k} by excluding other variables aids in identifying the AR(p)AR(p) model.

sPACF ϕkk^\widehat{\phi_{kk}} is an estimator for PACF ϕkk\phi_{kk}, and if YtY_{t} originates from a AR(p)AR(p) model, then for k>pk>p, it follows a normal distribution N(0,1n)\displaystyle N \left( 0 , {{ 1 } \over { n }} \right). Represented as ϕkk^N(0,1n) \widehat{\phi_{kk}} \sim N \left( 0 , {{ 1 } \over { n }} \right) , this is utilized for hypothesis testing.

Test

Given Yt=k=1pϕkYtk+et\displaystyle Y_{t} = \sum_{k=1}^{p} \phi_{k} Y_{t-k} + e_{t} and assuming k=1,,pk = 1 , \cdots , p,

  • H0H_{0}: AR(0)    θk=0AR(0) \iff \theta_{k} = 0, meaning, YtY_{t} does not follow an autoregressive model.
  • H1H_{1}: AR(k)    θk0AR(k) \iff \theta_{k} \ne 0, meaning, YtY_{t} has a partial autocorrelation at lag kk.

Interpretation

Under the null hypothesis, both p=0p=0 and ϕkk^N(0,1n)\widehat{\phi_{kk}} \sim N \left( 0 , {{ 1 } \over { n }} \right) are assumed, and the standard error becomes 1n\displaystyle {{1} \over {\sqrt{n}}}. Therefore, if wanting to perform hypothesis testing at significance level α\alpha, check if ϕk| \phi_{k} | surpasses the upper confidence limit z1α/2n\displaystyle {{ z_{1 - \alpha/2} } \over { \sqrt{n} }}. If it exceeds, it’s considered a significant lag; if not, it’s deemed to have no partial autocorrelation.

Practice

20190724\_101017.png

ar1.s data comes from a AR(1)AR(1) model in the TSA package. When analyzing with an actual ARIMA model, it’s also crucial to determine if the absolute value of the estimate exceeds twice the standard error to consider it significant.

3.png

Additionally, using the acf() function in the TSA package draws a correlogram for various kk like above. Without needing to mentally calculate, if it exceeds the line, it’s significant; if not, it’s considered insignificant. It’s typically calculated at significance level 5%5 \%.

4.png

The method of directly drawing lines as shown above is recommended to verify understanding of hypothesis testing using the partial autocorrelation function. Though it’s just one line of code in R, executing it even once allows acceptance that ϕ^kk\widehat{\phi}_{kk} follows a normal distribution, with its standard error calculated as se(rk)=1n\displaystyle \text{se} ( r_{k} ) = {{1} \over {\sqrt{n}}}.

Code

library(TSA)
data(ar1.s); win.graph(6,4); pacf(ar1.s)
arima(ar1.s, order=c(1,0,0))
abline(h=1.96*1/sqrt(length(ar1.s)),col='red')

See Also


  1. Cryer. (2008). Time Series Analysis: With Applications in R(2nd Edition): p112. ↩︎