logo

Intervention Analysis 📂Statistical Analysis

Intervention Analysis

Buildup

1.png

The graph above represents the actual time series data of the fine dust concentration in Seoul in the year 2015. What immediately stands out to anyone looking at it is that there was a day, around the 50th marker or so, which means the end of February, when the fine dust concentration exceeded 500.

20190807_163906.png

Anyone somewhat familiar with handling data might first suspect this to be a measurement error, but surprisingly, it actually happened. There’s even a study conducted on that day (paper) (at that time, the term ‘fine dust’ wasn’t used as extensively, hence the use of the term ‘yellow dust’ in the title) and documents mentioning that day are still not hard to find.

The issue is, not from the perspective of someone in the field of atmospheric science, but from that of a statistical analyst, how to handle this. Since time series data has an inherent order, hastily labeling it as an outlier and excluding it would not mean it never existed but would create a missing value instead. However, including this massive outlier and trying to fit it into an ARIMA model could significantly impact the overall analysis.

Definition 1

One way to solve this issue is through intervention analysis, which is expressed through the following equation.

$$ Y_{t} = m_{t} + N_{t} $$ Here, $N_{t}$ represents an ARIMA process, and $m_{t}$ is a term that adjusts the mean―modifying the data’s level.

Description

For instance, taking the fine dust as an example, defining $m_{t}$ as $$ m_{t} := \begin{cases} 500 & , t = t_{0} \\ 0 & , t \ne t_{0} \end{cases} $$ to add about 500 just at that point $t_{0}$ would suffice. Since $m_{t}$ is $0$ at times other than $t = t_{0}$, fitting it with $Y_{t} = N_{t}$ would work. In this sense, adding $m_{t}$ to the usual ARIMA analysis can rightfully be called ‘intervention analysis.’ How exactly $m_{t}$ is defined varies with each dataset and analysis, and even the mathematical expression changes accordingly.


  1. Cryer. (2008). Time Series Analysis: With Applications in R(2nd Edition): p250. ↩︎