How to Analyze Time Series with ARIMA Models in R
Practice
Let’s load the built-in data WWWusage
in R and draw a graph to check it.
WWWusage
represents a time series data indicating the number of internet users a long time ago. To understand its trend, time series analysis is necessary. Among the time series analysis models, the most representative one is the ARIMA model. However, even within the ARIMA models, there are various methods to find the appropriate model.
Fortunately, in R, the forecast
package’s auto.arima()
function allows us to quickly and easily obtain a plausible model. It cannot be considered a proper analysis, but for simple analyses, using this function alone can complete more than half of the analysis. auto.arima()
finds the optimal model following the variable selection criteria and variable selection procedures.
Of course, this does not mean it is the best or the only model, and analysts must continue to search for the model that accurately fits the data. It’s true that auto.arima()
is useful enough to get a rough result even for skilled analysts, but relying solely on auto.arima()
should be avoided.
Precautions
As a caution, the ARIMA model provides results for data satisfying stationarity, meaning the number of differences required from $ARIMA ( p , d , q )$ to $d$. This means if the data needs transformation, one must input the transformed data, but if the data requires differencing, it should not be done before inputting it into auto.arima()
. Properly differencing the data allows obtaining the corresponding ARMA model $ARMA(p,q)$. It sounds complicated, but if it’s hard to understand, just remember these two simple rules: (1) If transformation is needed, it should be done before inputting it into auto.arima()
. (2) If differencing is needed, do not do it before inputting it into auto.arima()
.
Verification
Let’s check how well the model obtained fits the actual data. The black solid line is the actual data, and the red solid line is the fit. Though the data is simple, you can see that the model explains the data well.
Code
install.packages("forecast")
library(forecast)
win.graph(6,3); WWWusage; plot(WWWusage,main='WWWusage\')
out<-auto.arima(WWWusage); out
summary(out)
win.graph(6,3); WWWusage; plot(WWWusage,main='WWWusage\')
lines(out$fitted,col='red')