logo

Box-Cox Transformation 📂Statistical Test

Box-Cox Transformation

Buildup

$x > 0$ is referred to as a Box-Cox transformation on $g(x) := \begin{cases} \displaystyle {{ x^{\lambda} - 1 } \over { \lambda }} & , \lambda \ne 0 \\ \log x & , \lambda = 0 \end{cases}$.

$g$, originally known as Power Transformation, was introduced by Box and Cox, hence it is also called the Box-Cox transformation. The main uses of the Box-Cox transformation are to make data more normal-distribution-like or to stabilize variances of the data, which is useful for preprocessing data prior to employing analysis techniques that assume normality or require stationarity. To apply a power transformation, the condition that data must be positive is necessary, but usually, this is resolved by simply adding some value to shift data so that its minimum becomes positive. If this method is unsatisfactory or unappealing, one may consider the generalized Yeo-Johnson Transformation Yeo-Johnson Transformation for the entire real numbers.

Mathematically, since $\displaystyle \lim_{\lambda \to 0} {{ x^{\lambda} - 1 } \over { \lambda }} = \log x$, we can comfortably say that just knowing up to $\displaystyle g(x) = {{ x^{\lambda} - 1 } \over { \lambda }}$ is largely sufficient for recognition purposes. Also, $g$ is dependent on $\lambda$, so what we have is a family of functions $\left\{ g_{\lambda} : \lambda \in \mathbb{R} \right\}$. $g_{\lambda}$ changes form flexibly according to $\lambda$, and analysts must correctly decide $\lambda$ to suit their objectives.

Particularly noteworthy is $\lambda$ as in $0,1,2$. $g_{0} (x) = \log(x)$ is the most frequently used transformation, and since $\displaystyle g_{2} (x) = {{\sqrt{x} - 1} \over {2}}$ also $\displaystyle h(x) = {{x - 1} \over {2}}$ being a linear transformation, we can observe that $\displaystyle g_{2} (x) \approx \sqrt{x}$ is discovered. The fact that favorite roots $\sqrt{}$ and logs $\log$ are covered by the Box-Cox transformation is theoretically interesting and should be expected in practice.

In the case of $\lambda = 1$, it becomes $g(x) = x - 1$, effectively an identity transformation. Normally, seeing a value of $1$ dropped from the data is hardly considered a transformation. If the suitable $\lambda$ is found to be $1$ for the given data, this can be interpreted as ’no transformation necessary'.

Similarly, when considering time series analysis, calculating the confidence interval of $\lambda$ in the Box-Cox transformation can be seen as hypothesis testing whether the data variance is constant. If the confidence interval of $\lambda$ includes $1$, it means there’s no difference whether we transform the data or not. Not needing a transformation indicates that the variance is already constant.

Hypothesis Testing

Let’s assume we have data $\left\{ x_{t} \right\}$.

  • $H_{0}$: $\lambda = 1$ that is, data $\left\{ y_{t} \right\}$ is stationary.
  • $H_{1}$: $\lambda \ne 1$ that is, data $\left\{ y_{t} \right\}$ is not stationary.

It’s important to note that such diagnostics happen only regarding variance. Since it doesn’t concern the mean, a separate test is required.

Code

Practice

In R, the BoxCox.ar() function from the TSA package allows for easy hypothesis testing.

Let’s load the built-in data UKgas.

ukgas.png boxcoxukgas.png

UKgas records the quarterly consumption of gas in the UK, and as we can see, the fluctuation becomes more severe over the years. Meanwhile, the hypothesis testing result includes $0$ in its confidence interval, hence the Box-Cox transformation effectively becomes a log transformation.

logukgas.png boxcoxlogukgas.png

Taking the log actually stabilizes the variance quite significantly. To verify, another round of hypothesis testing reveals that $1$ is included in the confidence interval. This can be interpreted, at the confidence level of $95\%$, that no further transformation is necessary. However, as shown in the graphs, the variance isn’t completely stabilized and since $2$ is also included in the confidence interval, it can be reasonably decided to transform once more. The choice in such a situation ultimately depends on the confidence level, and if the confidence level is $95 \%$, it’s up to the analyst to decide.

Full Code

UKgas
win.graph(3.5,3.5); plot(UKgas,main='UKgas')
win.graph(3.5,3.5); BoxCox.ar(UKgas)
win.graph(3.5,3.5); plot(log(UKgas),main='log(UKgas)')
win.graph(3.5,3.5); BoxCox.ar(log(UKgas))