Spearman Rank Correlation Coefficient r 📂Statistical Test

Spearman Rank Correlation Coefficient r

Definition ¹

Assume $n$ random samples (see random sample) are given as ordered pairs as in $(X, Y)$, and denote the rank of $X$ in the $k$-th sample by $X_{k}$ and the rank of $Y$ by $Y_{k}$. The ordered pair formed by their respective ranks can then be written as $\left\{ \left( X_{k} , Y_{k} \right) \right\}_{k=1}^{n}$. The quantity $r$ defined in this way is called the Spearman rank correlation coefficient. $$ r = 1 - \frac{ 6 \sum_{k=1}^{n} \left( X_{k} - Y_{k} \right)^{2} }{ n (n^{2} - 1) } $$

Explanation

Across statistics, the most widely used correlation coefficient is the Pearson correlation coefficient, while in the realm of nonparametric statistics the Spearman rank correlation coefficient is the most well-known.

Mean and variance of ranks: $$ \begin{align*} E \left( R \right) =& {\frac{ n + 1 }{ 2 }} \\ \Var \left( R \right) =& {\frac{ n^{2} - 1 }{ 12 }} \end{align*} $$

If one approaches $r$ algebraically its definition may look rather peculiar; in fact it starts from an intuitive form analogous to the Pearson correlation coefficient as follows, but because the sum of ranks is fixed to a constant $n(n+1)/2$ the result appears as an unexpectedly reduced form.

$$ \begin{align*} r =& {\frac{ S_{XY} }{ \sqrt{S_{XX} S_{YY}} }} \\ S_{XY} =& \sum_{k=1}^{n} \left( X_{k} - E \left( R_{X} \right) \right) \left( Y_{k} - E \left( R_{Y} \right) \right) \\ S_{XX} =& \sum_{k=1}^{n} \left( X_{k} - E \left( R_{X} \right) \right)^{2} \\ S_{YY} =& \sum_{k=1}^{n} \left( Y_{k} - E \left( R_{Y} \right) \right)^{2} \end{align*} $$

Hypothesis testing

The Spearman correlation coefficient is bounded in $[-1, 1]$, and there is a hypothesis test that uses $r$ itself.

$H_{0}$: The two rank-ordered pairs are not associated.
$H_{1}$: The two rank-ordered pairs are associated.

The rejection region is determined using precomputed tables similar to the Mann–Whitney test; in particular, for two-tailed tests values closer to $-1$ or $1$ are regarded as indicating association, whereas values closer to $0$ are regarded as indicating no association. This can be seen as a nonparametric test corresponding to the $t$-test among parametric methods (see ../2476).

Kendall rank correlation coefficient

As the next most famous rank correlation after Spearman’s, the following Kendall rank correlation coefficient $\tau$ is known. $$ \tau = \frac{ 2 }{ n (n-1) } \sum_{i<j} \sgn \left( X_{i} - X_{j} \right) \sgn \left( Y_{i} - Y_{j} \right) $$ Here $\sgn$ is the sign function.