Mann-Whitney U Test

Hypothesis testing ¹

Let two continuous random variables $X, Y$ be given, and suppose a total of $\left( n_{1} + n_{2} \right)$ random samples are sampled from two types of distributions as $X_{1} , \cdots , X_{n_{1}} \overset{\text{iid}}{\sim} X$ and $Y_{1} , \cdots , Y_{n_{2}} \overset{\text{iid}}{\sim} Y$. For convenience, denote $X$ as the first population and $Y$ as the second population.

Let the cumulative distribution functions of $X$ and $Y$ be $F_{X}$ and $F_{Y}$ respectively, and assume they belong to a location family. In other words, for some $\delta$, when expressed as $F_{X} (z) = F_{Y} (z + \delta)$, the following hypothesis test concerning $\delta$ is called the Mann-Whitney $U$ test.

$H_{0} : \delta = 0$, the distributions of the two populations are equal.
$H_{1} : \delta \ne 0$, the distributions of the two populations are not equal.

Test statistics

For this, define the following test statistics.

Let $T_{1}$ be the sum of the ranks of $\left\{ X_{i} \right\}_{i=1}^{n_{1}}$ among all samples, and $T_{2}$ be the sum of the ranks of $\left\{ Y_{j} \right\}_{j=1}^{n_{2}}$ among all samples. $T_{1}$ and $T_{2}$ are called the Wilcoxon statistics.
Let $U_{1}$ be the number of cases where $X_{i}$ is greater than $Y_{j}$, and $U_{2}$ be the number of cases where $Y_{j}$ is greater than $X_{i}$. $U_{1}$ and $U_{2}$ are called the Mann-Whitney statistics. Note that cases where the random variable values are exactly equal ($X_{i} = Y_{j}$) are excluded, since the probability of this is $P \left( X_{i} = Y_{j} \right) = 0$ and thus almost surely does not occur.

The test statistic $U_{1}$ is compared with the largest integer $u$ satisfying the following for significance level $\alpha$, and if $U_{1} \le u$ then $H_{0}$ is rejected. $$ P \left( U_{1} \le u \right) \le \alpha / 2 $$ The lower bound of the rejection region $u$ that appears here will be explained separately together with $U_{1}$.

Summary

[1] Sum of $U$ and sum of $T$: The sum of the two Mann-Whitney statistics $U_{1}$ and $U_{2}$ is as follows. $$ U_{1} + U_{2} = n_{1} n_{2} $$ The sum of the two Wilcoxon statistics $T_{1}$ and $T_{2}$ is as follows. $$ T_{1} + T_{2} = {{\left( n_{1} + n_{2} \right) \left( n_{1} + n_{2} + 1 \right)} \over 2} $$
[2] Relationship between $U$ and $T$: The relationship between $U_{1}$ and $T_{1}$ for the first population is as follows. $$ U_{1} = T_{1} - {{n_{1} \left( n_{1} + 1 \right)} \over 2} $$ The relationship between $U_{2}$ and $T_{2}$ for the second population is as follows. $$ U_{2} = T_{2} - {{n_{2} \left( n_{2} + 1 \right)} \over 2} $$

Explanation

The Mann-Whitney $U$ test is a nonparametric test, also widely known as the Wilcoxon rank-sum test, used when two populations are given and you do not know their distributions but want to know whether those distributions are the same or different.

Note that, as mentioned in the assumptions of the hypothesis test, this “difference” refers not to the functional form itself but to location; for example, in the case of a normal distribution, it focuses only on differences in the population mean and does not cover other aspects such as the population variance.

Meanwhile, as seen in Proposition [2], $U$ and $T$ differ only by a constant and are effectively the same as statistics, although they have two different names for historical or practical reasons.

Wilcoxon rank-sum statistic

In practice, $U_{1}$ is calculated via $T_{1}$. Simply counting all cases where $X_{i} > Y_{j}$ means you must iterate through $n_{1}$ datasets and $n_{2}$ datasets once each for comparison, so it has time complexity of $O \left( n_{1} n_{2} \right)$. Even if the sample sizes are not large and time is not a concern, this is more complicated than simply summing ranks, and regardless of theory, $T_{1}$ ensures consistent results in cases where exceptional ties like $X_{i} = Y_{j}$ occur.

Mann-Whitney $U$ statistic

However, apart from how $U_{1}$ is actually computed, the hypothesis test itself is based on the definition of $U_{1}$. If samples are drawn under the assumption that the null hypothesis is true, then the distribution of $U_{1}$ must ultimately depend on frequencies, the most primitive concept of probability.

When introducing the test statistic, the lower bound of the rejection region $u$ was said to be the largest integer satisfying $P \left( U_{1} \le u \right) \le \alpha / 2$, but in fact the right-hand side is computed for some function $h$ as follows. $$ P \left( U_{1} \le u \right) \le \alpha / 2 \approx {\frac{ h \left( n_{1} , n_{2} , u \right) n_{1}! n_{2}! }{ \left( n_{1} + n_{2} \right)! }} $$ Here $h \left( n_{1} , n_{2} , u \right)$ is a recursive function representing the frequency that $U_{1} < u$ occurs among $\left( n_{1} + n_{2} \right)$ samples under the assumption that the null hypothesis $H_{0}$ is true, and Mann and Whitney proved the following². $$ \begin{align*} h \left( n_{1} , n_{2} , u \right) =& h \left( n_{1} - 1 , n_{2} , u - n_{2} \right) + h \left( n_{1} , n_{2} - 1 , u \right) \\ h \left( n_{1} , n_{2} , u \right) =& h \left( n_{2} , n_{1} , u \right) \\ h \left( n_{1} , n_{2} , 0 \right) =& h \left( n_{1} , 0 , u \right) = 1 \\ h \left( n_{1} , n_{2} , - \Delta \right) =& 0 \qquad , \Delta > 0 \end{align*} $$ Accordingly, given $n_{1}$ and $n_{2}$, the value of $u$ changes whenever the significance level $\alpha$ changes, so such tables are constructed and used for hypothesis testing.

alt text

Many textbooks omit why such tables arise and how the relationship between $U_{1}$ and $T_{1}$ is established, but by carefully inspecting the proof one can easily understand the material beyond the textbook.

Proof

Let $I (X > Y)$ be the indicator function that equals 1 if $X > Y$ and 0 otherwise. $R(Z)$ is the rank of $Z$ in the combined sample; $R_{1} \left( X_{i} \right)$ is the rank of $X_{i}$ within the first sample, and $R_{2} \left( Y_{j} \right)$ is the rank of $Y_{j}$ within the second sample.

[1]

$$ \begin{align*} U_{1} =& \sum_{i=1}^{n_{1}} \sum_{j=1}^{n_{2}} I \left( X_{i} > Y_{j} \right) \\ U_{2} =& \sum_{i=1}^{n_{1}} \sum_{j=1}^{n_{2}} I \left( X_{i} < Y_{j} \right) \end{align*} $$ $U_{1}$ and $U_{2}$ can be represented as above by comparing $X_{i}$ and $Y_{j}$ for all $i = 1, \cdots , n_{1}$ and $j = 1, \cdots , n_{2}$, and their sum is as follows. $$ U_{1} + U_{2} = \sum_{i=1}^{n_{1}} \sum_{j=1}^{n_{2}} I \left( X_{i} \ne Y_{j} \right) = n_{1} n_{2} $$

Arithmetic series sum formula: For an arithmetic sequence $a_{n} = a+(n-1)d$ with first term $a$ and common difference $d$, $$ \sum_{k=1}^{n} a_{k}= {{n \left\{ 2a + (n-1)d \right\} } \over {2}} $$

Since the sample size is $\left( n_{1} + n_{2} \right)$, the total sum of ranks is as follows. $$ T_{1} + T_{2} = {{\left( n_{1} + n_{2} \right) \left( n_{1} + n_{2} + 1 \right)} \over 2} $$

■

[2]

The rank $R \left( X_{i} \right)$ of $X_{i}$ in the combined sample must equal the rank $R_{1} \left( X_{i} \right)$ in the first population plus the number of $Y_{j}$ that are less than $X_{i}$, which can be expressed as follows. $$ R \left( X_{i} \right) = R_{1} \left( X_{i} \right) + \sum_{j=1}^{n_{2}} I \left( X_{i} > Y_{j} \right) $$ Since the Wilcoxon rank-sum statistic is $T_{1} = \sum_{i=1}^{n_{1}} R \left( X_{i} \right)$, summing $R \left( X_{i} \right)$ over all $i = 1, \cdots , n_{1}$ gives: $$ \begin{align*} \sum_{i=1}^{n_{1}} R \left( X_{i} \right) =& \sum_{i=1}^{n_{1}} R_{1} \left( X_{i} \right) + \sum_{i=1}^{n_{1}} \sum_{j=1}^{n_{2}} I \left( X_{i} > Y_{j} \right) \\ \sum_{i=1}^{n_{1}} R \left( X_{i} \right) =& T_{1} \\ \sum_{i=1}^{n_{1}} R_{1} \left( X_{i} \right) =& \sum_{i=1}^{n_{1}} i = {{n_{1} \left( n_{1} + 1 \right)} \over 2} \\ \sum_{i=1}^{n_{1}} \sum_{j=1}^{n_{2}} I \left( X_{i} > Y_{j} \right) =& U_{1} \end{align*} $$ Rewriting this yields: $$ U_{1} = T_{1} - {{n_{1} \left( n_{1} + 1 \right)} \over 2} $$ By the same method one can prove the relationship between $U_{2}$ and $T_{2}$.

■

Milton, R. C. (1964). An Extended Table of Critical Values for the Mann-Whitney (Wilcoxon) Two-Sample Statistic. Journal of the American Statistical Association, 59(307), 925–934. https://doi.org/10.1080/01621459.1964.10480740 ↩︎
Mann, H. B., & Whitney, D. R. (1947). On a test of whether one of two random variables is stochastically larger than the other. The annals of mathematical statistics, 50-60. https://psycnet.apa.org/doi/10.1214/aoms/1177730491 ↩︎