Point-Biserial Correlation Coefficient
Definition
Suppose the data take the form of an ordered pair $\left\{ \left( X , Y \right) \right\}_{k=1}^{n}$. $X$ is a continuous random variable, and $Y$ is a discrete random variable taking the values $y_{0}$ or $y_{1}$. The statistic defined as follows is called the point-biserial correlation coefficient. $$ \rho = {\frac{ m_{1} - m_{0} }{ s }} \sqrt{\frac{ n_{0} n_{1} }{ n^{2} }} $$ Here, $n_{0}, n_{1}$ are the sample sizes for the cases $Y = y_{0}$ and $Y = y_{1}$ respectively, and $m_{0}, m_{1}$ are the means of $X$ in the cases $Y = y_{0}$ and $Y = y_{1}$ respectively. $n$ is the total sample size, and $\sigma$ is the overall standard deviation of $X$.
Explanation 1
The point-biserial correlation coefficient is a correlation coefficient usable when one of two variables is continuous and the other is discrete, in particular when it is binary and can be partitioned as $y_{0}$ or $y_{1}$.
Let $p_{0}$ and $p_{1}$ denote, respectively, the probabilities that $Y$ takes each of the two values, defined as follows. $$ \begin{align*} p_{0} =& P \left( Y = y_{0} \right) \\ p_{1} =& P \left( Y = y_{1} \right) \\ 1 =& p_{0} + p_{1} \end{align*} $$ Then the probability density function $f$ of $X$ is given in terms of the pdf $f_{0}$ for $X$ when $Y$ = $Y = y_{0}$ and the pdf $f_{1}$ for $X$ when $Y$ = $Y = y_{1}$ as follows. $$ f (x) = p_{0} f_{0}(x) + p_{1} f_{1}(x) $$ By linearity of integration, the expectation of $X$ is $$ E \left( X \right) = p_{0} m_{0} + p_{1} m_{1} $$
Now, for the bivariate variable $\left( X, Y \right)$ define $\sigma_{ij}$ as follows. $$ \sigma_{ij} = E \left[ \left( X - E \left( X \right) \right)^{i} \left( Y - E(Y) \right)^{j} \right] $$
Pearson correlation coefficient: $$ \rho = { {\operatorname{Cov} (X,Y)} \over {\sigma_X \sigma_Y} } $$
Here $\sigma_{00}, \sigma_{01}, \sigma_{10}$ is of little significance, and in terms of the Pearson correlation coefficient it corresponds to $\sigma_{11} = \Cov \left( X, Y \right)$ and $\sigma_{X} = \sqrt{\sigma_{20}}$, $\sigma_{Y} = \sqrt{\sigma_{02}}$, so it can be written as follows. $$ \rho = { \sigma_{11} \over \sqrt{\sigma_{20}} \sqrt{\sigma_{02}} } $$
Without loss of generality, assume $y_{0} < y_{1}$; these can be transformed respectively to $0$ and $1$, and writing $\sigma_{ij}$ in integral form gives: $$ \begin{align*} \sigma_{ij} =& \sum_{y=0,1} \left( y - p_{1} \right)^{j} E \left[ \left( X - m \right)^{i} \mid Y = y \right] P \left( Y = y \right) \\ =& p_{1} \left( 1 - p_{1} \right)^{j} E \left[ \left( X - m \right)^{i} \mid Y = 1 \right] \\ & + p_{0} \left( 0 - p_{1} \right)^{j} E \left[ \left( X - m \right)^{i} \mid Y = 0 \right] \\ =& p_{1} p_{0}^{j} \int_{\mathbb{R}} \left( x - m \right)^{i} f_{1} (x) dx \\ & + p_{0} \left( - p_{1} \right)^{j} \int_{\mathbb{R}} \left( x - m \right)^{i} f_{0} (x) dx \end{align*} $$
Here $m = E(X) = p_{1} m_{1} + p_{0} m_{0}$ and $\Delta = m_{1} - m_{0}$, and the different expressions for $(x - m)$ are obtained as follows.
When $Y = 1$ $$ \begin{align*} & x - m \\ =& x - m_{1} + m_{1} - m \\ =& x - m_{1} + m_{1} - p_{1} m_{1} - p_{0} m_{0} \\ =& x - m_{1} + \left( 1 - p_{1} \right) m_{1} - p_{0} m_{0} \\ =& x - m_{1} + p_{0} m_{1} - p_{0} m_{0} \\ =& x - m_{1} + p_{0} \Delta \end{align*} $$
When $Y = 0$ $$ \begin{align*} & x - m \\ =& x - m_{0} + m_{0} - m \\ =& x - m_{0} + m_{0} - p_{1} m_{1} - p_{0} m_{0} \\ =& x - m_{0} + \left( 1 - p_{0} \right) m_{0} - p_{1} m_{1} \\ =& x - m_{0} + p_{1} m_{0} - p_{1} m_{1} \\ =& x - m_{0} - p_{1} \Delta \end{align*} $$
In summary, we obtain: $$ \begin{align*} \sigma_{ij} =& p_{1} p_{0}^{j} \int_{\mathbb{R}} \left( x - m_{1} + p_{0} \Delta \right)^{i} f_{1} (x) dx \\ & + p_{0} \left( - p_{1} \right)^{j} \int_{\mathbb{R}} \left( x - m_{0} - p_{1} \Delta \right)^{i} f_{0} (x) dx \end{align*} $$
Now, to complete $\rho$, compute $\sigma_{20}$, $\sigma_{02}$, and $\sigma_{11}$. $$ \sigma_{20} = E \left[ \left( X - E \left( X \right) \right)^{2} \right] = s^{2} $$ $\sigma_{20}$ is the overall variance of $X$. $$ \begin{align*} \sigma_{02} =& p_{1} p_{0}^{2} + p_{0} p_{1}^{2} \\ =& p_{1} p_{0} \left( p_{0} + p_{1} \right) \\ =& p_{1} p_{0} \end{align*} $$ $\sigma_{02}$ and, in a similar way, $\sigma_{11}$ can be obtained. $$ \begin{align*} \sigma_{11} =& p_{1} p_{0}^{1} \int_{\mathbb{R}} p_{0} \Delta f_{1} (x) dx + p_{0} \left( - p_{1} \right)^{1} \int_{\mathbb{R}} \left( - p_{1} \Delta \right)^{1} f_{0} (x) dx \\ =& p_{1} p_{0} p_{0} \Delta + p_{0} \left( - p_{1} \right) \left( - p_{1} \Delta \right) \\ =& p_{1} p_{0} \end{align*} $$
Finally, substituting into $\rho$ yields: $$ \begin{align*} \rho =& { \sigma_{11} \over \sqrt{\sigma_{20}} \sqrt{\sigma_{02}} } \\ =& {\frac{ p_{1} p_{0} \Delta }{ \sqrt{s^{2}} \sqrt{p_{1} p_{0}} }} \\ =& {\frac{ m_{1} - m_{0} }{ s }} \sqrt{p_{0} p_{1}} \\ =& {\frac{ m_{1} - m_{0} }{ s }} \sqrt{\frac{ n_{0} n_{1} }{ n^{2} }} \end{align*} $$
From this derivation we can confirm that the point-biserial correlation coefficient indeed functions as a correlation coefficient. The sign being negative or positive is not particularly important, but a value close to $0$ indicates that, from the outset, $f_{0} \approx f_{1}$ — in other words, that the two distributions are not very different.
Gupta, S.D. Point biserial correlation coefficient and its generalization. Psychometrika 25, 393–408 (1960). https://doi.org/10.1007/BF02289756 ↩︎