logo

Z-Score and Standardization 📂Data Science

Z-Score and Standardization

Definition 1

  1. For a random variable $X$ that follows a distribution with a population mean of $\mu$ and a population standard deviation of $\sigma$, the following transformation is called standardization: $$ Z = {{ X - \mu } \over { \sigma }} $$
  2. Let’s assume we’re given quantitative data $\left\{ x \right\}$. For the sample mean $\overline{x}$ and the sample standard deviation $s$, the following statistic $z$ is known as the sample z-score: $$ z := {{ x - \overline{x} } \over { s }} $$

Explanation

A $z$-score is often used as an indicator to view the relative standing of data, regardless of the actual numeric value of the data, by focusing exclusively on the distribution itself. This process, which involves subtracting the mean from each data point and dividing by the standard deviation, is called standardization. It’s crucial to not confuse this with normalization or regularization, terms frequently used in data science. The term standardization arises from the statistical properties of z-scores; for example, when the population follows the famous normal distribution $N \left( \mu , \sigma^{2} \right)$, $$ Z = {{ X - \mu } \over { \sigma }} \sim N \left( 0, 1^{2} \right) $$ then the z-score will follow a Standard Normal Distribution. Thus, by setting the mean to $0$ and the variance to $1$, the actual numbers of the data become irrelevant and focus can be solely on the distribution.

See Also


  1. Mendenhall. (2012). Introduction to Probability and Statistics (13th Edition): p75. ↩︎