Z-Score and Standardization
Definition 1
- For a random variable $X$ that follows a distribution with a population mean of $\mu$ and a population standard deviation of $\sigma$, the following transformation is called standardization: $$ Z = {{ X - \mu } \over { \sigma }} $$
- Let’s assume we’re given quantitative data $\left\{ x \right\}$. For the sample mean $\overline{x}$ and the sample standard deviation $s$, the following statistic $z$ is known as the sample z-score: $$ z := {{ x - \overline{x} } \over { s }} $$
Explanation
A $z$-score is often used as an indicator to view the relative standing of data, regardless of the actual numeric value of the data, by focusing exclusively on the distribution itself. This process, which involves subtracting the mean from each data point and dividing by the standard deviation, is called standardization. It’s crucial to not confuse this with normalization or regularization, terms frequently used in data science. The term standardization arises from the statistical properties of z-scores; for example, when the population follows the famous normal distribution $N \left( \mu , \sigma^{2} \right)$, $$ Z = {{ X - \mu } \over { \sigma }} \sim N \left( 0, 1^{2} \right) $$ then the z-score will follow a Standard Normal Distribution. Thus, by setting the mean to $0$ and the variance to $1$, the actual numbers of the data become irrelevant and focus can be solely on the distribution.
See Also
- Standardization: Typically refers to the process in statistics of adjusting the mean to $0$ and the variance to $1$.
- Normalization: Typically refers to the process of scaling data to a specific range.
- Regularization: Typically refers to processes in machine learning to prevent overfitting.
Mendenhall. (2012). Introduction to Probability and Statistics (13th Edition): p75. ↩︎