logo

Percentiles and Outliers 📂Data Science

Percentiles and Outliers

Definitions 1

Given quantitative data,

  1. A value that is greater than p%p \% but less than (100p)%(100-p) \% is called the pp-percentile.
  2. The 100100-percentile and 00-percentile (the largest and smallest values in the data) are referred to as the maximum, minimum values, respectively.
    • The difference between the maximum and minimum values is called the data’s range RR.
  3. The 2525-percentile is called the first quartile Q1Q_{1}, and the 7575-percentile is called the third quartile Q3Q_{3}.
    • (Q3Q1)\left( Q_{3} - Q_{1} \right) is called the interquartile range IQR\text{IQR}.
  4. The minimum, first quartile, median, third quartile, and maximum are the five statistics called the Five-Number Summary. minQ1medianQ3max \min \qquad Q_{1} \qquad \text{median} \qquad Q_{3} \qquad \max
  5. Empirically, data that falls outside the following range is also referred to as an outlier. [Q11.5IQR,Q3+1.5IQR] \left[ Q_{1} - 1.5 \text{IQR} , Q_{3} + 1.5 \text{IQR} \right] The lower limit is called the lower fence, and the upper limit is called the upper fence.

Explanation

Second Quartile

The 5050-percentile, aka the second quartile, is essentially the median, so there is no need to define it separately when talking about the five-number summary. These summaries help to make an educated guess about the distribution of the data with a sufficient amount of data, and they should be the first thing to check regardless of the data being observed.

Outlier

An outlier is literally something that lies outside, meaning it falls outside the common range of data. Despite Q11.5IQRQ_{1} - 1.5 \text{IQR} being a rather small value and Q3+1.5IQRQ_{3} + 1.5 \text{IQR} being a rather large value, they are called outliers because they fall outside the expected range. Note that this is not a mathematically rigorous definition, as the terms ’empirical’ and ‘common data’ suggest.


  1. Mendenhall. (2012). Introduction to Probability and Statistics (13th Edition): p76, 60, 78~80. ↩︎