logo

Qualitative Variable and Quantitative Variable 📂Data Science

Qualitative Variable and Quantitative Variable

Definition 1

Qualitative Variables

Variables that measure qualitative characteristics are called qualitative variables.

  • The food is… delicious / so-so / tasteless
  • The color is… red / blue / yellow
  • The major is… mathematics / statistics / physics

Such qualitative variables are often referred to as categorical data.

Quantitative Variables

Variables that measure quantitative characteristics are called quantitative variables.

  • Age is… 20 years old / 31 years old / 11 years old
  • Height is… 170.0 cm / 170.5 cm / 162.1 cm

Quantitative variables that take specific values, like age or vision, are called discrete variables, while those that take continuous values, like height or weight, are called continuous variables.

Explanation

The definitions may seem odd, but ‘qualitative’ and ‘quantitative’ are not terms we’re born knowing; we learn their academic meanings and apply them to everyday language. For instance, when assessing the quality of an item, we might say it has ‘high quality’ without quantifying it with a specific number like ‘1432 units of good’ or ‘17% better’.

  • Qualitative refers to characteristics that might have an order (good-bad-terrible) but are usually not numerically expressed. It’s fine if they’re categorized without order (German-French-Japanese).
  • Quantitative refers to measurable amounts. The distinction between discrete and continuous variables here can be a bit tricky.

Discrete Values?

The term discrete values describes values that are distinct and separated, like natural numbers or marked scales. It’s an expression not commonly found in textbooks, and even I acknowledge it’s not ideal. A more fitting description might be:

Variables that take countable values are called discrete variables. They are assumed to take only finite or countable values.

However, such a mathematically precise description might not be immediately helpful in understanding what discrete variables are.

Countable in this context means something can be counted in a way familiar to Indo-European languages, like English, French, or Spanish, where we can say ‘one, two, …’ and count ‘how many’. In English, nouns that can be counted this way are called countable nouns. Mathematically, it means there’s a one-to-one correspondence with the set of natural numbers.

Examples might help clarify. These are typically discrete variables:

  • The number of pigs on a farm
  • The number of traffic accident fatalities per year
  • The number of pages in a textbook
  • The age of infants… ‘24-month-old boy’, ‘1 year 2 months girl’, etc.
  • The number of 1L water bottles

Some examples might be ambiguous:

  • The amount of water in 3 1L bottles… If we’re talking about the amount of water, it’s continuous.
  • Vision… Commonly measured in increments of 0.1, but if only divided into groups like 0.5, 1.0, 1.5, it could be considered discrete, and depending on the data, even qualitative.

Classification and Regression Problems

In data science, problems are often classified as classification or regression problems based on whether the dependent variable is qualitative or quantitative.

Precautions

Beginners working with data can make mistakes not because they don’t understand qualitative and quantitative variables, but because they’re not yet familiar with them. These are common mistakes, often encountered when studying complex topics like regression analysis, and there’s almost no opportunity to artificially develop intuition for these pitfalls. The following post might not explain everything but can give an idea of what to watch out for:

Encoding

It’s common to see encoding like men as $0$ and women as $1$, but just because there are numbers, it doesn’t make it a quantitative (discrete) variable.

Such encoding can also be used for privacy. Imagine medical data, which can be highly personal and specific enough to identify individuals just by the data. In such cases, data might be published with sensitive information simply represented by numbers, like psychiatric history or abortion status for women.

Ratings

Similarly to encoding, ratings might seem quantitative but are still qualitative. For example, if high school graduates are rated as $0$, bachelor’s degree holders as $1$, and PhDs as $2$, it might seem quantitative but is qualitative. Terms like ’low-educated’ or ‘high-educated’ are societal constructs and don’t necessarily imply a numerical order in the data.

Hex Codes

Distinguishing between red and blue is qualitative, but what about different shades like pink, fuchsia, or crimson? If it’s about lipstick, it might still be qualitative, but for fabric colors with thousands of shades, they might be represented by RGB hex codes. While it’s rare to encounter such data, it’s important to remember that what intuitively seems qualitative can be expressed quantitatively.

Gender

Data might categorize gender, and whether or not one agrees with the political correctness behind it, if the data presents it that way, it must be accepted as is.

  • True story. For instance, there have been cases where gender was encoded with numbers, and someone unfamiliar with gender issues was confused by values like 2 or 3 in a dataset.

The point is not to become an expert on gender issues but to avoid relying solely on intuition when analyzing data from unfamiliar domains.

Why Do We Need to Know This?

These concepts are straightforward, yet critical to understand and differentiate accurately. We here includes researchers applying statistics, statistics majors, and anyone potentially working in data science, even with a different background.

While we study and gain experience, our peers may be engaging in society in other ways. Unfortunately, they might not be as data-savvy, possibly ignoring the precautions mentioned here and making these common mistakes. Consider the general public, who might not question such errors, and even your boss might not be an exception.

It’s our responsibility to prevent these misunderstandings.


  1. Mendenhall. (2012). Introduction to Probability and Statistics (13th Edition): p10. ↩︎