logo

Definition of Statistics 📂Data Science

Definition of Statistics

Definition 1

Statistics is a collection of methods for collecting, analyzing, representing, interpreting, and making decisions about data.

  1. Descriptive statistics consist of methods that use charts or graphs and summary measures to organize, present, and describe data.
  2. Inferential statistics consist of methods for making decisions or predictions about a population from a sample.

Commentary

Below is a story beyond the textbook.

I personally would like to define statistics as “a field of applied mathematics that actively uses the theory of probability.”

  • Although this may seem like only a characteristic of statistics rather than a definition, the theory supporting statistical learning—especially inferential statistics—is indeed mathematical statistics, and the discussions relevant to what can be called statistical inference are largely based on probabilistic arguments.
  • While not directly related to statistics, the physical theory that introduces probability theory to study the micro-world is also called statistical mechanics.
  • Moreover, since the 2010s, machine learning, especially deep learning, has developed significantly, with technological levels for unstructured data rapidly increasing. They are producing very good results in areas that classic statistics has struggled with, such as natural language processing, computer vision, and reinforcement learning. Unfortunately, it’s almost difficult to view such fields as part of statistics.

For these reasons, the definition of statistics mentioned in the definition might more accurately be called the definition of Data Science. Classical machine learning, even before the deep learning trend, was part of statistics as a nonparametric method, but looking back, it’s time to acknowledge that statistics is not the only data science and firmly establish its identity.

However, there’s no need to be sad. Innately, statistics has a solid theoretical foundation, unlike approaches like deep learning, and there is growing disenchantment and exhaustion with performance-focused black-box techniques. Although it may seem somewhat diminished compared to its heyday as the entirety of data science, statistics still remains the largest field within applied mathematics.


  1. Department of Statistics, Kyungpook National University. (2008). Statistics with Excel: p2~3. ↩︎