logo

Definition and Etymology of Data 📂Data Science

Definition and Etymology of Data

Overview

In modern society, there is no intellectual who knows absolutely nothing about data. Even non-specialists with no interest can easily think of synonyms such as ‘knowledge about something’ or ‘resources for communication’ like data or information, to the extent that the concept of data has become universal and popularized. The following descriptions are merely attempts to define data a little more strictly from the perspective of data science.

Definition 1

  1. A variable refers to a characteristic that changes according to perspective on time, individual, or object.
  2. The individual or object whose variable is measured is called an experimental unit, and the actual measurement taken from an experimental unit is called a measurement.
  3. The set of measurements is called data.

Explanation

Etymology of Data 2

The English word data means ‘facts that are given or acknowledged’, derived from the Latin verb “Do-” meaning ’to give’, specifically from its past participle Datum meaning ‘given’. Data is the plural form of Datum.

Ironically, the origin of the word data points more accurately to the essence of data than any attempt to define it through characteristics or experiments as mentioned above. In the realm of data science, data is something that has been given or will be given to us, fundamentally different in quality from objects of new discoveries or creations.

In other words, data is inevitably, that is to say, given. To use a crude analogy, imagine inventing a durable light bulb. If you developed light bulb B from light bulb A with an average lifespan of 100 hours, you could measure the lifespan of each light bulb B (Object) by keeping the bulbs on until they burn out. This collection of measurements is the lifespan data of light bulb B, and those values are given solely based on light bulb B, not somehow derived from changing the data of light bulb A itself.

Variables and Experiments?

The term variable, meaning a changing number in Chinese characters, easily brings to mind numbers, and though numbers often appear in simplified explanations of data, the contemporary societal understanding of unstructured data doesn’t limit data to just numbers or categories. Data types include photos, documents, signals, stock prices, videos, network structures, and everything humans can perceive. Similarly, measurement, despite looking like a number because of the characters used, doesn’t necessarily need to be considered as a numeric value. It’s recommended to use the English term Measurement when possible.

Moreover, experiment in “experimental unit” does not exclusively refer to activities conducted by scientists in lab coats. Just as an event occurring is called a ‘random experiment’ in basic probability theory, it’s sufficient to understand it simply as a term for expression.

Population and Sample

  1. The set of all measurements an investigator is interested in is called a population.
  2. A subset of the population is called a sample.

…From such definitions, it can be inferred that a lot of data realistically is a sample of a population. Meanwhile, the English expression of population, closely related to statistics, also means population in the sense of demographics, so be mindful of this.

The concept of statistics is fundamentally “wanting to know about a population but being unable to investigate the entire population, so characterizing the population through a sample”, that is, inferring the essence of the subject of interest through data.

See Also

Definition of a Sample in Mathematical Statistics

In mathematical statistics that undergraduate students encounter in their 2nd or 3rd year, a mathematical definition is given for the sample described in this post, and another expression for data, called a realization, is introduced.


  1. Mendenhall. (2012). Introduction to Probability and Statistics (13th Edition): p8. ↩︎

  2. https://www.etymonline.com/word/data ↩︎