Sampling, With Replacement and Without Replacement
Definition
- The act of drawing a sample is called sampling or, more simply, extraction.
- Putting an already drawn sample back into the population during the sampling process is called replacement.
- Sampling that uses replacement is called sampling with replacement.
- Sampling that does not use replacement is called sampling without replacement.
Explanation
The sample in mathematical statistics is defined as a random variable, and the actual value obtained from it is distinguished as a realization. However, when such theoretical rigor is not required, across statistics and data science in general the act of obtaining data itself may also be referred to as sampling.
Sampling without replacement is much more complicated
Many theoretical results in statistics assume samples are obtained by sampling with replacement, because that implies the resulting data were obtained as iid. To someone unfamiliar with this concept, returning an already drawn unit to the population may seem to contaminate or manipulate the data, but mathematically speaking sampling with replacement is the scheme that is free of order and interdependence and is closest to truly random sampling.
In sampling without replacement, once an observation is drawn it cannot be drawn again, and if two events are mutually exclusive they are dependent — in other words, the convenient assumption of independence cannot be made. In practice, for areas that necessarily involve sampling without replacement, such as nonparametric statistics, the interdependencies among random variables make the theoretical development much more complicated.