logo

Bayesian Paradigm 📂Mathematical Statistics

Bayesian Paradigm

Buildup

Statistics can be defined as the study of methods to understand parameters. Just like measuring a physical quantity using formulas or laws, it would be ideal if parameters can be precisely estimated. However, due to the impractical nature of such precision, assumptions and samples are used to find ‘what is expected to be the parameter’. If interested in the average height of men in our country $X$, one might assume $X \sim N ( \theta , \sigma^2 )$ and find $\displaystyle \hat{\theta} = \overline{x} = {{1} \over {n}} \sum_{k = 1}^{ n } x_{k}$ to determine it as $\theta = \hat{\theta}$. This method of estimation is based on a rather simple and easy concept.

Frequentist

The samples we have are randomly obtained from the population, and as long as the method of obtaining these samples is fair, no special distinction is made among samples of the same size. Of course, they are practically different samples, but since the problem of whether a sample represents the population well entirely depends on luck, the only certainty is that larger samples are better than smaller ones. Naturally, it is not thought that the observations we have not obtained would be significantly different from the current sample. If they were, statistical analysis would be meaningless. This inference starts with the expectation that the sample and the population are not significantly different, and the more samples there are, the more this expectation approaches certainty. This kind of inference, which considers not only the data obtained so far but also the data that will be obtained in the future or has not been obtained yet, is called Frequentist Inference. From the view that accuracy increases with the size (Frequency) of the sample, this naming can be considered justified.

Bayesian

On the other hand, Bayesian Inference considers only the samples obtained so far. Through the Bayes’ theorem, the prior distribution merely changes to the posterior distribution. It is assumed that parameters have a distribution, but it is not strictly believed to be accurate. Before the analysis begins, it is okay to hypothesize any prior distribution based on expert opinions or subjective experiences. No concern is given even if the distribution changes upon obtaining new samples. The only certainty is that the posterior distribution after the analysis is the result obtained by reflecting the sample on the prior distribution.

What is the Bayesian Paradigm? 1

The components of the Bayesian Paradigm are as follows:

  • (1): Determining the prior distribution of parameters
  • (2): Calculation through Bayes’ theorem
  • (3): Estimation of parameters using the posterior distribution

If the prior distribution of the parameter $\theta$ is $\pi (\theta)$ and the observation is $y$, then according to the Bayes’ theorem, $$ p ( \theta | y ) = {{ p(y | \theta ) \pi (\theta ) } \over { p(y) }} $$ This probability distribution of the parameter reflected by the data $p ( \theta | y )$ is called the posterior distribution.

Example

Let’s consider a simple example. Suppose we have a friend named Adam who often arrives late for appointments.

If Adam’s lateness for appointments follows a normal distribution with an average of 10 minutes and a standard deviation of 5 minutes $N ( 10 , 5^2 )$, both the Frequentist and Bayesian would say the following when Adam is late for an appointment:

  • Frequentist: “Adam is inherently someone who arrives 10 minutes late.”
  • Bayesian: “Looking at it, Adam tends to be about 10 minutes late.”

The Frequentist infers that Adam is on average 10 minutes late, and since that’s Adam’s nature, he has been and will continue to be about 10 minutes late for appointments. The Bayesian supposes that, based on what has been observed so far, the probability that Adam is 10 minutes late is the highest, thus expecting Adam to be about 10 minutes late this time as well.

At first glance, their statements seem similar. That’s because, despite their different perspectives, both Frequentists and Bayesians are making statistical inferences. The difference emerges when, for example, Adam arrives on time for the next appointment:

  • Frequentist: “It’s rare for Adam to be on time, with a probability of only about 3%.”
  • Bayesian: “Oh, Adam can be early sometimes. Will he be early next time too?”

And if asked whether Adam could arrive on time for the next appointment, their answers would definitely differ:

  • Frequentist: “It’s hard to say Adam has changed. Him arriving early this time was definitely a possibility.”
  • Bayesian: “While the probability of Adam being late is still high, it’s also true that the probability of him being on time has increased.”

While the Frequentist only checks whether the newly obtained observation matches the conclusion already drawn, the Bayesian immediately updates the existing conclusion, thereby obtaining a new posterior distribution. Thus, the ease of Sequential Analysis is not only a distinguishing feature from the Frequentist but also an inherent advantage of Bayesian inference.


  1. 김달호. (2013). R과 WinBUGS를 이용한 베이지안 통계학: p89. ↩︎