Proof of Bayes' Theorem and Prior, Posterior Distributions 📂Mathematical Statistics

Proof of Bayes' Theorem and Prior, Posterior Distributions

Theorem ¹

Sample Space $S$ and Event $A$, Probability $P$ If $\left\{ S_1, S_2, \cdots ,S_n \right\}$ is a partition of $S$, then the following holds. $$ P(S_k|A)=\frac { P(S_k)P(A|S_k) }{ \sum _{ k=1 }^{ n }{ P(S_k)P(A|S_k) } } $$

Definition

The right-hand side of Bayes’ theorem, $P \left( S_{k} \right)$, is called the Prior Probability, and the left-hand side, $P \left( S_{k} | A \right)$, is called the Posterior Probability. The probability distributions formed by these probabilities are called Prior Distribution and Posterior Distribution, respectively.

Explanation

Also called Bayes’ Rule, this theorem can be proven quite easily using only two laws, but its applications are extensive. The so-called Bayesian Paradigm divides the field of statistics into two schools of thought, emphasizing its importance cannot be overstated.

What we want to know is the left-hand side of the above equation. What we already know are the probabilities of event $A$ and the partitions $S_k$ of sample space $S$ occurring, and the probability of $A$ occurring when each of these partitions occurs. In short, we start with everything we know about $S_k$ and its impact on $A$. Bayes’ theorem reverses this, allowing us to understand the impact of $A$ on each of these partitions. If this sounds complicated, it’s enough to focus on wanting to find out the left-hand side.

Proof

By the Law of Total Probability and the Multiplication Rule of Probability, we obtain the following equation. $$ \begin{align*} P(A)=&P(A\cap S_1)+P(A\cap S_2)+…+P(A\cap S_n) \\ =&P(S_1)P(A|S_1)+P(S_2)P(A|S_2)+…+P(S_n)P(A|S_n) \\ =& \sum _{ k=1 }^{ n }{ P(S_k)P(A|S_k) } \end{align*} $$ Taking the reciprocal of both sides gives us $$ \begin{align*} & \frac { 1 }{ \sum _{ k=1 }^{ n }{ P(S_k)P(A|S_k) } }=\frac { 1 }{ P(A) } \\ \implies& \frac { P(A\cap S_k) }{ \sum _{ k=1 }^{ n }{ P(S_k)P(A|S_k) } }=\frac { P(A\cap S_k) }{ P(A) } \\ \implies& \frac { P(S_k)P(A|S_k) }{ \sum _{ k=1 }^{ n }{ P(S_k)P(A|S_k) } }=P(S_k|A) \end{align*} $$

■

Hogg et al. (2013). Introduction to Mathematical Statistcs(7th Edition): p23. ↩︎