logo

Proof of Bayes' Theorem and Prior, Posterior Distributions 📂Mathematical Statistics

Proof of Bayes' Theorem and Prior, Posterior Distributions

Theorem 1

Sample Space SS and Event AA, Probability PP If {S1,S2,,Sn}\left\{ S_1, S_2, \cdots ,S_n \right\} is a partition of SS, then the following holds. P(SkA)=P(Sk)P(ASk)k=1nP(Sk)P(ASk) P(S_k|A)=\frac { P(S_k)P(A|S_k) }{ \sum _{ k=1 }^{ n }{ P(S_k)P(A|S_k) } }

Definition

The right-hand side of Bayes’ theorem, P(Sk)P \left( S_{k} \right), is called the Prior Probability, and the left-hand side, P(SkA)P \left( S_{k} | A \right), is called the Posterior Probability. The probability distributions formed by these probabilities are called Prior Distribution and Posterior Distribution, respectively.

Explanation

Also called Bayes’ Rule, this theorem can be proven quite easily using only two laws, but its applications are extensive. The so-called Bayesian Paradigm divides the field of statistics into two schools of thought, emphasizing its importance cannot be overstated.

What we want to know is the left-hand side of the above equation. What we already know are the probabilities of event AA and the partitions SkS_k of sample space SS occurring, and the probability of AA occurring when each of these partitions occurs. In short, we start with everything we know about SkS_k and its impact on AA. Bayes’ theorem reverses this, allowing us to understand the impact of AA on each of these partitions. If this sounds complicated, it’s enough to focus on wanting to find out the left-hand side.

Proof

By the Law of Total Probability and the Multiplication Rule of Probability, we obtain the following equation. P(A)=P(AS1)+P(AS2)++P(ASn)=P(S1)P(AS1)+P(S2)P(AS2)++P(Sn)P(ASn)=k=1nP(Sk)P(ASk) \begin{align*} P(A)=&P(A\cap S_1)+P(A\cap S_2)+…+P(A\cap S_n) \\ =&P(S_1)P(A|S_1)+P(S_2)P(A|S_2)+…+P(S_n)P(A|S_n) \\ =& \sum _{ k=1 }^{ n }{ P(S_k)P(A|S_k) } \end{align*} Taking the reciprocal of both sides gives us 1k=1nP(Sk)P(ASk)=1P(A)    P(ASk)k=1nP(Sk)P(ASk)=P(ASk)P(A)    P(Sk)P(ASk)k=1nP(Sk)P(ASk)=P(SkA) \begin{align*} & \frac { 1 }{ \sum _{ k=1 }^{ n }{ P(S_k)P(A|S_k) } }=\frac { 1 }{ P(A) } \\ \implies& \frac { P(A\cap S_k) }{ \sum _{ k=1 }^{ n }{ P(S_k)P(A|S_k) } }=\frac { P(A\cap S_k) }{ P(A) } \\ \implies& \frac { P(S_k)P(A|S_k) }{ \sum _{ k=1 }^{ n }{ P(S_k)P(A|S_k) } }=P(S_k|A) \end{align*}


  1. Hogg et al. (2013). Introduction to Mathematical Statistcs(7th Edition): p23. ↩︎