logo

Conjugate Prior Distribution 📂Mathematical Statistics

Conjugate Prior Distribution

Definition 1

If the prior and posterior distributions belong to the same family of distributions, the prior distribution is referred to as a Conjugate Prior.

Explanation

Even though Bayesian analysis is essentially about finding the parameters through updates regardless of the initial prior, using an appropriate prior can greatly simplify the mathematical computations and make the results easier to understand if there is some knowledge about the model.

  • (1) The conjugate prior for Bin(n,θ)\text{Bin} (n , \theta) is θBeta(α,β)\theta \sim \text{Beta} (\alpha, \beta)
  • (2) The conjugate prior for Poi(θ)\text{Poi} ( \theta) is θGamma(a,b)\theta \sim \text{Gamma} (a, b)
  • (3) When the population variance σ2\sigma^2 is known, the conjugate prior for N(μ,σ2)N ( \mu , \sigma^2 ) is μN(μ0,τ02)\mu \sim N ( \mu_{0} , \tau_{0}^2 )
  • (4) When the population mean μ\mu is known, the conjugate prior for N(μ,σ2)N ( \mu , \sigma^2 ) with respect to τ=1σ2\displaystyle \tau = {{1} \over {\sigma^2 }} is τGamma(a,b)\tau \sim \text{Gamma} (a, b)
  • (5) When λ\lambda is known, the conjugate prior for Gamma(a,λ)\text{Gamma} (a, \lambda) is λGamma(a,b)\lambda \sim \text{Gamma} (a, b)

Example

For example, consider the model for binomial data p(yθ)=(ny)θy(1θ)nyp ( y | \theta ) = \binom{ n}{ y } \theta^{y} (1-\theta)^{n-y}:

Thinking about its conjugate prior θBeta(α,β)\theta \sim \text{Beta} (\alpha, \beta), π(θ)θα1(1θ)β1 \pi (\theta) \propto \theta^{\alpha - 1} ( 1 - \theta )^{\beta -1 } Then, the posterior distribution of θ\theta is p(θy)θy+α1(1θ)ny+β1 p ( \theta | y ) \propto \theta^{y + \alpha -1 } (1-\theta)^{n-y + \beta -1} which implies a Beta distribution θBeta(y+α,ny+β) \theta \sim \text{Beta} (y + \alpha, n - y + \beta). Therefore, the posterior mean of θ\theta is E(θy)=y+αn+α+β E ( \theta | y) = {{y + \alpha} \over {n + \alpha + \beta}} which can also be seen as a weighted average of the sample mean yn\displaystyle {{y} \over {n}} and the prior mean αα+β\displaystyle {{\alpha } \over {\alpha + \beta}}, E(θy)=nn+α+βyn+α+βn+α+βαα+β E ( \theta | y) = {{n} \over {n + \alpha + \beta}} {{y} \over {n}} + {{\alpha + \beta } \over {n + \alpha + \beta}} {{\alpha} \over {\alpha + \beta}}

Therefore, as the sample nn increases, the influence of the prior mean decreases, and it becomes closer to frequentist inference.

Returning to the discussion on conjugate priors, using a Beta distribution as the prior and obtaining the posterior as a Beta distribution also makes sequential analysis very feasible. The pmf of the binomial distribution and the pdf of the Beta distribution look similar, so there isn’t much difficulty in the formulas, and it goes smoothly.


  1. 김달호. (2013). R과 WinBUGS를 이용한 베이지안 통계학: p101. ↩︎