Conjugate Prior Distribution 📂Mathematical Statistics

Conjugate Prior Distribution

Definition ¹

If the prior and posterior distributions belong to the same family of distributions, the prior distribution is referred to as a Conjugate Prior.

Explanation

Even though Bayesian analysis is essentially about finding the parameters through updates regardless of the initial prior, using an appropriate prior can greatly simplify the mathematical computations and make the results easier to understand if there is some knowledge about the model.

(1) The conjugate prior for $\text{Bin} (n , \theta)$ is $\theta \sim \text{Beta} (\alpha, \beta)$
(2) The conjugate prior for $\text{Poi} ( \theta)$ is $\theta \sim \text{Gamma} (a, b)$
(3) When the population variance $\sigma^2$ is known, the conjugate prior for $N ( \mu , \sigma^2 )$ is $\mu \sim N ( \mu_{0} , \tau_{0}^2 )$
(4) When the population mean $\mu$ is known, the conjugate prior for $N ( \mu , \sigma^2 )$ with respect to $\displaystyle \tau = {{1} \over {\sigma^2 }}$ is $\tau \sim \text{Gamma} (a, b)$
(5) When $\lambda$ is known, the conjugate prior for $\text{Gamma} (a, \lambda)$ is $\lambda \sim \text{Gamma} (a, b)$

Example

For example, consider the model for binomial data $p ( y | \theta ) = \binom{ n}{ y } \theta^{y} (1-\theta)^{n-y}$:

Thinking about its conjugate prior $\theta \sim \text{Beta} (\alpha, \beta)$, $$ \pi (\theta) \propto \theta^{\alpha - 1} ( 1 - \theta )^{\beta -1 } $$ Then, the posterior distribution of $\theta$ is $$ p ( \theta | y ) \propto \theta^{y + \alpha -1 } (1-\theta)^{n-y + \beta -1} $$ which implies a Beta distribution $ \theta \sim \text{Beta} (y + \alpha, n - y + \beta)$. Therefore, the posterior mean of $\theta$ is $$ E ( \theta | y) = {{y + \alpha} \over {n + \alpha + \beta}} $$ which can also be seen as a weighted average of the sample mean $\displaystyle {{y} \over {n}}$ and the prior mean $\displaystyle {{\alpha } \over {\alpha + \beta}}$, $$ E ( \theta | y) = {{n} \over {n + \alpha + \beta}} {{y} \over {n}} + {{\alpha + \beta } \over {n + \alpha + \beta}} {{\alpha} \over {\alpha + \beta}} $$

Therefore, as the sample $n$ increases, the influence of the prior mean decreases, and it becomes closer to frequentist inference.

Returning to the discussion on conjugate priors, using a Beta distribution as the prior and obtaining the posterior as a Beta distribution also makes sequential analysis very feasible. The pmf of the binomial distribution and the pdf of the Beta distribution look similar, so there isn’t much difficulty in the formulas, and it goes smoothly.

김달호. (2013). R과 WinBUGS를 이용한 베이지안 통계학: p101. ↩︎