logo

Beta Distribution 📂Probability Distribution

Beta Distribution

Definition 1

pdf0 pdf1 pdf2

For $\alpha , \beta > 0$, the continuous probability distribution $\text{Beta}(\alpha,\beta)$, called the beta Distribution, has the following probability density function: $$ f(x) = {{ 1 } \over { B(\alpha,\beta) }} x^{\alpha - 1} (1-x)^{\beta - 1} \qquad , x \in [0,1] $$


Basic Properties

Moment Generating Function

  • [1]: $$m(t) = 1 + \sum_{k=1}^{\infty} \left( \prod_{r=0}^{k-1} {{ \alpha + r } \over { \alpha + \beta + r }} {{ t^{k} } \over { k! }} \right) \qquad , t \in \mathbb{R}$$

Mean and Variance

  • [2]: If $X \sim \text{Beta}(\alpha,\beta)$, then $$ \begin{align*} E(X) =& {\alpha \over {\alpha + \beta} } \\ \text{Var} (X) =& { { \alpha \beta } \over {(\alpha + \beta + 1) { ( \alpha + \beta ) }^2 } } \end{align*} $$

Sufficient Statistics

  • [3]: Suppose a random sample $\mathbf{X} := \left( X_{1} , \cdots , X_{n} \right) \sim \text{Beta} \left( \alpha, \beta \right)$ following a beta distribution is given.

The sufficient statistic $T$ for $\left( \alpha, \beta \right)$ is as follows: $$ T = \left( \prod_{i} X_{i}, \prod_{i} \left( 1 - X_{i} \right) \right) $$

Theorems

Derivation from Gamma Distribution

  • [a]: If two random variables $X_{1},X_{2}$ are independent and $X_{1} \sim \Gamma ( \alpha_{1} , 1)$, $X_{2} \sim \Gamma ( \alpha_{2} , 1)$ are given, then $$ {{ X_{1} } \over { X_{1} + X_{2} }} \sim \text{beta} \left( \alpha_{1} , \alpha_{2} \right) $$

Derivation from F-distribution

  • [b]: For a random variable $X \sim F \left( r_{1}, r_{2} \right)$ following an F-distribution with degrees of freedom $r_{1} , r_{2}$, the defined $Y$ follows a beta distribution. $$ Y := {{ \left( r_{1} / r_{2} \right) X } \over { 1 + \left( r_{1} / r_{2} \right) X }} \sim \text{Beta} \left( {{ r_{1} } \over { 2 }} , {{ r_{2} } \over { 2 }} \right) $$

Description

Just as the gamma distribution comes from the gamma function, the beta distribution is named after the beta function. The beta function has the following relationship with the gamma function, allowing it to be expressed via gamma functions. $$ B(p,q) = {{\Gamma (p) \Gamma (q)} \over {\Gamma (p+q) }} $$ In fact, the gamma distribution can induce a beta distribution.

Just like the beta function can be seen as a generalization of binomial coefficients, careful observation of the beta distribution’s probability density function reveals its resemblance to the probability mass function of a binomial distribution $P(k) = { _n {C} _k }{ p ^ k }{ (1-p) ^ { n - k } }$. Although it doesn’t precisely match the definition of a beta distribution, if one considers $\alpha$ as the number of successes and $\beta$ as the number of failures, the resemblance is noticeable in: $$ n = \alpha + \beta \\ \displaystyle p = {{\alpha } \over {\alpha + \beta}} \\ \displaystyle q = {{\beta } \over {\alpha + \beta}} $$ In fact, in Bayesian analysis, it is used as the conjugate prior distribution of a binomial distribution.

Proof

[1]

Though the equations are complex, there is no logical difficulty.

Exponential function series expansion: $$ { { e ^ x } }=\sum _{ n=0 }^{ \infty }{ \frac { { x } ^{ n } }{ n! } } $$

Euler integration: $$ B(p,q)=\int_0^1 t^{p-1}(1-t)^{q-1}dt $$

$$ \begin{align*} m(t) =& \int_{0}^{1} e^{tx} {{ 1 } \over { B(\alpha,\beta) }} x^{\alpha - 1} (1-x)^{\beta - 1} dx \\ =& {{ 1 } \over { B(\alpha,\beta) }} \int_{0}^{1} \left( \sum_{k=0}^{\infty} {{ (tx)^{k} } \over { k! }} \right) x^{\alpha - 1} (1-x)^{\beta - 1} dx \\ =& {{ 1 } \over { B(\alpha,\beta) }} \sum_{k=0}^{\infty} {{ t^{k} } \over { k! }} \int_{0}^{1} x^{\alpha + k - 1} (1-x)^{\beta - 1} dx \\ =& {{ 1 } \over { B(\alpha,\beta) }} \sum_{k=0}^{\infty} {{ t^{k} } \over { k! }} B \left( \alpha + k , \beta \right) \\ =& \sum_{k=0}^{\infty} {{ t^{k} } \over { k! }} {{ B \left( \alpha + k , \beta \right) } \over { B(\alpha,\beta) }} \\ =& {{ t^{0} } \over { 0! }} {{ B \left( \alpha + 0 , \beta \right) } \over { B(\alpha,\beta) }} + \sum_{k=1}^{\infty} {{ t^{k} } \over { k! }} {{ B \left( \alpha + k , \beta \right) } \over { B(\alpha,\beta) }} \end{align*} $$

Relationship between Beta function and Gamma function: $$B(p,q) = {{\Gamma (p) \Gamma (q)} \over {\Gamma (p+q) }}$$

Expanding the Beta function into Gamma functions results in:

$$ \begin{align*} m(t) =& 1 + \sum_{k=1}^{\infty} {{ t^{k} } \over { k! }} {{ B \left( \alpha + k , \beta \right) } \over { B(\alpha,\beta) }} \\ =& 1 + \sum_{k=1}^{\infty} {{ t^{k} } \over { k! }} {{ \Gamma ( \alpha + k ) \Gamma ( \beta ) } \over { \Gamma \left( \alpha + \beta + k \right) }} {{ \Gamma ( \alpha + \beta ) } \over { \Gamma \left( \alpha \right) \Gamma \left( \beta \right) }} \\ =& 1 + \sum_{k=1}^{\infty} {{ t^{k} } \over { k! }} {{ \Gamma ( \alpha + k ) } \over { \Gamma \left( \alpha + \beta + k \right) }} {{ \Gamma ( \alpha + \beta ) } \over { \Gamma \left( \alpha \right) }} \\ =& 1 + \sum_{k=1}^{\infty} {{ t^{k} } \over { k! }} {{ \Gamma ( \alpha + k ) } \over { \Gamma \left( \alpha \right) }} {{ \Gamma ( \alpha + \beta ) } \over { \Gamma \left( \alpha + \beta + k \right) }} \\ =& 1 + \sum_{k=1}^{\infty} {{ t^{k} } \over { k! }} {{ \Gamma ( \alpha ) \prod_{r=0}^{k-1} ( \alpha + r) } \over { \Gamma \left( \alpha \right) }} {{ \Gamma ( \alpha + \beta ) } \over { \Gamma \left( \alpha + \beta \right) \prod_{r=0}^{k-1} ( \alpha + \beta + r) }} \\ =& 1 + \sum_{k=1}^{\infty} {{ t^{k} } \over { k! }} \prod_{r=0}^{k-1} {{ \alpha + r } \over { \alpha + \beta + r }} \end{align*} $$

[2]

Derive directly.

[3]

Though $(1 - x)$ may make one uncomfortable, just derive directly.

[a]

Derive directly using the probability density function.

[b]

Derive directly using the probability density function.

Code

Below is the Julia code that displays the probability density function of the beta distribution as a GIF.

@time using LaTeXStrings
@time using Distributions
@time using Plots

cd(@__DIR__)

x = 0:0.01:1
B = collect(0.1:0.1:10.0); append!(B, reverse(B))

animation = @animate for β ∈ B
    plot(x, pdf.(Beta(0.5, β), x),
     color = :black,
     label = "α = 0.5, β = $(rpad(β, 3, '0'))", size = (400,300))
    xlims!(0,1); ylims!(0,5); title!(L"\mathrm{pmf\,of\,Beta} (0.5, \beta)")
end
gif(animation, "pdf0.gif")

animation = @animate for β ∈ B
    plot(x, pdf.(Beta(1, β), x),
     color = :black,
     label = "α = 1, β = $(rpad(β, 3, '0'))", size = (400,300))
    xlims!(0,1); ylims!(0,5); title!(L"\mathrm{pmf\,of\,Beta} (1, \beta)")
end
gif(animation, "pdf1.gif")

animation = @animate for β ∈ B
    plot(x, pdf.(Beta(2, β), x),
     color = :black,
     label = "α = 2, β = $(rpad(β, 3, '0'))", size = (400,300))
    xlims!(0,1); ylims!(0,5); title!(L"\mathrm{pmf\,of\,Beta} (2, \beta)")
end
gif(animation, "pdf2.gif")

  1. Hogg et al. (2013). Introduction to Mathematical Statistics(7th Edition): p165. ↩︎