Bayesian Inference in Machine Learning
📂Machine LearningBayesian Inference in Machine Learning
Overview
Bayesian inference is a statistical method for estimating the distribution of parameters using prior knowledge and observed data based on Bayes’ theorem.
Explanation
Assume that a random variable x follows a probability distribution with parameter θ. The purpose of Bayesian inference is to estimate the distribution of θ by examining the samples drawn from x. The key point is not the value of θ, but estimating the “distribution” of θ. Given x, the probability density function related to θ is the conditional probability density function p(θ∣x), which according to Bayes’ theorem is as follows.
p(θ∣x)=p(x)p(x∣θ)p(θ)
posterior=evidencelikelihood×prior
Here, the left side p(θ∣x) that we seek is called the posterior probability (distribution). It refers to the probability concerning θ after x has been drawn, i.e., after the event has occurred.
The term p(x∣θ) on the right side is known as the likelihood.
The term p(θ) on the right side is called the prior probability (distribution). It represents the knowledge about θ before observing x.
The denominator on the right, p(x), is called the evidence.
Since the data’s distribution does not change, p(x) remains constant. Thus, we obtain the following.
p(θ∣x)∝p(x∣θ)p(θ)
posterior∝likelihood×prior
Meanwhile, from the definition of the conditional probability density function, the following holds.
p(θ∣x,y)=p(x,y)p(x,y∣θ)p(θ)=p(x,y)p(x,y∣θ)p(θ)p(y)p(y)(=p(x,y)p(x,y∣θ)p(θ)p(y)p(y))=p(x∣y)p(x∣y,θ)p(θ)(=p(x∣y)p(x∣y,θ)p(θ))
Maximum a Posteriori Estimation
Finding the θ that maximizes p(θ∣x) is called maximum a posteriori estimation, or simply MAP. Since p(x) does not depend on θ, the θMAP that maximizes the posterior probability is as follows.
θMAP=θargmaxp(θ∣x)=θargmaxp(x∣θ)p(θ)
Furthermore, since the logarithmic function is a monotonically increasing function, it is equivalent to the form below.
θMAP=θargmaxp(θ∣x)=θargmaxp(x∣θ)p(θ)=θargmaxlog[p(θ∣x)]=θargmaxlog[p(x∣θ)p(θ)]
Maximum Likelihood Estimation
In contrast to MAP, the estimation method that considers only the likelihood, without considering the prior probability, is called maximum likelihood estimation, or simply ML(E). The θML that maximizes the likelihood of θ is as follows.
θML=θargmaxp(x∣θ)=θargmaxlogp(x∣θ)
This is the same as assuming the prior probability to be a uniform distribution in the maximum a posteriori estimation.