logo

Hidden Markov Chain 📂Probability Theory

Hidden Markov Chain

Buildup

20190218_114748.png

Imagine a machine that produces certain items at fixed intervals as shown in the figure above. If green represents the normal quality goods $1$ and red represents defective goods that need to be discarded $0$, then the record so far would be $\left( 1, 0 , 1 \right)$. The actual visible results like this are referred to as Signal.

20190218_111057.png However, the probability of producing a defective item varies depending on whether the machine is normal or broken. When normal $+$, the defect rate is only $1 %$, but when broken $-$, the defect rate skyrockets to $4 %$. If we express this as a conditional probability, it will be $$\begin{matrix} p ( 1 \mid + ) = 0.99 & p ( 0 \mid + ) = 0.01 \\ p ( 1 \mid - ) = 0.96 & p ( 0 \mid - ) = 0.04 \end{matrix}$$

The diagram would look as follows.

20190218_111611.png

Let’s assume that the probability of the machine breaking down each time it produces an item is as the diagram above $10 \%$. This is a Markov chain, and the state space of the stochastic process $\left\{ X_{j} \right\}_{j = 0}^{n}$ is $\left\{ + , - \right\}$, and our interest is the probability that the machine is normal after the $j$th production $p \left( X_{j} = + \right)$. Of course, we’ll assume that the machine is normal initially, represented as $\begin{matrix} p(X_{0} = +) = 1 \\ p(X_{0} = - ) = 0 \end{matrix}$.

If the machine is left broken, it will continue to produce many defective items. The question is how we can know this. Of course, a manager can quickly find out whether the machine is broken or not by inspecting it once, but if there are an enormous number of machines in a large factory, even such inspections can be a very burdensome task. Realistically, we have no choice but to guess which machine has a problem based on the items produced.

Definition

When the Markov chain (cause) is hidden and only the resulting signal (effect) is known, it is called Hidden Markov Chain $\left\{ X_{j} \right\}_{j=0}^{n}$.

Example

(1)

Now let’s calculate the probability $\displaystyle p \left( X_{n} = x \mid S_{(n)} = s_{(n)} \right)$ given the signal $s_{(n)} = ( s_{1} , s_{2} , \cdots , s_{n} )$.


If we say $$ \begin{align*} & p \left( X_{n} = x \mid S_{(n)} = s_{(n)} \right) \\ =& {{ p \left( X_{n} = x , S_{(n)} = s_{(n)} \right) } \over { p \left( S_{(n)} = s_{(n)} \right) }} \\ =& {{ F_{n} (x) } \over { \sum_{y} F_{n} (y) }} \end{align*} $$ that is, if we know $F_{n} (x)$, we can know $\displaystyle p \left( X_{n} = x \mid S_{(n)} = s_{(n)} \right)$ without considering the conditional probability. $(1)$ $s_{n}$ is independent of $s_{(n-1)}$, and $(2)$ $\left\{ X_{j} \right\}_{j=1}^{n}$ is a Markov chain, so $$ \begin{align*} F_{n} (x) =& p \left( X_{n} = x , S_{(n)} = s_{(n)} \right) \\ =& p \left( X_{n} = x , S_{(n-1)} = s_{(n-1)} , S_{n} = s_{n} \right) \\ =& \sum_{y} p \left( X_{n} = x , X_{n-1} = y , S_{(n-1)} = s_{(n-1)} , S_{n} = s_{n} \right) \\ =& \sum_{y} p \left( X_{n-1} = y , S_{(n-1)} = s_{(n-1)} \right) {{ p \left( X_{n} = x , X_{n-1} = y , S_{(n-1)} = s_{(n-1)} , S_{n} = s_{n} \right)} \over { p \left( X_{n-1} = y , S_{(n-1)} = s_{(n-1)} \right) }} \\ =& \sum_{y} F_{n-1} ( y ) p \left( X_{n} = x , S_{n} = s_{n} \mid X_{n-1} = y , S_{(n-1)} = s_{(n-1)} \right) \\ &\overset{(1)}{=}& \sum_{y} F_{n-1} ( y ) p \left( X_{n} = x , S_{n} = s_{n} \mid X_{n-1} = y \right) \\ =& \sum_{y} F_{n-1} ( y ) {{ p \left( S_{n} = s_{n} , X_{n} = x , X_{n-1} = y \right) } \over { p ( X_{n-1} = y ) }} \\ =& \sum_{y} F_{n-1} ( y ) {{ p ( X_{n} = x , X_{n-1} = y ) } \over { p ( X_{n-1} = y ) }} {{ p \left( S_{n} = s_{n} , X_{n-1} = y , X_{n} = x \right) } \over { p ( X_{n} = x , X_{n-1} = y ) }} \\ =& \sum_{y} F_{n-1} ( y ) p \left( X_{n} = x \mid X_{n-1} = y \right) p \left( S_{n} = s_{n} \mid X_{n} = x , X_{n-1} = y \right) \\ &\overset{(2)}{=}& \sum_{y} F_{n-1} ( y ) p \left( X_{n} = x \mid X_{n-1} = y \right) p \left( S_{n} = s_{n} \mid X_{n} = x \right) \\ =& \sum_{y} F_{n-1} ( y ) p_{yx} p \left( s_{n} \mid x \right) \end{align*} $$ To summarize, we get the following. $$ F_{n} (x) = p \left( s_{n} \mid x \right) \sum_{y} F_{n-1} ( y ) p_{yx} $$

(2)

20190218_114748.png Then, as an example in this post, let’s calculate $\displaystyle p \left( X_{3} = + \mid S_{(3)} = (1,0,1) \right)$ under the assumptions mentioned. In other words, compute the probability that the machine is normal in that situation.


In the Markov chain, the transition probabilities of states are $$\begin{matrix} p_{++} = 0.9 & p_{+-} = 0.1 \\ p_{-+} = 0 & p_{–} = 1 \end{matrix}$$ For each state, $$\begin{matrix} p ( 1 \mid + ) = 0.99 & p ( 0 \mid + ) = 0.01 \\ p ( 1 \mid - ) = 0.96 & p ( 0 \mid - ) = 0.04 \end{matrix}$$ Initially, assuming that the machine is normal, $$ F_{1} (+) = p( 1 \mid + ) = 0.99 \\ F_{1} (-) = p (0 \mid + ) = 0.01 $$ since $s_{2} = 0$, $$ F_{2} ( + ) = p ( 0 \mid + ) \left[ F_{1} ( + ) p_{++} + F_{1} ( - ) p_{-+} \right] = 0.01 \times [ 0.99 \times 0.9 + 0 ] = 0.00891 \\ F_{2} ( - ) = p ( 0 \mid - ) \left[ F_{1} ( + ) p_{+-} + F_{1} ( - ) p_{–} \right] = 0.04 \times [ 0.99 \times 0.1 + 0.01 \times 1 ] = 0.00436 $$ since $s_{3} = 1$, $$ F_{3} ( + ) = p ( 1 \mid + ) \left[ F_{2} ( + ) p_{++} + F_{2} ( - ) p_{-+} \right] = 0.99 \times [ .00891 \times 0.9 + 0 ]) = 0.00793881 \\ F_{3} ( - ) = p ( 1 \mid - ) \left[ F_{2} ( + ) p_{+-} + F_{2} ( - ) p_{–} \right] = 0.94 \times [ .00891 \times 0.1 + .00436 \times 1 ] = 0.00493594 $$ Therefore, $$\displaystyle p \left( X_{3} = + \mid S_{(3)} = (1,0,1) \right) = {{ F_{3} ( + ) } \over { F_{3} ( + ) + F_{3} ( - ) }} = {{ 0.00793881 } \over { 0.01287475 }} = 0.6166 \cdots$$

Logically, if the defect rate is not that high but one out of three is defective, we can guess that the probability of the machine being normal is significantly decreased. The calculation results show that the probability of the machine being normal barely exceeds $60 \%$.