The Baum-Welch algorithm determines the (locally) optimal parameters for a Hidden Markov Model by essentially using three equations.
One for the initial probabilities:
\begin{align}
\pi_i &= \frac{E\left(\text{Number of times a sequence started with state}\, s_i\right)}{E\left(\text{Number of times a sequence started with any state}\right)}
\end{align}
Another for the transition probabilities:
\begin{align}
a_{ij} &= \frac{E\left(\text{Number of times the state changed from}\, s_i \, \text{to}\,s_j\right)}{E\left(\text{Number of times the state changed from}\, s_i \, \text{to any state}\right)}
\end{align}
And the last one for the emission probabilities:
\begin{align}
b_{ik} &= \frac{E\left(\text{Number of times the state was}\, s_i \, \text{and the observation was}\,v_k\right)}{E\left(\text{Number of times the state was}\, s_i\right)}
\end{align}
If one had a fully labeled training corpus representing all possible outcomes, this would be exactly the optimal solution: Count each occurrence, normalize and you’re good. If, however, no such labeled training corpus is available — i.e. only observations are given, no according state sequences — the expected values \(E(c)\) of these counts would have to be estimated. This can be done (and is done) using the forward and backward probabilities \(\alpha_t(i)\) and \(\beta_t(i)\) , as described below.
Weiterlesen »