The 2nd entry equals ≈ 0.44.

In a Hidden Markov Model (HMM), we have an invisible Markov chain (which we cannot observe), and each state generates in random one of k observations, which are visible to us.

Let’s look at an example. Suppose we have the Markov Chain from the previous part, with three states (snow, rain and sunshine), P – the transition probability matrix and q — the initial probabilities. This is the invisible Markov Chain — suppose we are home and cannot see the weather. We can, however, feel the temperature inside our room, and suppose there are two possible observations: hot and cold, where:

Basic Example

As a first example, we apply the HMM to calculate the probability that it feels cold for two consecutive days. In these two days, there are 3*3=9 options for the underlying Markov states. Let us give an example for the probability computation of one of these 9 options:

Summing up all options gives the desired probability.

Finding Hidden States — Viterbi Algorithm

In some cases we are given a series of observation, and want to find the most probable corresponding hidden states.

A brute force solution would take exponential time (like the calculations above); A more efficient approach is called the Viterbi Algorithm ; its main idea is as follows: we are given a sequence of observations o ₁,…,o ₜ . For each state i and t =1,…,T , we define

That is, the maximum probability of a path which ends at time t at the state i , given our observations. The main observation here is that by the Markov property, the if the most likely path that ends with i at time t , equals to some i * at time t −1, then i * is the value of the last state of the most likely path which ends at time t −1. This gives us the following forward recursion:

here, α ⱼ(o ₜ) denotes the probability to have o ₜ when the hidden Markov state is j .

Here is an example. Let us generate a sequence of 14 days, in each 1 denotes hot temperature and 0 denotes called. We will use the algorithm to find the most likely weather forecast of these two weeks.

import numpy as np import pandas as pdobs_map = {'Cold':0, 'Hot':1} obs = np.array([1,1,0,1,0,0,1,0,1,1,0,0,0,1])

inv_obs_map = dict((v,k) for k, v in obs_map.items()) obs_seq = [inv_obs_map[v] for v in list(obs)]

print("Simulated Observations:n",pd.DataFrame(np.column_stack([obs, obs_seq]),columns=['Obs_code', 'Obs_seq']) )

pi = [0.6,0.4] # initial probabilities vector states = ['Cold', 'Hot'] hidden_states = ['Snow', 'Rain', 'Sunshine'] pi = [0, 0.2, 0.8] state_space = pd.Series(pi, index=hidden_states, name='states') a_df = pd.DataFrame(columns=hidden_states, index=hidden_states) a_df.loc[hidden_states[0]] = [0.3, 0.3, 0.4] a_df.loc[hidden_states[1]] = [0.1, 0.45, 0.45] a_df.loc[hidden_states[2]] = [0.2, 0.3, 0.5] print("n HMM matrix:n", a_df) a = a_df.values

observable_states = states b_df = pd.DataFrame(columns=observable_states, index=hidden_states) b_df.loc[hidden_states[0]] = [1,0] b_df.loc[hidden_states[1]] = [0.8,0.2] b_df.loc[hidden_states[2]] = [0.3,0.7] print("n Observable layer matrix:n",b_df) b = b_df.values

We get:

Simulated Observations: Obs_code Obs_seq 0 1 Hot 1 1 Hot 2 0 Cold 3 1 Hot 4 0 Cold 5 0 Cold 6 1 Hot 7 0 Cold 8 1 Hot 9 1 Hot 10 0 Cold 11 0 Cold 12 0 Cold 13 1 HotHMM matrix: Snow Rain Sunshine Snow 0.3 0.3 0.4 Rain 0.1 0.45 0.45 Sunshine 0.2 0.3 0.5

Observable layer matrix: Cold Hot Snow 1 0 Rain 0.8 0.2 Sunshine 0.3 0.7

Now let’s use the algorithm:

path, delta, phi = viterbi(pi, a, b, obs) state_map = {0:'Snow', 1:'Rain', 2:'Sunshine'} state_path = [state_map[v] for v in path] pd.DataFrame().assign(Observation=obs_seq).assign(Best_Path=state_path) We get:

Start Walk Forwards=0 and t=1: phi[0, 1] = 2.0 s=1 and t=1: phi[1, 1] = 2.0 s=2 and t=1: phi[2, 1] = 2.0 s=0 and t=2: phi[0, 2] = 2.0 s=1 and t=2: phi[1, 2] = 2.0 s=2 and t=2: phi[2, 2] = 2.0 s=0 and t=3: phi[0, 3] = 0.0 s=1 and t=3: phi[1, 3] = 1.0 s=2 and t=3: phi[2, 3] = 1.0 s=0 and t=4: phi[0, 4] = 2.0 s=1 and t=4: phi[1, 4] = 2.0 s=2 and t=4: phi[2, 4] = 2.0 s=0 and t=5: phi[0, 5] = 0.0 s=1 and t=5: phi[1, 5] = 1.0 s=2 and t=5: phi[2, 5] = 1.0 s=0 and t=6: phi[0, 6] = 0.0 s=1 and t=6: phi[1, 6] = 1.0 s=2 and t=6: phi[2, 6] = 1.0 s=0 and t=7: phi[0, 7] = 2.0 s=1 and t=7: phi[1, 7] = 2.0 s=2 and t=7: phi[2, 7] = 2.0 s=0 and t=8: phi[0, 8] = 0.0 s=1 and t=8: phi[1, 8] = 1.0 s=2 and t=8: phi[2, 8] = 1.0 s=0 and t=9: phi[0, 9] = 2.0 s=1 and t=9: phi[1, 9] = 2.0 s=2 and t=9: phi[2, 9] = 2.0 s=0 and t=10: phi[0, 10] = 2.0 s=1 and t=10: phi[1, 10] = 2.0 s=2 and t=10: phi[2, 10] = 2.0 s=0 and t=11: phi[0, 11] = 0.0 s=1 and t=11: phi[1, 11] = 1.0 s=2 and t=11: phi[2, 11] = 1.0 s=0 and t=12: phi[0, 12] = 0.0 s=1 and t=12: phi[1, 12] = 1.0 s=2 and t=12: phi[2, 12] = 1.0 s=0 and t=13: phi[0, 13] = 0.0 s=1 and t=13: phi[1, 13] = 1.0 s=2 and t=13: phi[2, 13] = 1.0 -------------------------------------------------- Start Backtrace

path[12] = 1 path[11] = 1 path[10] = 1 path[9] = 2 path[8] = 2 path[7] = 1 path[6] = 2 path[5] = 1 path[4] = 1 path[3] = 2 path[2] = 1 path[1] = 2 path[0] = 2

Which leads to the output:

We used the following implementation, based on [2]:

def viterbi(pi, a, b, obs):nStates = np.shape(b)[0] T = np.shape(obs)[0]

# init blank path path = path = np.zeros(T,dtype=int) # delta --> highest probability of any path that reaches state i delta = np.zeros((nStates, T)) # phi --> argmax by time step for each state phi = np.zeros((nStates, T))

# init delta and phi delta[:, 0] = pi * b[:, obs[0]] phi[:, 0] = 0

print('nStart Walk Forwardn') # the forward algorithm extension for t in range(1, T): for s in range(nStates): delta[s, t] = np.max(delta[:, t-1] * a[:, s]) * b[s, obs[t]] phi[s, t] = np.argmax(delta[:, t-1] * a[:, s]) print('s={s} and t={t}: phi[{s}, {t}] = {phi}'.format(s=s, t=t, phi=phi[s, t]))

# find optimal path print('-'*50) print('Start Backtracen') path[T-1] = np.argmax(delta[:, T-1]) for t in range(T-2, -1, -1): path[t] = phi[path[t+1], [t+1]] print('path[{}] = {}'.format(t, path[t]))

return path, delta, phi

Learning and the Baum-Welch Algorithm

A similar approach to the one above can be used for parameter learning of the HMM model. We have some dataset, and we want to find the parameters which fit the HMM model best. The Baum-Welch Algorithm is an iterative process which finds a (local) maximum of the probability of the observations P(O |M), where M denotes the model (with the parameters we want to fit). Since we know P(M|O ) by the model, we can use a Bayesian approach to find P(M|O ) and converge to an optimum.

HMM have various applications, from character recognition to financial forecasts (detecting regimes in markets).

References

[1] https://cse.buffalo.edu/~jcorso/t/CSE555/files/lecture_hmm.pdf

[2] http://www.blackarbs.com/blog/introduction-hidden-markov-models-python-networkx-sklearn/2/9/2017