In the following, we assume that you have installed GHMM including the Python bindings. Download the UnfairCasino.py-file.

You might have seen the unfair casino example (Chair Biological Sequence Analysis, Durbin et. al, 1998), where a dealer in a casino occasionally exchanges a fair dice with a loaded one. We do not know if and when this exchange happens, we only can observe the throws. This stochastic process we will model with a HMM.

Below `>` is your shell prompt and
`>>>` is the prompt of the Python interpreter and
you should type whatever follows the prompt omitting the
blank. Alternatively, you can enter the commands in a text file `
foo.py` and execute that text file with `python2.3 -i foo.py`. The
`-i` option will bring up the Python prompt after executing `
foo.py` with all the variables and functions you have defined in the
file.

> python2.3 Python 2.3 (#4, Aug 21 2003, 11:51:20) [GCC 2.95.3 20010315 (release)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import ghmm >>> help('ghmm') Help on module ghmm: NAME ghmm - The Design of ghmm.py ...

There are two states in our example. Either the dice is fair (state 0; Python indexes arrays like C and C++ from 0) or it is loaded (state 1). We will define the transition and emission matrices explicitly.

To save us some typing (namely `ghmm.` before any class or
function from GHMM), we will import all symbols from ghmm directly.

Our alphabet are the numbers on the dice {1,2,3,4,5,6}. To be consistent with Python's range function the upper limit is>>> from ghmm import *

The transition matrix>>> sigma = IntegerRange(1,7)

>>> A = [[0.9, 0.1], [0.3, 0.7]] >>> efair = [1.0 / 6] * 6 >>> print efair [0.16666666666666666, 0.16666666666666666, ... 0.16666666666666666] >>> eloaded = [3.0 / 13, 3.0 / 13, 2.0 / 13, 2.0 / 13, 2.0 / 13, 1.0 / 13] >>> B = [efair, eloaded] >>> pi = [0.5] * 2 >>> m = HMMFromMatrices(sigma, DiscreteDistribution(sigma), A, B, pi) >>> print m

At this time GHMM still exposes some internals. For example, symbols
from discrete alphabets are internally represented as *0,...,|alphabet|
-1* and the alphabet is responsible for translating between internal
and external representations.

First we sample from the model, specifying the length of the sequence, and then transfer the observations to the external representation.

>>> obs_seq = m.sampleSingle(20) >>> print obs_seq >>> obs = map(sigma.external, obs_seq) >>> print obs [1, 5, 4, 1, 3, 4, 1, 1, 3, 4, 3, 5, 5, 1, 5, 2, 1, 5, 3, 5]

In a real application we would want to train the model. That is, we would like to estimate the parameters of the model from data with the goal to maximize the likelihood of that data given the model. This is done for HMMs with the Baum-Welch algorithm which is actually an instance of the well-known Expectation-Maximization procedure for missing data problems in statistics. The process is iterative and hence sometime called re-estimation.

We can use `m` as a starting point and use some 'real'
data to train it. The variable `train_seq` is an
`EmissionSequence` or `SequenceSet` object.

>>> from UnfairCasino import train_seq >>> m.baumWelch(train_seq)

If you want more fine-grained control over the learning procedure, you can do single steps and monitor the relevant diagnostics yourself, or employ meta-heuristics such as noise-injection to avoid getting stuck in local maxima. [Currently for continous emissions only]

A Viterbi-path is a sequence of states *Q* maximizing the probability
*P[Q|observations, model]*. If we compute a
Viterbi-path for a unfair casino sequence of throws, the state
sequence tells us when the casino was fair (state 0) and when it
was unfair (state 1).

>>> from UnfairCasino import test_seq >>> v = m.viterbi(test_seq) >>> print v

You can investigate how successful your model is at detecting cheating. Just create artificial observations and see if you can detect them.

>>> my_seq = EmissionSequence(sigma, [1] * 20 + [6] * 10 + [1] * 40) >>> print my_seq >>> print m.viterbi(my_seq)