Table of Contents

Module: ghmm /amd/bernoulli/1/home/abt_vin/georgi/hmm/0.7/ghmm//ghmmwrapper/ghmm.py

Python bindings for the GHMM C-library.

The Design of ghmm.py

HMMs are stochastic models which encode a probability density over sequences of symbols. These symbols can be discrete letters (A,C,G and T for DNA; 1,2,3,4,5,6 for dice), real numbers (weather measurement over time: temperature) or vectors of either or the combination thereof (weather again: temperature, pressure, percipitation).

Note: We will always talk about emissions, emission sequence and so forth when we refer to the sequence of symbols. Another name for the same object is observation resp. observation sequence.

The objects one has to deal with in HMM modelling are the following

1) The domain the emissions come from: the EmissionDomain. Domain is to be understood mathematically and to encompass both discrete, finite alphabets and fields such as the real numbers or intervals of the reals.

For technical reasons there can be two representations of an emission symbol: an external and an internal. The external representation is the view of the application using ghmm.py. The internal one is what is used in both ghmm.py and the ghmm C-library. Representations can coincide, but this is not guaranteed. Discrete alphabets of size k are represented as [0,1,2,...,k-1] internally. It is the domain objects job to provide a mapping between representations in both directions.

NOTE: Do not make assumptions about the internal representations. It might change.

2) Every domain has to afford a distribution, which is usually parameterized. A distribution associated with a domain should allow us to compute $\Prob[x| distribution parameters]$ efficiently.

The distribution defines the type of distribution which we will use to model emissions in every state of the HMM. The type of distribution will be identical for all states, their parameterizations will differ from state to state.

3) We will consider a Sequence of emissions from the same emission domain and very often sets of such sequences: SequenceSet

4) The HMM: The HMM consists of two major components: A Markov chain over states (implemented as a weighted directed graph with adjacency and inverse-adjacency lists) and the emission distributions per-state. For reasons of efficiency the HMM itself is static, as far as the topology of the underlying Markov chain (and obviously the EmissionDomain) are concerned. You cannot add or delete transitions in an HMM.

Transition probabilities and the parameters of the per-state emission distributions can be easily modified. Particularly, Baum-Welch reestimation is supported. While a transition cannot be deleted from the graph, you can set the transition probability to zero, which has the same effect from the theoretical point of view. However, the corresponding edge in the graph is still traversed in the computation.

States in HMMs are referred to by their integer index. State sequences are simply list of integers.

If you want to store application specific data for each state you have to do it yourself.

Subclasses of HMM implement specific types of HMM. The type depends on the EmissionDomain, the Distribution used, the specific extensions to the standard HMMs and so forth

5) HMMFactory: This provides a way of constucting HMMs. Classes derived from HMMFactory allow to read HMMs from files, construct them explicitly from, for a discrete alphabet, transition matrix, emission matrix and prior or serve as the basis for GUI-based model building.

There are several ways of using the HMMFactory.

Static construction:

HMMOpen(fileName) # Calls an object of type HMMOpen instantiated in ghmm HMMOpen(fileName, type=HMM.FILE_XML) HMMFromMatrices(emission_domain, distribution, A, B, pi) # B is a list of distribution parameters

Examples:

hmm = HMMOpen(some-hmm.xml)

Imported modules   
from Graph import Graph, EdgeWeight
import StringIO
import copy
import ghmmhelper
import ghmmwrapper
from math import log, ceil
import modhmmer
from os import path
import re
from string import join
import sys
import xmlutil
Functions   
HMMDiscriminativePerformance
HMMDiscriminativeTraining
HMMwriteList
IntegerRange
SequenceSetOpen
readMultipleHMMERModels
verbose
  HMMDiscriminativePerformance 
HMMDiscriminativePerformance ( HMMList,  SeqList )

Exceptions   
TypeRrror, 'Inputs not equally long'
  HMMDiscriminativeTraining 
HMMDiscriminativeTraining (
        HMMList,
        SeqList,
        nrSteps=50,
        gradient=0,
        )

Exceptions   
TypeError, 'Inputs not equally long'
TypeError, 'discriminative training is at the moment only implemented on discrete HMMs'
UnknownInputType, "TrainingType " + gradient + " not supported."
  HMMwriteList 
HMMwriteList ( fileName,  hmmList )

  IntegerRange 
IntegerRange ( a,  b )

  SequenceSetOpen 
SequenceSetOpen ( emissionDomain,  fileName )

Reads a sequence file with multiple sequence sets.

Returns a list of SequenceSet objects.

Exceptions   
IOError, 'File ' + str( fileName ) + ' not found.'
  readMultipleHMMERModels 
readMultipleHMMERModels ( fileName )

Reads a file containing multiple HMMs in HMMER format, returns list of HMM objects.

Exceptions   
IOError, 'File ' + str( fileName ) + ' not found.'
  verbose 
verbose ( message,  level=1 )

Classes   

Alphabet

Discrete, finite alphabet

BackgroundDistribution

ComplexEmissionSequence

A complex emission sequence holds the encoded representations of one

ContinousDistribution

DiscreteDistribution

A DiscreteDistribution over an Alphabet: The discrete distribution

DiscreteEmissionHMM

HMMs with discrete emissions.

DiscretePairDistribution

A DiscreteDistribution over TWO Alphabets: The discrete distribution

Distribution

Abstract base class for distribution over EmissionDomains

EmissionDomain

Abstract base class for emissions produced by an HMM.

EmissionSequence

An EmissionSequence contains the internal representation of

Float

GHMMError

Base class for exceptions in this module.

GHMMOutOfDomain

GaussianDistribution

XXX attributes unused at this point

GaussianEmissionHMM

HMMs with Gaussian distribution as emissions.

GaussianMixtureDistribution

GaussianMixtureHMM

HMMs with mixtures of Gaussians as emissions.

HMM

The HMM base class.

HMMFactory

A HMMFactory is the base class of HMM factories.

HMMFromMatricesFactory

HMMOpenFactory

IndexOutOfBounds

InvalidModelParameters

LabelDomain

To be used for labelled HMMs. We could use an Alphabet directly but this way it is more explicit.

MixtureContinousDistribution

NoValidCDataType

PairHMM

Pair HMMs with discrete emissions over multiple alphabets.

PairHMMOpenFactory

factory to create PairHMM objects from XML files

SequenceCannotBeBuild

SequenceSet

SequenceSetSubset

SequenceSetSubset contains a subset of the sequences from a SequenceSet object.

StateLabelHMM

Labelled HMMs with discrete emissions.

UnknownInputType

badCPointer


Table of Contents

This document was automatically generated on Fri Jan 20 14:56:59 2006 by HappyDoc version WORKING