GHMM: General Hidden Markov Model library

The General Hidden Markov Model library (GHMM) is a freely available C library implementing efficient data structures and algorithms for basic and extended HMMs with discrete and continous emissions. It comes with Python wrappers which provide a much nicer interface and added functionality. The GHMM is licensed under the LGPL.

Defining a HMM with two states which output either heads or tails (think of two coins which get exchanged occasionally) is as easy as this:


> python
Python 2.6.4 (r264:75706, Mar 16 2010, 09:46:46) 
[GCC 4.0.1 (Apple Inc. build 5490)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import ghmm
>>> sigma = ghmm.Alphabet(['h','t'])
>>> m = HMMFromMatrices(sigma, ghmm.DiscreteDistribution(sigma),
... [[0.9, 0.1], [0.3, 0.7]], [[0.5, 0.5], [0.2, 0.8]], [1.0, 0.0]) 
>>> print m
...
>>> help(m)
...

Here the last three arguments for HMMFromMatrices are the transition matrix, the emission matrix, and the initial distribution. For example 0.9 is the probability of staying in the first state. The first state uniformly emits heads or tails, the second state produces tails with a probability of 0.8. The HMM always starts in the first state.

Features

Discrete and continous emissions
Mixtures of PDFs for continous emissions
Non-homogenous Markov chains
Pair HMMs
Clustering and mixture modelling for HMMs
Graphical Editor HMMEd
Python bindings
XML-based file format
Portable (autoconf, automake)

Development

The GHMM is under active development by the Alexander Schliep's group for bioinformatics at Rutgers University. The development is hosted at Sourceforge http://sourceforge.net/projects/ghmm/, where you have access to the Subversion repository, mailing lists and forums.

Publications

The GHMM has been used in numerous scientific publications by the core group. A GHMM software publication is forthcoming. The GHMM has also extensively used as a teaching tool in Bioinformatics and machine learning classes at the Freie Universität Berlin and at Rutgers.

A. Schönhuth, I.G. Costa and A. Schliep. Semi-supervised Clustering of Yeast Gene Expression. In Cooperation in Classification and Data Analysis, Springer, 151–160, 2009.

Michael Seifert Analyzing Microarray Data Using Homogenous and Inhomogenous Hidden Markov Models. Diplomarbeit im Studiengang Bioinformatik, Martin-Luther-Universität Halle (2006)

A. Schliep, I. G. Costa, C. Steinhoff, A. A. Schönhuth. Analyzing Gene Expression Time-Courses IEEE/ACM Transactions on Computational Biology and Bioinformatics ( Special Issue on Machine Learning for Bioinformatics), 2005, 2(3):179-193.

Matthias Heinig Development of a Pair HMM based Gene Finder for the Paramecium Genome. Master Thesis, FU Berlin (2005)

I. G. Costa, A. Schönhuth, A. Schliep. The Graphical Query Language: a tool for analysis of gene expression time-courses , Bioinformatics, 2005, 21(10):2544-2545.

A. Schliep, C. Steinhoff, A. Schönhuth.Robust inference of groups in gene expression time-courses using mixtures of HMM. Proceedings of the ISMB 2004.

A. Schliep, B. Georgi, W. Rungsarityotin, I. G. Costa, A. Schönhuth The General Hidden Markov Model Library: Analyzing Systems with Unobservable States , Proceedings of the Heinz-Billing-Price 2004: 121-136,.

A. Schliep, A. Schönhuth, C. Steinhoff. Using Hidden Markov Models to Analyze Gene Expression Time Course Data. Proceedings of the ISMB 2003. Bioinformatics. 2003 Jul; 19 Suppl 1: I255-I263

B. Knab, A. Schliep, B. Steckemetz and B. Wichern. Model-Based Clustering With Hidden Markov Models and its Application to Financial Time-Series Data. In Between Data Science and Applied Data Analysis, Springer, 561–569, 2003.

B. Georgi. A Graph-based Apporach to Clustering of Profile Hidden Markov Models Bachelor Thesis, FU Berlin.

A. Weisse. Detecting Circular Permutations in Remote Homologue Proteins. Bachelor Thesis, FU Berlin.

Bernd Wichern. Hidden Markov Models for the analysis of data from saving and loan banks Ph.D. Thesis. ZAIK, University of Cologne, Germany (2001). In German.

Bernhard Knab. Extension of Hidden Markov Models for the analysis of financial time-series data Ph.D. Thesis. ZAIK, University of Cologne, Germany (2000). In German.