The General Hidden Markov Model library (GHMM) is a freely available C library implementing efficient data structures and algorithms for basic and extended HMMs with discrete and continous emissions. It comes with Python wrappers which provide a much nicer interface and added functionality. The GHMM is licensed under the LGPL.

Defining a HMM with two states which output either heads or tails (think of two coins which get exchanged occasionally) is as easy as this:

Here the last three arguments for> python Python 2.6.4 (r264:75706, Mar 16 2010, 09:46:46) [GCC 4.0.1 (Apple Inc. build 5490)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import ghmm >>> sigma = ghmm.Alphabet(['h','t']) >>> m = HMMFromMatrices(sigma, ghmm.DiscreteDistribution(sigma), ... [[0.9, 0.1], [0.3, 0.7]], [[0.5, 0.5], [0.2, 0.8]], [1.0, 0.0]) >>> print m ... >>> help(m) ...

- Discrete and continous emissions
- Mixtures of PDFs for continous emissions
- Non-homogenous Markov chains
- Pair HMMs
- Clustering and mixture modelling for HMMs
- Graphical Editor HMMEd
- Python bindings
- XML-based file format
- Portable (autoconf, automake)

The GHMM is under active development by the Alexander Schliep's group for bioinformatics at Rutgers University. The development is hosted at Sourceforge http://sourceforge.net/projects/ghmm/, where you have access to the Subversion repository, mailing lists and forums.

The GHMM has been used in numerous scientific publications by the core group. A GHMM software publication is forthcoming. The GHMM has also extensively used as a teaching tool in Bioinformatics and machine learning classes at the Freie Universität Berlin and at Rutgers.

A. Schönhuth, I.G. Costa and A. Schliep.Semi-supervised Clustering of Yeast Gene Expression. InCooperation in Classification and Data Analysis,Springer, 151–160, 2009.

Michael SeifertAnalyzing Microarray Data Using Homogenous and Inhomogenous Hidden Markov Models.Diplomarbeit im Studiengang Bioinformatik, Martin-Luther-Universität Halle (2006)

A. Schliep, I. G. Costa, C. Steinhoff, A. A. Schönhuth.Analyzing Gene Expression Time-CoursesIEEE/ACM Transactions on Computational Biology and Bioinformatics ( Special Issue on Machine Learning for Bioinformatics), 2005, 2(3):179-193.

Matthias HeinigDevelopment of a Pair HMM based Gene Finder for the Paramecium Genome.Master Thesis, FU Berlin (2005)

I. G. Costa, A. Schönhuth, A. Schliep.The Graphical Query Language: a tool for analysis of gene expression time-courses, Bioinformatics, 2005, 21(10):2544-2545.

A. Schliep, C. Steinhoff, A. Schönhuth.Robust inference of groups in gene expression time-courses using mixtures of HMM.Proceedings of the ISMB 2004.

A. Schliep, B. Georgi, W. Rungsarityotin, I. G. Costa, A. SchönhuthThe General Hidden Markov Model Library: Analyzing Systems with Unobservable States, Proceedings of the Heinz-Billing-Price 2004: 121-136,.

A. Schliep, A. Schönhuth, C. Steinhoff.Using Hidden Markov Models to Analyze Gene Expression Time Course Data.Proceedings of the ISMB 2003. Bioinformatics. 2003 Jul; 19 Suppl 1: I255-I263

B. Knab, A. Schliep, B. Steckemetz and B. Wichern. Model-Based Clustering With Hidden Markov Models and its Application to Financial Time-Series Data. InBetween Data Science and Applied Data Analysis,Springer, 561–569, 2003.

B. Georgi.A Graph-based Apporach to Clustering of Profile Hidden Markov ModelsBachelor Thesis, FU Berlin.

A. Weisse.Detecting Circular Permutations in Remote Homologue Proteins.Bachelor Thesis, FU Berlin.

Bernd Wichern.Hidden Markov Models for the analysis of data from saving and loan banksPh.D. Thesis. ZAIK, University of Cologne, Germany (2001). In German.

Bernhard Knab.Extension of Hidden Markov Models for the analysis of financial time-series dataPh.D. Thesis. ZAIK, University of Cologne, Germany (2000). In German.