The Graphical Query Language:
A GHMM-based tool for querying and clustering Gene-Expression time-course data

(c) 2003-2004 Alexander Schliep (c) 2004-2011 Ivan G. Costa and Alexander Schliep

GQL is a suite of tools for analyizing time-course experiments. Currently, it is adapted to gene expression data. The two main tools are GQLQuery, for querying data sets, and GQLCluster, which provides a way for computing groupings based on a number of methods (model-based clustering using HMMs as cluster models and estimation of a mixture of HMMs).

Availability

GQL is freely available under the GPL. The first public release of GQL is available in source form and binaries form Windows or Mac.

Note that downloading and using GQL implies acceptance of the GPL. GQL is not freeware, nor public domain and the copyright will be enforced. As the GPL has strong consequences for any work derived from GQL, commercial entities can inquire about non-exclusive licenses.

If you use GQL in your research, please do cite the aforementioned paper(s).

GQLQuery: Querying time-courses

The GUI has been ported to Python using Tkinter and the brand-new Python bindings for GHMM. It runs on all Linux/Unix boxes. Executable binaries for MAC and Windows are provided.

GQLCluster: Finding groups in time-courses

Linux & Unix Version

Prerequisites

You will need to install the following packages before you install GHMM. The version 1.0 of GQL will only work with the most recent version of GHMM.

Installation

Troubleshooting

Mac OS

Prerequisites

You will need a Mac OS 10.3 or latter version.

Installation

Windows

Prerequisites

You will need Windows 98 or latter version.

Installation

See readme.txt avaliable in the unix/linux version for a non binary installation on windows.

Documentation

The papers bellows describes the methods implemented in GQL.

A. Schliep, A. Schönhuth, C. Steinhoff. Using Hidden Markov Models to Analyze Gene Expression Time Course Data. Proceedings of the ISMB 2003. Bioinformatics. 2003 Jul; 19 Suppl 1: I255-I263
A. Schliep, C. Steinhoff, A. Schönhuth.Robust inference of groups in gene expression time-courses using mixtures of HMM. Proceedings of the ISMB 2004. Bioinformatics, Aug 2004; 20 Suppl 1: I283 - I289.
A. Schliep, I. G. Costa, C. Steinhoff, A. Schönhuth. Analysing gene expression time-courses , IEEE Transactions on Computational Biology and Bioinformatics, to appear.
I. G. Costa, A. Schönhuth, A. Schliep. The Graphical Query Language: a tool for analysis of gene expression time-courses , Bioinformatics, 2005, 21(10):2544-2545.

File formats:

Both tools supports GHMM file formats for input data and model descriptions (see GHMM). It also reads input files in standard tab separated files, as the ones used by most of gene expression analysis tools. In this format, each line represents a gene and the columns the measured time points. The first column holds the gene identifiers and the second column any type of annotation of the genes. Missing values should be decoded as either 'Nan' or by not placing any character at the position. Sample files of all formats are provided in examples.



YHR124W	 meiosis                                -0.377685	-0.427071	-0.479749	 0.175438
YGR072W	 mRNA decay, nonsense-mediated unknown  -0.067600	-0.664033	-0.412644	 0.090134
YGR145W	 unknown                       	         0.266238	-0.854138	-0.103595	 0.371387
YIR031C  allantoin utilization                  -0.017010	 0.650807	 0.461851	-0.146432
YJR010W	 methionine biosynthesis                      NaN	 0.847968	 0.078140	-0.137952
YMR172W	 osmotic stress response                -0.734039	-0.258823	-0.135069	 0.127290
YIR032C  ureidoglycolate hydrolase              -0.287924	 0.701009	 0.464117	-0.160077
YHR053C	 metallothionein                        -0.263116	 0.780098	-0.363840	-0.396216

Example of a gene expression file during 4 time points. The second column holds functional annotation of the genes.

GQL also use tab separated files for files containing partial labels. Now, the files have only two colunms, the first containing the gene id and the second containing a numerical label (from 1 to n).



YHR124W  1
YGR072W  1
YGR145W  1
YIR031C  2 
YJR010W	 2 
YMR172W	 2 
YIR032C  2
YHR053C	 2

Release Notes

Version 1.0:
We had this version in heavy use for the last months. There are still some missing feature, and bugs.