The Graphical Query Language:
A GHMM-based tool for querying and clustering Gene-Expression time-course data

(c) 2003-2004 Alexander Schliep (c) 2004-2005 Ivan Costa

Documentation

The papers bellows describes the methods implemented in GQL.

A. Schliep, A. Schönhuth, C. Steinhoff. Using Hidden Markov Models to Analyze Gene Expression Time Course Data. Proceedings of the ISMB 2003. Bioinformatics. 2003 Jul; 19 Suppl 1: I255-I263
A. Schliep, C. Steinhoff, A. Schönhuth.Robust inference of groups in gene expression time-courses using mixtures of HMM. Proceedings of the ISMB 2004. Bioinformatics, Aug 2004; 20 Suppl 1: I283 - I289.
A. Schliep, I. G. Costa, C. Steinhoff, A. Schönhuth. Analysing gene expression time-courses , IEEE Transactions on Computational Biology and Bioinformatics, to appear.
I. G. Costa, A. Schönhuth, A. Schliep. The Graphical Query Language: a tool for analysis of gene expression time-courses , Bioinformatics, 2005, 21(10):2544-2545.

File formats:

Both tools supports GHMM file formats for input data and model descriptions (see GHMM). It also reads input files in standard tab separated files, as the ones used by most of gene expression analysis tools. In this format, each line represents a gene and the columns the measured time points. The first column holds the gene identifiers and the second column any type of annotation of the genes. Missing values should be decoded as either 'Nan' or by not placing any character at the position. Sample files of all formats are provided in examples.



YHR124W	 meiosis                                -0.377685	-0.427071	-0.479749	 0.175438
YGR072W	 mRNA decay, nonsense-mediated unknown  -0.067600	-0.664033	-0.412644	 0.090134
YGR145W	 unknown                       	         0.266238	-0.854138	-0.103595	 0.371387
YIR031C  allantoin utilization                  -0.017010	 0.650807	 0.461851	-0.146432
YJR010W	 methionine biosynthesis                      NaN	 0.847968	 0.078140	-0.137952
YMR172W	 osmotic stress response                -0.734039	-0.258823	-0.135069	 0.127290
YIR032C  ureidoglycolate hydrolase              -0.287924	 0.701009	 0.464117	-0.160077
YHR053C	 metallothionein                        -0.263116	 0.780098	-0.363840	-0.396216

Example of a gene expression file during 4 time points. The second column holds functional annotation of the genes.

GQL also use tab separated files for files containing partial labels. Now, the files have only two colunms, the first containing the gene id and the second containing a numerical label (from 1 to n).



YHR124W  1
YGR072W  1
YGR145W  1
YIR031C  2 
YJR010W	 2 
YMR172W	 2 
YIR032C  2
YHR053C	 2

Release Notes

Version 1.0:
We had this version in heavy use for the last months. There are still some missing feature, and bugs.