The Graphical Query Language: |
GQL is a suite of tools for analyizing time-course experiments. Currently, it is adapted to gene expression data. The two main tools are GQLQuery, for querying data sets, and GQLCluster, which provides a way for computing groupings based on a number of methods (model-based clustering using HMMs as cluster models and estimation of a mixture of HMMs).
GQL is freely available under the GPL. The first public release of GQL is available in source form and binaries form Windows or Mac.
Note that downloading and using GQL implies acceptance of the GPL. GQL is not freeware, nor public domain and the copyright will be enforced. As the GPL has strong consequences for any work derived from GQL, commercial entities can inquire about non-exclusive licenses.
If you use GQL in your research, please do cite the aforementioned paper(s).
The GUI has been ported to Python using Tkinter and the brand-new Python bindings for GHMM. It runs on all Linux/Unix boxes. Executable binaries for MAC and Windows are provided.
You will need to install the following packages before you install GHMM. The version 1.0 of GQL will only work with the most recent version of GHMM.
You will need a Mac OS 10.3 or latter version.
You will need Windows 98 or latter version.
The papers bellows describes the methods implemented in GQL.
A. Schliep, A. Schönhuth, C. Steinhoff. Using Hidden Markov Models to Analyze Gene Expression Time Course Data. Proceedings of the ISMB 2003. Bioinformatics. 2003 Jul; 19 Suppl 1: I255-I263
A. Schliep, C. Steinhoff, A. Schönhuth.Robust inference of groups in gene expression time-courses using mixtures of HMM. Proceedings of the ISMB 2004. Bioinformatics, Aug 2004; 20 Suppl 1: I283 - I289.
A. Schliep, I. G. Costa, C. Steinhoff, A. Schönhuth. Analysing gene expression time-courses , IEEE Transactions on Computational Biology and Bioinformatics, to appear.
I. G. Costa, A. Schönhuth, A. Schliep. The Graphical Query Language: a tool for analysis of gene expression time-courses , Bioinformatics, 2005, 21(10):2544-2545.
Both tools supports GHMM file formats for input data and model descriptions (see GHMM). It also reads input files in standard tab separated files, as the ones used by most of gene expression analysis tools. In this format, each line represents a gene and the columns the measured time points. The first column holds the gene identifiers and the second column any type of annotation of the genes. Missing values should be decoded as either 'Nan' or by not placing any character at the position. Sample files of all formats are provided in examples.
YHR124W meiosis -0.377685 -0.427071 -0.479749 0.175438 YGR072W mRNA decay, nonsense-mediated unknown -0.067600 -0.664033 -0.412644 0.090134 YGR145W unknown 0.266238 -0.854138 -0.103595 0.371387 YIR031C allantoin utilization -0.017010 0.650807 0.461851 -0.146432 YJR010W methionine biosynthesis NaN 0.847968 0.078140 -0.137952 YMR172W osmotic stress response -0.734039 -0.258823 -0.135069 0.127290 YIR032C ureidoglycolate hydrolase -0.287924 0.701009 0.464117 -0.160077 YHR053C metallothionein -0.263116 0.780098 -0.363840 -0.396216
Example of a gene expression file during 4 time points. The second column holds functional annotation of the genes.
GQL also use tab separated files for files containing partial labels. Now, the files have only two colunms, the first containing the gene id and the second containing a numerical label (from 1 to n).
YHR124W 1 YGR072W 1 YGR145W 1 YIR031C 2 YJR010W 2 YMR172W 2 YIR032C 2 YHR053C 2
Version 1.0:
We had this version in heavy use for
the last months. There are still some missing feature, and bugs.