A simple hyper-geometric approach
for discovering putative transcription factor binding sites
Y. Barash, G. Bejerano and N. Friedman
In Algorithms in
Bioinformatics: Proc. First International Workshop.
Postscript version
PDF version.
Abstract
A central issue in molecular biology is understanding the regulatory
mechanisms that control gene expression. The recent flood of genomic
and post-genomic data
opens the way for computational methods elucidating the key
components that play a role in these mechanisms. One important
consequence is the ability to recognize groups of genes that are
co-expressed using microarray expression data. We then wish to identify
in-silico putative transcription factor binding sites in the promoter
regions of these gene, that might explain the co-regulation, and hint at
possible regulators.
In this paper we describe a simple and fast, yet powerful,
two stages approach to this task.
Using a rigorous hyper-geometric statistical analysis and
a straightforward computational procedure we find small conserved
sequence kernels.
These are then stochastically expanded into PSSMs using an EM-like procedure.
We demonstrate the utility and speed of our methods by applying them
to several data sets from recent literature. We also compare these
results with those of MEME when run on the same sets.
Back to Nir's publications page
nir@cs.huji.ac.il