Discovering Conserved DNA Motifs in Upstream Regulatory Regions of Co-Expressed Genes
Xiaole Liu, Jun S. Liu, Douglas L. Brutlag
Stanford Medical Informatics, Stanford University
The development of high throughput genome sequencing and gene expression techniques gives rise to the demand for data-mining tools. BioProspector, a C program using a Gibbs sampling strategy, examines the upstream region of genes in the same gene expression pattern group and looks for regulatory sequence motifs. BioProspector uses Markov background to model the base dependencies of non-motif bases, which greatly improved the specificity of the reported motifs. The parameters of the Markov background model are either estimated from user-specified sequences or pre-computed from the whole genome sequences. A new motif scoring function is adopted to allow each input sequences to contain zero to multiple copies of the motif. In addition, BioProspector can model gapped motifs and motifs with palindromic patterns, which are prevalent motif patterns in prokaryotes. All these modifications greatly improve the performance of the program. Besides showing preliminary success in finding the binding motifs for S. cerevisiae RAP1, B. subtilis RNA polymerase, and E. coli CRP, we have used BioProspector to find s54 motif from M. xanthus genome, many B. subtilis motifs from DBTBS collection of promoters, and motifs from yeast expression data.
BioProspector requires the user to specify a motif width. Recently, JS Liu and his student have developed an algorithm BioOptimizer to automatically adjust a user-specified motif width to optimize the motif's information. The program can be downloaded from: http://www.people.fas.harvard.edu/~junliu/BioOptimizer/.
Obtaining a local copy of BioProspector:
BioProspector is free-of-charge to academia. Please check out:
Brutlag Bioinformatics Group Software Download and
Academic License Instructions for details.
Liu X, Brutlag DL, Liu JS. BioProspector: discovering conserved DNA motifs in upstream regulatory regions of co-expressed genes. Pac Symp Biocomput. 2001;:127-38.