Publications
Papers
2008
Deshpande
O, Batzoglou S, Feldman M, Cavalli-Sforza L. A serial founder effect model for human settlements out of Africa. Proceedings of the Royal Society B, in
press.
Valouev
A, Johnson DS, Sundquist A, Medina C, Elisabeth A, Batzoglou S, Myers RM, Sidow
A. Genome-wise analysis of transcription
factor binding sites based on ChIP-Seq data. Nature Methods, in press.
Do
CB, Foo Chuan-Sheng, Batzoglou S. A max-margin model for efficient
simultaneous alignment and folding of RNA sequences. Accepted in ISMB 2008.
Do
CB, Batzoglou S. What is the EM algorithm? Nature Biotechnology, in press.
Sundquist
A, Fratkin E, Do CB, Batzoglou S. Effect
of genetic divergence in identifying ancestral origin using HAPAA. Genome Research 18:676-682,2008.
Proceedings of the Twelfth Annual International Conference on Computational
Molecular Biology, (RECOMB 2008), pp. 423.
Flannick J, Novak A, Do CB, Srinivasan BS, Batzoglou S. Automatic parameter learning for multiple network alignment. Proceedings of the Twelfth Annual International Conference on Computational Molecular Biology, (RECOMB 2008), pp. 214-231, 2008.
2007
Sundquist A, Bigdeli S, Jalili R, El-Sayed YY, Taslimi MM,
Druzin ML, Waller S, Pullen KM, Batzoglou S, Ronaghi M. Bacterial
flora typing with deep, targeted, chip-based Pyrosequencing. BMC Microbiology, 7:108, 2007.
Gross
SS, Do CB, Sirota M, Batzoglou S. A
discriminative, phylogeny-free approach to multiple informant de novo gene
prediction. Genome
Biology, 8:R269, 2007.
Drosophila
Comparative Genome Sequencing and Analysis Consortium. Evolution of genes
and genomes in the context of the Drosophila phylogeny. Nature 450: 203-218, 2007.
Srinivasan BS, Shah NH, Flannick JA, Abeliuk E, Novak AF, Batzoglou S. Current progress in network research: toward reference networks for key model organisms. Briefings in Bioinformatics 8(5): 318-32, 2007.
The
ENCODE Project Consortium. Identification and analysis of functional
elements in 1% of the human genome by the ENCODE pilot project. Nature 447: 799-816, 2007.
Sundquist
A, Ronaghi M, Tang H, Pevzner P, Batzoglou S. Whole-genome
sequencing and assembly with high-throughput short-read technologies. PLOS One, 2(5): e484.
Margulies E, Cooper GM, Asimenos G, Thomas DJ, Dewey CN,
Siepel A, Birney E, Keefe D, Schwartz AS, Hou M, Taylor J, Nikolaev S,
Montoya-Burgos JI, Lytynoja A, Whelan S, Pardi F, Massingham T, Brown JB,
Bieckl P, Holmes I, Mullikin JC, Ureta-Vidal A, Paten B, Schuler G, Church D,
Rosenbloom KR, Kent WJ, NISC Comparative Sequencing Program, Baylor College of
Medicine Human Genome Sequencing Center, Washington University Genome
Sequencing Center, Broad Institute, UCSC Genome Browser Team, Antonarakis SE,
Batzoglou S, Goldman N, Hardison R, Haussler D, Miller W, Pachter L, Green ED,
Sidow A. Analyses of deep mammalian sequence alignments and constraint
predictions for 1% of the human genome. Genome Research 17 (6): 760-774, 2007.
Gross
SS, Russakovsky O, Do CB, Batzoglou S. Training
conditional random fields for maximum parse accuracy. NIPS 2006.
2006
Phuong
TM, Do CB, Edgar RC, Batzoglou S. Multiple
alignment of protein sequences with repeats and rearrangements. Nucleic Acids Research 34(20):
5932-5942, 2006.
Naughton
B, Fratkin E, Batzoglou S, Brutlag DL. A
graph-based motif detection algorithm models complex nucleotide dependencies in
transcription factor binding sites. Nucleic Acids Research 34(20): 5730-5739, 2006.
Srinivasan
BS, Do CB, Batzoglou S. RECOMB
2006: Evidence for Intelligent (Algorithm) Design. Genome Biology, 7:322, 2006.
Flannick
J, Novak A, Srinivasan BS, McAdams HH, Batzoglou S. Graemlin:
General and Robust Alignment of Multiple Large Interaction Networks.
Genome Research, 16:1169
Davydov
E, Batzoglou S. A computational model for RNA multiple structural
alignment. Theoretical
Computer Science, Special Issue on Combinatorial Pattern Matching, in press.
Do
CB, Woods DA, Batzoglou S. CONTRAfold:
RNA Secondary Structure Prediction without Physics-Based Models. ISMB 2006 Conference Proceedings,
Bioinformatics 22:e90
Fratkin
E, Naughton B, Brutlag DL, Batzoglou S. MotifCut:
Finding Regulatory Motifs with Maximum Density Subgraphs. ISMB 2006 Conference Proceedings,
Bioinformatics 22: e150
Edgar
RC, Batzoglou S. Multiple
Sequence Alignment. Current
Opinion in Structural Biology, 16: 368
Do
CB, Gross SS, Batzoglou S. CONTRAlign:
Discriminative Training for Protein Sequence Alignment. Proceedings
of the Tenth Annual International Conference on Computational Molecular
Biology, (RECOMB 2006), pp. 160
Srinivasan
BS, Novak A, Flannick J, Batzoglou S, McAdams H. Integrated
Protein Interaction Networks for 11 Microbes. Proceedings of
the Tenth Annual International Conference on Computational Molecular Biology,
(RECOMB 2006), pp. 1
2005
Galagan
JE, Calvo SE, Cuomo C, Ma L-J, Wortman J, Batzoglou S, Lee S-I, Başt rkmen
M, Spevak CC, Clutterbuck J, Kapitonov V, Jurka J, Scazzocchio C, Farman M,
Butler J, Purcell S, Harris S, Braus GH, Draht O, Busch S, D Enfert C, Bouchier
C, Goldman GH, Bell-Pedersen D, Griffiths-Jones S, Doonan JH, Yu J, Vienken K,
Pain A, Freitag M, Selker EU, Archer DB, Pe alva MA, Oakley BR, Momany M,
Tanaka T, Kumagai T, Asai K, Machida M, Nierman WC, Denning DW, Caddick M,
Hynes M, Paoletti M, Fischer R, Miller B, Dyer P, Sachs MS, Osmani SA, Birren
B. Sequencing
of Aspergillus nidulans and comparative
analysis with A.
fumigatus and A.
oryzae. Nature
438:1105
Flannick
J, Batzoglou S. Using
multiple alignments to improve seeded local alignment algorithms. Nucleic
Acids Research 33(14): 4563
Manohar
A, Batzoglou S. TreeRefiner:
a tool for refining a multiple alignment on a phylogenetic tree.
Proceedings of the CSB 2005, pp 111
Cooper
GM, Stone EA, Asimenos G, NISC Comparative Sequencing Program, Green ED,
Batzoglou S, Sidow A. Distribution and intensity of
constraint in mammalian genomic sequence. Genome Research 15: 901
Do
CB, Mahabhashyam MS, Brudno M, Batzoglou S.
Batzoglou
S. The
many faces of sequence alignment. Briefings in Bioinformatics 1:
6
Batzoglou
S. Algorithmic
Challenges in Mammalian Genome Sequence Assembly. Special Review,
In: Dunn M, Jorde L, Little P, Subramaniam S, editors. Encyclopedia of
genomics, proteomics and bioinformatics. Hoboken
(New Jersey): John Wiley and Sons, 2005.
2004
Liu Y, Wei
L, Batzoglou S, Brutlag DL, Liu JS, Liu S. A suit of
web-based programs to search for transcription regulatory motifs. Nucleic
Acids Research 32: W204
Davydov E, Batzoglou
Brudno M,
Poliakov A, Salamov A, Cooper GM, Sidow A, Rubin EM, Solovyev V, Batzoglou S,
Dubchak I. Automated
whole-genome multiple alignment of Rat, Mouse, and Human. Genome Research 14: 685 692, 2004.
Cooper GM,
Brudno M, Stone ES, Dubchak I, Batzoglou S, Sidow A. Characterization of
evolutionary rates and constraints in three mammalian genomes. Genome Research 14: 539 548, 2004.
Liu Y, Liu
XS, Wei L, Altman RB, Batzoglou S. Eukaryotic regulatory element
conservation and identification using comparative genomics. Genome Research, 14: 451
2003
Lee S-I,
Batzoglou S. ICA-based
clustering of genes from microarray expression data. In Advances in Neural Information
Processing Systems (NIPS),
Lee S-I,
Batzoglou S. Application
of independent component analysis to microarrays. Genome Biology 2003, 4:R76.
Shan N,
Couronne O, Pennacchio LA, Brudno M, Batzoglou S, Joy S, Bethel W, Rubin EM,
Hamann B, Dubchak
Brudno M,
Chapman MA, Gottgens B, Batzoglou S, Morgenstern B. Fast and sensitive
multiple alignment of large genomic sequences. BMC Bioinformatics 4:66, 2003.
Brudno M,
Malde S, Poliakov A, Do C, Couronne O, Dubchak I, Batzoglou S. Glocal
alignment: finding rearrangements during alignment. Special Issue on the Proceedings
of the ISMB 2003, Bioinformatics 19: 54i 62i,
2003.
Taher L, Rinner O, Garg S, Sczyrba A, Brudno M, Batzoglou M, Morgenstern B. AgenDA: homology-based gene prediction. Bioinformatics, 19:1575 1577, 2003.
Khambata-Ford
S, Liu Y, Gleason C, Dickson M, Altman RB, Batzoglou S, Myers RM. Identification
of promoter regions in the human genome by using a retroviral plasmid
library-based functional reporter gene assay. Genome Research 13:1765 1774,
2003.
Cooper GM, Brudno M, NISC
Comparative Sequencing Program, Green ED, Batzoglou S, Sidow A. Quantitative
estimates of sequence divergence for comparative analyses of mammalian genomes.
Genome Research 13:813 820, 2003.
Brudno M, Do C, Cooper GM, Kim MF, Davydov E, NISC Comparative Sequencing
Program, Green ED, Sidow A, Batzoglou S. LAGAN and
Multi-LAGAN: Efficient tools for large-scale multiple alignment of genomic DNA.
Genome Research 13: 721 731, 2003.
2002 and before
Batzoglou
S, Jaffe D, Stanley K, Butler J, Gnerre S, Mauceli E, Berger B, Mesirov JP,
Lander ES. ARACHNE:
A whole genome shotgun assembler. Genome Research
12:177 189, 2002.
Lander
ES et al. Initial
sequencing and analysis of the human genome.
Nature 409:860 921, 2001.
Batzoglou S, Pachter L, Mesirov JP,
Berger B,. Lander ES. Human and mouse gene structure:
comparative analysis and application to exon prediction. Genome
Research 10:950 958, 2000.
Batzoglou S, Pachter L, Mesirov JP, Berger B, Lander ES. Human and mouse gene structure:
comparative analysis and application to exon prediction. Proceedings
of the Fourth Annual International Conference on Computational Molecular
Biology, (RECOMB 2000) p.46 53.
Batzoglou S, Mesirov JP, Berger B, Lander ES. Sequencing
a genome by walking with clone-ends: A mathematical analysis.
Proceedings of the Fourth Annual International Conference on
Computational Molecular Biology, (RECOMB 2000) p.45.
Istrail S, Hurd A, Lippert RA, Walenz B, Batzoglou S, Conway JH, Peyerl FW. Prediction
of Self-Assembly of Energetic Tiles and Dominos: Experiments, Mathematics and
Software. Sandia Labs Technical Report, March 2000.
Batzoglou
S, Mesirov JP, Berger B, Lander ES. Sequencing
a genome by walking with clone-ends: A mathematical analysis.
Genome Research 9:1163 1174, 1999.
Pachter L, Batzoglou S, Spitkovsky
VI, Banks E, Lander ES, Kleitman DJ, Berger B. A dictionary
based approach to gene annotation. Journal of
Computational Biology 6:419 430, 1999.
Pachter L, Batzoglou S, Spitkovsky VI, Beebee W, Lander ES, Berger B, Kleitman
DJ. A dictionary based approach to gene
annotation. Proceedings of the Third Annual
International Conference on Computational Molecular Biology, (RECOMB 1999),
285 294.
Batzoglou S, Istrail S. Physical
mapping with repeated probes: The hypergraph superstring problem.
Lecture Notes in Computer Science 1645:66 ff, 1999,
Special Issue on Combinatorial Pattern Matching 1999.
Batzoglou S, Berger B, Kleitman DJ, Lander ES, Pachter L. Recent
developments in computational gene recognition. Documenta
Mathematica, Extra Volume ICM I, 649 658, 1998.
Agarwala R, Batzoglou S, Dancik V, Decatur SE, Hannenhalli S, Farach M,
Muthukrishnan M, Skiena S. Local
rules for protein folding on a triangular lattice and generalized
hydrophobicity in the HP model. Journal of Computational
Biology 4: 275 296, 1997.
Agarwala R, Batzoglou S, Dancik V, Decatur SE, Hannenhalli S, Farach M,
Muthukrishnan S, Skiena S. Local
rules for protein folding on a triangular lattice and generalized
hydrophobicity in the HP model. Proceedings of the
8th Annual ACM-SIAM Symposium on Discrete Algorithms, (SODA 97)
390 399.
Agarwala R, Batzoglou S, Dancik V, Decatur SE, Hannenhalli S, Farach M,
Muthukrishnan S, Skiena S. Local rules for protein folding on a
triangular lattice and generalized hydrophobicity in the HP model.
Proceedings of the First Annual International Conference on
Computational Molecular Biology, (RECOMB 1997) 1 2.
Decatur S, Batzoglou S. Protein
folding in the hydrophobic-polar model on the 3D triangular lattice. Proceedings
of the 6th Annual MIT Student Workshop on Computing Technology,
1997.
Abstracts of Conference Talks
Novak AF,
Flannick JA, Srinivasan B, McAdams HH, Batzoglou S. NUKE: fast and
scalable multiple alignment of protein interaction networks. CSHL Conference Genome Informatics,
October 28-November 1, 2005, Cold Spring Harbor Laboratory, Cold Spring Harbor,
New York, 2005.
Srinivasan B,
Novak AF, Flannick JA, Batzoglou S, McAdams HH. Integrated protein
interaction networks for 230 microbes. In BCATS 2005 Symposium Proceedings, p. 26, 2005.
Fratkin E, Naughton B, Brutlag D, Batzoglou S. Motif finding in DNA sequences using maximum density subgraphs. 2nd Moscow Conference in Computational Molecular Biology (MCCMB), 2005.
Asimenos G,
Cooper GM, Holbert D, Sidow A, Batzoglou S. A reference mammalian
whole-genome alignment. The Biology of Genomes, CSHL, May 2005.
Dubchak I,
Brudno M, Poliakov A, Kislyuk A, Sundararajan M, Batzoglou S. Glocal
(global/local) alignment methods for comparison of DNA sequences and whole
genome assemblies utilized in
Liu Y, Batzoglou
S, Kim SK. Global identification of Caenorhabditis elegans regulatory
motifs. In BCATS 2004 Symposium Proceedings, p. 26, 2004.
Do CB, Brudno M,
Batzoglou S. ProbCons: Probabilistic consistency-based multiple alignment
of amino acid sequences. Intelligent Systems in Molecular Biology (ISMB)
2004. Best
Paper Award.
Do CB, Brudno M,
Batzoglou S. Probabilistic consistency-based multiple alignment of
proteins. In BCATS 2003 Symposium Proceedings, 2003.
Liu Y, Liu XS,
Stuart JM, Kim SK, Batzoglou S. Predicting the Activity of
Transcription Factor Binding Motifs. In BCATS 2003 Symposium
Proceedings, 2003.
Brudno M, Malde
S, Poliakov A, Do CB, Couronne O, Dubchak I, Batzoglou S. Glocal
alignment: finding rearrangements during alignment. Joint
CSHL/Wellcome Trust Conference Genome Informatics May 7 11
Brudno M,
Poliakov A, Couronne O, Do CB, Batzoglou S, Dubchak I. Multiple alignment
of whole genomes: a pipeline approach. Joint CSHL/Wellcome Trust
Conference Genome Informatics May 7 11
Brudno M, Do CB,
Cooper GM, Kim M, Davydov E. NISC Comparative Sequencing Program, Green ED,
Sidow A, Batzoglou S. Multiple genomic sequence alignment. Advances
in Genome Biology and Technology (AGBT)
Brudno M, Do CB,
Kim M, Batzoglou S. Multiple genomic sequence alignment. In BCATS
2002 Symposium Proceedings, p.23, 2002.
Brudno M, Kim M,
Batzoglou S. Multiple alignment of genomic sequences. Joint
CSHL/Wellcome Trust Conference Genome Informatics September 4 8 p.6.
Wellcome Trust Genome Campus,
Couronne O, Bray
N, Khatib F, Dubchak I, Batzoglou S, Pachter L. Comparative whole genome
shotgun assembly. Abstracts of papers presented at the Joint
CSHL/Wellcome Trust Conference Genome Sequencing and Biology May 7 11 p.63,
p.123.
Jaffe D, Batzoglou
B, Stanley K,
Batzoglou S,
Jaffe D, Stanley K, Berger B, Mesirov JP, Lander ES. ARACHNE: A whole
genome shotgun assembler. Abstracts of papers presented at the 2001
meeting on Genome Sequencing and Biology May 9 13 p.199.
Batzoglou S,
Jaffe D, Stanley K, Berger B, Mesirov JP, Lander ES. ARACHNE: A
whole-genome shotgun assembler. Advances in Genome Biology and
Technology (AGBT)
Batzoglou S.,
Istrail S. Physical mapping with repeated probes: the hypergraph
superstring problem. In Ninth
Conference
Posters
Gross SS,
Russakovsky O, Do CB, Batzoglou S. Training Conditional Random
Fields for Maximum Labelwise Accuracy. NIPS 2006.
Gross SS, Do
CB, Batzoglou S. De novo gene prediction using a semi-Markov conditional
random field. RECOMB 2006.
Do CB, Woods
DA, Batzoglou S. CONTRAfold: RNA secondary structure prediction without
physics-based models. RECOMB 2006.
Flannick J,
Batzoglou S. Using
multiple alignments to improve seeded local alignment algorithms. CSHL Conference Genome Informatics,
October 28-November 1, 2005, Cold Spring Harbor Laboratory, Cold Spring Harbor,
New York, 2005.
Srinivasan B,
Novak AF, Flannick JA, Batzoglou S, McAdams HH. Integrated protein
interaction networks for 230 microbes. CSHL Conference Genome Informatics, October 28-November 1,
2005, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, 2005. Best
Poster Award.
Sundquist A,
Ronaghi M, Tang H, Pevzner P, Batzoglou S. A strategy
for whole-genome sequencing and assembly with high-throughput, short-read
technologies. CSHL
Conference Genome Informatics, October 28-November 1, 2005, Cold Spring Harbor
Laboratory, Cold Spring Harbor, New York, 2005.
Do CB, Gross SS, Edgar RC, Batzoglou S. CONTRAlign: a discriminative framework for protein sequence alignment. In BCATS 2005 Symposium Proceedings, p. 45, 2005. Best Poster Award.
Gross SS, Do
CB, Batzoglou S. CONTRAST: de novo gene prediction using a semi-Markov
conditional random field. In BCATS 2005 Symposium Proceedings, p.
82, 2005.
Sundquist A,
Ronaghi M, Tang H, Pevzner P, Batzoglou S. A strategy for whole-genome
sequencing and assembly with high-throughput, short-read technologies. In BCATS
2005 Symposium Proceedings, p. 38, 2005.
Cooper GM,
Stone EA, Asimenos G, NISC Comparative Sequencing Program, Green ED, Batzoglou
S, Sidow A. Characterization of the effects of purifying selection in a
sample of the human genome. The Biology of Genomes, CSHL, May 2005.
Davydov E,
Batzoglou S. A computational model for RNA multiple structural alignment.
In BCATS 2004 Symposium Proceedings, p. 42, 2004.
Brudno M,
Sundararajan M, Poliakov A, Kislyuk A, Dubchak I, Batzoglou S.
Constructing synteny maps for whole-genome alignments. Joint
CSHL/Wellcome Trust Conference Genome Informatics September 22 26, 2004.
Do CB, Brudno
M, Batzoglou S. LAGAN2: Probabilistic global alignment of DNA under
multiple conservation models. Intelligent Systems in Molecular Biology
(ISMB) 2003.
Lee S,
Batzoglou S. Discovering biological processes from microarray data using
independent component analysis. Joint CSHL/Wellcome Trust
Conference Genome Informatics May 7 11
Do C, Brudno M,
Batzoglou S. The draft problem: genomic sequence alignment reconsidered.
In BCATS 2002 Symposium Proceedings, p.41, 2002.
Taher L, Rinner
O, Garg S, Brudno M, Batzoglou S, Morgenstern B. AGenDA: A WWW server for
gene recognition by comparative sequence analysis. In Thomas Lengauer,
Hans-Peter Lenhof, Ruth Christmann (editors) European Conference on
Computational Biology 2002, Poster Abstracts pp. 236 238, 2002.
Vinson J, Jaffe
D, Stange-Thomann N, Galagan J, Batzoglou S, Nusbaum C, Birren B, Zody M,
Mesirov J, Sidow A, Lander ES. Highly polymorphic genomes: a challenge
for assembly and an opportunity for comparative genomics. Joint
CSHL/Wellcome Trust Conference Genome Informatics September 4 8 p.86.
Wellcome Trust Genome Campus,
Vinson J, Jaffe
D, Sidow A, Batzoglou S, Butler J, Nusbaum C, Birren B, Stange-Thomann N, Zody
M, Mesirov J, and Lander ES. Assembly of the highly polymorphic genome of
Ciona savigny. Abstracts of papers presented at the Joint
CSHL/Wellcome Trust Conference Genome Sequencing and Biology May 7 11
p.291.
Nusbaum C,
Endrizzi M, Calvo S, Foley K, Stange-Thomann N, Sachs M, Kinsey J, Staben C,
Jaffe D, Batzoglou S, Galagan J, and Birren B. Sequencing the Neurospora
Genome. Abstracts of papers presented at the 2001 meeting on Genome
Sequencing and Biology May 9 13 p.182.
Other
Manuscripts
PhD THESIS. Serafim Batzoglou. Computational Genomics: Mapping,
Comparison, and Annotation of Genomes. Ph.D. Dissertation, Department of
Electrical Engineering and Computer Science, MIT, June 2000. Here
is my thesis is pdf format. Please note that I had to reformat it a little
(merge files from the time where PCs had only 64 Mb RAM and I had to split a
Word document into many files...)
Computational
Genomics: Mapping, Comparison, and Annotation of Genomes.
Area Exam. Serafim
Batzoglou. DNA
Computing and Molecular Self-Assembly. Area examination,
Department of Electrical Engineering and Computer Science, MIT, December
1999. Note: I seem to have lost the figures that show the NP-completeness
gadgets.
Solutions Manual. Serafim Batzoglou
and Victor Boyko. Discrete Algorithms: A solutions manual. Morgan Kauffman,
1998.