From - Mon Jan 12 23:31:50 1998 Received: from CS.Stanford.EDU (CS.Stanford.EDU [171.64.64.64]) by robotics.Stanford.EDU (8.8.7/8.8.7) with ESMTP id BAA18160 for ; Tue, 16 Dec 1997 01:41:13 -0800 (PST) Received: from listserv.nodak.edu (listserv.NoDak.edu [134.129.111.8]) by CS.Stanford.EDU (8.8.8/8.8.8) with ESMTP id BAA28335; Tue, 16 Dec 1997 01:42:03 -0800 (PST) Received: from listserv (134.129.111.8) by listserv.nodak.edu (LSMTP for Windows NT v1.1a) with SMTP id <0.372DD830@listserv.nodak.edu>; Tue, 16 Dec 1997 3:39:30 -0600 Received: from LISTSERV.NODAK.EDU by LISTSERV.NODAK.EDU (LISTSERV-TCP/IP release 1.8c) with spool id 529541 for THEORYNT@LISTSERV.NODAK.EDU; Tue, 16 Dec 1997 03:39:26 -0600 Received: from listserv (134.129.111.8) by listserv.nodak.edu (LSMTP for Windows NT v1.1a) with SMTP id <0.347A7800@listserv.nodak.edu>; Tue, 16 Dec 1997 3:39:25 -0600 Received: from LISTSERV.NODAK.EDU by LISTSERV.NODAK.EDU (LISTSERV-TCP/IP release 1.8c) with spool id 529528 for THEORY-A@LISTSERV.NODAK.EDU; Tue, 16 Dec 1997 03:39:24 -0600 Received: from pollux.usc.edu by listserv.nodak.edu (LSMTP for Windows NT v1.1a) with SMTP id <0.32B56F70@listserv.nodak.edu>; Tue, 16 Dec 1997 3:39:22 -0600 Received: (from ierardi@localhost) by pollux.usc.edu (8.8.8/8.8.8/usc) id BAA29787 for theory-a@listserv.nodak.edu; Tue, 16 Dec 1997 01:39:19 -0800 (PST) Approved-By: Doug Ierardi Approved-By: Theory-A - TheoryNet World-Wide Events Message-ID: <199712021210.HAA22770@central.cis.upenn.edu> Date: Tue, 16 Dec 1997 01:39:19 PST Reply-To: Theory-A - TheoryNet World-Wide Events , Tandy Warnow Sender: TheoryNet List From: Tandy Warnow Subject: Symposium on "Big Tree Reconstruction" Comments: To: THEORY-A@LISTSERV.NODAK.EDU To: THEORYNT@LISTSERV.NODAK.EDU Status: O X-Status: SYMPOSIUM ANNOUNCEMENT AND CALL FOR PAPERS Estimating Large Scale Phylogenies: Biological, Statistical, and Algorithmic Problems SPONSORS: the University of Pennsylvania Program in Computational Biology and DIMACS LOCATION: Princeton University DATE: June 26-28, 1998 FORMAT: Paper presentations and posters. All papers for oral presentation must be submitted in full and they will be peer reviewed. REGISTRATION FEE: None, but please send mail expressing interest in attending; hotel information will be sent out in a later mailing and reservations for hotel rooms should be made in advance to ensure desirable accomodations. STUDENT SUPPORT: limited financial assistance is available for students and postdocs who wish to attend the symposium. Please send requests for such support to Tandy Warnow, tandy@central.cis.upenn.edu, by April 15. Announcements of awarded support (conditional upon submission of an poster abstract) will be made by May 1. PAPER SUBMISSION DEADLINE: April 15, 1998. Please submit papers by mail or email (ps file/MS Word file only) to: Junhyong Kim Dept. of Biology Yale University 165 Prospect st. New Haven, CT 06511 (203)-432-9917 (203)-432-3854 (fax) junhyong_kim@quickmail.yale.edu Co-organizers: Junhyong Kim (Yale University), Tandy Warnow (University of Pennsylvania), and Ken Rice (SmithKline Beecham) INTRODUCTION Biological organization is fundamentally based on an evolutionary history of bifurcating descent-with-modification. Phylogenetic estimation is the inference of this genealogical history from present day data. Phylogenetic trees, the graph representation of the genealogical history, play a central role in evolutionary biology and phylogenetic estimation techniques are being applied to a wide variety of computational biology problems. The size of a phylogenetic estimation problem is measured by the number of taxa and the number of characters. Until recently, computational and data limitations kept most phylogenetic estimation problems to small numbers of taxa. But, the availability of computational resources and the influx of large molecular data sets are enabling researchers to tackle increasingly larger problems, and the analysis of large-scale data sets is rapidly becoming a central problem in phylogenetic biology. Recent experimental evidence has established the existence of large trees that can be estimated accurately as well as those that are difficult to accurately estimate with reasonable numbers of characters. Some of these examples have suggested that taxon sampling (increasing the size of the estimation problem through the addition of taxa rather than characters) might lead to more easily estimated trees. Conversely, it has been argued that big trees are hard to infer for a variety of reasons: NP-hardness of the optimization problems, properties of the search space, inadequacy of the heuristics, and even possible inadequacy of the optimization criteria. Unfortunately, very little actual evidence is available to support any conjectures about how the performance of estimators scale with respect to the size of the phylogenetic problem. In addition, the question of scaling is itself confused by poorly delineated notions. For example, the size of the tree also involves the maximum amount of divergence (not only the number of taxa and characters) and measures of estimator performance have also not been standardly agreed upon. The goal of this symposium is to precisely identify the key problems with respect to how the performance of phylogenetic estimators scale as with the size of the problem, and gather experimental and theoretical results addressing this problem. FORMAT The symposium will consist of four topic sessions with paper presentations followed by a panel discussion of invited experts. The four topics and some of the questions to be addressed in each session are: Biological problems 1. What are the limits to sampling characters and taxa? 2. What are examples of very difficult problems? 3. What are the reasonable models of character evolution and tree shape? 4. What are the most important problems in systematics? 5. What can we say about evolutionary history from data other than rows and columns of homologous characters? Empirical results 1. What do simulation studies tell us about performance of different methods and how they scale with the size of the problem? 2. What properties of the tree models affect accuracy and how do those properties scale? 3. Are there any methodological biases? 4. What can we say about performance under more realistic models of sequence evolution from the existing studies? 5. Is there a need to standardize experimental studies, perhaps through the establishment of a testbed of different model trees, methods, etc? Algorithmic problems 1. What is the relationship between standard optimization problems (distance-based criteria, parsimony, etc) and estimating the topology of evolutionary trees? Which of the standard optimization criteria are best suited to obtaining highly accurate topology estimations, given bounds on the available sequence length? 2. How much of the difficulty is due to inadequate solution to the right NP-hard optimization problems? 3. Are there new optimization problems or approaches (not necessarily linked to optimization criteria) that are promising? 4. How good are the existing heuristics for solving the relevent optimization problems, and what new approaches might give better results on important optimization problems? 5. How should we evaluate performance of algorithms? 6. Are there ``algorithms engineering" issues which will make these methods less powerful, and how do we handle them? 7. Is it possible to design methods which can efficiently characterize all optimal and near-optimal trees, rather than just a single optima? Statistical problems 1. What bounds can we obtain on the convergence rate of different methods? 2. How do various statistical properties of different methods scale with the size of the problem? 3. What is the relationship between estimating the whole tree versus some subset of the tree? 4. What is the distribution of specific tree characteristics such as smallest edge length, smallest diameter for quartet covering, steminess, etc. with respect to tree model sampling distribution? 5. Can we obtain accuracy bounded estimates (sacrificing resolution)?