A Structural EM Algorithm for Phylogenetic Inference
N. Friedman, and M. Ninio, I. Pe'er, and T. Pupko
In Proc. Fifth Annual Inter. Conf.
on Computational Molecular Biology (RECOMB 2001).
Postscript version (1040K)
PDF version (560K).
Abstract
A central task in the study of evolution is the reconstruction of a
phylogenetic tree from sequences of current-day taxa. A well supported
approach to tree reconstruction performs maximum likelihood
(ML) analysis. Unfortunately, searching for the
maximum likelihood phylogenetic tree is
computationally expensive. In this paper, we describe a new algorithm
that uses Structural-EM for
learning maximum likelihood
trees. This algorithm is similar to the standard EM method for
estimating branch lengths, except that
during iterations of this algorithms the topology is improved as well
as the branch length.
The algorithm performs iterations of
two steps.
In the E-Step, we use the current tree topology and branch
lengths to compute expected sufficient statistics, which
summarize the data.
In the M-Step, we search for a topology that maximizes
the likelihood with respect to these expected sufficient statistics.
As we show, searching for better topologies inside the M-step can be
done efficiently, as opposed to standard search over topologies.
We prove that each iteration of this procedure increases the
likelihood of the topology, and thus the procedure must converge.
We evaluate our new algorithm on both synthetic and real sequence data,
and show that it is both dramatically faster and finds more plausible
trees than standard search for maximum likelihood phylogenies.
Back to Nir's publications page
nir@cs.huji.ac.il