Class Discovery in Gene Expression Data
A. Ben-Dor, N. Friedman, and Z. Yakhini.
In Proc. Fifth Annual Inter. Conf.
on Computational Molecular Biology (RECOMB 2001).
Postscript version (576K)
PDF version.
Abstract
Recent studies demonstrate the discovery of putative disease subtypes
from gene expression data. The underlying computational problem is to
partition the set of sample tissues into statistically meaningful
classes. In this paper we present a novel approach to class discovery
and develop automatic analysis methods. Our approach is based on
statistically scoring candidate partitions according to the
overabundance of genes that separate the different classes. Indeed, in
biological datasets, an overabundance of genes separating known
classes is typically observed. we measure overabundance against a
stochastic null model. This allows for highlighting subtle, yet
meaningful, partitions that are supported on a small subset of the
genes.
Using simulated annealing we explore the space of all possible
partitions of the set of samples, seeking partitions with statistically
significant overabundance of differentially expressed genes. We
demonstrate the performance of our methods on synthetic data, where we
recover planted partitions. Finally, we turn to tumor expression
datasets, and show that we find several highly pronounced partitions.
Back to Nir's publications page
nir@cs.huji.ac.il