Rich probabilistic models for gene expression (2001)by E. Segal, B. Taskar, A. Gasch, N. Friedman, and D. Koller
Clustering is commonly used for analyzing gene expression data.
Despite their successes, clustering methods suffer from a number of
limitations. First, these methods reveal similarities that exist over
all of the measurements, while obscuring relationships that exist over
only a subset of the data. Second, clustering methods cannot readily
incorporate additional types of information, such as clinical data or
known attributes of genes. To circumvent these shortcomings, we propose the
use of a single coherent probabilistic model, that encompasses much of the
rich structure in the genomic expression data, while incorporating
additional information such as experiment type, putative binding
sites, or functional information. We show how this model can be learned
from the data, allowing us to discover patterns in the data and
dependencies between the gene expression patterns and additional
attributes. The learned model reveals context-specific
relationships, that exist only over a subset of the experiments in the
dataset. We demonstrate the power of our approach on synthetic data and on
two real-world gene expression data sets for yeast. For example, we
demonstrate a novel functionality that falls naturally out of our framework:
predicting the "cluster" of the array resulting from a gene mutation based
only on the gene's expression pattern in the context of other mutations.
E. Segal, B. Taskar, A. Gasch, N. Friedman, and D. Koller (2001). "Rich probabilistic models for gene expression." Bioinformatics, 17(Suppl 1), S243-52.
Proc. ISMB 2001.
author = "E. Segal and B. Taskar and A. Gasch and N. Friedman and D. Koller",
title = "Rich probabilistic models for gene expression",
journal = "Bioinformatics",
volume = "17",
number = "Suppl 1",
pages = "S243--52",
year = "2001",
note = "Proc. ISMB 2001",