Tissue Classification with Gene Expression Profiles
A. Ben-Dor, L. Bruhn, N. Friedman, I. Nachman, M. Schummer, and Z. Yakhini
J. Computational Biology, 7: 559-584, 2000.
Postscript version
PDF version.
Abstract
Constantly improving gene expression profiling technologies are expected
to provide understanding and insight into cancer related cellular processes.
Gene expression data is also expected to significantly aid in the development
of efficient cancer diagnosis and classification platforms. In this work
we examine three sets of gene expression data measured across sets of tumor(s)
and normal clinical samples: The first set consists of 2,000 genes, measured
in 62 epithelial colon samples [Alon et al. PNAS, 1999]. The second
consists of ~100,000 clones, measured in 32 ovarian samples (unpublished
extension of data set described in [Schummer et al, Gene,
1999]). The third set consists of ~7,100 genes, measured in 72 bone marrow
and peripheral blood samples [Golub et al, Science, 1999].
We examine the use of scoring methods, measuring separation of tissue
type (e.g., tumors from normals) using individual gene expression levels.
These are then coupled with high dimensional classification methods to
assess the classification power of complete expression profiles. We present
results of performing leave-one-out cross validation (LOOCV) experiments
on the three data sets, employingnearest neighbor classifier, SVM,
AdaBoost and a novel clustering based classification technique.
As tumor samples can differ from normal samples in their cell-type composition
we also perform LOOCV experiments using appropriately modified sets of
genes, attempting to eliminate the resulting bias.
We demonstrate success rate of at least 90 in tumor vs normal classification,
using sets of selected genes, with as well as without cellular contamination
related members. These results are insensitive to the exact selection mechanism,
over a certain range.
Back to Nir's publications page
nir@cs.huji.ac.il