Pathway Expression Rank Analysis (p-XRAY): A Novel Tool for Gene Set Expression Analysis
James A. Eddy, University of Illinois, Urbana, IL 61801, Donald Geman, Johns Hopkins University, Baltimore, MD 21218, and Nathan D. Price, University of Illinois Urbana-Champaign, Urbana, IL 61801.
Systems analysis of high-throughput gene expression data is useful in aiding clinical diagnoses and elucidating disease mechanisms. Due to the high dimensionality of expression data relative to the number of experimental samples (patient replicates), as well as the common need for normalization to account for microarray platform variability, it is often difficult to build robust predictive models based on single, pairs, or small clusters of genes. New approaches for expression analysis of biologically meaningful gene sets (e.g. pathways) may yield further insight into cellular disfunction and provide more informative means of differentiating complex phenotypes. We have developed a computational method, pathway Expression Rank Analysis (p-XRAY) that quantifies the regulation of gene expression within pathways in different phenotypic states. Using a priori pathway definitions, we determine a characteristic ordering of pathway genes for each phenotype based on relative levels of expression (ranks), thus normalization of gene expression values is not needed. The p-XRAY algorithm assigns scores to samples in a test set based on the conservation of this ordering in each sample, compressing expression values to a single measure for each pathway. Generating multiple sets of conservation scores for all samples, each with respect to a particular phenotype's characteristic ordering, leads to an elegant and effective classification metric – differential expression rank conservation (dERC), which performs binary classification based simply on whether the metric is positive or negative. Specifically, the dERC metric is used to find pathways where gene order is highly conserved within each of the classes being compared, but minimally conserved across all classes. We used p-XRAY with dERC to identify pathways where gene expression was highly differentially regulated between samples of gastrointestinal stromal tumors (GIST) or leiomyosarcomas (LMSs). Using these classifier pathways, we distinguished GIST and LMS with 98.6% accuracy in cross-validation. These results show p-XRAY to be a promising tool for analysis and classification of disease expression data.