Swati Gupta1, James A. Eddy2, Seth Hanson3, and Nathan D. Price3. (1) Biophysics and Computational Biology, University of Illinois at Urbana Champaign, MC - 195, Institute for Genomic Biology, 1206 W. Gregory Drive, Urbana, IL 61801, (2) Bioengineering, University of Illinois at Urbana Champaign, MC-195, Institute for Genomic Biology, 1206 W. Gregory Drive, Urbana, IL 61801, (3) Chemical and Biomolecular Engineering, University of Illinois at Urbana Champaign, MC - 195, Institute for Genomic Biology, 1206 W. Gregory Drive, Urbana, IL 61801
Ever evolving experimental techniques for high-throughput data generation keep data mining a major bioinformatics opportunity. Most current pathway-level comparisons of expression data are based on evaluating the enrichment of individual pathways between two sample sets (e.g. cancer vs. non-cancer or responsive vs. non-responsive to treatment). Common examples of this approach include gene ontology (GO) enrichment and Gene Set Enrichment Analysis. Here we present a microarray data analysis approach – Gene Set Expression Reversal Analysis (GSERA) – that adds to these approaches by also performing relative comparisons between all pairs of gene sets in order to uncover switches between their relative expressions between classes. The statistical significance of reversals in gene set expression levels between different biological states are determined based on their False Discovery Rates (FDR). Our program returns a sublist of top scoring gene set pairs which have the most added value as compared to either individual gene set. The user can choose between predefined metabolic/signaling pathways or dynamically define them by accessing the gene ontology database or any defined gene sets of their choosing. In cancer data sets we have studied, the program provides added information value, and very low false discovery rates, when compared with other similar programs. Our work indicates that assessing relative changes between pairs of pathways will yield significant additional insights not found using methods that consider genes or gene sets individually.