Monday, November 5, 2007 - 2:00 PM

Inferring Large-Scale Gene Regulatory Structures Using Optimization-Based Approaches

Meng Piao Tan1, Christodoulos A. Floudas1, and James R. Broach2. (1) Department of Chemical Engineering, Princeton University, Princeton, NJ 08544, (2) Department of Molecular Biology, Princeton University, Princeton, NJ 08544

Novel high-throughput techniques of molecular biology are capable of producing in vivo time series expression data that are relatively high in quantity and quality. These data implicitly contain enormous information about the biological system they describe, such as their functional connectivity and regulatory patterns. The ability to use this data to map gene-transcription factor relationships provides important insights into complex cellular signal transduction pathways. Gene regulatory networks are important because they are the on-off switches and rheostats of a cell operating at the gene level. The networks dynamically orchestrate the level of expression for each gene in the genome by controlling whether and how vigorously that gene will be transcribed into RNA. The current effort in researching gene regulatory networks is substantial, but models that accurately describe regulatory networks are wanting. This is largely because biological systems are typically data-poor. For instance, modeling network connectivity requires knowledge of specific kinetic parameters that often have to be estimated themselves [1]. It has also been shown using nonlinear stability analysis that a dynamic analysis of gene networks requires both mRNA and protein expression data [2]. However, despite the advent of high-throughput experimental techniques to measure mRNA levels, there is a limited ability to effectively measure protein abundances on a large scale [3].

In this study, we propose a novel Mixed-Integer Nonlinear Programming (MINLP) methodology [4] to quantify and infer the most feasible large-scale gene regulatory network given only experimental mRNA expression data and a list of candidate transcription factors. Using an iterative approach, we use an initial coarse estimation of the regulatory network to deduce an appropriate set of rate constants and kinetic parameters. The results are then used to obtain a refined prediction of the transcription regulatory model. We test our proposed algorithm on a set of 45 high quality gene clusters [5,6] obtained from gene expression patterns from the yeast Saccharomyces Cerevisiae. The dataset is obtained from experiments designed to examine the roles of the Ras, Snf1, and Sch9 proteins in effecting transcriptional changes as a result of yeast cellular response to glucose. We show that our methodology is able to replicate known transcriptional connections as well as uncover new potential regulatory relationships.

[1]-Ronen, M., Rosenberg, R., Shraiman, B. I., Alon, U.: Assigning Numbers to the Arrows: Parametrizing a Gene Regulation Network by Using Accurate Expression Kinetics. PNAS 99(16), 10555-10560 (2002) [2]-Hatzimanikatis, V., Lee, K. H.: Dynamical Analysis of Gene Networks Requires Both mRNA and Protein Expression Information. Metabolic Engineering 1, 275-281 (1999) [3]-Greenbaum, D., Colangelo, C., Williams, K., Gerstein, M.: Comparing Protein Abundance and mRNA Expression Levels on a Genomic Scale. Genome Biology 4, Article 117 (2003) [4]-Tan, M. P.; Broach, J. R.; Floudas, C. A.: An Optimization-Based Approach to Rigorously Infer Large-Scale Gene Regulatory Structures from DNA Microarray Data (2007) In Preparation [5]-Tan, M. P., Broach, J. R., Floudas, C. A.: A Novel Clustering Approach and Prediction of Optimal Number of Clusters: Global Optimum Search with Enhanced Positioning. Journal of Global Optimization (2007) In Press [6]-Tan, M. P., Broach, J. R., Floudas, C. A.: Microarray Data Mining: A Novel Optimization-Based Iterative Clustering Approach to Uncover Biologically Coherent Structures. (2007) Submitted for Publication