438719 A Systems Biology Definition of the Core Proteome of Metabolism and Expression

Sunday, November 8, 2015
Exhibit Hall 1 (Salt Palace Convention Center)
Laurence Yang1, Justin Tan1, Edward J. O'Brien2, Jonathan M. Monk1, Donghyuk Kim3, Howard Li1, Pep Charusanti1, Ali Ebrahim1, Colton J. Lloyd1, James T. Yurkovich1, Bin Du1, Andreas Dräger1,4, Alex Thomas5, Yuekai Sun6, Michael A. Saunders7 and Bernhard O. Palsson8, (1)Bioengineering, University of California, San Diego, La Jolla, CA, (2)Bioinformatics, University of California, San Diego, La Jolla, CA, (3)Department of Bioengineering, University of California, San Diego, La Jolla, CA, (4)Center for Bioinformatics Tuebingen (ZBIT), University of Tuebingen, Tübingen, Germany, (5)Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Lyngby, Denmark, (6)Institute for Computational and Mathematical Engineering, Stanford University, Stanford, CA, (7)Department of Management Science and Engineering, Stanford University, Stanford, CA, (8)Department of Pediatrics, University of California, San Diego, La Jolla, CA

Finding the minimal set of gene functions needed to sustain life is of both fundamental and practical importance. Minimal gene lists have been proposed using comparative genomics-based core proteome definitions. A definition of a core proteome that is supported by empirical data, is understood at the systems-level, and that provides a basis for computing essential cell functions is lacking. Here, we use a genome-scale model of metabolism and expression to define a functional core proteome consisting of 356 gene products, accounting for 44% of the Escherichia coli proteome by mass based on proteomics data. This systems biology core proteome includes 212 genes not found in previous comparative genomics-based core proteome definitions, accounts for 65% of known essential genes in E. coli, and has 78% gene function overlap with minimal genomes (Buchnera aphidicola and Mycoplasma genitalium). Based on transcriptomics data across environmental and genetic backgrounds, the core proteome is significantly enriched in non-differentially expressed genes, and depleted in differentially expressed genes. Compared to the non-core, core gene expression levels are also similar across genetic backgrounds (two times higher Spearman rank correlation), and exhibit significantly more complex transcriptional and post-transcriptional regulatory features (40% more transcription start sites per gene, 22% longer 5’UTR). Thus, genome-scale systems biology approaches rigorously identified a functional core proteome needed to support growth. This framework, validated using high-throughput datasets, facilitates a mechanistic understanding of systems-level core proteome function through in silico models.

Extended Abstract: File Not Uploaded