Biological cells are complex dynamic systems, comprising thousands of interacting nucleic acids, proteins, and metabolites acting in concert to perform coordinated and well regulated tasks. The ability to effectively quantify system-wide cellular responses to various stimuli is paramount to our understanding of complex disease-states and how pharmacological agents elicit desired outcomes. Investigators are therefore increasingly relying on large scale –omics data (transcriptomics, proteomics, metabolomics) to experimentally capture changes in signaling and metabolic pathway components. While the burden of data acquisition has lightened with improved instrumentation, data interpretation in the proper biological context remains challenging. Metabolomics data, in particular, are difficult to interpret in the context of pathways because a single metabolite may be utilized by several functional modules. Both univariate and multivariate statistical tools can determine which metabolites explain a differential response among experimental treatment groups. However, these data-driven methods do not utilize the vast domain-specific biochemical knowledge of known stoichiometric and regulatory interactions that can be used to systematically uncover which signaling or metabolic pathways are activated by a perturbation. To this end, mapping metabolomics data onto well curated metabolic graph networks can help identify interactions that may not be intuitive from simply observing the metabolic pathway maps offered by the KEGG database.
In this study, we introduce Metabolomic Modularity Analysis (MMA) as a graph-based algorithm to systematically identify modules of reactions enriched with metabolites flagged to be statistically significant based on univariate analysis. Briefly, a metabolic network is abstracted as a reaction-centric graph network where a non-directional edge is drawn between two reactions if a metabolite produced by the first reaction is consumed by the second. The length of the edge between reactions is weighted such that if the two reactions are involved in metabolites that are statistically significant, their edge distance is shorter. When applying Newman's hierarchical partition algorithm based on relative reaction-pair shortest paths, the resulting modules feature reactions from the surrounding local topological neighborhood, which favors adjacent reaction pairs that involve statistically significant metabolites. A defining feature of determining reaction-centric modularity is that interactions between reactions mediated by the production and consumption of cofactors and other hub metabolites are also accounted for.
We apply MMA on time-course metabolomics data collected from biopsies during subnormothermic machine perfusion (SNMP) of nine discarded human livers that were rejected for transplantation. These livers had endured various degrees of warm ischemia time (WIT) prior to organ procurement and three of them exhibited over 30% macrovesicular steatosis. Of the 155 primary metabolites measured, the pre-perfusion levels (t=0 hours) of 33 of them were found to be significantly correlated with WIT, suggesting that they could be putative biomarkers for ischemic injury. MMA was performed on the human ReconX metabolic network (7439 reactions and 2626 metabolites) after flagging these 33 metabolites in the cytosolic compartment as being significant. For a network this large, computing the adjacency distance matrix requires 7439C2, or 2.77 ·107 computations for the initial network, which causes total run times on the order of days with a laptop, rendering the method impractical for repeated analysis. Fortunately, Mathworks® now offers Matlab's Distributed Computing Server (MDCS) on Amazon's Elastic Compute Cloud (EC2). For this study, one cluster node (c3.8xlarge) with 16 workers was rented at a rate of $1.68/hour for the cluster and $1.12/hour for MDCS. Several key parts of the MMA algorithm are parallelizable and was therefore able to complete a full run in 2.7 hours, with a total cost of $8.40 for the 3 hours of server time. The MMA partitioning resulted in 4755 hierarchical modules, of which 223 contain at least one significant metabolite. To highlight an example, one such module contained four metabolites significantly correlated with WIT; arachidic acid (arach[c]), cholesterol (chsterol[c]), stearic acid (ocdca[c]), and palmitic acid (hdca[c]). The stoichiometry of reactions in the module suggests that ischemia may have an impact on coenzyme A (coA[c]) pools as a cause for why these four metabolite levels are affected (Figure 1). Ongoing work involves applying MMA to identify modules activated by perfusion itself by flagging metabolites significantly different between various time points and determine how ischemia or fat content affects the modules uncovered.
In this study, we demonstrate that graph network-based analytics to quantify metabolomics data is made practical and feasible using parallel computing on Amazon's cloud. Prospectively, similar graph-based algorithms can be employed to analyze proteomics and transcriptomics data as well and ultimately devise ways to incorporate all information to best characterize cellular dynamics.
Figure 1 Example of module identified using MMA. Metabolite nodes are represented as ellipses, while reaction nodes are represented as rectangles. Metabolite nodes colored in red are those whose pre-perfusion levels are correlated to WIT (p<0.10).
See more of this Group/Topical: Topical Conference: Emerging Frontiers in Systems and Synthetic Biology