Data Correction, Normalization and Validation for Enhanced Accuracy of Gc-MS Metabolomic Analysis:
Time Series Metabolomic Analysis of Arabidopsis Thaliana Response to Elevated Co2 a Case Study

Harin H. Kanani and Maria I. Klapa. University of Maryland, 2113 Chemical and Nuclear Engineering Building, College Park, MD 20742-2111

Metabolomic Profiling has emerged as a platform technology for quantitatively assessing the metabolic fingerprint of a biological system, being actually one of the fastest growing –“omics” technique to-date. In its short five year history Metabolomic analysis has already shown potential for commercial applications in Nutrition, Healthcare, Diagnostics, Toxicology, AgriBiotech and Industrial Biotech applications. In spite of the dramatically increasing interest, investment and growth, however, there are still issues regarding accuracy of measurements, sample preparation and protocol standardization, speed and user-friendliness that need to be resolved.

Gas Chromatography-Mass Spectrometry (GC-MS) has been the most commonly (in >90% of metabolomic labs) used instrument for metabolomic analysis due to its low cost and technical advantages, including superior separation capability, larger spectrum libraries, robustness and sensitivity. GC-MS metabolomics, however, requires derivatization of the original sample. Therefore, quantitative GC-MS metabolomics has to take into account potential systematic biases that might distort the one-to-one proportional relationship between the original metabolite concentration and the derivative peak area profiles. It is imperative that the metabolomic profile is corrected from these biases, because of the high risk of assigning biological significance to changes due only to chemical kinetics.

For the first time ever, a streamlined data correction, normalization and validation strategy** not jeopardizing the high-throughput nature of metabolomics analysis is presented. The importance of the presented strategy will be demonstrated, by comparing results obtained from, the short-term time-series (over 30 hours) metabolomic analysis of response to elevated CO2 (1%), of 12-day old Arabidopsis thaliana liquid cultures, with and without the use of the presented data correction and normalization strategy. Moreover, in the context of this study, it became possible for 15 derivative peaks of (NH2)-group containing compounds that had to-date either not been reported or considered as unknown in public databases to be annotated.

The presented data correction, data normalization and data validation technique increases significantly reliability and reproducibility of the GC-MS metabolomic analysis.

* This work is funded by US NSF (QSB-0331312)

** US Letter patent application 11/362,717

Web Page: www.glue.umd.edu/~kanani