Existing computational approaches for identifying and quantitating proteins with multiple post-translational modifications (PTMs) from LC-MS/MS data generally decouple this complex but complementary data structure into independent sub-problems for tractability purposes. However, this approach ignores important chromatographic relationships that exist in the m/z and time dimension and can only quantitate a modified form if it has been "positively identified" using tandem MS. This is a severe limitation, as tandem mass spectra (of varying quality, which is dependent on many factors) are only available for a fraction of the peptides that are actually present in a biological sample due the limited dynamic range of the mass spectrometer. Therefore, existing strategies fail to provide robust and comprehensive biological readouts as they cannot detect lower abundance forms nor resolve co-eluting isobaric peptides for which only partial, mixed or no tandem MS information is available.
We have developed a method for targeted protein systems that solves a much larger-scale problem to simultaneously annotate and quantify all of the peptides present in a LC-MS/MS run. The approach can utilize both label-free and isotopically-labeled data from a variety of sources, including metabolically-labeled amino acids and/or their transiently-labeled post-translational modifications. The key concept of our novel method is that chromatographic information (such as peak shape, isotopic distribution and relative elution time with respect to physically-related peptides) is directly incorporated into the MS2 identification and MS1 quantitation problems. By formulating this as an optimization problem, all modes of this information are simultaneously considered and the resulting model, which is more representative of the actual large-scale LC-MS/MS data structure, is solved to global optimality to identify and quantify ALL of the targeted peptides at once. Furthermore, the method is able to deconvolve co-eluting isobaric peptides present in mixed tandem MS (i.e., spectra containing the fragment ions of more than one modified form) and make accurate and robust identifications to incomplete tandem MS. To motivate the main ideas of our approach, specific examples will be provided to illustrate how unambiguous annotation and quantitation can only be accomplished for a particular peptide when additional information outside of the tandem MS is considered.
The utility of our method will be demonstrated and compared to existing quantitation algorithms using both bottom up and middle down LC-MS/MS data corresponding to mixtures of hyper-modified proteins from several biological samples. The systems presented in this work were selected to highlight the understudied yet ubiquitous existence of multiply-modified proteins and the functional implications of their PTMs in the molecular processes of eukaryotes. These hyper-modified protein systems include: (1) chromatin-related proteins, such as histones, in which several hundred modified isoforms are present per sample and correlate directly with specific nuclear events; (2) proteins essential for regulating telemore replication, in which differential phosphorylation simultaneously occurs on several residues; and (3) enzymatic proteins, where the modification state of the functional protein provides important insight into its structure. These systems further highlight the critical importance of obtaining quantitative measurements for all modified isoforms of a peptide, as lower abundance forms for which no tandem MS are available are often the most biologically relevant but are not detected by other methods. Lastly, it will be illustrated how the robust and comprehensive readouts generated by our algorithm can be used to reveal biological insight into these processes.
See more of this Group/Topical: Topical 3: 2011 Annual Meeting of the American Electrophoresis Society (AES)