259791 Ptm Curator: An Automated Method for the Frequency Analysis of the Experimental and Putative Post-Translational Modification Statistics Contained in the Swiss-Prot Database
Post-translational modifications (PTMs) broadly contribute to the recent explosion of proteomic data and possess a complexity surpassing that of protein design . PTMs are the chemical modification of a protein during or after its translation, and have wide effects broadening its range of functionality . Based on previous estimates, it is believed that more than half of proteins are glycoproteins . Whereas mutations can only occur once per position, different forms of post-translational modifications may occur in tandem. With the number and abundances of modifications constantly being discovered, there is no method to readily assess their relative levels.
In this work, we report the relative abundances of each PTM found experimentally and putatively, from high-quality, manually curated, proteome-wide data contained in the Swiss-Prot [4-6] database. We find that at best, less than one-fifth of proteins are glycosylated from the global dataset. We further explore the frequency of the D-amino acids in the database in comparison to the global set. Remarkably we found that D-alanine is the most frequent, which may have implications for the origins of life given alanine’s status as the “default” amino acid. Only 837 D-amino acids were found to be contained in the database of over 187,941,074 amino acids .
The frequencies can be converted into probabilities and conditional probabilities representing the possibility of identification of a PTM on a peptide or protein. This can help untargeted proteomic assignment to develop probabilistic methods to determine the expected value associated with potential PTM sites on proteins, helping account for uncertainty. Similarly, the statistics can help validate targeted assignment of modifications.
We make available to the academic community a continuously updated systems-wide resource, the PTM Curator  (http://selene.princeton.edu/PTMCuration). New features have been added for ease of use. Namely, we populate the UniProt ID numbers associated with each modification and filter the modification statistics by whether they occurred on a prokaryote or eukaryote, as well as provide the modifications occurring on mammalian proteins. Using our Curation method, scientists unambiguously can assess “how many” of each PTM exists.
1. Baliban, R.C., et al., A Novel Approach for Untargeted Post-translational Modification Identification Using Integer Linear Optimization and Tandem Mass Spectrometry. Molecular & Cellular Proteomics, 2010. 9(5): p. 764-779.
2. Walsh, C., Posttranslational modification of proteins: expanding nature's inventory2006, Englewood, Colo.: Roberts and Co. Publishers. xxi, 490 p.
3. Apweiler, R., H. Hermjakob, and N. Sharon, On the frequency of protein glycosylation, as deduced from analysis of the SWISS-PROT database. Biochimica et Biophysica Acta (BBA) - General Subjects, 1999. 1473(1): p. 4-8.
4. Bairoch, A. and R. Apweiler, The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Research, 2000. 28(1): p. 45-48.
5. Jung, E., et al., Annotation of glycoproteins in the SWISS-PROT database. PROTEOMICS, 2001. 1(2): p. 262-268.
6. Farriol-Mathis, N., et al., Annotation of post-translational modifications in the Swiss-Prot knowledge base. PROTEOMICS, 2004. 4(6): p. 1537-1550.
7. Khoury, G.A., R.C. Baliban, and C.A. Floudas, Proteome-wide post-translational modification statistics: frequency analysis and curation of the swiss-prot database. Sci. Rep., 2011. 1.