Systematic Identification of Relevant Order Parameters in Biophysical Systems

Wednesday, November 11, 2009: 1:50 PM
Lincoln A (Gaylord Opryland Hotel)

Andrew L. Ferguson, Chemical Engineering, Princeton University, Princeton, NJ
Lilia V. Bravewolf, Chemical Engineering, Princeton University, Princeton, NJ
Pablo G. Debenedetti, Chemical Engineering, Princeton University, Princeton, NJ
Athanassios Z. Panagiotopoulos, Chemical Engineering, Princeton University, Princeton, NJ
Yannis G. Kevrekidis, Chemical Engineering, Princeton University, Princeton, NJ

The systematic determination of thermodynamically and kinetically meaningful low-dimensional embeddings of high-dimensional datasets remains an important problem with implications for the visualization, clustering and coarse-grained simulation of complex dynamical systems. It is well established that many processes residing in an ostensibly D dimensional space actually lie on an intrinsic manifold of dimensionality d << D. The dimensionality and shape of the manifold is generally unknown a priori and may be a highly non-linear function of the data. For example, transition path sampling has been used to demonstrate that the transitions between the C7eq and Cax conformations of alanine dipeptide in vacuum are well-characterized by two backbone torsional angles [1], indicating that the system lies close to a two-dimensional manifold parameterized by these variables.

The diffusion mapping technique [2-4] relies on the construction of a Markov matrix describing a random walk over a data set, where the probability of hopping from one data point to another is specified by a pairwise similarity metric. In biophysical systems, the negative exponential of the root mean squared deviation between molecular conformations is a common choice. The diffusive proximity of two data points is defined as the probability of reaching one point from the other in a specified number of applications of the Markov transition matrix. Points that are connected by many, short pathways have a small diffusive proximity, whereas those connected by few, long routes will have a large value. For uniformly sampled datasets over the domain, the eigenvectors of the Markov matrix are discrete approximations to the corresponding eigenfunctions of the continuous Laplace-Beltrami operator, which is a generalization of the familiar Laplacian to arbitrary surfaces and the generator of a continuous diffusion process on that surface. In the case of non-uniform sampling, the eigenvectors approximate the eigenfunctions of the Fokker-Planck operator describing a continuous diffusion process allowing for the presence of potential wells. Mapping the original data set onto the eigenvectors of the Markov matrix – the so-called diffusion mapping – results in an embedding in which Euclidean distances between points correspond to their diffusive proximity in the original space. Subsequent analyses may be conducted to reconstruct the intrinsic manifold, estimate its dimensionality and interpret the diffusion map embeddings in the original variables.

In this work, we apply the diffusion map technique to ideal-gas and solvated n-alkane molecular dynamics trajectories to systematically identify the “right” variables with which to describe the dynamic evolution of these systems and construct low-dimensional projections of the free energy surface. Our findings suggest that, consistent with our recent work [5], the chain radius of gyration is the primary order parameter for the system, with a variable correlated with a hairpin to globular transition also of significant importance. We have also conducted long atomistic molecular dynamics simulations of the alanine dipeptide in explicit solvent and determined the top two eigenvectors to be correlated with the Φ and Ψ backbone dihedral angles known to parameterize the free energy landscape. Finally, we introduce novel techniques to incorporate solvent variables into the diffusion map analysis of these two systems, in order to move away from a solute centered perspective. Preliminary results for the hydrocarbon systems suggest that the diffusion map may be able to capture cavitation as a key variable in hydrophobic collapse as has been suggested in the literature [6].

  1. Bolhuis, P.G.; Dellago, C.; Chandler, D. Proc. Natl. Acad. Sci. U.S.A. 2000, 97, 11 5877-5882.
  2. Coifman, R.R.; Lafon, S.; Lee, A.B.; Maggioni, M.; Nadler, B.; Warner, F.; Zucker, S.W. Geometric Diffusions as a Tool for Harmonic Analysis and Structure Definition of Data: Diffusion Maps. Proc. Natl. Acad. Sci. U.S.A. 2005, 102, 21, 7426-7431.
  3. Belkin, M.; Niyogi, P. Laplacian Eigenmaps for Dimensionality Reduction and Data Representation. Neural Computation, 2003, 15, 1373-1396.
  4. Nadler, B.; Lafon, S.; Coifman, R.R.; Kevrekidis, I.G. Diffusion Maps, Spectral Clustering and Eigenfunctions of Fokker-Planck Operators in Advances in Neural Information Processing Systems; MIT Press: Boston, 2005, 955-962.
  5. Ferguson, A.L.; Debenedetti, P.G.; Panagiotopoulos, A.Z. Solubility and Molecular Conformations of n-Alkane Chains in Water. J. Phys. Chem. B, 2009, 113, 6405-6414.
  6. Miller, T.F.; Vanden-Eijnden, E.; Chandler, D. Solvent Coarse-Graining and the String Method Applied to the Hydrophobic Collapse of a Hydrated Chain. Proc. Natl. Acad. Sci. U.S.A. 2007, 104, 37, 14559-14564.

Extended Abstract: File Not Uploaded