601109 Discovery of Self-Assembling π-Conjugated Peptides By Active Learning-Directed Coarse-Grained Molecular Simulation

Wednesday, November 18, 2020
Computational Molecular Science and Engineering Forum (21) (Poster Gallery)
Kirill Shmilovich1, Rachael A. Mansbach2, Hythem Sidky1, Olivia E. Dunne1, Sayak S. Panda3,4, John D. Tovar3,4,5 and Andrew Ferguson6, (1)Pritzker School of Molecular Engineering, University of Chicago, Chicago, IL, (2)Theoretical Biology and Biophysics Group, Los Alamos National Laboratory, Los Alamos, NM, (3)Department of Chemistry, Johns Hopkins University, Baltimore, MD, (4)Institute of NanoBioTechnology, Johns Hopkins University, Baltimore, MD, (5)Department of Materials Science and Engineering, Johns Hopkins University, Baltimore, MD, (6)Pritzker School of Molecular Engineering, The University of Chicago, Chicago, IL

In this work we integrate coarse-grained molecular dynamics simulation, deep representational learning, and Bayesian optimization to discover pi-conjugated peptides capable of self-assembling into biocompatible optoelectronic nanoaggregates. The pi-conjugated peptides studied in this work are triblock molecules consisting of a central aromatic core flanked by peptide wings. This class of molecules have surfaced as an extensible building block for self-assembling electronics as they have experimentally been demonstrated to form mesoscopic fibers micrometers in length and nanometers in diameter, where overlaps between pi-orbitals in these supramolecular assemblies lead to the emergence optical and electronic properties. Edisonian trial-and-error discovery of these molecules through either experiment or simulation is rendered impossible due to the combinatorial exploration in the molecular design space of pi-cores and peptide wings. We efficiently navigate the design space in search of high-performing candidates by deploying an active learning procedure which integrates three machine learning components: (i) an unsupervised deep representation learning approach to learn continuous low-dimensional embeddings of the discrete molecular design space, (ii) a supervised surrogate model using Gaussian process regression to predict molecular performance measured in simulation as a function of this embedded space, and (iii) a Bayesian optimization of the surrogate model to dictate which molecules should be evaluated next. Using this protocol, we derive a converged surrogate model for predicting molecular performance of one particular peptide family comprising tetrapeptide wings and an oligophenylenevinylene pi core after sampling only 2.3% of the design space. We identify molecules we predict to possess unprecedented self-assembly behavior and optoelectronic activity while uncovering design rules to guide the rational engineering of these molecular systems.

Extended Abstract: File Not Uploaded