610067 The Open Catalyst Project Dataset

Monday, November 16, 2020
Applications of Data Science to Molecules and Materials (T3) (PreRecorded+)
Lowik Chanussot1, Abhishek Das2, Javier Heras-Domingo3, Siddharth Goyal4, Caleb Ho4, Thibaut Lavril4, Aini Palizhati5, Devi Parikh6, Morgane Riviere4, Muhammed Shuaibi5, Kevin Tran7, Zachary Ulissi5, Junwoong Yoon5 and C. Lawrence Zitnick4, (1)Facebook AI Research, Paris, France, (2)Georgia Tech, Atlanta, GA, (3)Carnegie Mellon University, Pittsburgh, PA, (4)Facebook AI Research, Menlo Park, CA, (5)Chemical Engineering, Carnegie Mellon University, Pittsburgh, PA, (6)Georgia Tech and Facebook AI Research, Atlanta, GA, (7)Department of Chemical Engineering, Carnegie Mellon University, Pittsburgh, PA

The Open Catalyst Project aims to develop new ML methods and models to accelerate the catalyst simulation process for renewable energy technologies and improve our ability to predict activity/selectivity across catalyst composition. To achieve that in the short term we need participation from the ML community in solving key challenges in catalysis. One path to interaction is the development of grand challenge datasets that are representative of common challenges in catalysis, large enough to excite the ML community, and large enough to take advantage of and encourage advances in deep learning models. Similar datasets have had a large impact in small molecule drug discovery, organic photovoltaics, and inorganic crystal structure prediction. We present the first open dataset from this effort on thermochemical intermediates across stable multi-metallic and p-block doped surfaces. This dataset includes full-accuracy DFT calculations across 53 elements and their binary/ternary materials, various low-index facets. Adsorbates span 56 common reaction intermediates with relevance to carbon, oxygen, and nitrogen thermal and electrochemical reactions. Off-equilibrium structures are also generated and included to aid in machine learning force field design and fitting. Collectively, this dataset represents the largest systematic dataset that bridges organic and inorganic chemistry and will enable a new generation of catalyst structure/property relationships. Fixed train/test splits that represent common chemical challenges and an open challenge website will be discussed to encourage competition and buy-in from the ML community.

Extended Abstract: File Not Uploaded