470979 Integrated Upstream and Downstream Data Curation Tools As a Key to Enabling Reproducibility, Usability and Data Sharing

Wednesday, November 16, 2016: 4:05 PM
Yosemite A (Hilton San Francisco Union Square)
Frederick R. Phelan Jr.1, Thomas Rosch1, Cheol Jeong1, Brian Moroz2 and Sharief Youssef2, (1)Materials Science and Engineering Division, NIST, Gaithersburg, MD, (2)Software and Systems Division, NIST, Gaithersburg, MD

In this presentation, we describe the development of a computational “workbench” whose goal is to provide an integrated computational and data environment to support multiscale modeling of soft materials for the Materials Genome Initiative (MGI). The design has three essential elements: a modular program structure that supports the addition of new functionality through Python scripting and run-time plugins; a hierarchical data structure which enables unified representation of materials at different levels of granularity; finally, integration of the NIST Materials Data Curation System (MDCS) [1-2] into the environment to support ontology based materials descriptions. A key element of the design which we emphasize in this presentation is the database element. The XML schema based database environment allows us to visualize the inter-relationships between data elements, and enables automated curation of both upstream and downstream data in the workflow. We show how controlling the data in this manner is essential for ensuring reproducibility, results in greatly enhanced usability, and allows users to build progressive, materials reference libraries which can be pushed or shared by various means. We will illustrate this using various examples including tools being developed for coarse-grained force-field development and property calculation tools.

References

  1. Materials Data and Informatics, http://www.nist.gov/itl/ssd/is/materials-data-and-informatics.cfm
  2. Materials Data Curation System, https://github.com/usnistgov/MDCS

Extended Abstract: File Not Uploaded