454766 Using Semi-Supervised Machine Learning to Map the Phase Diagrams of Open Materials Data Sets

Monday, November 14, 2016: 2:48 PM
Yosemite A (Hilton San Francisco Union Square)
Jason Hattrick-Simpers, Department of Chemical Engineering, University of South Carolina, Columbia, SC, Jonathan Kenneth Bunn, Chemical Engineering, University of South Carolina, Columbia, SC and Jianjun Hu, Computer Science and Engineering, University of South Carolina, Columbia

The Materials Genome Initiative (MGI) is a government program intended to expedite materials discovery, optimization, and deployment by combining theory, experiments, and data science in an integrated workflow. This effort requires that large quantities of high quality experimental data be made publicly available via minable materials databases for the purpose of validating new data mining techniques and validating theoretical results. High-throughput experimentation has been identified as an important technology for creating large-scale experimental databases that delineate the impact of synthesis, processing and composition on crystal phase stability and figure of merit. To date, however, relatively few of these rich data sets are freely available and those that are available are primarily restricted to unlabeled as-obtained structural data.

Such data is critically important to identifying the materials genome, as it creates the linkage between composition, structure and property. Algorithmic approaches to automated phase diagram mapping have been a hot issue in the high-throughput field for a number of years and most studies have focused on an open FeGaPd data set that has been available for about 10 years. Here, we demonstrate a semi-supervised machine learning technique, SS-AutoPhase, which uses a two-step approach to automatically identify phases within structural data sets. In the first step, clustering analysis is used to automatically select a representative sub-set of samples to be manually analyzed by a human expert. In the second step, these labeled samples are used by an AdaBoost classifier to identify the presence of the different phases in the FeGaPd diffraction data. SS-AutoPhase was used to identify the metallographic phases in 278 diffraction patterns from a FeGaPd sputtered composition spread sample. The accuracy of SS-AutoPhase was greater than 82.6% for all phases when 15% of the diffraction patterns were used for training. Furthermore, the predicted phase diagram of SS-AutoPhase was determined and compared to phase labels from a human expert and other algorithmic approaches. This comparison showed that not only did SS-AutoPhase have very high agreement with the expert phase labels, but that it was able to determines and correctly identify a previously unreported phase. Finally, I will report on a first-of-its-kind identification of a novel ferromagnetic shape memory alloy via the data mining of an open materials database.


Extended Abstract: File Not Uploaded