468669 Fault Tolerant Computing through Machine Learning

Tuesday, November 15, 2016: 10:00 AM
Carmel I (Hotel Nikko San Francisco)
David Sroczynski1, Christine Kyauk1, Ioannis G. Kevrekidis1, Paul Villoutreix1 and Joakim Anden2, (1)Department of Chemical and Biological Engineering, Princeton University, Princeton, NJ, (2)Program for Applied and Computational Mathematics, Princeton University, Princeton, NJ

In modern, massively parallel scientific computation, domain decomposition approaches lead
to different segments of a domain, and different subfields/equations solved for in each segment,
being computed on different processors. 
If a processor fails during a "computation era", before information is exchanged between nodes,
one is faced with a serious problem about if and how the computation can proceed.

In many cases, the different fields that these processors compute are all functions of some 
intrinsic lower-dimensional coarse variables (e.g., time during the computation, long-wavelength
features of the solution). 
If the computational algorithms share some such common information, 
we can use machine learning, and in particular diffusion maps, a nonlinear manifold learning algorithm, to 
``register" the computational data in the coarse space and to ``fill in", to the best of our ability, data that
are missing or corrupted because of a processor failure.

This allows us to learn functional relationships between aspects of the data fields that are
not common across processors, effectively fusing the data sets.

We demonstrate our approach on two illustrative PDE systems with various spatiotemporal patterns of missing data.

The approach meshes well with equation-free computation schemes, in particular with patch dynamics;
beyond helping to partially restore corrupted or missing data, it can help determine
the size of the computational domain over which simulations need not be performed,
and can help determine processor redundancy for different anticipated failure patterns.

This is joint work with Prof. G. Karniadakis and Dr. Seungjoon Lee at Brown University.


Extended Abstract: File Not Uploaded