468669 Fault Tolerant Computing through Machine Learning
to different segments of a domain, and different subfields/equations solved for in each segment,
being computed on different processors.
If a processor fails during a "computation era", before information is exchanged between nodes,
one is faced with a serious problem about if and how the computation can proceed.
In many cases, the different fields that these processors compute are all functions of some
intrinsic lower-dimensional coarse variables (e.g., time during the computation, long-wavelength
features of the solution).
If the computational algorithms share some such common information,
we can use machine learning, and in particular diffusion maps, a nonlinear manifold learning algorithm, to
``register" the computational data in the coarse space and to ``fill in", to the best of our ability, data that
are missing or corrupted because of a processor failure.
This allows us to learn functional relationships between aspects of the data fields that are
not common across processors, effectively fusing the data sets.
We demonstrate our approach on two illustrative PDE systems with various spatiotemporal patterns of missing data.
The approach meshes well with equation-free computation schemes, in particular with patch dynamics;
beyond helping to partially restore corrupted or missing data, it can help determine
the size of the computational domain over which simulations need not be performed,
and can help determine processor redundancy for different anticipated failure patterns.
This is joint work with Prof. G. Karniadakis and Dr. Seungjoon Lee at Brown University.
See more of this Group/Topical: Computing and Systems Technology Division