399462 Alamo: Automatic Learning of Algebraic Models Using Optimization

Tuesday, April 28, 2015: 4:30 PM
12B (Austin Convention Center)
Nick Sahinidis, Chemical Engineering, Carnegie Mellon University, Pittsburgh, PA

We address the problem of discovering algebraic relationships that are hidden in a set of data, an experimental process, or a simulation model.  The problem lies at the interfaces between statistical experimental design, optimization, and machine learning.  We present a methodology for developing models that are simple and accurate, while minimizing the number of experiments or simulations of the system under study.  The methodology begins by building a low-complexity model of the system using integer optimization techniques.  The model is then tested, exploited, and improved through the use of derivative-free optimization to adaptively sample new experimental or simulation points.  Semi-infinite optimization techniques facilitate a combined data- and theory-driven approach to model building.  We provide computational comparisons between ALAMO, the computational implementation of the proposed methodology, and a variety of machine learning and statistical techniques, including Latin hypercube sampling, simple least squares regression, and the lasso.  Finally, we demonstrate how ALAMO’s adaptive sampling technique can be used to learn models by selecting small numbers of samples from huge data sets or even from infinitely many data points.

Extended Abstract: File Not Uploaded
See more of this Session: Big Data Analytics – Vendor Perspective (invited session) II
See more of this Group/Topical: Big Data Analytics