463600 An Optimization-Based Approach for Learning Simple Parametric Surrogate Models

Tuesday, November 15, 2016: 1:04 PM
Monterey I (Hotel Nikko San Francisco)
Zachary Wilson, Carnegie Mellon University, Pittsburgh, PA and Nick Sahinidis, Department of Chemical Engineering, Carnegie Mellon University, Pittsburgh, PA

Data obtained through simulations or experiments are routinely used to inform process decisions and build system models. In order to describe behavior in complex systems with non-linear behavior, non-parametric methods, such as artificial neural networks or support vector machines are routinely used to build accurate system models. However, these models often suffer from overfitting. Moreover, their non-convex functional forms make them difficult to interpret and incorporate directly into algebraic optimization algorithms. In order to develop simple, yet accurate algebraic models we have recently developed the ALAMO methodology to learn models from exogenous data [1]. ALAMO performs a number of nonlinear transformations of input variables to populate a regression basis set.

In this paper, we present a systematic computational study of several fitness metrics that can be used in an optimization-based subset selection methodology to identify an optimal subset of regression variables. These metrics include Mallows’ Cp, Akaike’s information criterion, and Bayesian information criterion amongst others. The resulting models consist of a linear combination of nonlinear transformations of input variables, and their simple algebraic form can help provide insight on the system at hand. We complement these exact optimization algorithms with fast heuristics and describe their computational performance in ALAMO. Moreover, we present a systematic comparison between ALAMO’s optimization-based approach to model fitting from data with a number of other parametric model building methods, including the lasso implementation in Matlab [2] and R’s leaps routine [3].

References

[1] Cozad, A., N. V. Sahinidis, and D. C. Miller, Automatic learning of algebraic models for optimization, AIChE Journal, 60, 2211-2227, 2014.

[2] http://www.mathworks.com/help/stats/lasso.html
[3] https://cran.r-project.org/web/packages/leaps/leaps.pdf

Extended Abstract: File Not Uploaded
See more of this Session: Big Data Analytics in Chemical Engineering
See more of this Group/Topical: Computing and Systems Technology Division