283768 Efficient Surrogate Model Generation with Adaptive Sequential Sampling
By constructing a simpler approximation of the full system, surrogate-based optimization allows for traditional derivative-based optimization techniques to be applied in more complex problems. However, the construction of a surrogate model requires the execution of the original model many times in order to gather the data that will be used to construct it. Depending on the complexity of the original model, this step could become cost prohibitive. For example, if the original model is a computational fluid dynamics simulation of a packed two-phase reactor, single simulation run might take anywhere from days to weeks to solve. Therefore, it is important to determine the number and location of the samples to minimize the upfront computational costs to generate accurate surrogate models. In this work, three adaptive sequential sampling algorithms are developed for surrogate model construction. Their performances are compared on the basis of number of samples needed to generate surrogate models with 5% accuracy for three challenge functions: the Shekel function, Ackley function, and Beale function. In this work, artificial neural networks are used as surrogate models.
All three algorithms are sequential design techniques, i.e., they begin with an initial sample set, train the networks, evaluate the performance of the trained networks (we used K-fold cross-validation as the model evaluation technique), select n new data points where n is a given fraction of the current sample size, and train the networks again with the new full data set. This procedure is repeated until a stopping criterion is satisfied.
The first algorithm begins the process of selecting new points by generating a large Latin hypercube sample of proposed points. The set of neural networks, each trained by the data of a single fold, is used to predict the output variable at each proposed point. The variance of the predictions at a given point provides an estimate of the surrogate-model variance at that point. The points with the highest variance estimates are where the models have the highest uncertain predictions and hence these points are selected to run the original model. The neural networks are then trained with the overall data set. This process is repeated until the maximum variance estimate decreases below a target value. The second algorithm is similar in its execution steps; however, it uses a performance metric combining the estimated variance and the distance to the nearest-neighbor sample point to select the sequential sample points rather than the variance alone. This algorithm terminates when the performance metric is below a target value for all proposed points. The addition of the nearest-neighbor distance makes this algorithm both space-filling and adaptive, to both locate and fully model fluctuations in the objective function. The final algorithm uses incremental Latin hypercube sampling (or iLHS, as presented in Nuchitprasittichai and Cremaschi 2012) until the mean squared error stabilizes, and then switches to the first algorithm to add more samples in areas of insufficient information.
Algorithm 1 is the simplest and fastest algorithm, but requires that the initial sample size be sufficient to locate all regions of interest in the objective function. The space-filling criterion in algorithm 2 overcomes this problem but at an additional computation cost to calculate the nearest neighbor distance. Algorithm 3 uses iLHS to create sufficiently space-filling samples before beginning the adaptive sampling, but the iLHS process is costly because the algorithm maintains a Latin hypercube sample for each iteration, eliminating previously sampled points if necessary. Although many past sampling algorithms were designed for a specific type of surrogate model, all three algorithms presented here generalize to any type of surrogate model. These algorithms also scale well to problems with higher dimensionality as are common in chemical engineering.