269747 Strategic Search for Experimental Conditions for Efficient Product Design
In experimental design of materials and products, experimental results are affected by various parameters, for example, chemical constitution and composition of raw materials, and synthesis condition. There are many parameter combinations and the relationships between parameters and physical properties are very complicated. Hence, parameter search for the candidate having the desired properties takes cost and time.
Until now, experimenters have chosen the parameter combination from a large choice with their experience and/or intuitions. However, this inefficient search method increases experimental number of times and development cost.
For understanding relationships between parameters and physical properties, prediction models with regression methods are useful. Traditionally, predicted values of physical properties with a regression method were used for the parameter search. However, in regions where data density is low, the predictive ability of a regression model is also low and the reliability of predicted values is not ensured.
In this study, we use regression methods, and evaluate candidates with predicted physical properties and data distribution as evaluating criteria. We can get the material with desired physical properties with less experimental number of times by using the proposed method.
First, a physical property prediction model is built between experimental parameters (X) and a physical property (y) from database by using regression methods such as gaussian process (GP), partial least squares (PLS), and support vector regression (SVR). Second, values of new X of candidates are input into the model and values of y are predicted. Third, these candidates are evaluated with the following criteria. After evaluating the candidates, the physical property of the best ones is measured. If the measured property is insufficient, the prediction model is updated with these data. This flow is repeated until objected values of y are obtained.
Two evaluating criteria are used in this method. One is probability (P) to get the material which has the objective y (y1≤y≤y2). P is obtained by integration of the normal distribution, whose average is a predicted value of y (ypred) and variance is estimated variance of prediction errors. This variance can be calculated with GP.
Another criterion is data density (DD) calculated with one-class support vector machine (OCSVM). In regions where data density is high, the predictive ability will be high and these regions are called as applicability domains of a regression model.
After range-scaling of P and DD, a weighted average of these two criteria is regarded as an overall criterion. We changed the weight of the criteria variously and investigated the experimental number of times to get the objective solution with each regression method.
The proposed method was applied to various types of data. Simulation data were generated with Goldstein-Price Log Function. The number of input variables (X) is two and the number of output variables (y) is one. The minimum value was 1.10 when x1 equals to -1 and x2 equals to 0. We got 1,681 data by changing each x (-2≤x≤2) in steps of 0.1. The objective region of y was set as -5≤y≤3, and the first 10 data for constructing prediction model were selected randomly from the candidates whose values of y were larger than 7.
The weight of P for a weighted average was set as 0.5, which meant that P and DD were dealt as equivalent. When GP or SVR which is a nonlinear regression method was used, the experimental number was smaller than that of PLS which is a linear regression method. When search for the candidates in the objective region of y was performed 50 times, the average experimental number with the proposed method was 13.6 when GP was used. Meanwhile, when only predicted value was used as a traditional evaluating criterion, the average number was 27.9. Herewith, the effectiveness of our method was confirmed.
Next, we applied this method to actual data which were the logarithmic water solubility values of 1,290 chemical compounds obtained from the literature. X-variables were 187 structural descriptors and y-variable was water solubility (logS). The objective region of y was 1.15≤y≤2.00, and the first 10 data were selected randomly from the chemical compounds whose values of y were smaller than -5. In this case, when we used PLS the experimental number was smaller than that of GP and SVR because this data had strong linearity between X and y. The average experimental number was 4.98 when the proposed method was used and 5.60 when only predicted value was used as traditional evaluating criterion.
By using the proposed method, the decrease of the experimental number of times was achieved, and therefore more efficient experimental design of materials and products can be expected.
G. Li, V. Aute, and S. Azarm, Structural and Multidisciplinary Optimization, 40, 137-155, 2010.
 I. V. Tetko, V. Y. Tanchuk, T. N. Kasheva, A. E. P. Villa, J. Chem. Inf. Comput. Sci., 41, 1488-1493, 2001.