Chemical production scale up can create performance gaps in final products due to process condition variations. To streamline production scale-up, it is important to identify the key process factors of critical performance metrics. Multi-level data processing (critical performance metrics measured at different levels/stages), multicollinearity (high correlations among process factors) and “curse of dimensionality” (too many process factors and interactions compared to number of observations) together make it a challenge in statistical modeling.
In a real Dow scale up example, we applied classical regression methods (Stepwise Regression, Partial Least Square and Generalized models) and Symbolic Regression approach (linear and nonlinear model searching algorithm) for two critical performance metrics. Multiple important process factors were identified consistently across all methods for the two metrics. In this presentation, we also demonstrate model validation using an independent test data for comparing different models. Among all the models, Ridge regression has highest model prediction accuracy with the most model complexity including all process factors and interactions. Pareto front models in Symbolic Regression approach were selected for comparison with classical models in the test data.
From practical and statistical perspectives, we selected two models with relatively high prediction accuracy and low model complexity for the two performance metrics. With this work, the business can streamline process optimization by using in-process, real-time measurements to estimate critical performance metrics. Successful implementation of these process models will accelerate the progress of launching the new products in the commercial manufacturing plant.