Monday, November 5, 2007

Multivariate Linear Regression As A Tool In Modeling The Kinetics Of Complex Chemical Reactions

Foteini Makrydaki, Chemical and Biological Engineering, Tufts University, Science and Technology Center, 4 Colby Str., Medford, MA 02155 and Christos Georgakis, Department of Chemical and Biological Engineering, Tufts University, Science and Technology Center, 4 Colby Street, Medford, MA 02155.

The goal of this research project is to systematically identify the structural model-data mismatch and thus iterate the kinetic forms to a rich enough form, until the mismatch is minimized. Complex reaction systems consist of a large number of species whose concentration changes over time are affected by several active reactions. Our methodology aims to achieve quantitative description of the kinetics, using experimental data in which many if not all the reactions are active. This type of model and its accuracy description can then be used to define the operational space of the process. Pharmaceutical reactions are of particular interest. In the first step of the methodology, Singular Value Decomposition (SVD) is applied to the increases/decreases in the amounts of measured compounds from the concentration vs. time data. The number of singular values that are significantly different from zero indicates the number of linearly independent reactions. The second step identifies the possible stoichiometric models using Structured Target Factor Analysis (STFA)1. The next step, which is the focus of the present paper, is to identify the kinetic model that best describes the data. For an accepted stoichiometric model, the SVD analysis provides the evolution of the reaction extents matrix, which is used to calculate the reaction rates at each time. In this step, the statistical analysis of the correlation structure between the reaction rates and the different concentrations provides strong indication of the most plausible kinetic forms for each reaction. To perform the last step, nonlinear regression methods are used to estimate accurately the parameters of each of the final candidate models. A comparison between the accuracy of the competing models leads to the selection of the most appropriate one. Otherwise, a new set of experiments is designed to provide additional data most appropriate for the discrimination among rival models. In this presentation, we address the use of Multivariate Linear Regression as a tool to consider the suitability of a large set of kinetic forms for each of the reactions. This approach considers which is the best set of two, three, or larger number of species to be included in the reaction rate and quantifies the significance of each of them. Denominator or inhibition terms are also considered in an order of magnitude framework. Statistical criteria like R2, P-values, and Mallows Cp are used to define the most promising model(s). Two evolutionary methods are considered in order to define the most appropriate components to be included in each kinetic form: Best Subsets Regression and Stepwise Regression. The methodology is applied to both simulated and experimental reaction engineering problems: the Epoxidation of Oleic Acid and Aspirin Hydrolysis.

1Fotopoulos, J., Georgakis, C., and Stenger, H. G.: “Structured Target Factor Analysis for the Stoichiometric Modeling of Batch Reactors,” Proceedings of the 1994 American Control Conference, pp. 495-499, Baltimore, MD, June 29-July 1.