A Systematic Approach of Using Material Properties Data for Pharmaceutical Process Simulation
Jun Zhang1, Frances Pereira2, Ravendra Singh1, Sean Bermingham2, Fernando Muzzio1, Rohit Ramachandran1, Marianthi Ierapetritou1
1C-SOPS, Chemical and Biochemical Engineering, Rutgers University, Piscataway, NJ, USA
2Process Systems Enterprise Inc, London, UK
Pharmaceutical industry faces a “data rich” world since data are continuously generated from the substantial experiments employed in the process development activities. The data generation speed was anticipated as double every month, and it is still being promoted by the development of rapid experimental techniques. Benefits of harnessing this accumulated data include: narrowing down the design space of pharmaceutical process to be explored, reducing the amount of experiments to be implemented, and generating a clear landscape about target product and process performance . However, less attention has been given on data utilization as indicated by the survey of pharmaceutical industry that pointed to less than 10% of data use. The reason of inefficient data utilization is that the data lack systematic organization and there is very little work on information extraction from such data sets. To address this issue, a systematic approach is developed which consists of material properties data representation, search function and multivariate data analysis.
The data representation can systematically represent material properties data, which is generated by different characterization devices, as a set of specifications; and these specifications would be organized as a XML file, which is an extensible markup language defined by W3C. Each data point of material properties is represented as a XML file with unique filename as the ID, and in each XML file, the measurements of material properties are organized as a hierarchical structure where each node represents a specification, i.e. a specific measurement’s name with associated value.
The search function is developed to allow user to retrieve desired material properties data, which consists of user interface, ontology base and comparison algorithm. User interface allows user to define the specifications of data to be retrieved, e.g. API name, as well as the numerical criterion to be used for data searching, e.g. ±10% of specified viscosity. Ontology base consists of a set of ontologies that describe the relationship of terminologies referred in the materials properties data, e.g. Paracetamol is a type of API. The ontology establishes logic links among the data that allow more related data can be explored, which expands the search space that could help user to further understand the data. Comparison algorithm compares user’s specifications with each XML file using numerical criterion and ontologies to return the data that is relevant with user’s specifications.
Based on the returned data, partial lease square regression (PLS) algorithm is used to correlate the material properties information and process parameters with process output in order to generate a predictive model. Such predictive model ensures consistency of information of each unit’s input and output, and it would greatly facilitate process simulation, especially the whole process consisting of different unit models that require inconsistent inputs and outputs.
As a case study, a specific blending process consisting of feeding, co-milling and blending, which has inconsistent model inputs and outputs, is used. gPROMs  is selected as the simulation platform to demonstrate how this framework can be implemented and how the data can be fully and flexibly used for process simulation study.
 Zhang J, Hunter A, Zhou Y. A logic-reasoning based system to harness bioprocess experimental data and knowledge for design. Biochemical Engineering Journal, 2013, 74: 127-135.
 Boukouvala F, Niotis V, Ramachandran R, Muzzio F, Ierapetritou M, An integrated approach for dynamic flowsheet modeling and sensitivity analysis of a continuous tablet manufacturing process, Computers & Chemical Engineering, 2012, 42: 30-47.