470591 Advanced Modeling of Tissue:Blood Partition Coefficients for Industrial Chemicals
Several approaches incorporating QSARs have been proposed for the prediction of partition coefficients for PBBK modeling, including (a) the Peyret, Poulin and Krishnan algorithm, which is based on the fractional content of cells, interstitial fluid in tissue, plasma in blood, erythrocyte in blood, tissue lipids and the lipophilicity of the compound of interest; (b) the molecular fractions algorithm proposed by Béliveau et al. that takes into account the frequency of occurrence of the several molecular fragments of the compounds and (c) Abraham’s solvation equation for estimating biological properties, which takes into account the excess molar refraction, the compound dipolarity/polarizability, the solute effective or summation hydrogen-bond acidity, the solute effective or summation hydrogen-bond basicity and the McGowan characteristic volume that can trivially be calculated for any solute simply from a knowledge of its molecular structure.
The methodological approach presented in this study proposes the modeling of tissue/blood partition coefficients for five main human tissues (muscle, kidney, adipose, liver, brain) using PaDEL Descriptor and QSARINS. PaDEL Descriptor is an open source software for the calculation of 1D, 2D, 3D molecular descriptors and fingerprints of chemical compounds. QSARINS is used for the development of QSAR models, based on Multiple Linear Regression (MLR) by Ordinary Least Squares (OLS) as modeling method and Genetic Algorithm (GA) for descriptors’ selection. In QSARINS, models are analysed using tools such as Principal Component Analysis (PCA), fitting, internal and external validation criteria and applicability domain procedure. Users can browse through different options, ending up with a robust and reliable model according to the OECD principles.
The first step of QSAR modeling was the preparation of input data, which included the experimental values of the tissue/blood partition coefficients and the molecular descriptors of the corresponding chemical compounds. The dataset was consisted of 33 environmental chemical compounds, which were randomly splitted to a training and a prediction set. The splitting was based on random selection through property sampling, performed by ordering the chemicals according to their descending experimental values. The prereduction process was followed for the derived PaDEL descriptors in order to avoid the semi-constant and intercorrelated ones. A set of 435 descriptors for each chemical compound was used for the development and analysis of QSAR models.
The next step was the dataset analysis and the development of QSAR models. The distribution of chemical compounds in the chemical space was explored using PCA. The score plot indicated that the molecules were clustered by structures and the loading plot showcased the most influential descriptors for the chemicals’ categorization. As mentioned before, the statistical method of MLR, combined with OLS, was used for the development of the models. Variable selection was done by means of a genetic algorithm, which aimed to find the best combination of variables for the derived models. A large number of models was developed and ordered according to their fitting performance. In order to evaluate the models’ validity, internal (Leave One Out (LOO) and Leave Many Out (LMO) technique, Y-scrambling) and external validation methods were applied. The selection of the best model for each tissue/ blood partition coefficient was based on the Multi-Criteria Decision Making (MCDM) value, which summarized the fitting, cross validation and external validation criteria.
The fitting performance (R2) of the selected models for predicting muscle, kidney, adipose, liver and brain/ blood partition coefficient was 0.92, 0.92, 0.97, 0.94 and 0.96, respectively. The LOO technique ( ) indicated that models’ performance in predictions was equal to 0.88, 0.90, 0.96, 0.92 and 0.94, while the LMO technique ( ) resulted in 0.86, 0.89, 0.95, 0.92 and 0.93, respectively. The external validation value ( ) was found to be 0.56, 0.62, 0.98, 0.81 and 0.81 for muscle, kidney, adipose, liver and brain/ blood partition coefficients, respectively. The absence of chance correlation was confirmed by the low values, obtained from the Y-scrambling method. The Root Mean Square of Error (RMSE) for the training set was calculated and ranged from 0.08 to 0.16, while for the prediction set ranged from 0.18 to 0.27. The Applicability Domain (AD) analysis showed that there were not outliers, verifying the reliability of each of the developed QSAR models.
The proposed models for the estimation of tissue/blood partition coefficients were checked for their fitting, validity and applicability. It was found that they are stable, reliable and capable to predict physicochemical parameters of “data poor” chemical compounds that fall within the applicability domain. The developed predictive models could serve as a tool to fill in data gaps of environmental chemicals with unknown values of tissue/blood partitioning. In this way, the animal testing and experiments could be reduced and the wide use of PBBK models could be reinforced. In conclusion, the “safe by design” concept for environmental chemicals is supported, by allowing the successful prediction of toxicokinetic behavior based on molecular parameters, promoting green chemistry and cost saving of product development.