Increasingly, investigators are using quantitative structure-property relationship (QSPR) models to describe various thermophysical properties; as such, improved implementation strategies and guidelines of this modeling technique are needed.
A designed experiment is used to evaluate both linear and nonlinear QSPR methodologies and linear, non-linear, and hybrid descriptor feature selection strategies for two different case studies involving melting point and acentric factor data. In all case studies, the same initial set of descriptors generated from different software packages are reduced to a final modeling set. The descriptors in these final sets are compared and contrasted, and differences in the non-linear QSPR model predictions using these sets are calculated. Efficacy of several feature selection methods such as differential evolution, genetic programming, mutual information and support vector machines are compared. This information provides useful insight for both novel modeling efforts and improved predictive capabilities.