444698 Comparative Analysis of Molecular Structure Identifiability Based on Signatures and Descriptors

Monday, April 11, 2016
Exhibit Hall E (George R. Brown )
Zelimir Kurtanjek, Faculty of food technology and biotechnology, University of Zagreb, Zagreb, Croatia

Chemoinformatics methodology and application of large data sets of molecular information is becoming integrated into computer chemical engineering process design software. Computer design and/or selection of molecules with target properties from QSAR models  is percived as large scale computational combinatorial problem.  Information on molecule structure and inferences of its properties are mostly based on the following two approaches: molecule structure coding based on graph theory (Faulon et al.1 extended valence) and the chemical molecule based descriptors2. Available are software tools for automatic calculation of chemoinformatic data, but the needed inverse modelling from target propertis to molecule structures is difficult and is still an open problem. Due to lack of systemic formal mathematical properties of chemoinformatic mappings, they are nonlinear, noncontinuous, highly synergetic, hence linear/nonlinear continuous  models lack generalisation and are mostly case limited. Here are applied models based on decision trees/random forest and evaluated are their accuarcy for inverse classification from chemoinformatic data to molecule structures. Here are presented as  test molecules:  alkanes, alkenes, acetones, aromatics, organic acids and halogenated hydrocarbons in the range of C1-C12, and a set of binary ionic liquids (cations: imidazole, pyridinium, quinolinium, ammonium, phosphonium).  The results indicate that molecule descriptors outperform graph based approach  for molecule prediction of properties, but for accuracy of the inverse mapping is favored by the graph extended valances.  


