212124 Thermodynamic Data, Physicochemical Properties, and Molecular Information on More Than 2.5 Million Chemical Compounds

Tuesday, March 15, 2011: 4:30 PM
Water Tower (Hyatt Regency Chicago)
Yangsoo Kim1, Won-chon Jung1, Ae-ri Sung1, Jeongjae Jeon2, Yunkyung Kwon2, Junhyuk Cho2, O Hyoung Kwon2, Yongkun Leem2 and Tae-Yun Park2, (1)Research and Development, Visual Simulation, Co. Ltd, Seoul, South Korea, (2)Research & Development, Visual Simulation, Co. Ltd, Seoul, South Korea

Scientists and engineers often suffer from the shortage of thermodynamic data, physicochemical properties, and molecular information of chemical compounds. This is not surprising if one consider the fact that over 50 million chemicals have been registered as of October 2010, while the number of chemicals whose thermodynamic or physicochemical data are available is only in the order of 10,000. Experimental measurements are expensive and time consuming, and reliable calculation methods are rarely available.

We have developed computer modules for the prediction of thermodynamic data and physicochemical properties based on quantum mechanical calculations at decent accuracy levels and more than 2,000 molecular descriptors. The predicted results were carefully verified with millions of experimental data collected for over 3 years by more than 50 scientists and engineers, which confirmed that our computer modules are capable of predicting thermodynamic data and physicochemical properties at the accuracy level of experiments.

An automatic procedure from the molecule generation to the prediction of the data and information has been developed, and a computing center containing over 550 computers has been constructed to process massive amount of chemical compounds. Using the procedure and the computing system, more than 2.5 million chemicals have been processed. The final data and the information have been packed into a database server. To search and browse the data of the target chemical, information browsing software has also been developed, which provides the access to the database server and information retrieval online.

Our database contains a total of 2,140 data and information sets per molecule, consisting of 46 thermodynamic data and physicochemical properties, 3 spectra (IR, NMR, VCD), 69 quantum mechanical information, 2,004 molecular descriptors, and 18 drug related properties. The 2.5 million chemical compounds contains radicals, hydrocarbons, fuels such as gasoline, jet-fuel, diesel, bio-diesel, etc., compounds involved in the commercial processes like thermal cracking, combustion, reforming, isomerization, and drug-like molecules and hetero-compounds containing oxygen, nitrogen.

Due to the limitation of quantum computation time, molecules containing other than C, H, N, O, and S atoms and the compounds with number of carbon atom above C25 were not processed. They will also be processed in the near future after upgrading our current computing system.


Extended Abstract: File Not Uploaded
See more of this Session: Industry Needs and Goals for University Research
See more of this Group/Topical: Topical 5: University Research for Industry