467310 Machine Learning and Natural Language Processing for Pharmaceutical Product Engineering

Tuesday, November 15, 2016: 2:50 PM
Union Square 3 & 4 (Hilton San Francisco Union Square)
Miguel Francisco Remolona and Venkat Venkatasubramanian, Department of Chemical Engineering, Columbia University, New York, NY

Pharmaceutical product engineering is a “Big Data” discipline. It requires understanding of details of the drug chemistry during production and within the body, the manufacturing processes and conditions, and the pharmacokinetics of a disease – all data intensive. In fact, a typical New Drug Application (NDA) contains more than 100,000 pages of a variety of information. In this talk, we present a framework, called HOLMES, for the automatic extraction of knowledge from primary sources related to pharmaceutical product engineering. The information extracted is then stored in ontologies. These ontologies are a computer readable semantic knowledge representation used in artificial intelligence. We describe Machine Learning (ML) algorithms and Natural Language Processing (NLP) techniques that are used in HOLMES for Entity and Concept Recognition and Relation Extraction. We will discuss our progress on the creation of an entity-concept-and-relation databank (7968 entities and concepts, 1665 relations); the application of different ML algorithms for the purpose of joint Entity and Concept detection; and the development of a relation clustering algorithm using common feature sets.

Extended Abstract: File Not Uploaded
See more of this Session: Tools and Techniques for Product Design
See more of this Group/Topical: Process Development Division