Accepted_test

Application of text mining methods to extract comprehensive information about the biological activity of drugs: case-study for antiviral compounds

Authors:
Biziukova Nadezhda, Institute of Biomedical Chemistry, Moscow, Russia
Sobolev Boris, Institute of Biomedical Chemistry, Moscow, Russia
Karasev Dmitry, Institute of Biomedical Chemistry, Moscow, Russia
Ionov Nikita, Institute of Biomedical Chemistry, Moscow, Russia
Sukhachev Vladislav, Institute of Biomedical Chemistry, Moscow, Russia
Taktashov Rustam, Institute of Biomedical Chemistry, Moscow, Russia
Rudik Anastasia, Institute of Biomedical Chemistry, Moscow, Russia
Ivanov Sergey, Institute of Biomedical Chemistry, Moscow, Russia; Pirogov Russian National Research Medical University, Moscow, Russia
Tarasova Olga, Institute of Biomedical Chemistry, Moscow, Russia

Abstract ID: 403

Event: BGRS-abstracts

Sections: [Sym 3] Section “Pharmacology cheminformatics and chemical biology”

Studies in mathematical biology, bioinformatics, and computer-aided drug discovery requires large amounts of data. Typically, researchers use biomedical databases for this purpose.
These databases are curated manually, based on information from primary sources, texts of scientific publications. Text mining algorithms are being actively developed to obtain structured information from texts automatically and can be beneficial for fast and accurate extraction of knowledge for further usage including expert analysis and development of new hypotheses. The aim of our research is to develop an integrated approach for extracting comprehensive information on the biological activity of low molecular weight chemical compounds and to test it on the example of antiviral drugs. We developed various approaches to recognize chemical and biological named entities - chemical compounds, proteins/gens, diseases, species, cell lines, miRNAs, single nucleotide polymorphisms - based on machine learning(including deep learning)-, regular expression- and dictionary-based methods. In order to find associations between the extracted objects, we developed a rule-based approach, which not only allows us to indicate the existence of a relationship between objects, but also to characterize it based on the semantic features. We processed more than 150,000 abstracts of relevant publications and retrospectively verified the findings. The developed approach allows for rapid extraction of biomedical knowledge from texts of scientific publications.