Accepted_test

Application of text mining methods to extract comprehensive information about the biological activity of drugs: case-study for antiviral compounds
by Biziukova Nadezhda | Sobolev Boris | Karasev Dmitry | Ionov Nikita | Sukhachev Vladislav | Taktashov Rustam | Rudik Anastasia | Ivanov Sergey | Tarasova Olga | Institute of Biomedical Chemistry, Moscow, Russia | Institute of Biomedical Chemistry, Moscow, Russia | Institute of Biomedical Chemistry, Moscow, Russia | Institute of Biomedical Chemistry, Moscow, Russia | Institute of Biomedical Chemistry, Moscow, Russia | Institute of Biomedical Chemistry, Moscow, Russia | Institute of Biomedical Chemistry, Moscow, Russia | Institute of Biomedical Chemistry, Moscow, Russia; Pirogov Russian National Research Medical University, Moscow, Russia | Institute of Biomedical Chemistry, Moscow, Russia
Abstract ID: 403
Event: BGRS-abstracts
Sections: [Sym 3] Section “Pharmacology cheminformatics and chemical biology”

Studies in mathematical biology, bioinformatics, and computer-aided drug discovery requires large amounts of data. Typically, researchers use biomedical databases for this purpose.
These databases are curated manually, based on information from primary sources, texts of scientific publications. Text mining algorithms are being actively developed to obtain structured information from texts automatically and can be beneficial for fast and accurate extraction of knowledge for further usage including expert analysis and development of new hypotheses. The aim of our research is to develop an integrated approach for extracting comprehensive information on the biological activity of low molecular weight chemical compounds and to test it on the example of antiviral drugs. We developed various approaches to recognize chemical and biological named entities - chemical compounds, proteins/gens, diseases, species, cell lines, miRNAs, single nucleotide polymorphisms - based on machine learning(including deep learning)-, regular expression- and dictionary-based methods. In order to find associations between the extracted objects, we developed a rule-based approach, which not only allows us to indicate the existence of a relationship between objects, but also to characterize it based on the semantic features. We processed more than 150,000 abstracts of relevant publications and retrospectively verified the findings. The developed approach allows for rapid extraction of biomedical knowledge from texts of scientific publications.