Accepted_test
Studies in mathematical biology, bioinformatics, and computer-aided drug discovery requires large amounts of data. Typically, researchers use biomedical databases for this purpose. However, as a rule, these databases are entiched manually, based on information from primary sources - texts of scientific publications. Due to the labor-intensive nature of this process, text mining algorithms are being actively developed to obtain structured information from texts automatically. The aim of our research is to develop an integrated approach for extracting comprehensive information on the biological activity of low molecular weight chemical compounds and to test it on the example of antiviral drugs. We have developed various approaches to recognize chemical and biological named entities - chemical compounds, proteins/gens, diseases, species, cell lines, miRNAs, single nucleotide polymorphisms - based on machine learning(including deep learning)-, regular expression- and dictionary-based methods. In order to find associations between the extracted objects, we developed a rule-based approach, which not only allows us to indicate the existence of a relationship between objects, but also to characterize it based on the semantic features. We processed more than 150,000 abstracts of relevant publications and retrospectively verified the findings.