Accepted_test
Studies in mathematical biology, bioinformatics, and computer-aided drug discovery requires large amounts of data. Typically, researchers use biomedical databases for this purpose.
These databases are curated manually, based on information from primary sources, texts of scientific publications. Text mining algorithms are being actively developed to obtain structured information from texts automatically and can be beneficial for fast and accurate extraction of knowledge for further usage including expert analysis and development of new hypotheses. The aim of our research is to develop an integrated approach for extracting comprehensive information on the biological activity of low molecular weight chemical compounds and to test it on the example of antiviral drugs. We developed various approaches to recognize chemical and biological named entities - chemical compounds, proteins/gens, diseases, species, cell lines, miRNAs, single nucleotide polymorphisms - based on machine learning(including deep learning)-, regular expression- and dictionary-based methods. In order to find associations between the extracted objects, we developed a rule-based approach, which not only allows us to indicate the existence of a relationship between objects, but also to characterize it based on the semantic features. We processed more than 150,000 abstracts of relevant publications and retrospectively verified the findings. The developed approach allows for rapid extraction of biomedical knowledge from texts of scientific publications.