Accepted_test
Microorganisms of natural and technogenic ecosystems represent an inexhaustible pool of metabolic pathways for the utilization of some compounds and the biosynthesis of others. Modern experimental technologies of genetics and microbiology provide an effective search in microbiological collections for new strains of microorganisms that are promising for biotechnological applications, as well as complete sequencing of their genomes. The practice of the world's major biotechnology companies, accumulated over the past 30 years, indicates the critical importance of integrating experimental and information-computer approaches for constructing strains of microorganisms with target properties, including multi-parameter profiling of microorganisms based on genomic, proteomic and other data, bioinformatics and systems methods computer biology, which allows, on the basis of genomic information, to reconstruct gene networks that control the production of target substances and to build mathematical models of their functioning.
This paper proposes a software pipeline for assembly, annotation of sequenced genomes of microorganisms and reconstruction of their gene networks.
The developed pipeline was used on the data of the collection of microorganisms of the Kurchatov Genomic Center. To date, 2086 genomes have been processed. The average genome length was around 6,000,000 bp. The average number of genes in the genome is 5627. KEGG gene networks and metabolic pathways were reconstructed for all analyzed genomes.