{"id":15662,"date":"2022-06-22T10:50:25","date_gmt":"2022-06-22T03:50:25","guid":{"rendered":"https:\/\/bgrssb.icgbio.ru\/2022\/?p=15662"},"modified":"2022-09-20T10:32:44","modified_gmt":"2022-09-20T03:32:44","slug":"pygenomics-python-package-for-processing-genomic-intervals-and-bioinformatic-data-formats","status":"publish","type":"post","link":"https:\/\/bgrssb.icgbio.ru\/2022\/ru\/2022\/06\/22\/pygenomics-python-package-for-processing-genomic-intervals-and-bioinformatic-data-formats\/","title":{"rendered":"Pygenomics: Python package for processing genomic intervals and bioinformatic data formats"},"content":{"rendered":"<p><em>by Gaik Tamazian | Nikolay Cherkasov | Alexander Kanapin | Anastasia Samsonova | Saint Petersburg<\/em><br \/>\n<em>State University | Saint Petersburg State University | Saint Petersburg State University | Saint<\/em><br \/>\n<em>Petersburg State University<\/em><\/p>\n<p><strong>Motivation<\/strong>: Computational analysis of genome sequencing data and its derivatives such as<br \/>\nassembled genome sequences, annotated genes, repeats, and genomic variants plays an<br \/>\nimportant role in modern bioinformatic studies. Such studies are usually implemented in the<br \/>\nform of computational pipelines which combine invoking bioinformatic programs (e.g.,<br \/>\ngenome assemblers or gene prediction programs) with extra routines that convert input or<br \/>\noutput files of the programs between various bioinformatic data formats and query the<br \/>\nproduced files. Many bioinformatic data formats are based on genomic intervals, and thus<br \/>\nquerying files in such formats requires operating with the intervals.<br \/>\n<strong>Methods<\/strong>: We present pygenomics &#8212; an open-source Python package that provides routines<br \/>\nfor reading and writing bioinformatic data in various formats and operating with genomic<br \/>\nintervals. Pygenomics is implemented in pure Python and does not require any other<br \/>\nlibraries except for the Python standard library. The package is developed according to the<br \/>\nfunctional programming paradigm that ensures immutability of the package entities,<br \/>\nabsence of side effects in the package functions except for ones related to input-output<br \/>\n(I\/O), and extendable stream-based I\/O.<br \/>\n<strong>Results<\/strong>: Pygenomics implements reading and writing from a number of bioinformatic data<br \/>\nformats, including BAM, BED, GFF3, and VCF. The package provides the application<br \/>\nprogramming interface (API) and the command-line interface (CLI) for calling its routines<br \/>\nfrom a source code or as stand-alone programs, respectively. Implementation of pygenomics<br \/>\nin pure Python allows to seamlessly incorporate the package routines into Snakemake<br \/>\npipelines and to run them using CPython and PyPy interpreters. Absence of external<br \/>\ndependencies, implementation in pure Python, and the property-based testing framework<br \/>\nfacilitate deployment of pygenomics to various computational platforms.<\/p>\n<a href=\"https:\/\/bgrssb.icgbio.ru\/2022\/wp-content\/uploads\/sites\/3\/2022\/06\/pygenomics_presentation.pdf\" class=\"pdfemb-viewer\" style=\"\" data-width=\"max\" data-height=\"max\"  data-toolbar=\"bottom\" data-toolbar-fixed=\"off\">pygenomics_presentation<br\/><\/a>\n","protected":false},"excerpt":{"rendered":"<p>by Gaik Tamazian | Nikolay Cherkasov | Alexander Kanapin | Anastasia Samsonova | Saint Petersburg State University | Saint Petersburg State University | Saint Petersburg State University | Saint Petersburg State University Motivation: Computational analysis of genome sequencing data and its derivatives such as assembled genome sequences, annotated genes, repeats, and genomic variants plays an [&hellip;]<\/p>\n","protected":false},"author":3967,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[17],"tags":[120,121,122],"_links":{"self":[{"href":"https:\/\/bgrssb.icgbio.ru\/2022\/ru\/wp-json\/wp\/v2\/posts\/15662"}],"collection":[{"href":"https:\/\/bgrssb.icgbio.ru\/2022\/ru\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/bgrssb.icgbio.ru\/2022\/ru\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/bgrssb.icgbio.ru\/2022\/ru\/wp-json\/wp\/v2\/users\/3967"}],"replies":[{"embeddable":true,"href":"https:\/\/bgrssb.icgbio.ru\/2022\/ru\/wp-json\/wp\/v2\/comments?post=15662"}],"version-history":[{"count":1,"href":"https:\/\/bgrssb.icgbio.ru\/2022\/ru\/wp-json\/wp\/v2\/posts\/15662\/revisions"}],"predecessor-version":[{"id":15664,"href":"https:\/\/bgrssb.icgbio.ru\/2022\/ru\/wp-json\/wp\/v2\/posts\/15662\/revisions\/15664"}],"wp:attachment":[{"href":"https:\/\/bgrssb.icgbio.ru\/2022\/ru\/wp-json\/wp\/v2\/media?parent=15662"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/bgrssb.icgbio.ru\/2022\/ru\/wp-json\/wp\/v2\/categories?post=15662"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/bgrssb.icgbio.ru\/2022\/ru\/wp-json\/wp\/v2\/tags?post=15662"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}