{"id":15730,"date":"2022-06-24T13:31:54","date_gmt":"2022-06-24T06:31:54","guid":{"rendered":"https:\/\/bgrssb.icgbio.ru\/2022\/?p=15730"},"modified":"2022-09-20T10:32:31","modified_gmt":"2022-09-20T03:32:31","slug":"less-is-more-filtering-and-trimming-ont-sequencing-data","status":"publish","type":"post","link":"https:\/\/bgrssb.icgbio.ru\/2022\/2022\/06\/24\/less-is-more-filtering-and-trimming-ont-sequencing-data\/","title":{"rendered":"Less is more: filtering and trimming ONT sequencing data"},"content":{"rendered":"<p><em>by Natalia Nenasheva | MIPT, VIGG, Genotek<\/em><\/p>\n<p>Since many tasks in bioinformatics require high quality reads with unique genome mapping,<br \/>\nwe tested if the overall ONT data quality can be improved by filtering out low-quality reads<br \/>\nor read fragments. To this end we developed a set of criteria applied to the raw (fastq) base<br \/>\ncalling data. To further improve the potential of ONT data we used the combined Illumina \u2013<br \/>\nONT sequencing rather than the standard database genome reference sequence.<br \/>\nWhen filtering the data, we took into account the leveling score normalized to the length of<br \/>\nthe read, the amount of fragmentation of the read, and we considered only data with a<br \/>\nsufficiently high quality score &#8211; phred score > 10. Third filter was introduced: the read<br \/>\nfragmentation score. It has been shown that the most fragmented reads most often have a<br \/>\nlow quality score.<br \/>\nSince the determination of various genetic variations requires alignment to the genome, we<br \/>\ntried to influence this step as well. For one of the samples, in addition to ONT data, illumina<br \/>\nsequencing data was also obtained. We suggested that it would be possible to take into<br \/>\naccount single polymorphisms and compile a reference sequence for these samples<\/p>\n<a href=\"https:\/\/bgrssb.icgbio.ru\/2022\/wp-content\/uploads\/sites\/3\/2022\/06\/NNenasheva_poster.pdf\" class=\"pdfemb-viewer\" style=\"\" data-width=\"max\" data-height=\"max\"  data-toolbar=\"bottom\" data-toolbar-fixed=\"off\">NNenasheva_poster<br\/><\/a>\n","protected":false},"excerpt":{"rendered":"<p>by Natalia Nenasheva | MIPT, VIGG, Genotek Since many tasks in bioinformatics require high quality reads with unique genome mapping, we tested if the overall ONT data quality can be improved by filtering out low-quality reads or read fragments. To this end we developed a set of criteria applied to the raw (fastq) base calling [&hellip;]<\/p>\n","protected":false},"author":3967,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[5],"tags":[196,195,197],"_links":{"self":[{"href":"https:\/\/bgrssb.icgbio.ru\/2022\/wp-json\/wp\/v2\/posts\/15730"}],"collection":[{"href":"https:\/\/bgrssb.icgbio.ru\/2022\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/bgrssb.icgbio.ru\/2022\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/bgrssb.icgbio.ru\/2022\/wp-json\/wp\/v2\/users\/3967"}],"replies":[{"embeddable":true,"href":"https:\/\/bgrssb.icgbio.ru\/2022\/wp-json\/wp\/v2\/comments?post=15730"}],"version-history":[{"count":1,"href":"https:\/\/bgrssb.icgbio.ru\/2022\/wp-json\/wp\/v2\/posts\/15730\/revisions"}],"predecessor-version":[{"id":15732,"href":"https:\/\/bgrssb.icgbio.ru\/2022\/wp-json\/wp\/v2\/posts\/15730\/revisions\/15732"}],"wp:attachment":[{"href":"https:\/\/bgrssb.icgbio.ru\/2022\/wp-json\/wp\/v2\/media?parent=15730"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/bgrssb.icgbio.ru\/2022\/wp-json\/wp\/v2\/categories?post=15730"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/bgrssb.icgbio.ru\/2022\/wp-json\/wp\/v2\/tags?post=15730"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}