Accepted_test
Genotyping by sequencing (GBS) is used to identify genetic variability and to genotype samples more rapidly, and is more cost-effective than whole-genome sequencing. GBS has demonstrated its reliability and flexibility for a number of plant species and populations. The method has been applied to genetic mapping, molecular marker detection, genomic selection, in genetic diversity studies, variety identification, and in conservation biology and evolutionary ecology studies.
GBS significantly reduces both the cost and the time required for sequencing the samples under study. This has led to the need to develop high-quality bioinformatics analysis for the ever-expanding amount of sequenced data.
In the present work, we have developed the GBS-DP bioinformatics pipeline for analyzing GBS-derived data. The pipeline is applicable to any species of organisms. The pipeline is fully automated and allows processing large amounts of data (more than 400 samples). The pipeline provides results of evaluation of basic characteristics of sequenced libraries: read length for each library, average read depth, number of reads per library, average coverage of each library and all libraries in general. Also provides the results of the search for polymorphisms between the genotypes analyzed, distribution of the identified SNPs across chromosomes, principal component analysis of genotypes based on the identified SNPs, and construction of a phylogenetic tree.