Accepted_test

Cost-effective estimation of optimal number of reads in GBS sequencing
by Zamalutdinov Aleksei | Boldyrev Stepan | Ben Cecile | Gentzbittel Laurent | Skolkovo Institute of Science and Technology, Moscow, Russia | Skolkovo Institute of Science and Technology, Moscow, Russia | Skolkovo Institute of Science and Technology, Moscow, Russia | Skolkovo Institute of Science and Technology, Moscow, Russia
Abstract ID: 558
Event: BGRS-abstracts
Sections: [Sym 6] Section “Genomics, genetics and systems biology of plants”

Genotype-by-sequencing (GBS) is a cost-effective approach to large-scale genotyping that has been widely used for several species, particularly those with large genomes. To reduce genome complexity, GBS uses digestion of the genome by one or more restriction enzymes. These enzymes have different properties and affect the size, number and genomic localisation of the fragments. Therefore, the choice of restriction enzymes for library preparation is a critical step in GBS. Several protocols and different enzyme combinations have been evaluated to reduce costs and increase accuracy. However, the quality and cost of genotyping also depends on the number of reads used. It is expected that more reads will yield more SNPs. We have shown that the number of reads affects the final number of SNPs as well as their localisation in soybean using a real data set with the HindIII-NlaIII enzyme combination. However, estimating the optimal number of reads by genotyping the test GBS library is either costly when high coverage is used or inaccurate when low numbers of reads are used. An alternative approach using high coverage genotyping of the one sample without SNP calling has been proposed. Such an indirect approach allows to define the range of read numbers where the cost is reasonable for the output. We believe that this approach provides a simple way to estimate the appropriate number of reads per library and can be extended to other species and enzyme combinations.