Accepted_test

Toward improved calling of allele-specific events in high-throughput sequencing data with a mixture of compound Dirichlet negative multinomial distributions
by Georgy Meshcheryakov | Ivan Vladimirovich Kulakovskiy | Institute of Protein Research, RAS | Institute of Protein Research, RAS
Abstract ID: 531
Event: BGRS-abstracts
Sections: [Sym 12] Section “Systems theory, big biological data analysis, ontologies and artificial intelligence”

Heterozygous sites in gene regulatory regions may have allele-dependent chromatin states or protein affinity. High-throughput sequencing efficiently reveals nucleotide variants at homologous chromosomes of the human genome. The allelic imbalance of read counts supporting alternating alleles at individual sites reveals functional consequences of nucleotide substitutions in regulatory regions. Thus, it is possible to study the allele-specific gene regulation using the data from assays focused on genome regulatory regions, such as ChIP-Seq for transcription factor binding, or DNase- and ATAC-Seq for chromatin accessibility. Quantitative evaluation of the allelic imbalance requires a proper statistical framework accounting for noise, technical variability, and biases in the underlying read count distributions. Here we present a novel approach utilizing joint bivariate Dirichlet negative multinomial compound distributions to improve accuracy and better reflect the underlying nature of high-throughput sequencing experiments.

The approach is implemented as an updated version of the MIXALIME software.