Accepted_test
Heterozygous sites in gene regulatory regions may have allele-dependent chromatin states or protein affinity. High-throughput sequencing efficiently reveals nucleotide variants at homologous chromosomes of the human genome. The allelic imbalance of read counts supporting alternating alleles at individual sites reveals functional consequences of nucleotide substitutions in regulatory regions. Thus, it is possible to study the allele-specific gene regulation using the data from assays focused on genome regulatory regions, such as ChIP-Seq for transcription factor binding, or DNase- and ATAC-Seq for chromatin accessibility. Quantitative evaluation of the allelic imbalance requires a proper statistical framework accounting for noise, technical variability, and biases in the underlying read count distributions. Here we present a novel approach utilizing joint bivariate Dirichlet negative multinomial compound distributions to improve accuracy and better reflect the underlying nature of high-throughput sequencing experiments.
The approach is implemented as an updated version of the MIXALIME software.