Accepted_test

Inferring posterior weight distribution in hierarchical mixture models

by Georgy Meshcheryakov | Alexandra Vadimovna Kovanova | Institute of Protein Research, RAS | MediaLine Group

Abstract ID: 708

Event: BGRS-abstracts

Sections: [Sym 12] Section “Systems theory, big biological data analysis, ontologies and artificial intelligence”

Mixture models are used to model heterogenous data that is formed by sampling from multiple known distributions. The exact proportion of the data belonging to one each of the specified distributions is usually unknown and is a parameter to be estimated. In Bayesian treatment of the problem, it is assumed that mixture parameters follow a prior distribution, typically the Beta (or Dirichlet in a multivariate case) distribution. Then, posterior weight distribution is trivially inferred for each data point with a maximum-a-posteriori value usually being interpreted as a probability of a data point belonging to a particular mixture component. However, in some applications it is known that subsets of data points jointly originate from the same mixture component. Then, the posterior distribution attains a seemingly intractable form that is tough to compute. For instance, this problem arises when dealing with unphased data in high-throughput sequencing experiments. There, multiple samples/observations of the same SNP have the same unknown phase. The number of SNPs can be in tens of thousands and the average number of observations per SNP can be as high as a hundred, making applying stochastic algorithms such as MCMC infeasible. Here, we infer an analytical form for the posterior weight distribution of a mixture model and provide an algorithm to compute it in O(n^2) , making its computation possible in most practical cases.