Accepted_test

Machine learning for rational design and reliable prediction of activity of gene regulatory regions

Authors:
Penzar Dmitry, AIRI

Abstract ID: 691

Event: BGRS-abstracts

Sections: [Sym 1] Section “Regulatory genomics”

In previous studies, we had shown how to adapt the EfficientNetV2 architecture, originally used for image classification, to be effectively applicable for the analysis of nucleotide sequences. Our LegNet model outperformed multiple competing solutions, including those based on recurrent networks and attention-based networks, in solving the problem of predicting reporter protein expression in yeast cells solely from the promoter sequence.

Here we show that our model LegNet demonstrates state-of-the-art (SOTA) results in predicting expression from the regulatory sequence for yeast promoters, estimating the activity of Drosophila regulatory elements, and human enhancers. For human gene regulatory regions, the achieved model performance surpasses Enformer , which was pre-trained on a large number of experiments from the ENCODE database. Further, models trained for regulatory regions of the human genome are capable of predicting allele-specific events based on data from ADASTRA [8] and UDACHA.

Also, we have proposed and tested a generative diffusion-based model for generating promoters with a desired level of expression, the first of its kind not limited by handling categorical variables only