Accepted_test
Gene expression is a complex process controlled by diverse regulatory mechanisms. With the advent and rapid development of massively parallel reporter assays, it became possible to profile the regulatory effects of thousands of UTRs and identify and model the sequence-level regulatory grammar with the help of deep learning. Yet, existing studies fail to properly capture and predict the UTR-dependent pattern of cell type-specific mRNA translation and stability.
In this work we present the PARADE framework, a deep learning solution for estimating cell type-specific activity of 5'- and 3'UTRs, and performing rational design of UTR sequences constrained by desired expression patterns.
For training, we performed a Massively parallel reporter assay (MPRA) yielding 20k+20k 5' and 3' UTR sequences in five distinct cell types. To analyze the data, we used a neural network for the regression of normalized reporter fluorescence from the UTR sequence in five particular cell types.
To generate new sequences, PARADE employs 3 different methods: fine-grain filtering of random sequences, genetic algorithm, and diffusion-denoising neural network. The performance of the methods was then evaluated and validated experimentally.