Accepted_test

Motif discovery and benchmarking from diverse assays highlight the best tools for predicting DNA-binding specificity of human transcription factors
by GRECO-BIT/Codebook Consortium | VIGG, MSU, IPR, Altius, UToronto, EPFL, SIB, IOCB, Sirius Univ., MPIB, MLU Halle, UBC
Abstract ID: 527
Event: BGRS-abstracts
Sections: [Sym 1] Section “Regulatory genomics”

Computational methods that discover DNA motifs specifically binding proteins have been developed for over three decades, stimulated by the continuing progress in experimental methods for DNA-protein interaction profiling. Yet, despite the availability of many advanced motif models, the position weight matrix, the traditional way to represent the DNA binding specificity, remains the most common approach. As GRECO-BIT/Codebook initiative, we used the new experimental data from several thousand experiments for several hundred less-studied human transcription factors to derive binding motifs with diverse DNA motif discovery tools, from multiple sequence aligners to deep learning methods. We performed an extensive benchmarking study highlighting the best-performing tools and assembled a rich catalog of motifs, many of which belong to previously unexplored transcription factors. Next, we used a carefully selected subset of forty TFs to call for the wisdom of crowds: we have organized an open competition in constructing the best sequence-level models of transcription factor binding sites from these yet unpublished data. The challenge in Inferring Binding Specificities, IBIS, is built on the experimental data on the TF-DNA interactions obtained from five different experimental techniques, and aimed at a fair assessment of existing and novel methods for de novo TFBS motif discovery, using both classic Position Weight Matrices and Arbitrary Advanced Approaches, “triple-A” models. We plan to report the best-performing software tools and key findings of the internal within-Consortium motif benchmarking study and tease the first insights from the open IBIS Challenge, which will end mid-summer 2024.