Transcription factor binding sites: data integration, stable identifiers and incremental builds

by Kolmykov Semyon | Kondrakhin Yury | Sharipov Ruslan | Yevshin Ivan | Ryabova Anna | Kolpakov
Fedor | Sirius University of Science and Technology, Sochi, 354340, Russian Federation | Federal
Research Center for Information and Computational Technologies, Novosibirsk, 630090, Russian
Federation | Novosibirsk State University, 630090, Russian Federation | BIOSOFT.RU, LLC, Novosibirsk, 630090, Russian Federation 

The report describes the modifications of the previously developed algorithm for metaanalysis of ChIP-seq datasets through the rank aggregation approach (METARA). It
evaluates the quality (or reliability) of each meta-cluster by calculation of the rank aggregation score.
There are two main reasons for creating IMETARA (incremental METARA):
1) to significantly reduce the run time of meta-cluster building;
2) to introduce stable identifiers for each meta-cluster, which will allow researchers to
refer directly to the meta-clusters of TF binding sites, as they currently refer to SNP.
Thus, IMETARA only recalculates the rank aggregation scores for existing meta-clusters and
adds novel meta-clusters and gives stable IDs, when there are new ChIP-seq experiments
are added to the ChIP-seq data collection already stored in GTRD.

BGRS_SB-2022_SKolmykov_slides