Peak caller comparison through quality control of ChIP-Seq datasets

Ruslan N. Sharipov1, Yury V. Kondrakhin2, Semyon K. Kolmykov3, Ivan S. Yevshin4, Anna S. Ryabova5, Fedor A. Kolpakov61BIOSOFT.RU, LLC; Novosibirsk State University Novosibirsk, Russia, shrus79@biosoft.ru2Institute of Computational Technologies SB RAS; BIOSOFT.RU, LLC, Novosibirsk, Russia, yvkondrat@mail.ru3FRC Institute of Cytology and Genetics SB RAS; Institute of Computational Technologies SB RAS, Novosibirsk, Russia, kolmykovsk@gmail.com4Institute of Computational Technologies SB RAS; BIOSOFT.RU, LLC Novosibirsk, Russia, ivan@biosoft.ru5Institute of Computational Technologies SB RAS; BIOSOFT.RU, LLC Novosibirsk, Russia, anna@biosoft.ru6Institute of Computational Technologies SB RAS; BIOSOFT.RU, LLC Novosibirsk, Russia, fedor@biosoft.ru Chromatin immunoprecipitation followed by high throughput sequencing, i.e. ChIP-Seq, is a widely used experimental technology for the identification of functional protein-DNA interactions. Nowadays, such databases as GTRD, ChIP-Atlas and ReMap systematically collect and annotate a large number of ChIP-Seq datasets generated by distinct peak callers, including MACS2. The quality control of such datasets is currently indispensable, since the peak callers may produce different results for the same ChIP-seq experiment. We have performed a comparative analysis of intensively used peak callers with the help of two metrics that control false positive/negative rates. We have found that MACS2 outperformed its competitors.

Read More