Accepted_test
Transcriptional R-loops are dynamical three-stranded guanine-rich RNA-DNA hybrid structures in which a nascent RNA is hybridized to a template DNA strand, with the other (non-template) DNA strand looped out. We have proposed that such single-stranded DNA sequences determine the initiation and stabilization of R-loops genome-wide and found that they are ubiquitous in eukaryotic genomes (Wongsurawat et al, NAR, 2012, Kuznetsov et al, NAR, 2018). We called a family of these sequences the R-loop forming sequences (RLFS). In our studies, we have introduced the Quantitative models of R-Loop Forming Sequences (QmRLFS) tools (Jenjaroenpun et al, NAR, 2016). As per QmRLFS model, RLFS is a non-template guanine-rich sequence consisting of three zones: R-loop initiation zone (RIZ, includes few G repeats), linker sequence, and R-loop elongation zone (REZ, high rich G sequence). QmRLFS identifies the sizes, positions, and boundaries of RLFS with high specificity and sensitivity. The model predicts the size and boundaries of the R-loop at an accuracy of 86-92% in vitro providing objective controls and improved R-loop detection methods. The objective of our review is to consider the computationally predicted RLFSs and R-loop biology studies in the context of improvement of quality control, accuracy, reproducibility, and resolution experimental methods. We also consider the perspective of integrative computational biology, bioinformatics and functional genomics, single-cell and image analyses of R-loops, and mathematical modeling to move forward to cancer R-loop pathobiology and clinical oncology needs.