Accepted_test
Here, we developed three novel G4-BS models (single-strand DNA sequence subsets) and utilize these models to identify pG4-BS in the human genome. We show that about 90% of human protein-coding genes contain at least one pG4-BS in either the promoter region or the gene body, that the high density of pG4-BS is preferentially localized in transcription checkpoints, including intron–exon junctions, around the transcription start site (TSS) and transcription termination site (TTS). Additionally, many of the pG4-BS overlap with other regulatory elements, such as DNase I hypersensitive sites and transcription factor binding sites (TFBS). Using gene ontology (GO) analysis we identify subsets of pG4-BS enriched in specific genes associated with GO terms related to transcriptional regulation, the nervous system, and diseases. In addition, we consider pG4-BS on both template and non-template strands in promoters and study the associations of these complex back-forward G4 architectures with genes and pathways. The results of our analyses integrating our computational predictions with G4-seq and G4 CUT-Tag data strongly support the existence of G4-B conformations. Additionally, we use nuclear magnetic resonance (NMR) spectroscopy, circular dichroism (CD) spectroscopy and ultraviolet (UV) melting experiments for the identification of stable pG4-BS predicted by our models in cancer-associated genes. Finally, we provide the in-depth structural characterization of the G4-B formed by a novel pG4-BS located in the promoter region of the E2F8 gene.