Accepted_test
The RNA genome of SARS-CoV-2 was shown to be organized into structural and functional blocks of RNA information that are demarcated by short RNA breakpoint sequences that promote recombination at specific non-random locations within the viral genome consisting of short repetitive sequences, namely palindromes. Palindromic sequences are involved in the formation of RNA secondary structures. They can be locations recognized by RNA-binding proteins as well as places of RNA recombination. We analyzed SARS-COV-2 genomes with particular attention to mutations within palindromic sequences.
A dataset of complete isolate nucleotide sequences was extracted from https://www.ncbi.nlm.nih.gov/sars-cov-2. Each sequence was annotated with a World Health Organisation (WHO) SARS-CoV-2 annotation. Each nucleotide sequence was individually aligned to the reference sequence SARS-COV -2 (NC_045512.2) using the MAFFT alignment program. The StatRepeats program was used to determine all palindromes with a minimum length of 8. A total of 801.935.394 palindromes were determined. The average number of palindromes per isolate shows a constant increase, respectively by intervals (4 months since pandemic data): 1.92, 3.51, 9.31, 14.84, and 20.66. The pipeline for the analysis of viral genome in relation to mutation rate is presented. The highest number of palindromes is located around positions 22.000 in the genome(left part) and 24.300 (right part), counting the positions with respect to the beginning of the isolates. For the total number of mutations, almost 78% resulted in amino-acid changes in the corresponding proteins. Available tools and the problems of mutation rate dependence on sequence context will be discussed.