Accepted_test
A software package was developed to extract data from available information resources about all human genes and their transcripts, promoter sequences and SNPs localization in them. A multi-threaded version of the SNP_TATA_Comparator program was developed to genome-wide analysis for assess the influence of known SNPs in the promoters of all human genes on changes in the affinity of TBP for DNA and in the expression of these genes. Integration of genomic data with clinical and phenotypic observations presented in the ClinVar database was carried out.
As a result, the database contains the following information:
- 62603 genes, of which 19314 encode proteins.
- 117414 transcripts, of which 63141 encode proteins.
- 5,305,816 SNP variants in gene promoters in the [-90,-1] interval from the start of transcription, of which 3,199,285 are in the promoters of protein-coding genes.
- For 445,875 SNP variants in the promoter of a protein-coding gene, we predicted that they statistically significantly (p-value < 0.05) change the level of TBP affinity for this promoter. Among them, for 3847 SNP variants, there are clinical and phenotypic observations presented in the ClinVar database.
The results of genome-wide data analysis are presented, including the features of the distribution of genes by the number of transcripts, the distribution of SNPs affecting the affinity of TBP to DNA by positions within promoters, as well as patterns linking the affinity of TBP to the promoter, the specificity of the TBP binding site to the promoter and other characteristics of promoters.