Constructing a pipeline for genome variant / gene functioning hybrid prioritization: a case study of type II diabetes

Video (download)

Irina Kolesnikova1, Valery Polunovsky2, Konstantin Gunbin3
1LLC NCGI, Novosibirsk, Russia, i.kolesnikova@mygenetics.ru
2LLC NCGI, Novosibirsk, Russia, valeriy.polunovskiy@mygenetics.ru
3ICG SB RAS, Novosibirsk, Russia; NSU, Novosibirsk, Russia, genkvg@bionet.nsc.ru

According to recently proposed “omnigenic model” the genes implicated in complex diseases determination can be divided into core genes and peripheral genes. Mutations in core genes directly affect disease development, while mutations in peripheral genes can only indirectly modulate disease risk. In this study in order to discriminate core genes and their major regulators, we hierarchically combine genome variant prioritization with the prioritization of genes and target tissues.

0 0 vote
Article Rating
Subscribe
Notify of
guest
14 Comments
Inline Feedbacks
View all comments
Elizaveta Elgaeva
Elizaveta Elgaeva
3 years ago

1. Could you please explain how the core genes were identified? Am I right that the first and the second steps from your pipeline (the filtration of genome regions steps) were done for revealing of these core genes?
2. Have you thought about using the CEDAR and WESTRA expression data alongside with GTex data?

Konstantin Gunbin
Konstantin Gunbin
3 years ago

Many thanks for your questions!

(I) Identification of the core genes of any complex disease is a major problem in any GWAS study. The omnigenic model proposed by Jonathan C. Pritchard and his colleagues is the only theoretical basis for the search for core genes, yet. One of the main ideas for identifying core genes is to subdivide genes into two clearly defined groups: 1) genes that belong to gene networks and / or co-expression networks of molecular processes, mutational changes of which lead directly to disease and 2) genes which modifying (at any level) the work of these molecular processes. It was shown that core genes cannot have statistically significant polymorphisms associations with disease in any GWAS due to the effect of strict negative selection on them, therefore, the only way to identify core genes is to analyze evolutionary conservatism, functional significance and connectivity (in gene networks) of all genes whose polymorphisms are associated with the disease. That’s why our pipeline was created.

(II) In our work, we try to use the entire available amount of information. It is interesting to note that even GTex expression data, unfortunately, are not reliable for some tissues, this can be clearly seen on the basis of analysis of similar information by Human Protein Atlas and / or ARCHS4 and / or TISSUES databases. This feature of GTex can be related both to the features of experimental procedures, and to (undocumented) features of data processing.

Elizaveta Elgaeva
Elizaveta Elgaeva
3 years ago

Thank you a lot for your detailed answer.

I have learnt some new information from you.
The only remained question from my side is “Could the joint application of the GTex, CEDAR and WESTRA expression data somehow smoth the inaccuracy in GTex data? Maybe the combination of these datasets will improve the analysis?”.

Konstantin Gunbin
Konstantin Gunbin
3 years ago

Thank you for the idea, it is indeed possible to use CEDAR and WESTRA to smooth GTex expression data. We have to try.

relaxing jazz
6 months ago

relaxing jazz

relaxing piano
6 months ago

relaxing piano

jazz piano
6 months ago

jazz piano

relax
6 months ago

relax

night jazz luxury apartment

night jazz luxury apartment

calming music
6 months ago

calming music

relaxing jazz music
5 months ago

relaxing jazz music

sleep meditation
5 months ago

sleep meditation

sleeping music
5 months ago

sleeping music

positive bossa nova
2 months ago

positive bossa nova