Accepted_test
Genome-wide association study (GWAS) is among the most important approaches for finding the genetic basis of complex traits in humans and other organisms. Hundreds of successful GWAS studies have been conducted over the past decades; however, annotation and interpretation of the detected associations is still a challenging task. The development of efficient methods for functional annotation of GWAS results is complicated by the lack of readily available datasets with a known set of causal genetic variants and biological processes involved in the development of the studied trait. The goal of this work was to create a method for simulating GWAS datasets with a predefined set of causal variants and/or processes, and to apply this tool to evaluate the accuracy of tools for finding gene set enrichment in GWAS results. We developed a new tool, bioGWAS, to enable simulations of realistic GWAS datasets. bioGWAS was implemented as a bioinformatic pipeline using the Snakemake toolkit. Our instrument demonstrated its high efficiency for generating GWAS results with a given set of causal variants and gene sets in which these variants are located. We also used the data generated by bioGWAS to evaluate the efficiency of identification of given gene sets by MAGMA, Pascal, and LSEA tools. Results of this analysis showed that the LSEA method has a significant advantage over its competitors. As thus, bioGWAS expands the arsenal of approaches for development and evaluation of the performance of GWAS results annotation methods.