Accepted_test
N-glycosylation is a common post-translational modification that impacts the physical and biological functions of proteins. N-glycan biosynthesis pathways are well studied, but understanding of the mechanisms of genetic regulation is limited, hindering of glycome-associated disease biomarker development. The total blood plasma N-glycome is a mixture of N-glycomes of individual glycoproteins. Reconstructing the N-glycan concentrations of individual proteins from the concentrations of glycans of all proteins in blood plasma will allow obtaining new datasets for providing genome-wide association studies (GWASes) without profiling new samples. To predict the immunoglobulin G N-glycan concentrations from the total blood plasma N-glycome, we compared five models: a simple linear regression, the lasso and ridge regressions, an elastic net, and a canonical correlation analysis. The simple linear regression model showed the greatest accuracy. We applied the best model to the total blood plasma N-glycosylation GWAS results and then obtained the predicted immunoglobulin G N-glycosylation GWAS results. We validated our results by comparing the set of loci significantly associated with immunoglobulin G N-glycosylation to the already published loci in Klarić et al. The predicted immunoglobulin G N-glycosylation GWAS results are consistent with the published loci. The developed method is capable of reconstructing most of the N-glycosylation spectrum of immunoglobulin G. The developed method is capable of reconstructing most of the N-glycosylation spectrum of immunoglobulin G. We also showed the possibility of obtaining GWAS results for an individual glycoprotein from the results for the total blood plasma N-glycome.