首页 | 本学科首页   官方微博 | 高级检索  
检索        


Hierarchical Bayesian Model for Rare Variant Association Analysis Integrating Genotype Uncertainty in Human Sequence Data
Authors:Liang He  Janne Pitkäniemi  Antti‐Pekka Sarin  Veikko Salomaa  Mikko J Sillanpää  Samuli Ripatti
Institution:1. Department of Public Health, Hjelt Institute, University of Helsinki, Helsinki, Finland;2. Finnish Cancer Registry, Institute for Statistical and Epidemiological Cancer Research, Helsinki, Finland;3. Institute for Molecular Medicine Finland FIMM, University of Helsinki, Helsinki, Finland;4. Public Health Genomics Unit, National Institute for Health and Welfare, Helsinki, Finland;5. Chronic Disease Epidemiology and Prevention Unit, National Institute for Health and Welfare, Helsinki, Finland;6. Department of Mathematical Sciences, University of Oulu, Oulu, Finland;7. Department of Biology, University of Oulu, Oulu, Finland;8. Biocenter Oulu, University of Oulu, Oulu, Finland;9. Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK
Abstract:Next‐generation sequencing (NGS) has led to the study of rare genetic variants, which possibly explain the missing heritability for complex diseases. Most existing methods for rare variant (RV) association detection do not account for the common presence of sequencing errors in NGS data. The errors can largely affect the power and perturb the accuracy of association tests due to rare observations of minor alleles. We developed a hierarchical Bayesian approach to estimate the association between RVs and complex diseases. Our integrated framework combines the misclassification probability with shrinkage‐based Bayesian variable selection. It allows for flexibility in handling neutral and protective RVs with measurement error, and is robust enough for detecting causal RVs with a wide spectrum of minor allele frequency (MAF). Imputation uncertainty and MAF are incorporated into the integrated framework to achieve the optimal statistical power. We demonstrate that sequencing error does significantly affect the findings, and our proposed model can take advantage of it to improve statistical power in both simulated and real data. We further show that our model outperforms existing methods, such as sequence kernel association test (SKAT). Finally, we illustrate the behavior of the proposed method using a Finnish low‐density lipoprotein cholesterol study, and show that it identifies an RV known as FH North Karelia in LDLR gene with three carriers in 1,155 individuals, which is missed by both SKAT and Granvil.
Keywords:rare variant  classification error  shrinkage‐based Bayesian variable selection  LDL‐C
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号