Hierarchical Bayesian Model for Rare Variant Association Analysis Integrating Genotype Uncertainty in Human Sequence Data |
| |
Authors: | Liang He Janne Pitkäniemi Antti‐Pekka Sarin Veikko Salomaa Mikko J Sillanpää Samuli Ripatti |
| |
Institution: | 1. Department of Public Health, Hjelt Institute, University of Helsinki, Helsinki, Finland;2. Finnish Cancer Registry, Institute for Statistical and Epidemiological Cancer Research, Helsinki, Finland;3. Institute for Molecular Medicine Finland FIMM, University of Helsinki, Helsinki, Finland;4. Public Health Genomics Unit, National Institute for Health and Welfare, Helsinki, Finland;5. Chronic Disease Epidemiology and Prevention Unit, National Institute for Health and Welfare, Helsinki, Finland;6. Department of Mathematical Sciences, University of Oulu, Oulu, Finland;7. Department of Biology, University of Oulu, Oulu, Finland;8. Biocenter Oulu, University of Oulu, Oulu, Finland;9. Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK |
| |
Abstract: | Next‐generation sequencing (NGS) has led to the study of rare genetic variants, which possibly explain the missing heritability for complex diseases. Most existing methods for rare variant (RV) association detection do not account for the common presence of sequencing errors in NGS data. The errors can largely affect the power and perturb the accuracy of association tests due to rare observations of minor alleles. We developed a hierarchical Bayesian approach to estimate the association between RVs and complex diseases. Our integrated framework combines the misclassification probability with shrinkage‐based Bayesian variable selection. It allows for flexibility in handling neutral and protective RVs with measurement error, and is robust enough for detecting causal RVs with a wide spectrum of minor allele frequency (MAF). Imputation uncertainty and MAF are incorporated into the integrated framework to achieve the optimal statistical power. We demonstrate that sequencing error does significantly affect the findings, and our proposed model can take advantage of it to improve statistical power in both simulated and real data. We further show that our model outperforms existing methods, such as sequence kernel association test (SKAT). Finally, we illustrate the behavior of the proposed method using a Finnish low‐density lipoprotein cholesterol study, and show that it identifies an RV known as FH North Karelia in LDLR gene with three carriers in 1,155 individuals, which is missed by both SKAT and Granvil. |
| |
Keywords: | rare variant classification error shrinkage‐based Bayesian variable selection LDL‐C |
|
|