首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
To confirm associations with a large number of single nucleotide polymorphisms (SNPs), each with only a small effect size, as hypothesized in the polygenic theory for schizophrenia, the International Schizophrenia Consortium (2009, Nature 460:748–752) proposed a polygenic risk score (PRS) test and demonstrated its effectiveness when applied to psychiatric disorders. The basic idea of the PRS test is to use a half of the sample to select and up‐weight those more likely to be associated SNPs, and then use the other half of the sample to test for aggregated effects of the selected SNPs. Intrigued by the novelty and increasing use of the PRS test, we aimed to evaluate and improve its performance for GWAS data. First, by an analysis of the PRS test, we point out its connection with the Sum test [Chapman and Whittaker, 2008 , Genet Epidemiol 32:560–566; Pan, 2009 , Genet Epidemiol 33:497–507]; given the known advantages and disadvantages of the Sum test, this connection motivated the development of several other polygenic tests, some of which may be more powerful than the PRS test under certain situations. Second, more importantly, to overcome the low statistical efficiency of the data‐splitting strategy as adopted in the PRS test, we reformulate and thus modify the PRS test, obtaining several adaptive tests, which are closely related to the adaptive sum of powered score (SPU) test studied in the context of rare variant analysis [Pan et al., 2014, Genetics 197:1081–1095]. We use both simulated data and a real GWAS dataset of alcohol dependence to show dramatically improved power of the new tests over the PRS test; due to its superior performance and simplicity, we recommend the whole sample‐based adaptive SPU test for polygenic testing. We hope to raise the awareness of the limitations of the PRS test and potential power gain of the adaptive SPU test.  相似文献   

2.
Genome‐wide association studies (GWASs) have identified hundreds of single nucleotide polymorphisms (SNPs) associated with complex human diseases. However, risk prediction models based on them have limited discriminatory accuracy. It has been suggested that including many such SNPs can improve predictive performance. Here, we studied various aspects of model building to improve discriminatory accuracy, as measured by the area under the receiver operating characteristic curve (AUC), including: (1) How well does a one‐phase procedure that selects SNPs and estimates odds ratios on the same data perform? (2) How should training data be allocated between SNP selection (Phase 1) and estimation (Phase 2) in a two‐phase procedure? (3) Should SNP selection be based on P‐value thresholding or ranking P‐values? (4) How many SNPs should be selected? and (5) Is multivariate estimation preferred to univariate estimation in the presence of linkage disequilibrium (LD)? We used realistic estimates of the distributions of genetic effect sizes, allele frequencies, and LD patterns based on GWAS data for Crohn's disease and prostate cancer. Theory and simulations were used to estimate AUC. Empirical risk models based on 10,000 cases and controls had considerably lower AUC than theoretically achievable. The most critical aspect of prediction model building was initial SNP selection. The single‐phase procedure achieved higher AUC than the two‐phase procedure. Multivariate estimation did not perform as well as univariate (marginal) estimation. For complex diseases and samples of 10,000 or fewer cases and controls, one should limit the number of SNPs to tens or hundreds.  相似文献   

3.
Pathway‐based genome‐wide association studies (GWAS) can exploit collective effects of causal variants in a pathway to increase power of detection. However, current methods for pathway‐based GWAS do not consider epistatic effects of genetic variants, although interactions between genetic variants may play an important role in influencing complex traits. In this paper, we employed a Bayesian Lasso logistic regression model for pathway‐based GWAS to include all possible main effects and a large number of pairwise interactions of single nucleotide polymorphisms (SNPs) in a pathway, and then inferred the model with an efficient group empirical Bayesian Lasso (EBLasso) method. Using the inferred model, the statistical significance of a pathway was tested with the Wald statistics. Reliable effects in a significant pathway were selected using the stability selection technique. Extensive computer simulations demonstrated that our group EBlasso method significantly outperformed two competitive methods in most simulation setups and offered similar performance in other simulation setups. When applying to a GWAS dataset for Parkinson disease, EBLasso identified three significant pathways including the primary bile acid biosynthesis pathway, the neuroactive ligand–receptor interaction, and the MAPK signaling pathway. All effects identified in the primary bile acid biosynthesis pathway and many of effects in the other two pathways were epistatic effects. The group EBLasso method provides a valuable tool for pathway‐based GWAS to identify main and epistatic effects of genetic variants.  相似文献   

4.
Detection of gene–gene interaction has become increasingly popular over the past decade in genome wide association studies (GWAS). Besides traditional logistic regression analysis for detecting interactions between two markers, new methods have been developed in recent years such as comparing linkage disequilibrium (LD) in case and control groups. All these methods form the building blocks of most screening strategies for disease susceptibility loci in GWAS. In this paper, we are interested in comparing the competing methods and providing practical guidelines for selecting appropriate testing methods for interaction in GWAS. We first review a series of existing statistical methods to detect interactions, and then examine different definitions of interactions to gain insight into the theoretical relationship between the existing testing methods. Lastly, we perform extensive simulations to compare powers of various methods to detect either interaction between two markers at two unlinked loci or the overall association allowing for both interaction and main effects. This investigation reveals informative characteristics of various methods that are helpful to GWAS investigators.  相似文献   

5.
Meta‐analysis of genome‐wide association studies (GWAS) has achieved great success in detecting loci underlying human diseases. Incorporating GWAS results from diverse ethnic populations for meta‐analysis, however, remains challenging because of the possible heterogeneity across studies. Conventional fixed‐effects (FE) or random‐effects (RE) methods may not be most suitable to aggregate multiethnic GWAS results because of violation of the homogeneous effect assumption across studies (FE) or low power to detect signals (RE). Three recently proposed methods, modified RE (RE‐HE) model, binary‐effects (BE) model and a Bayesian approach (Meta‐analysis of Transethnic Association [MANTRA]), show increased power over FE and RE methods while incorporating heterogeneity of effects when meta‐analyzing trans‐ethnic GWAS results. We propose a two‐stage approach to account for heterogeneity in trans‐ethnic meta‐analysis in which we clustered studies with cohort‐specific ancestry information prior to meta‐analysis. We compare this to a no‐prior‐clustering (crude) approach, evaluating type I error and power of these two strategies, in an extensive simulation study to investigate whether the two‐stage approach offers any improvements over the crude approach. We find that the two‐stage approach and the crude approach for all five methods (FE, RE, RE‐HE, BE, MANTRA) provide well‐controlled type I error. However, the two‐stage approach shows increased power for BE and RE‐HE, and similar power for MANTRA and FE compared to their corresponding crude approach, especially when there is heterogeneity across the multiethnic GWAS results. These results suggest that prior clustering in the two‐stage approach can be an effective and efficient intermediate step in meta‐analysis to account for the multiethnic heterogeneity.  相似文献   

6.
Genome‐wide association studies (GWASs) for complex diseases often collect data on multiple correlated endo‐phenotypes. Multivariate analysis of these correlated phenotypes can improve the power to detect genetic variants. Multivariate analysis of variance (MANOVA) can perform such association analysis at a GWAS level, but the behavior of MANOVA under different trait models has not been carefully investigated. In this paper, we show that MANOVA is generally very powerful for detecting association but there are situations, such as when a genetic variant is associated with all the traits, where MANOVA may not have any detection power. In these situations, marginal model based methods, however, perform much better than multivariate methods. We investigate the behavior of MANOVA, both theoretically and using simulations, and derive the conditions where MANOVA loses power. Based on our findings, we propose a unified score‐based test statistic USAT that can perform better than MANOVA in such situations and nearly as well as MANOVA elsewhere. Our proposed test reports an approximate asymptotic P‐value for association and is computationally very efficient to implement at a GWAS level. We have studied through extensive simulations the performance of USAT, MANOVA, and other existing approaches and demonstrated the advantage of using the USAT approach to detect association between a genetic variant and multivariate phenotypes. We applied USAT to data from three correlated traits collected on 5, 816 Caucasian individuals from the Atherosclerosis Risk in Communities (ARIC, The ARIC Investigators [ 1989 ]) Study and detected some interesting associations.  相似文献   

7.
Recently, large scale genome‐wide association study (GWAS) meta‐analyses have boosted the number of known signals for some traits into the tens and hundreds. Typically, however, variants are only analysed one‐at‐a‐time. This complicates the ability of fine‐mapping to identify a small set of SNPs for further functional follow‐up. We describe a new and scalable algorithm, joint analysis of marginal summary statistics (JAM), for the re‐analysis of published marginal summary stactistics under joint multi‐SNP models. The correlation is accounted for according to estimates from a reference dataset, and models and SNPs that best explain the complete joint pattern of marginal effects are highlighted via an integrated Bayesian penalized regression framework. We provide both enumerated and Reversible Jump MCMC implementations of JAM and present some comparisons of performance. In a series of realistic simulation studies, JAM demonstrated identical performance to various alternatives designed for single region settings. In multi‐region settings, where the only multivariate alternative involves stepwise selection, JAM offered greater power and specificity. We also present an application to real published results from MAGIC (meta‐analysis of glucose and insulin related traits consortium) – a GWAS meta‐analysis of more than 15,000 people. We re‐analysed several genomic regions that produced multiple significant signals with glucose levels 2 hr after oral stimulation. Through joint multivariate modelling, JAM was able to formally rule out many SNPs, and for one gene, ADCY5, suggests that an additional SNP, which transpired to be more biologically plausible, should be followed up with equal priority to the reported index.  相似文献   

8.
Bilirubin is an effective antioxidant and is influenced by both genetic and environmental factors. Recent genome‐wide association studies (GWAS) have identified multiple loci affecting serum total bilirubin levels. However, most of the studies were conducted in European populations and little attention has been devoted either to genetic variants associated with direct and indirect bilirubin levels or to the gene‐environment interactions on bilirubin levels. In this study, a two‐stage GWAS was performed to identify genetic variants associated with all types of bilirubin levels in 10,282 Han Chinese individuals. Gene‐environment interactions were further examined. Briefly, two previously reported loci, UGT1A1 on 2q37 (rs6742078 and rs4148323, combined P = 1.44 × 10?89 and P = 5.05 × 10?69, respectively) and SLCO1B3 on 12p12 (rs2417940, combined P = 6.93 × 10?19) were successfully replicated. The two loci explained 9.2% and 0.9% of the total variations of total bilirubin levels, respectively. Ethnic genetic differences were observed between Chinese and European populations. More importantly, a significant interaction was found between rs2417940 in SLCO1B3 gene and smoking on total bilirubin levels (P = 1.99 × 10?3). Single nucleotide polymorphism (SNP) rs2417940 had stronger effects on total bilirubin levels in nonsmokers than in smokers, suggesting that the effects of SLCO1B3 genotype on bilirubin levels were partly dependent on smoking status. Consistent associations and interactions were observed for serum direct and indirect bilirubin levels.  相似文献   

9.
For analyzing complex trait association with sequencing data, most current studies test aggregated effects of variants in a gene or genomic region. Although gene‐based tests have insufficient power even for moderately sized samples, pathway‐based analyses combine information across multiple genes in biological pathways and may offer additional insight. However, most existing pathway association methods are originally designed for genome‐wide association studies, and are not comprehensively evaluated for sequencing data. Moreover, region‐based rare variant association methods, although potentially applicable to pathway‐based analysis by extending their region definition to gene sets, have never been rigorously tested. In the context of exome‐based studies, we use simulated and real datasets to evaluate pathway‐based association tests. Our simulation strategy adopts a genome‐wide genetic model that distributes total genetic effects hierarchically into pathways, genes, and individual variants, allowing the evaluation of pathway‐based methods with realistic quantifiable assumptions on the underlying genetic architectures. The results show that, although no single pathway‐based association method offers superior performance in all simulated scenarios, a modification of Gene Set Enrichment Analysis approach using statistics from single‐marker tests without gene‐level collapsing (weighted Kolmogrov‐Smirnov [WKS]‐Variant method) is consistently powerful. Interestingly, directly applying rare variant association tests (e.g., sequence kernel association test) to pathway analysis offers a similar power, but its results are sensitive to assumptions of genetic architecture. We applied pathway association analysis to an exome‐sequencing data of the chronic obstructive pulmonary disease, and found that the WKS‐Variant method confirms associated genes previously published.  相似文献   

10.
Genomewide association studies (GWAS) sometimes identify loci at which both the number and identities of the underlying causal variants are ambiguous. In such cases, statistical methods that model effects of multiple single‐nucleotide polymorphisms (SNPs) simultaneously can help disentangle the observed patterns of association and provide information about how those SNPs could be prioritized for follow‐up studies. Current multi‐SNP methods, however, tend to assume that SNP effects are well captured by additive genetics; yet when genetic dominance is present, this assumption translates to reduced power and faulty prioritizations. We describe a statistical procedure for prioritizing SNPs at GWAS loci that efficiently models both additive and dominance effects. Our method, LLARRMA‐dawg, combines a group LASSO procedure for sparse modeling of multiple SNP effects with a resampling procedure based on fractional observation weights. It estimates for each SNP the robustness of association with the phenotype both to sampling variation and to competing explanations from other SNPs. In producing an SNP prioritization that best identifies underlying true signals, we show the following: our method easily outperforms a single‐marker analysis; when additive‐only signals are present, our joint model for additive and dominance is equivalent to or only slightly less powerful than modeling additive‐only effects; and when dominance signals are present, even in combination with substantial additive effects, our joint model is unequivocally more powerful than a model assuming additivity. We also describe how performance can be improved through calibrated randomized penalization, and discuss how dominance in ungenotyped SNPs can be incorporated through either heterozygote dosage or multiple imputation.  相似文献   

11.
The analysis of gene‐environment (G × E) interactions remains one of the greatest challenges in the postgenome‐wide association studies (GWASs) era. Recent methods constitute a compromise between the robust but underpowered case‐control and powerful case‐only methods. Inferences of the latter are biased when the assumption of gene‐environment (G‐E) independence in controls fails. We propose a novel empirical hierarchical Bayes approach to G × E interaction (EHB‐GE), which benefits from greater rank power while accounting for population‐based G‐E correlation. Building on Lewinger et al.'s ([2007] Genet Epidemiol 31:871–882) hierarchical Bayes prioritization approach, the method first obtains posterior G‐E correlation estimates in controls for each marker, borrowing strength from G‐E information across the genome. These posterior estimates are then subtracted from the corresponding case‐only G × E estimates. We compared EHB‐GE with rival methods using simulation. EHB‐GE has similar or greater rank power to detect G × E interactions in the presence of large numbers of G‐E correlations with weak to strong effects or only a low number of such correlations with large effect. When there are no or only a few weak G‐E correlations, Murcray et al.'s method ([2009] Am J Epidemiol 169:219–226) identifies markers with low G × E interaction effects better. We applied EHB‐GE and competing methods to four lung cancer case‐control GWAS from the Interdisciplinary Research in Cancer of the Lung/International Lung Cancer Consortium with smoking as environmental factor. A number of genes worth investigating were identified by the EHB‐GE approach.  相似文献   

12.
Polygenic risk scores (PRSs) are a method to summarize the additive trait variance captured by a set of SNPs, and can increase the power of set‐based analyses by leveraging public genome‐wide association study (GWAS) datasets. PRS aims to assess the genetic liability to some phenotype on the basis of polygenic risk for the same or different phenotype estimated from independent data. We propose the application of PRSs as a set‐based method with an additional component of adjustment for linkage disequilibrium (LD), with potential extension of the PRS approach to analyze biologically meaningful SNP sets. We call this method POLARIS: POlygenic Ld‐Adjusted RIsk Score. POLARIS identifies the LD structure of SNPs using spectral decomposition of the SNP correlation matrix and replaces the individuals' SNP allele counts with LD‐adjusted dosages. Using a raw genotype dataset together with SNP effect sizes from a second independent dataset, POLARIS can be used for set‐based analysis. MAGMA is an alternative set‐based approach employing principal component analysis to account for LD between markers in a raw genotype dataset. We used simulations, both with simple constructed and real LD‐structure, to compare the power of these methods. POLARIS shows more power than MAGMA applied to the raw genotype dataset only, but less or comparable power to combined analysis of both datasets. POLARIS has the advantages that it produces a risk score per person per set using all available SNPs, and aims to increase power by leveraging the effect sizes from the discovery set in a self‐contained test of association in the test dataset.  相似文献   

13.
A number of investigators have proposed regression methods for testing linkage between a phenotypic trait and a genetic marker with sib‐pair observations. Xu et al. [Am J Hum Genet 67:1025–8, 2000] studied a unified method for testing linkage, which tends to be more powerful than existing procedures. Often there are multiple traits, which are linked to a common set of genetic markers. In this paper, we present a simple generalization of the unified test to combine information from multiple traits optimally. We use the simulated Genetic Analysis Workshop 12 data to illustrate this methodology and show the advantage of using the combined tests over the single‐trait tests. For the four quantitative traits (Q1,...,Q4) studied, our linkage results suggest that major loci affecting Q1 and Q2 localize at or near markers D02G172, D19G032, and D09G122, while loci affecting Q3 and Q4 localize at or near markers D09G122 and D17G051. © 2001 Wiley‐Liss, Inc.  相似文献   

14.
Although a standard genome‐wide significance level has been accepted for the testing of association between common genetic variants and disease, the era of whole‐genome sequencing (WGS) requires a new threshold. The allele frequency spectrum of sequence‐identified variants is very different from common variants, and the identified rare genetic variation is usually jointly analyzed in a series of genomic windows or regions. In nearby or overlapping windows, these test statistics will be correlated, and the degree of correlation is likely to depend on the choice of window size, overlap, and the test statistic. Furthermore, multiple analyses may be performed using different windows or test statistics. Here we propose an empirical approach for estimating genome‐wide significance thresholds for data arising from WGS studies, and we demonstrate that the empirical threshold can be efficiently estimated by extrapolating from calculations performed on a small genomic region. Because analysis of WGS may need to be repeated with different choices of test statistics or windows, this prediction approach makes it computationally feasible to estimate genome‐wide significance thresholds for different analysis choices. Based on UK10K whole‐genome sequence data, we derive genome‐wide significance thresholds ranging between 2.5 × 10?8 and 8 × 10?8 for our analytic choices in window‐based testing, and thresholds of 0.6 × 10?8–1.5 × 10?8 for a combined analytic strategy of testing common variants using single‐SNP tests together with rare variants analyzed with our sliding‐window test strategy.  相似文献   

15.
Exhaustive testing of all possible SNP pairs in a genome‐wide association study (GWAS) generally yields low power to detect gene‐gene (G × G) interactions because of small effect sizes and stringent requirements for multiple‐testing correction. We introduce a new two‐step procedure for testing G × G interactions in case‐control GWAS to detect interacting single nucleotide polymorphisms (SNPs) regardless of their marginal effects. In an initial screening step, all SNP pairs are tested for gene‐gene association in the combined sample of cases and controls. In the second step, the pairs that pass the screening are followed up with a traditional test for G × G interaction. We show that the two‐step method is substantially more powerful to detect G × G interactions than the exhaustive testing approach. For example, with 2,000 cases and 2,000 controls, the two‐step method can have more than 90% power to detect an interaction odds ratio of 2.0 compared to less than 50% power for the exhaustive testing approach. Moreover, we show that a hybrid two‐step approach that combines our newly proposed two‐step test and the two‐step test that screens for marginal effects retains the best power properties of both. The two‐step procedures we introduce have the potential to uncover genetic signals that have not been previously identified in an initial single‐SNP GWAS. We demonstrate the computational feasibility of the two‐step G × G procedure by performing a G × G scan in the asthma GWAS of the University of Southern California Children's Health Study.  相似文献   

16.
Genome‐wide association studies (GWAS) of complex traits have generated many association signals for single nucleotide polymorphisms (SNPs). To understand the underlying causal genetic variant(s), focused DNA resequencing of targeted genomic regions is commonly used, yet the current cost of resequencing limits sample sizes for resequencing studies. Information from the large GWAS can be used to guide choice of samples for resequencing, such as the SNP genotypes in the targeted genomic region. Viewing the GWAS tag‐SNPs as imperfect surrogates for the underlying causal variants, yet expecting that the tag‐SNPs are correlated with the causal variants, a reasonable approach is a two‐phase case‐control design, with the GWAS serving as the first‐phase and the resequencing study serving as the second‐phase. Using stratified sampling based on both tag‐SNP genotypes and case‐control status, we explore the gains in power of a two‐phase design relative to randomly sampling cases and controls for resequencing (i.e., ignoring tag‐SNP genotypes). Simulation results show that stratified sampling based on both tag‐SNP genotypes and case‐control status is not likely to have lower power than stratified sampling based only on case‐control status, and can sometimes have substantially greater power. The gain in power depends on the amount of linkage disequilibrium between the tag‐SNP and causal variant alleles, as well as the effect size of the causal variant. Hence, the two‐phase design provides an efficient approach to follow‐up GWAS signals with DNA resequencing.  相似文献   

17.
Family‐based genetic association studies of related individuals provide opportunities to detect genetic variants that complement studies of unrelated individuals. Most statistical methods for family association studies for common variants are single marker based, which test one SNP a time. In this paper, we consider testing the effect of an SNP set, e.g., SNPs in a gene, in family studies, for both continuous and discrete traits. Specifically, we propose a generalized estimating equations (GEEs) based kernel association test, a variance component based testing method, to test for the association between a phenotype and multiple variants in an SNP set jointly using family samples. The proposed approach allows for both continuous and discrete traits, where the correlation among family members is taken into account through the use of an empirical covariance estimator. We derive the theoretical distribution of the proposed statistic under the null and develop analytical methods to calculate the P‐values. We also propose an efficient resampling method for correcting for small sample size bias in family studies. The proposed method allows for easily incorporating covariates and SNP‐SNP interactions. Simulation studies show that the proposed method properly controls for type I error rates under both random and ascertained sampling schemes in family studies. We demonstrate through simulation studies that our approach has superior performance for association mapping compared to the single marker based minimum P‐value GEE test for an SNP‐set effect over a range of scenarios. We illustrate the application of the proposed method using data from the Cleveland Family GWAS Study.  相似文献   

18.
Genome‐wide association studies (GWAS) have been a standard practice in identifying single nucleotide polymorphisms (SNPs) for disease susceptibility. We propose a new approach, termed integrative GWAS (iGWAS) that exploits the information of gene expressions to investigate the mechanisms of the association of SNPs with a disease phenotype, and to incorporate the family‐based design for genetic association studies. Specifically, the relations among SNPs, gene expression, and disease are modeled within the mediation analysis framework, which allows us to disentangle the genetic effect on a disease phenotype into two parts: an effect mediated through a gene expression (mediation effect, ME) and an effect through other biological mechanisms or environment‐mediated mechanisms (alternative effect, AE). We develop omnibus tests for the ME and AE that are robust to underlying true disease models. Numerical studies show that the iGWAS approach is able to facilitate discovering genetic association mechanisms, and outperforms the SNP‐only method for testing genetic associations. We conduct a family‐based iGWAS of childhood asthma that integrates genetic and genomic data. The iGWAS approach identifies six novel susceptibility genes (MANEA, MRPL53, LYCAT, ST8SIA4, NDFIP1, and PTCH1) using the omnibus test with false discovery rate less than 1%, whereas no gene using SNP‐only analyses survives with the same cut‐off. The iGWAS analyses further characterize that genetic effects of these genes are mostly mediated through their gene expressions. In summary, the iGWAS approach provides a new analytic framework to investigate the mechanism of genetic etiology, and identifies novel susceptibility genes of childhood asthma that were biologically meaningful.  相似文献   

19.
Genome‐wide association studies (GWAS) that draw samples from multiple studies with a mixture of relationship structures are becoming more common. Analytical methods exist for using mixed‐sample data, but few methods have been proposed for the analysis of genotype‐by‐environment (G×E) interactions. Using GWAS data from a study of sarcoidosis susceptibility genes in related and unrelated African Americans, we explored the current analytic options for genotype association testing in studies using both unrelated and family‐based designs. We propose a novel method—generalized least squares (GLX)—to estimate both SNP and G×E interaction effects for categorical environmental covariates and compared this method to generalized estimating equations (GEE), logistic regression, the Cochran–Armitage trend test, and the WQLS and MQLS methods. We used simulation to demonstrate that the GLX method reduces type I error under a variety of pedigree structures. We also demonstrate its superior power to detect SNP effects while offering computational advantages and comparable power to detect G×E interactions versus GEE. Using this method, we found two novel SNPs that demonstrate a significant genome‐wide interaction with insecticide exposure—rs10499003 and rs7745248, located in the intronic and 3' UTR regions of the FUT9 gene on chromosome 6q16.1.  相似文献   

20.
Next‐generation sequencing (NGS) has led to the study of rare genetic variants, which possibly explain the missing heritability for complex diseases. Most existing methods for rare variant (RV) association detection do not account for the common presence of sequencing errors in NGS data. The errors can largely affect the power and perturb the accuracy of association tests due to rare observations of minor alleles. We developed a hierarchical Bayesian approach to estimate the association between RVs and complex diseases. Our integrated framework combines the misclassification probability with shrinkage‐based Bayesian variable selection. It allows for flexibility in handling neutral and protective RVs with measurement error, and is robust enough for detecting causal RVs with a wide spectrum of minor allele frequency (MAF). Imputation uncertainty and MAF are incorporated into the integrated framework to achieve the optimal statistical power. We demonstrate that sequencing error does significantly affect the findings, and our proposed model can take advantage of it to improve statistical power in both simulated and real data. We further show that our model outperforms existing methods, such as sequence kernel association test (SKAT). Finally, we illustrate the behavior of the proposed method using a Finnish low‐density lipoprotein cholesterol study, and show that it identifies an RV known as FH North Karelia in LDLR gene with three carriers in 1,155 individuals, which is missed by both SKAT and Granvil.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号