首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Most regression‐based tests of the association between a low‐count variant and a binary outcome do not protect type 1 error, especially when tests are rejected based on a very low significance threshold. Noted exception is the Firth test. However, it was recently shown that in meta‐analyzing multiple studies all asymptotic, regression‐based tests, including the Firth, may not control type 1 error in some settings, and the Firth test may suffer a substantial loss of power. The problem is exacerbated when the case‐control proportions differ between studies. I propose the BinomiRare exact test that circumvents the calibration problems of regression‐based estimators. It quantifies the strength of association between the variant and the disease outcome based on the departure of the number of diseased individuals carrying the variant from the expected distribution of disease probability, under the null hypothesis of no association between the disease outcome and the rare variant. I provide a meta‐analytic strategy to combine tests across multiple cohorts, which requires that each cohort provides the disease probabilities of all carriers of the variant in question, and the number of diseased individuals among the carriers. I show that BinomiRare controls type 1 error in meta‐analysis even when the case‐control proportions differ between the studies, and does not lose power compared to pooled analysis. I demonstrate the test in studying the association of rare variants with asthma in the Hispanic Community Health Study/Study of Latinos.  相似文献   

2.
Next generation sequencing technology has enabled the paradigm shift in genetic association studies from the common disease/common variant to common disease/rare‐variant hypothesis. Analyzing individual rare variants is known to be underpowered; therefore association methods have been developed that aggregate variants across a genetic region, which for exome sequencing is usually a gene. The foreseeable widespread use of whole genome sequencing poses new challenges in statistical analysis. It calls for new rare‐variant association methods that are statistically powerful, robust against high levels of noise due to inclusion of noncausal variants, and yet computationally efficient. We propose a simple and powerful statistic that combines the disease‐associated P‐values of individual variants using a weight that is the inverse of the expected standard deviation of the allele frequencies under the null. This approach, dubbed as Sigma‐P method, is extremely robust to the inclusion of a high proportion of noncausal variants and is also powerful when both detrimental and protective variants are present within a genetic region. The performance of the Sigma‐P method was tested using simulated data based on realistic population demographic and disease models and its power was compared to several previously published methods. The results demonstrate that this method generally outperforms other rare‐variant association methods over a wide range of models. Additionally, sequence data on the ANGPTL family of genes from the Dallas Heart Study were tested for associations with nine metabolic traits and both known and novel putative associations were uncovered using the Sigma‐P method.  相似文献   

3.
The transmission disequilibrium test (TDT) is the gold standard for testing the association between a genetic variant and disease in samples consisting of affected individuals and their parents. In practice, more complex pedigree structures, that is siblings with no parents, or three-generational pedigrees with possibly missing genotypes, are common. There are several generalizations of the TDT that are suitable for use with arbitrary pedigree structures. We consider three such frequently used generalizations, family-based association test, pedigree disequilibrium test, and generalized disequilibrium test, that have accompanying software and compare them regarding validity and power in the single variant setting. We use simulations to study the effects of population admixture, populations whose genotypes are not in Hardy–Weinberg equilibrium (HWE), different pedigree structures, and the presence of linkage. Whereas our results show that some TDT generalizations can have a substantially increased Type 1 error, these tests are often used in substantive research without caveats about the validity of their Type 1 error. For the association analysis of rare variants in sequencing studies, region-based extensions of the TDT generalizations, that rely on the postulated robustness of the single variant tests, have been proposed. We discuss the implications of our results for these region-based extensions.  相似文献   

4.
There is an emerging interest in sequencing‐based association studies of multiple rare variants. Most association tests suggested in the literature involve collapsing rare variants with or without weighting. Recently, a variance‐component score test [sequence kernel association test (SKAT)] was proposed to address the limitations of collapsing method. Although SKAT was shown to outperform most of the alternative tests, its applications and power might be restricted and influenced by missing genotypes. In this paper, we suggest a new method based on testing whether the fraction of causal variants in a region is zero. The new association test, T REM, is derived from a random‐effects model and allows for missing genotypes, and the choice of weighting function is not required when common and rare variants are analyzed simultaneously. We performed simulations to study the type I error rates and power of four competing tests under various conditions on the sample size, genotype missing rate, variant frequency, effect directionality, and the number of non‐causal rare variant and/or causal common variant. The simulation results showed that T REM was a valid test and less sensitive to the inclusion of non‐causal rare variants and/or low effect common variants or to the presence of missing genotypes. When the effects were more consistent in the same direction, T REM also had better power performance. Finally, an application to the Shanghai Breast Cancer Study showed that rare causal variants at the FGFR2 gene were detected by T REM and SKAT, but T REM produced more consistent results for different sets of rare and common variants. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

5.
Recent advancements in next‐generation DNA sequencing technologies have made it plausible to study the association of rare variants with complex diseases. Due to the low frequency, rare variants need to be aggregated in association tests to achieve adequate power with reasonable sample sizes. Hierarchical modeling/kernel machine methods have gained popularity among many available methods for testing a set of rare variants collectively. Here, we propose a new score statistic based on a hierarchical model by additionally modeling the distribution of rare variants under the case‐control study design. Results from extensive simulation studies show that the proposed method strikes a balance between robustness and power and outperforms several popular rare‐variant association tests. We demonstrate the performance of our method using the Dallas Heart Study.  相似文献   

6.
Recent progress in sequencing technologies makes it possible to identify rare and unique variants that may be associated with complex traits. However, the results of such efforts depend crucially on the use of efficient statistical methods and study designs. Although family‐based designs might enrich a data set for familial rare disease variants, most existing rare variant association approaches assume independence of all individuals. We introduce here a framework for association testing of rare variants in family‐based designs. This framework is an adaptation of the sequence kernel association test (SKAT) which allows us to control for family structure. Our adjusted SKAT (ASKAT) combines the SKAT approach and the factored spectrally transformed linear mixed models (FaST‐LMMs) algorithm to capture family effects based on a LMM incorporating the realized proportion of the genome that is identical by descent between pairs of individuals, and using restricted maximum likelihood methods for estimation. In simulation studies, we evaluated type I error and power of this proposed method and we showed that regardless of the level of the trait heritability, our approach has good control of type I error and good power. Since our approach uses FaST‐LMM to calculate variance components for the proposed mixed model, ASKAT is reasonably fast and can analyze hundreds of thousands of markers. Data from the UK twins consortium are presented to illustrate the ASKAT methodology.  相似文献   

7.
Although genome‐wide association studies (GWAS) have now discovered thousands of genetic variants associated with common traits, such variants cannot explain the large degree of “missing heritability,” likely due to rare variants. The advent of next generation sequencing technology has allowed rare variant detection and association with common traits, often by investigating specific genomic regions for rare variant effects on a trait. Although multiple correlated phenotypes are often concurrently observed in GWAS, most studies analyze only single phenotypes, which may lessen statistical power. To increase power, multivariate analyses, which consider correlations between multiple phenotypes, can be used. However, few existing multivariant analyses can identify rare variants for assessing multiple phenotypes. Here, we propose Multivariate Association Analysis using Score Statistics (MAAUSS), to identify rare variants associated with multiple phenotypes, based on the widely used sequence kernel association test (SKAT) for a single phenotype. We applied MAAUSS to whole exome sequencing (WES) data from a Korean population of 1,058 subjects to discover genes associated with multiple traits of liver function. We then assessed validation of those genes by a replication study, using an independent dataset of 3,445 individuals. Notably, we detected the gene ZNF620 among five significant genes. We then performed a simulation study to compare MAAUSS's performance with existing methods. Overall, MAAUSS successfully conserved type 1 error rates and in many cases had a higher power than the existing methods. This study illustrates a feasible and straightforward approach for identifying rare variants correlated with multiple phenotypes, with likely relevance to missing heritability.  相似文献   

8.
Next‐generation sequencing technologies have afforded unprecedented characterization of low‐frequency and rare genetic variation. Due to low power for single‐variant testing, aggregative methods are commonly used to combine observed rare variation within a single gene. Causal variation may also aggregate across multiple genes within relevant biomolecular pathways. Kernel‐machine regression and adaptive testing methods for aggregative rare‐variant association testing have been demonstrated to be powerful approaches for pathway‐level analysis, although these methods tend to be computationally intensive at high‐variant dimensionality and require access to complete data. An additional analytical issue in scans of large pathway definition sets is multiple testing correction. Gene set definitions may exhibit substantial genic overlap, and the impact of the resultant correlation in test statistics on Type I error rate control for large agnostic gene set scans has not been fully explored. Herein, we first outline a statistical strategy for aggregative rare‐variant analysis using component gene‐level linear kernel score test summary statistics as well as derive simple estimators of the effective number of tests for family‐wise error rate control. We then conduct extensive simulation studies to characterize the behavior of our approach relative to direct application of kernel and adaptive methods under a variety of conditions. We also apply our method to two case‐control studies, respectively, evaluating rare variation in hereditary prostate cancer and schizophrenia. Finally, we provide open‐source R code for public use to facilitate easy application of our methods to existing rare‐variant analysis results.  相似文献   

9.
Genome‐wide association studies have identified hundreds of genetic variants associated with complex diseases although most variants identified so far explain only a small proportion of heritability, suggesting that rare variants are responsible for missing heritability. Identification of rare variants through large‐scale resequencing becomes increasing important but still prohibitively expensive despite the rapid decline in the sequencing costs. Nevertheless, group testing based overlapping pool sequencing in which pooled rather than individual samples are sequenced will greatly reduces the efforts of sample preparation as well as the costs to screen for rare variants. Here, we proposed an overlapping pool sequencing to screen rare variants with optimal sequencing depth and a corresponding cost model. We formulated a model to compute the optimal depth for sufficient observations of variants in pooled sequencing. Utilizing shifted transversal design algorithm, appropriate parameters for overlapping pool sequencing could be selected to minimize cost and guarantee accuracy. Due to the mixing constraint and high depth for pooled sequencing, results showed that it was more cost‐effective to divide a large population into smaller blocks which were tested using optimized strategies independently. Finally, we conducted an experiment to screen variant carriers with frequency equaled 1%. With simulated pools and publicly available human exome sequencing data, the experiment achieved 99.93% accuracy. Utilizing overlapping pool sequencing, the cost for screening variant carriers with frequency equaled 1% in 200 diploid individuals dropped to at least 66% at which target sequencing region was set to 30 Mb.  相似文献   

10.
The recent development of high‐throughput sequencing technologies calls for powerful statistical tests to detect rare genetic variants associated with complex human traits. Sampling related individuals in sequencing studies offers advantages over sampling unrelated individuals only, including improved protection against sequencing error, the ability to use imputation to make more efficient use of sequence data, and the possibility of power boost due to more observed copies of extremely rare alleles among relatives. With related individuals, familial correlation needs to be accounted for to ensure correct control over type I error and to improve power. Recognizing the limitations of existing rare‐variant association tests for family data, we propose MONSTER (Minimum P‐value Optimized Nuisance parameter Score Test Extended to Relatives), a robust rare‐variant association test, which generalizes the SKAT‐O method for independent samples. MONSTER uses a mixed effects model that accounts for covariates and additive polygenic effects. To obtain a powerful test, MONSTER adaptively adjusts to the unknown configuration of effects of rare‐variant sites. MONSTER also offers an analytical way of assessing P‐values, which is desirable because permutation is not straightforward to conduct in related samples. In simulation studies, we demonstrate that MONSTER effectively accounts for family structure, is computationally efficient and compares very favorably, in terms of power, to previously proposed tests that allow related individuals. We apply MONSTER to an analysis of high‐density lipoprotein cholesterol in the Framingham Heart Study, where we are able to replicate association with three genes.  相似文献   

11.
Although genome‐wide association studies (GWAS) have identified thousands of trait‐associated genetic variants, there are relatively few findings on the X chromosome. For analysis of low‐frequency variants (minor allele frequency <5%), investigators can use region‐ or gene‐based tests where multiple variants are analyzed jointly to increase power. To date, there are no gene‐based tests designed for association testing of low‐frequency variants on the X chromosome. Here we propose three gene‐based tests for the X chromosome: burden, sequence kernel association test (SKAT), and optimal unified SKAT (SKAT‐O). Using simulated case‐control and quantitative trait (QT) data, we evaluate the calibration and power of these tests as a function of (1) male:female sample size ratio; and (2) coding of haploid male genotypes for variants under X‐inactivation. For case‐control studies, all three tests are reasonably well‐calibrated for all scenarios we evaluated. As expected, power for gene‐based tests depends on the underlying genetic architecture of the genomic region analyzed. Studies with more (haploid) males are generally less powerful due to decreased number of chromosomes. Power generally is slightly greater when the coding scheme for male genotypes matches the true underlying model, but the power loss for misspecifying the (generally unknown) model is small. For QT studies, type I error and power results largely mirror those for binary traits. We demonstrate the use of these three gene‐based tests for X‐chromosome association analysis in simulated data and sequencing data from the Genetics of Type 2 Diabetes (GoT2D) study.  相似文献   

12.
While it is well established that genetics can be a major contributor to population variation of complex traits, the relative contributions of rare and common variants to phenotypic variation remains a matter of considerable debate. Here, we simulate genetic and phenotypic data across different case/control panel sampling strategies, sequencing methods, and genetic architecture models based on evolutionary forces to determine the statistical performance of rare variant association tests (RVATs) widely in use. We find that the highest statistical power of RVATs is achieved by sampling case/control individuals from the extremes of an underlying quantitative trait distribution. We also demonstrate that the use of genotyping arrays, in conjunction with imputation from a whole-genome sequenced (WGS) reference panel, recovers the vast majority (90%) of the power that could be achieved by sequencing the case/control panel using current tools. Finally, we show that for dichotomous traits, the statistical performance of RVATs decreases as rare variants become more important in the trait architecture. Our results extend previous work to show that RVATs are insufficiently powered to make generalizable conclusions about the role of rare variants in dichotomous complex traits.  相似文献   

13.
Genome‐wide association (GWA) studies have proved to be extremely successful in identifying novel common polymorphisms contributing effects to the genetic component underlying complex traits. Nevertheless, one source of, as yet, undiscovered genetic determinants of complex traits are those mediated through the effects of rare variants. With the increasing availability of large‐scale re‐sequencing data for rare variant discovery, we have developed a novel statistical method for the detection of complex trait associations with these loci, based on searching for accumulations of minor alleles within the same functional unit. We have undertaken simulations to evaluate strategies for the identification of rare variant associations in population‐based genetic studies when data are available from re‐sequencing discovery efforts or from commercially available GWA chips. Our results demonstrate that methods based on accumulations of rare variants discovered through re‐sequencing offer substantially greater power than conventional analysis of GWA data, and thus provide an exciting opportunity for future discovery of genetic determinants of complex traits. Genet. Epidemiol. 34: 188–193, 2010. © 2009 Wiley‐Liss, Inc.  相似文献   

14.
Over the past few years, an increasing number of studies have identified rare variants that contribute to trait heritability. Due to the extreme rarity of some individual variants, gene‐based association tests have been proposed to aggregate the genetic variants within a gene, pathway, or specific genomic region as opposed to a one‐at‐a‐time single variant analysis. In addition, in longitudinal studies, statistical power to detect disease susceptibility rare variants can be improved through jointly testing repeatedly measured outcomes, which better describes the temporal development of the trait of interest. However, usual sandwich/model‐based inference for sequencing studies with longitudinal outcomes and rare variants can produce deflated/inflated type I error rate without further corrections. In this paper, we develop a group of tests for rare‐variant association based on outcomes with repeated measures. We propose new perturbation methods such that the type I error rate of the new tests is not only robust to misspecification of within‐subject correlation, but also significantly improved for variants with extreme rarity in a study with small or moderate sample size. Through extensive simulation studies, we illustrate that substantially higher power can be achieved by utilizing longitudinal outcomes and our proposed finite sample adjustment. We illustrate our methods using data from the Multi‐Ethnic Study of Atherosclerosis for exploring association of repeated measures of blood pressure with rare and common variants based on exome sequencing data on 6,361 individuals.  相似文献   

15.
In genome‐wide association studies of binary traits, investigators typically use logistic regression to test common variants for disease association within studies, and combine association results across studies using meta‐analysis. For common variants, logistic regression tests are well calibrated, and meta‐analysis of study‐specific association results is only slightly less powerful than joint analysis of the combined individual‐level data. In recent sequencing and dense chip based association studies, investigators increasingly test low‐frequency variants for disease association. In this paper, we seek to (1) identify the association test with maximal power among tests with well controlled type I error rate and (2) compare the relative power of joint and meta‐analysis tests. We use analytic calculation and simulation to compare the empirical type I error rate and power of four logistic regression based tests: Wald, score, likelihood ratio, and Firth bias‐corrected. We demonstrate for low‐count variants (roughly minor allele count [MAC] < 400) that: (1) for joint analysis, the Firth test has the best combination of type I error and power; (2) for meta‐analysis of balanced studies (equal numbers of cases and controls), the score test is best, but is less powerful than Firth test based joint analysis; and (3) for meta‐analysis of sufficiently unbalanced studies, all four tests can be anti‐conservative, particularly the score test. We also establish MAC as the key parameter determining test calibration for joint and meta‐analysis.  相似文献   

16.
A large number of rare genetic variants have been discovered with the development in sequencing technology and the lowering of sequencing costs. Rare variant analysis may help identify novel genes associated with diseases and quantitative traits, adding to our knowledge of explaining heritability of these phenotypes. Many statistical methods for rare variant analysis have been developed in recent years, but some of them require the strong assumption that all rare variants in the analysis share the same direction of effect, and others requiring permutation to calculate the P‐values are computer intensive. Among these methods, the sequence kernel association test (SKAT) is a powerful method under many different scenarios. It does not require any assumption on the directionality of effects, and statistical significance is computed analytically. In this paper, we extend SKAT to be applicable to family data. The family‐based SKAT (famSKAT) has a different test statistic and null distribution compared to SKAT, but is equivalent to SKAT when there is no familial correlation. Our simulation studies show that SKAT has inflated type I error if familial correlation is inappropriately ignored, but has appropriate type I error if applied to a single individual per family to obtain an unrelated subset. In contrast, famSKAT has the correct type I error when analyzing correlated observations, and it has higher power than competing methods in many different scenarios. We illustrate our approach to analyze the association of rare genetic variants using glycemic traits from the Framingham Heart Study.  相似文献   

17.
For analyzing complex trait association with sequencing data, most current studies test aggregated effects of variants in a gene or genomic region. Although gene‐based tests have insufficient power even for moderately sized samples, pathway‐based analyses combine information across multiple genes in biological pathways and may offer additional insight. However, most existing pathway association methods are originally designed for genome‐wide association studies, and are not comprehensively evaluated for sequencing data. Moreover, region‐based rare variant association methods, although potentially applicable to pathway‐based analysis by extending their region definition to gene sets, have never been rigorously tested. In the context of exome‐based studies, we use simulated and real datasets to evaluate pathway‐based association tests. Our simulation strategy adopts a genome‐wide genetic model that distributes total genetic effects hierarchically into pathways, genes, and individual variants, allowing the evaluation of pathway‐based methods with realistic quantifiable assumptions on the underlying genetic architectures. The results show that, although no single pathway‐based association method offers superior performance in all simulated scenarios, a modification of Gene Set Enrichment Analysis approach using statistics from single‐marker tests without gene‐level collapsing (weighted Kolmogrov‐Smirnov [WKS]‐Variant method) is consistently powerful. Interestingly, directly applying rare variant association tests (e.g., sequence kernel association test) to pathway analysis offers a similar power, but its results are sensitive to assumptions of genetic architecture. We applied pathway association analysis to an exome‐sequencing data of the chronic obstructive pulmonary disease, and found that the WKS‐Variant method confirms associated genes previously published.  相似文献   

18.
A combination of common and rare variants is thought to contribute to genetic susceptibility to complex diseases. Recently, next‐generation sequencers have greatly lowered sequencing costs, providing an opportunity to identify rare disease variants in large genetic epidemiology studies. At present, it is still expensive and time consuming to resequence large number of individual genomes. However, given that next‐generation sequencing technology can provide accurate estimates of allele frequencies from pooled DNA samples, it is possible to detect associations of rare variants using pooled DNA sequencing. Current statistical approaches to the analysis of associations with rare variants are not designed for use with pooled next‐generation sequencing data. Hence, they may not be optimal in terms of both validity and power. Therefore, we propose here a new statistical procedure to analyze the output of pooled sequencing data. The test statistic can be computed rapidly, making it feasible to test the association of a large number of variants with disease. By simulation, we compare this approach to Fisher's exact test based either on pooled or individual genotypic data. Our results demonstrate that the proposed method provides good control of the Type I error rate, while yielding substantially higher power than Fisher's exact test using pooled genotypic data for testing rare variants, and has similar or higher power than that of Fisher's exact test using individual genotypic data. Our results also provide guidelines on how various parameters of the pooled sequencing design affect the efficiency of detecting associations. Genet. Epidemiol. 34: 492–501, 2010. © 2010 Wiley‐Liss, Inc.  相似文献   

19.
Next‐generation DNA sequencing technologies are facilitating large‐scale association studies of rare genetic variants. The depth of the sequence read coverage is an important experimental variable in the next‐generation technologies and it is a major determinant of the quality of genotype calls generated from sequence data. When case and control samples are sequenced separately or in different proportions across batches, they are unlikely to be matched on sequencing read depth and a differential misclassification of genotypes can result, causing confounding and an increased false‐positive rate. Data from Pilot Study 3 of the 1000 Genomes project was used to demonstrate that a difference between the mean sequencing read depth of case and control samples can result in false‐positive association for rare and uncommon variants, even when the mean coverage depth exceeds 30× in both groups. The degree of the confounding and inflation in the false‐positive rate depended on the extent to which the mean depth was different in the case and control groups. A logistic regression model was used to test for association between case‐control status and the cumulative number of alleles in a collapsed set of rare and uncommon variants. Including each individual's mean sequence read depth across the variant sites in the logistic regression model nearly eliminated the confounding effect and the inflated false‐positive rate. Furthermore, accounting for the potential error by modeling the probability of the heterozygote genotype calls in the regression analysis had a relatively minor but beneficial effect on the statistical results. Genet. Epidemiol. 35: 261‐268, 2011. © 2011 Wiley‐Liss, Inc.  相似文献   

20.
Several methods have been proposed to increase power in rare variant association testing by aggregating information from individual rare variants (MAF < 0.005). However, how to best combine rare variants across multiple ethnicities and the relative performance of designs using different ethnic sampling fractions remains unknown. In this study, we compare the performance of several statistical approaches for assessing rare variant associations across multiple ethnicities. We also explore how different ethnic sampling fractions perform, including single‐ethnicity studies and studies that sample up to four ethnicities. We conducted simulations based on targeted sequencing data from 4,611 women in four ethnicities (African, European, Japanese American, and Latina). As with single‐ethnicity studies, burden tests had greater power when all causal rare variants were deleterious, and variance component‐based tests had greater power when some causal rare variants were deleterious and some were protective. Multiethnic studies had greater power than single‐ethnicity studies at many loci, with inclusion of African Americans providing the largest impact. On average, studies including African Americans had as much as 20% greater power than equivalently sized studies without African Americans. This suggests that association studies between rare variants and complex disease should consider including subjects from multiple ethnicities, with preference given to genetically diverse groups.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号