共查询到20条相似文献,搜索用时 11 毫秒
1.
Recent sequencing efforts have focused on exploring the influence of rare variants on the complex diseases. Gene level based tests by aggregating information across rare variants within a gene have become attractive to enrich the rare variant association signal. Among them, the sequence kernel association test (SKAT) has proved to be a very powerful method for jointly testing multiple rare variants within a gene. In this article, we explore an alternative SKAT. We propose to use the univariate likelihood ratio statistics from the marginal model for individual variants as input into the kernel association test. We show how to compute its significance P‐value efficiently based on the asymptotic chi‐square mixture distribution. We demonstrate through extensive numerical studies that the proposed method has competitive performance. Its usefulness is further illustrated with application to associations between rare exonic variants and type 2 diabetes (T2D) in the Atherosclerosis Risk in Communities (ARIC) study. We identified an exome‐wide significant rare variant set in the gene ZZZ3 worthy of further investigations. 相似文献
2.
Han Chen Thomas Lumley Jennifer Brody Nancy L. Heard‐Costa Caroline S. Fox L. Adrienne Cupples Josée Dupuis 《Genetic epidemiology》2014,38(3):191-197
Rare variant tests have been of great interest in testing genetic associations with diseases and disease‐related quantitative traits in recent years. Among these tests, the sequence kernel association test (SKAT) is an omnibus test for effects of rare genetic variants, in a linear or logistic regression framework. It is often described as a variance component test treating the genotypic effects as random. When the linear kernel is used, its test statistic can be expressed as a weighted sum of single‐marker score test statistics. In this paper, we extend the test to survival phenotypes in a Cox regression framework. Because of the anticonservative small‐sample performance of the score test in a Cox model, we substitute signed square‐root likelihood ratio statistics for the score statistics, and confirm that the small‐sample control of type I error is greatly improved. This test can also be applied in meta‐analysis. We show in our simulation studies that this test has superior statistical power except in a few specific scenarios, as compared to burden tests in a Cox model. We also present results in an application to time‐to‐obesity using genotypes from Framingham Heart Study SNP Health Association Resource. 相似文献
3.
A large number of rare genetic variants have been discovered with the development in sequencing technology and the lowering of sequencing costs. Rare variant analysis may help identify novel genes associated with diseases and quantitative traits, adding to our knowledge of explaining heritability of these phenotypes. Many statistical methods for rare variant analysis have been developed in recent years, but some of them require the strong assumption that all rare variants in the analysis share the same direction of effect, and others requiring permutation to calculate the P‐values are computer intensive. Among these methods, the sequence kernel association test (SKAT) is a powerful method under many different scenarios. It does not require any assumption on the directionality of effects, and statistical significance is computed analytically. In this paper, we extend SKAT to be applicable to family data. The family‐based SKAT (famSKAT) has a different test statistic and null distribution compared to SKAT, but is equivalent to SKAT when there is no familial correlation. Our simulation studies show that SKAT has inflated type I error if familial correlation is inappropriately ignored, but has appropriate type I error if applied to a single individual per family to obtain an unrelated subset. In contrast, famSKAT has the correct type I error when analyzing correlated observations, and it has higher power than competing methods in many different scenarios. We illustrate our approach to analyze the association of rare genetic variants using glycemic traits from the Framingham Heart Study. 相似文献
4.
Karim Oualkacha Zari Dastani Rui Li Pablo E. Cingolani Timothy D. Spector Christopher J. Hammond J. Brent Richards Antonio Ciampi Celia M. T. Greenwood 《Genetic epidemiology》2013,37(4):366-376
Recent progress in sequencing technologies makes it possible to identify rare and unique variants that may be associated with complex traits. However, the results of such efforts depend crucially on the use of efficient statistical methods and study designs. Although family‐based designs might enrich a data set for familial rare disease variants, most existing rare variant association approaches assume independence of all individuals. We introduce here a framework for association testing of rare variants in family‐based designs. This framework is an adaptation of the sequence kernel association test (SKAT) which allows us to control for family structure. Our adjusted SKAT (ASKAT) combines the SKAT approach and the factored spectrally transformed linear mixed models (FaST‐LMMs) algorithm to capture family effects based on a LMM incorporating the realized proportion of the genome that is identical by descent between pairs of individuals, and using restricted maximum likelihood methods for estimation. In simulation studies, we evaluated type I error and power of this proposed method and we showed that regardless of the level of the trait heritability, our approach has good control of type I error and good power. Since our approach uses FaST‐LMM to calculate variance components for the proposed mixed model, ASKAT is reasonably fast and can analyze hundreds of thousands of markers. Data from the UK twins consortium are presented to illustrate the ASKAT methodology. 相似文献
5.
Wei Pan 《Genetic epidemiology》2015,39(8):651-663
We study the problem of testing for single marker‐multiple phenotype associations based on genome‐wide association study (GWAS) summary statistics without access to individual‐level genotype and phenotype data. For most published GWASs, because obtaining summary data is substantially easier than accessing individual‐level phenotype and genotype data, while often multiple correlated traits have been collected, the problem studied here has become increasingly important. We propose a powerful adaptive test and compare its performance with some existing tests. We illustrate its applications to analyses of a meta‐analyzed GWAS dataset with three blood lipid traits and another with sex‐stratified anthropometric traits, and further demonstrate its potential power gain over some existing methods through realistic simulation studies. We start from the situation with only one set of (possibly meta‐analyzed) genome‐wide summary statistics, then extend the method to meta‐analysis of multiple sets of genome‐wide summary statistics, each from one GWAS. We expect the proposed test to be useful in practice as more powerful than or complementary to existing methods. 相似文献
6.
Detecting the association between a set of variants and a phenotype of interest is the first and important step in genetic and genomic studies. Although it attracted a large amount of attention in the scientific community and several related statistical approaches have been proposed in the literature, powerful and robust statistical tests are still highly desired and yet to be developed in this area. In this paper, we propose a powerful and robust association test, which combines information from each individual single-nucleotide polymorphisms based on sequential independent burden tests. We compare the proposed approach with some popular tests through a comprehensive simulation study and real data application. Our results show that, in general, the new test is more powerful; the gain in detecting power can be substantial in many situations, compared to other methods. 相似文献
7.
Daniel J. Schaid Shannon K. McDonnell Jason P. Sinnwell Stephen N. Thibodeau 《Genetic epidemiology》2013,37(5):409-418
Searching for rare genetic variants associated with complex diseases can be facilitated by enriching for diseased carriers of rare variants by sampling cases from pedigrees enriched for disease, possibly with related or unrelated controls. This strategy, however, complicates analyses because of shared genetic ancestry, as well as linkage disequilibrium among genetic markers. To overcome these problems, we developed broad classes of “burden” statistics and kernel statistics, extending commonly used methods for unrelated case‐control data to allow for known pedigree relationships, for autosomes and the X chromosome. Furthermore, by replacing pedigree‐based genetic correlation matrices with estimates of genetic relationships based on large‐scale genomic data, our methods can be used to account for population‐structured data. By simulations, we show that the type I error rates of our developed methods are near the asymptotic nominal levels, allowing rapid computation of P‐values. Our simulations also show that a linear weighted kernel statistic is generally more powerful than a weighted “burden” statistic. Because the proposed statistics are rapid to compute, they can be readily used for large‐scale screening of the association of genomic sequence data with disease status. 相似文献
8.
Genome‐wide association studies (GWASs) for complex diseases often collect data on multiple correlated endo‐phenotypes. Multivariate analysis of these correlated phenotypes can improve the power to detect genetic variants. Multivariate analysis of variance (MANOVA) can perform such association analysis at a GWAS level, but the behavior of MANOVA under different trait models has not been carefully investigated. In this paper, we show that MANOVA is generally very powerful for detecting association but there are situations, such as when a genetic variant is associated with all the traits, where MANOVA may not have any detection power. In these situations, marginal model based methods, however, perform much better than multivariate methods. We investigate the behavior of MANOVA, both theoretically and using simulations, and derive the conditions where MANOVA loses power. Based on our findings, we propose a unified score‐based test statistic USAT that can perform better than MANOVA in such situations and nearly as well as MANOVA elsewhere. Our proposed test reports an approximate asymptotic P‐value for association and is computationally very efficient to implement at a GWAS level. We have studied through extensive simulations the performance of USAT, MANOVA, and other existing approaches and demonstrated the advantage of using the USAT approach to detect association between a genetic variant and multivariate phenotypes. We applied USAT to data from three correlated traits collected on 5, 816 Caucasian individuals from the Atherosclerosis Risk in Communities (ARIC, The ARIC Investigators [ 1989 ]) Study and detected some interesting associations. 相似文献
9.
Daniel C. Posner Honghuang Lin James B. Meigs Eric D. Kolaczyk Josée Dupuis 《Genetic epidemiology》2020,44(4):352-367
We propose a novel variant set test for rare-variant association studies, which leverages multiple single-nucleotide variant (SNV) annotations. Our approach optimizes a convex combination of different sequence kernel association test (SKAT) statistics, where each statistic is constructed from a different annotation and combination weights are optimized through a multiple kernel learning algorithm. The combination test statistic is evaluated empirically through data splitting. In simulations, we find our method preserves type I error at and has greater power than SKAT(-O) when SNV weights are not misspecified and sample sizes are large (). We utilize our method in the Framingham Heart Study (FHS) to identify SNV sets associated with fasting glucose. While we are unable to detect any genome-wide significant associations between fasting glucose and 4-kb windows of rare variants () in 6,419 FHS participants, our method identifies suggestive associations between fasting glucose and rare variants near ROCK2 () and within CPLX1 (). These two genes were previously reported to be involved in obesity-mediated insulin resistance and glucose-induced insulin secretion by pancreatic beta-cells, respectively. These findings will need to be replicated in other cohorts and validated by functional genomic studies. 相似文献
10.
Next generation sequencing technologies make direct testing rare variant associations possible. However, the development of powerful statistical methods for rare variant association studies is still underway. Most of existing methods are burden and quadratic tests. Recent studies show that the performance of each of burden and quadratic tests depends strongly upon the underlying assumption and no test demonstrates consistently acceptable power. Thus, combined tests by combining information from the burden and quadratic tests have been proposed recently. However, results from recent studies (including this study) show that there exist tests that can outperform both burden and quadratic tests. In this article, we propose three classes of tests that include tests outperforming both burden and quadratic tests. Then, we propose the optimal combination of single‐variant tests (OCST) by combining information from tests of the three classes. We use extensive simulation studies to compare the performance of OCST with that of burden, quadratic and optimal single‐variant tests. Our results show that OCST either is the most powerful test or has similar power with the most powerful test. We also compare the performance of OCST with that of the two existing combined tests. Our results show that OCST has better power than the two combined tests. 相似文献
11.
Liang He Janne Pitkäniemi Antti‐Pekka Sarin Veikko Salomaa Mikko J. Sillanpää Samuli Ripatti 《Genetic epidemiology》2015,39(2):89-100
Next‐generation sequencing (NGS) has led to the study of rare genetic variants, which possibly explain the missing heritability for complex diseases. Most existing methods for rare variant (RV) association detection do not account for the common presence of sequencing errors in NGS data. The errors can largely affect the power and perturb the accuracy of association tests due to rare observations of minor alleles. We developed a hierarchical Bayesian approach to estimate the association between RVs and complex diseases. Our integrated framework combines the misclassification probability with shrinkage‐based Bayesian variable selection. It allows for flexibility in handling neutral and protective RVs with measurement error, and is robust enough for detecting causal RVs with a wide spectrum of minor allele frequency (MAF). Imputation uncertainty and MAF are incorporated into the integrated framework to achieve the optimal statistical power. We demonstrate that sequencing error does significantly affect the findings, and our proposed model can take advantage of it to improve statistical power in both simulated and real data. We further show that our model outperforms existing methods, such as sequence kernel association test (SKAT). Finally, we illustrate the behavior of the proposed method using a Finnish low‐density lipoprotein cholesterol study, and show that it identifies an RV known as FH North Karelia in LDLR gene with three carriers in 1,155 individuals, which is missed by both SKAT and Granvil. 相似文献
12.
Natural genetic structures like genes may contain multiple variants that work as a group to determine a biologic outcome. The effect of rare variants, mutations occurring in less than 5% of samples, is hypothesized to be explained best as groups collectively associated with a biologic function. Therefore, it is important to develop powerful association tests to identify a true association between an outcome of interest and a group of variants, in particular a group with many rare variants. In this article we first delineate a novel penalized regression‐based global test for the association between sets of variants and a disease phenotype. Next, we use Genetic Analysis Workshop 18 (GAW18) data to assess the power of the new global association test to capture a relationship between an aggregated group of variants and a simulated hypertension status. Rare variant only, common variant only, and combined variant groups are studied. The power values are compared to those obtained from eight well‐regarded global tests (Score, Sum, SSU, SSUw, UminP, aSPU, aSPUw, and sequence kernel association test (SKAT)) that do not use penalized regression and a set of tests using either the SSU or score statistics and least absolute shrinkage and selection operator penalty (LASSO) logistic regression. Association testing of rare variants with our method was the top performer when there was low linkage disequilibrium (LD) between and within causal variants. This was similarly true when simultaneously testing rare and common variants in low LD scenarios. Finally, our method was able to provide meaningful variant‐specific association information. 相似文献
13.
Given the functional relevance of many rare variants, their identification is frequently critical for dissecting disease etiology. Functional variants are likely to be aggregated in family studies enriched with affected members, and this aggregation increases the statistical power to detect rare variants associated with a trait of interest. Longitudinal family studies provide additional information for identifying genetic and environmental factors associated with disease over time. However, methods to analyze rare variants in longitudinal family data remain fairly limited. These methods should be capable of accounting for different sources of correlations and handling large amounts of sequencing data efficiently. To identify rare variants associated with a phenotype in longitudinal family studies, we extended pedigree‐based burden (BT) and kernel (KS) association tests to genetic longitudinal studies. Generalized estimating equation (GEE) approaches were used to generalize the pedigree‐based BT and KS to multiple correlated phenotypes under the generalized linear model framework, adjusting for fixed effects of confounding factors. These tests accounted for complex correlations between repeated measures of the same phenotype (serial correlations) and between individuals in the same family (familial correlations). We conducted comprehensive simulation studies to compare the proposed tests with mixed‐effects models and marginal models, using GEEs under various configurations. When the proposed tests were applied to data from the Diabetes Heart Study, we found exome variants of POMGNT1 and JAK1 genes were associated with type 2 diabetes. 相似文献
14.
Sharon M. Lutz Tasha E. Fingerlin John E. Hokanson Christoph Lange 《Genetic epidemiology》2017,41(2):163-170
Through genome‐wide association studies, numerous genes have been shown to be associated with multiple phenotypes. To determine the overlap of genetic susceptibility of correlated phenotypes, one can apply multivariate regression or dimension reduction techniques, such as principal components analysis, and test for the association with the principal components of the phenotypes rather than the individual phenotypes. However, as these approaches test whether there is a genetic effect for at least one of the phenotypes, a significant test result does not necessarily imply pleiotropy. Recently, a method called Pleiotropy Estimation and Test Bootstrap (PET‐B) has been proposed to specifically test for pleiotropy (i.e., that two normally distributed phenotypes are both associated with the single nucleotide polymorphism of interest). Although the method examines the genetic overlap between the two quantitative phenotypes, the extension to binary phenotypes, three or more phenotypes, and rare variants is not straightforward. We provide two approaches to formally test this pleiotropic relationship in multiple scenarios. These approaches depend on permuting the phenotypes of interest and comparing the set of observed P‐values to the set of permuted P‐values in relation to the origin (e.g., a vector of zeros) either using the Hausdorff metric or a cutoff‐based approach. These approaches are appropriate for categorical and quantitative phenotypes, more than two phenotypes, common variants and rare variants. We evaluate these approaches under various simulation scenarios and apply them to the COPDGene study, a case‐control study of chronic obstructive pulmonary disease in current and former smokers. 相似文献
15.
Wan‐Yu Lin Nengjun Yi Xiang‐Yang Lou Degui Zhi Kui Zhang Guimin Gao Hemant K. Tiwari Nianjun Liu 《Genetic epidemiology》2013,37(6):560-570
For most complex diseases, the fraction of heritability that can be explained by the variants discovered from genome‐wide association studies is minor. Although the so‐called “rare variants” (minor allele frequency [MAF] < 1%) have attracted increasing attention, they are unlikely to account for much of the “missing heritability” because very few people may carry these rare variants. The genetic variants that are likely to fill in the “missing heritability” include uncommon causal variants (MAF < 5%), which are generally untyped in association studies using tagging single‐nucleotide polymorphisms (SNPs) or commercial SNP arrays. Developing powerful statistical methods can help to identify chromosomal regions harboring uncommon causal variants, while bypassing the genome‐wide or exome‐wide next‐generation sequencing. In this work, we propose a haplotype kernel association test (HKAT) that is equivalent to testing the variance component of random effects for distinct haplotypes. With an appropriate weighting scheme given to haplotypes, we can further enhance the ability of HKAT to detect uncommon causal variants. With scenarios simulated according to the population genetics theory, HKAT is shown to be a powerful method for detecting chromosomal regions harboring uncommon causal variants. 相似文献
16.
K. Alaine Broadaway Richard Duncan Karen N. Conneely Lynn M. Almli Bekh Bradley Kerry J. Ressler Michael P. Epstein 《Genetic epidemiology》2015,39(5):366-375
The etiology of complex traits likely involves the effects of genetic and environmental factors, along with complicated interaction effects between them. Consequently, there has been interest in applying genetic association tests of complex traits that account for potential modification of the genetic effect in the presence of an environmental factor. One can perform such an analysis using a joint test of gene and gene‐environment interaction. An optimal joint test would be one that remains powerful under a variety of models ranging from those of strong gene‐environment interaction effect to those of little or no gene‐environment interaction effect. To fill this demand, we have extended a kernel machine based approach for association mapping of multiple SNPs to consider joint tests of gene and gene‐environment interaction. The kernel‐based approach for joint testing is promising, because it incorporates linkage disequilibrium information from multiple SNPs simultaneously in analysis and permits flexible modeling of interaction effects. Using simulated data, we show that our kernel machine approach typically outperforms the traditional joint test under strong gene‐environment interaction models and further outperforms the traditional main‐effect association test under models of weak or no gene‐environment interaction effects. We illustrate our test using genome‐wide association data from the Grady Trauma Project, a cohort of highly traumatized, at‐risk individuals, which has previously been investigated for interaction effects. 相似文献
17.
For rare‐variant association analysis, due to extreme low frequencies of these variants, it is necessary to aggregate them by a prior set (e.g., genes and pathways) in order to achieve adequate power. In this paper, we consider hierarchical models to relate a set of rare variants to phenotype by modeling the effects of variants as a function of variant characteristics while allowing for variant‐specific effect (heterogeneity). We derive a set of two score statistics, testing the group effect by variant characteristics and the heterogeneity effect. We make a novel modification to these score statistics so that they are independent under the null hypothesis and their asymptotic distributions can be derived. As a result, the computational burden is greatly reduced compared with permutation‐based tests. Our approach provides a general testing framework for rare variants association, which includes many commonly used tests, such as the burden test [Li and Leal, 2008] and the sequence kernel association test [Wu et al., 2011], as special cases. Furthermore, in contrast to these tests, our proposed test has an added capacity to identify which components of variant characteristics and heterogeneity contribute to the association. Simulations under a wide range of scenarios show that the proposed test is valid, robust, and powerful. An application to the Dallas Heart Study illustrates that apart from identifying genes with significant associations, the new method also provides additional information regarding the source of the association. Such information may be useful for generating hypothesis in future studies. 相似文献
18.
19.
Xuefeng Wang Seunggeun Lee Xiaofeng Zhu Susan Redline Xihong Lin 《Genetic epidemiology》2013,37(8):778-786
Family‐based genetic association studies of related individuals provide opportunities to detect genetic variants that complement studies of unrelated individuals. Most statistical methods for family association studies for common variants are single marker based, which test one SNP a time. In this paper, we consider testing the effect of an SNP set, e.g., SNPs in a gene, in family studies, for both continuous and discrete traits. Specifically, we propose a generalized estimating equations (GEEs) based kernel association test, a variance component based testing method, to test for the association between a phenotype and multiple variants in an SNP set jointly using family samples. The proposed approach allows for both continuous and discrete traits, where the correlation among family members is taken into account through the use of an empirical covariance estimator. We derive the theoretical distribution of the proposed statistic under the null and develop analytical methods to calculate the P‐values. We also propose an efficient resampling method for correcting for small sample size bias in family studies. The proposed method allows for easily incorporating covariates and SNP‐SNP interactions. Simulation studies show that the proposed method properly controls for type I error rates under both random and ascertained sampling schemes in family studies. We demonstrate through simulation studies that our approach has superior performance for association mapping compared to the single marker based minimum P‐value GEE test for an SNP‐set effect over a range of scenarios. We illustrate the application of the proposed method using data from the Cleveland Family GWAS Study. 相似文献
20.
Changshuai Wei Ming Li Zihuai He Olga Vsevolozhskaya Daniel J. Schaid Qing Lu 《Genetic epidemiology》2014,38(8):699-708
With advancements in next‐generation sequencing technology, a massive amount of sequencing data is generated, which offers a great opportunity to comprehensively investigate the role of rare variants in the genetic etiology of complex diseases. Nevertheless, the high‐dimensional sequencing data poses a great challenge for statistical analysis. The association analyses based on traditional statistical methods suffer substantial power loss because of the low frequency of genetic variants and the extremely high dimensionality of the data. We developed a Weighted U Sequencing test, referred to as WU‐SEQ, for the high‐dimensional association analysis of sequencing data. Based on a nonparametric U‐statistic, WU‐SEQ makes no assumption of the underlying disease model and phenotype distribution, and can be applied to a variety of phenotypes. Through simulation studies and an empirical study, we showed that WU‐SEQ outperformed a commonly used sequence kernel association test (SKAT) method when the underlying assumptions were violated (e.g., the phenotype followed a heavy‐tailed distribution). Even when the assumptions were satisfied, WU‐SEQ still attained comparable performance to SKAT. Finally, we applied WU‐SEQ to sequencing data from the Dallas Heart Study (DHS), and detected an association between ANGPTL 4 and very low density lipoprotein cholesterol. 相似文献