首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Genetic association studies have provided new insights into the genetic variability of human complex traits with a focus mainly on continuous or binary traits. Methods have been proposed to take into account disease heterogeneity between subgroups of patients when studying common variants but none was specifically designed for rare variants. Because rare variants are expected to have stronger effects and to be more heterogeneously distributed among cases than common ones, subgroup analyses might be particularly attractive in this context. To address this issue, we propose an extension of burden tests by using a multinomial regression model, which enables association tests between rare variants and multicategory phenotypes. We evaluated the type I error and the power of two burden tests, CAST and WSS, by simulating data under different scenarios. In the case of genetic heterogeneity between case subgroups, we showed an advantage of multinomial regression over logistic regression, which considers all the cases against the controls. We replicated these results on real data from Moyamoya disease where the burden tests performed better when cases were stratified according to age-of-onset. We implemented the functions for association tests in the R package “Ravages” available on Github.  相似文献   

2.
This paper presents a unified framework for transmission-disequilibrium tests for discrete and continuous traits. A conditional score test is derived that maximizes power to detect small effects for any exponential family distribution, which includes binary and normal distributions, and distributions that are skewed or have non-normal kurtosis. The specific distributional form need not be specified, and the method applies to sibships of arbitrary size. Formulas for the distribution of the test statistic are given for models including complex genetic effects (additive, dominant, and recessive gene action), covariates, multiple gene models including gene-gene interactions or heterogeneity, and gene-environment interactions. We develop refinements of our method for trait-based sampling designs and multiple siblings that can have dramatic effects on power.  相似文献   

3.
In genetic association studies, joint modeling of related traits/phenotypes can utilize the correlation between them and thereby provide more power and uncover additional information about genetic etiology. Moreover, detecting rare genetic variants are of current scientific interest as a key to missing heritability. Logistic Bayesian LASSO (LBL) has been proposed recently to detect rare haplotype variants using case-control data, that is, a single binary phenotype. As there is currently no haplotype association method that can handle multiple binary phenotypes, we extend LBL to fill this gap. We develop a bivariate model by using a latent variable to induce correlation between the two outcomes. We carry out extensive simulations to investigate the bivariate LBL and compare with the univariate LBL. The bivariate LBL performs better or similar to the univariate LBL in most settings. It has the highest gain in power when a haplotype is associated with both traits and it affects at least one trait in a direction opposite to the direction of the correlation between the traits. We analyze two data sets—Genetic Analysis Workshop 19 sequence data on systolic and diastolic blood pressures and a genome-wide association data set on lung cancer and smoking and detect several associated rare haplotypes.  相似文献   

4.
Many gene mapping studies of complex traits have identified genes or variants that influence multiple phenotypes. With the advent of next‐generation sequencing technology, there has been substantial interest in identifying rare variants in genes that possess cross‐phenotype effects. In the presence of such effects, modeling both the phenotypes and rare variants collectively using multivariate models can achieve higher statistical power compared to univariate methods that either model each phenotype separately or perform separate tests for each variant. Several studies collect phenotypic data over time and using such longitudinal data can further increase the power to detect genetic associations. Although rare‐variant approaches exist for testing cross‐phenotype effects at a single time point, there is no analogous method for performing such analyses using longitudinal outcomes. In order to fill this important gap, we propose an extension of Gene Association with Multiple Traits (GAMuT) test, a method for cross‐phenotype analysis of rare variants using a framework based on the distance covariance. The approach allows for both binary and continuous phenotypes and can also adjust for covariates. Our simple adjustment to the GAMuT test allows it to handle longitudinal data and to gain power by exploiting temporal correlation. The approach is computationally efficient and applicable on a genome‐wide scale due to the use of a closed‐form test whose significance can be evaluated analytically. We use simulated data to demonstrate that our method has favorable power over competing approaches and also apply our approach to exome chip data from the Genetic Epidemiology Network of Arteriopathy.  相似文献   

5.
Family data represent a rich resource for detecting association between rare variants (RVs) and human traits. However, most RV association analysis methods developed in recent years are data‐driven burden tests which can adaptively learn weights from data but require permutation to evaluate significance, thus are not readily applicable to family data, because random permutation will destroy family structure. Direct application of these methods to family data may result in a significant inflation of false positives. To overcome this issue, we have developed a generalized, weighted sum mixed model (WSMM), and corresponding computational techniques that can incorporate family information into data‐driven burden tests, and allow adaptive and efficient permutation test in family data. Using simulated and real datasets, we demonstrate that the WSMM method can be used to appropriately adjust for genetic relatedness among family members and has a good control for the inflation of false positives. We compare WSMM with a nondata‐driven, family‐based Sequence Kernel Association Test (famSKAT), showing that WSMM has significantly higher power in some cases. WSMM provides a generalized, flexible framework for adapting different data‐driven burden tests to analyze data with any family structures, and it can be extended to binary and time‐to‐onset traits, with or without covariates.  相似文献   

6.
Recent advances in sequencing technologies have made it possible to explore the influence of rare variants on complex diseases and traits. Meta‐analysis is essential to this exploration because large sample sizes are required to detect rare variants. Several methods are available to conduct meta‐analysis for rare variants under fixed‐effects models, which assume that the genetic effects are the same across all studies. In practice, genetic associations are likely to be heterogeneous among studies because of differences in population composition, environmental factors, phenotype and genotype measurements, or analysis method. We propose random‐effects models which allow the genetic effects to vary among studies and develop the corresponding meta‐analysis methods for gene‐level association tests. Our methods take score statistics, rather than individual participant data, as input and thus can accommodate any study designs and any phenotypes. We produce the random‐effects versions of all commonly used gene‐level association tests, including burden, variable threshold, and variance‐component tests. We demonstrate through extensive simulation studies that our random‐effects tests are substantially more powerful than the fixed‐effects tests in the presence of moderate and high between‐study heterogeneity and achieve similar power to the latter when the heterogeneity is low. The usefulness of the proposed methods is further illustrated with data from National Heart, Lung, and Blood Institute Exome Sequencing Project (NHLBI ESP). The relevant software is freely available.  相似文献   

7.
Given the functional relevance of many rare variants, their identification is frequently critical for dissecting disease etiology. Functional variants are likely to be aggregated in family studies enriched with affected members, and this aggregation increases the statistical power to detect rare variants associated with a trait of interest. Longitudinal family studies provide additional information for identifying genetic and environmental factors associated with disease over time. However, methods to analyze rare variants in longitudinal family data remain fairly limited. These methods should be capable of accounting for different sources of correlations and handling large amounts of sequencing data efficiently. To identify rare variants associated with a phenotype in longitudinal family studies, we extended pedigree‐based burden (BT) and kernel (KS) association tests to genetic longitudinal studies. Generalized estimating equation (GEE) approaches were used to generalize the pedigree‐based BT and KS to multiple correlated phenotypes under the generalized linear model framework, adjusting for fixed effects of confounding factors. These tests accounted for complex correlations between repeated measures of the same phenotype (serial correlations) and between individuals in the same family (familial correlations). We conducted comprehensive simulation studies to compare the proposed tests with mixed‐effects models and marginal models, using GEEs under various configurations. When the proposed tests were applied to data from the Diabetes Heart Study, we found exome variants of POMGNT1 and JAK1 genes were associated with type 2 diabetes.  相似文献   

8.
Kernel machine (KM) models are a powerful tool for exploring associations between sets of genetic variants and complex traits. Although most KM methods use a single kernel function to assess the marginal effect of a variable set, KM analyses involving multiple kernels have become increasingly popular. Multikernel analysis allows researchers to study more complex problems, such as assessing gene‐gene or gene‐environment interactions, incorporating variance‐component based methods for population substructure into rare‐variant association testing, and assessing the conditional effects of a variable set adjusting for other variable sets. The KM framework is robust, powerful, and provides efficient dimension reduction for multifactor analyses, but requires the estimation of high dimensional nuisance parameters. Traditional estimation techniques, including regularization and the “expectation‐maximization (EM)” algorithm, have a large computational cost and are not scalable to large sample sizes needed for rare variant analysis. Therefore, under the context of gene‐environment interaction, we propose a computationally efficient and statistically rigorous “fastKM” algorithm for multikernel analysis that is based on a low‐rank approximation to the nuisance effect kernel matrices. Our algorithm is applicable to various trait types (e.g., continuous, binary, and survival traits) and can be implemented using any existing single‐kernel analysis software. Through extensive simulation studies, we show that our algorithm has similar performance to an EM‐based KM approach for quantitative traits while running much faster. We also apply our method to the Vitamin Intervention for Stroke Prevention (VISP) clinical trial, examining gene‐by‐vitamin effects on recurrent stroke risk and gene‐by‐age effects on change in homocysteine level.  相似文献   

9.
Family‐based designs enriched with affected subjects and disease associated variants can increase statistical power for identifying functional rare variants. However, few rare variant analysis approaches are available for time‐to‐event traits in family designs and none of them applicable to the X chromosome. We developed novel pedigree‐based burden and kernel association tests for time‐to‐event outcomes with right censoring for pedigree data, referred to FamRATS (family‐based rare variant association tests for survival traits). Cox proportional hazard models were employed to relate a time‐to‐event trait with rare variants with flexibility to encompass all ranges and collapsing of multiple variants. In addition, the robustness of violating proportional hazard assumptions was investigated for the proposed and four current existing tests, including the conventional population‐based Cox proportional model and the burden, kernel, and sum of squares statistic (SSQ) tests for family data. The proposed tests can be applied to large‐scale whole‐genome sequencing data. They are appropriate for the practical use under a wide range of misspecified Cox models, as well as for population‐based, pedigree‐based, or hybrid designs. In our extensive simulation study and data example, we showed that the proposed kernel test is the most powerful and robust choice among the proposed burden test and the existing four rare variant survival association tests. When applied to the Diabetes Heart Study, the proposed tests found exome variants of the JAK1 gene on chromosome 1 showed the most significant association with age at onset of type 2 diabetes from the exome‐wide analysis.  相似文献   

10.
Complex human diseases are affected by genetic and environmental risk factors and their interactions. Gene–environment interaction (GEI) tests for aggregate genetic variant sets have been developed in recent years. However, existing statistical methods become rate limiting for large biobank-scale sequencing studies with correlated samples. We propose efficient Mixed-model Association tests for GEne–Environment interactions (MAGEE), for testing GEI between an aggregate variant set and environmental exposures on quantitative and binary traits in large-scale sequencing studies with related individuals. Joint tests for the aggregate genetic main effects and GEI effects are also developed. A null generalized linear mixed model adjusting for covariates but without any genetic effects is fit only once in a whole genome GEI analysis, thereby vastly reducing the overall computational burden. Score tests for variant sets are performed as a combination of genetic burden and variance component tests by accounting for the genetic main effects using matrix projections. The computational complexity is dramatically reduced in a whole genome GEI analysis, which makes MAGEE scalable to hundreds of thousands of individuals. We applied MAGEE to the exome sequencing data of 41,144 related individuals from the UK Biobank, and the analysis of 18,970 protein coding genes finished within 10.4 CPU hours.  相似文献   

11.
Family‐based genetic association studies of related individuals provide opportunities to detect genetic variants that complement studies of unrelated individuals. Most statistical methods for family association studies for common variants are single marker based, which test one SNP a time. In this paper, we consider testing the effect of an SNP set, e.g., SNPs in a gene, in family studies, for both continuous and discrete traits. Specifically, we propose a generalized estimating equations (GEEs) based kernel association test, a variance component based testing method, to test for the association between a phenotype and multiple variants in an SNP set jointly using family samples. The proposed approach allows for both continuous and discrete traits, where the correlation among family members is taken into account through the use of an empirical covariance estimator. We derive the theoretical distribution of the proposed statistic under the null and develop analytical methods to calculate the P‐values. We also propose an efficient resampling method for correcting for small sample size bias in family studies. The proposed method allows for easily incorporating covariates and SNP‐SNP interactions. Simulation studies show that the proposed method properly controls for type I error rates under both random and ascertained sampling schemes in family studies. We demonstrate through simulation studies that our approach has superior performance for association mapping compared to the single marker based minimum P‐value GEE test for an SNP‐set effect over a range of scenarios. We illustrate the application of the proposed method using data from the Cleveland Family GWAS Study.  相似文献   

12.
Traditional genome‐wide association studies (GWASs) usually focus on single‐marker analysis, which only accesses marginal effects. Pathway analysis, on the other hand, considers biological pathway gene marker hierarchical structure and therefore provides additional insights into the genetic architecture underlining complex diseases. Recently, a number of methods for pathway analysis have been proposed to assess the significance of a biological pathway from a collection of single‐nucleotide polymorphisms. In this study, we propose a novel approach for pathway analysis that assesses the effects of genes using the sequence kernel association test and the effects of pathways using an extended adaptive rank truncated product statistic. It has been increasingly recognized that complex diseases are caused by both common and rare variants. We propose a new weighting scheme for genetic variants across the whole allelic frequency spectrum to be analyzed together without any form of frequency cutoff for defining rare variants. The proposed approach is flexible. It is applicable to both binary and continuous traits, and incorporating covariates is easy. Furthermore, it can be readily applied to GWAS data, exome‐sequencing data, and deep resequencing data. We evaluate the new approach on data simulated under comprehensive scenarios and show that it has the highest power in most of the scenarios while maintaining the correct type I error rate. We also apply our proposed methodology to data from a study of the association between bipolar disorder and candidate pathways from Wellcome Trust Case Control Consortium (WTCCC) to show its utility.  相似文献   

13.
Group 14 of Genetic Analysis Workshop 17 examined several issues related to analysis of complex traits using DNA sequence data. These issues included novel methods for analyzing rare genetic variants in an aggregated manner (often termed collapsing rare variants), evaluation of various study designs to increase power to detect effects of rare variants, and the use of machine learning approaches to model highly complex heterogeneous traits. Various published and novel methods for analyzing traits with extreme locus and allelic heterogeneity were applied to the simulated quantitative and disease phenotypes. Overall, we conclude that power is (as expected) dependent on locus-specific heritability or contribution to disease risk, large samples will be required to detect rare causal variants with small effect sizes, extreme phenotype sampling designs may increase power for smaller laboratory costs, methods that allow joint analysis of multiple variants per gene or pathway are more powerful in general than analyses of individual rare variants, population-specific analyses can be optimal when different subpopulations harbor private causal mutations, and machine learning methods may be useful for selecting subsets of predictors for follow-up in the presence of extreme locus heterogeneity and large numbers of potential predictors.  相似文献   

14.
15.
With the advance of high‐throughput sequencing technologies, it has become feasible to investigate the influence of the entire spectrum of sequencing variations on complex human diseases. Although association studies utilizing the new sequencing technologies hold great promise to unravel novel genetic variants, especially rare genetic variants that contribute to human diseases, the statistical analysis of high‐dimensional sequencing data remains a challenge. Advanced analytical methods are in great need to facilitate high‐dimensional sequencing data analyses. In this article, we propose a generalized genetic random field (GGRF) method for association analyses of sequencing data. Like other similarity‐based methods (e.g., SIMreg and SKAT), the new method has the advantages of avoiding the need to specify thresholds for rare variants and allowing for testing multiple variants acting in different directions and magnitude of effects. The method is built on the generalized estimating equation framework and thus accommodates a variety of disease phenotypes (e.g., quantitative and binary phenotypes). Moreover, it has a nice asymptotic property, and can be applied to small‐scale sequencing data without need for small‐sample adjustment. Through simulations, we demonstrate that the proposed GGRF attains an improved or comparable power over a commonly used method, SKAT, under various disease scenarios, especially when rare variants play a significant role in disease etiology. We further illustrate GGRF with an application to a real dataset from the Dallas Heart Study. By using GGRF, we were able to detect the association of two candidate genes, ANGPTL3 and ANGPTL4, with serum triglyceride.  相似文献   

16.
Consider the integrative analysis of genetic data with multiple correlated response variables. The goal is to identify important gene–environment (G × E) interactions along with main gene and environment effects that are associated with the responses. The homogeneity and heterogeneity models can be adopted to describe the genetic basis of multiple responses. To accommodate possible nonlinear effects of some environment effects, a multi‐response partially linear varying coefficient model is assumed. Penalization is adopted for marker selection. The proposed penalization method can select genetic variants with G × E interactions, no G × E interactions, and no main effects simultaneously. It adopts different penalties to accommodate the homogeneity and heterogeneity models. The proposed method can be effectively computed using a coordinate descent algorithm. Simulation study and the analysis of Health Professionals Follow‐up Study, which has two correlated continuous traits, SNP measurements and multiple environment effects, show superior performance of the proposed method over its competitors. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

17.
Functional linear models are developed in this paper for testing associations between quantitative traits and genetic variants, which can be rare variants or common variants or the combination of the two. By treating multiple genetic variants of an individual in a human population as a realization of a stochastic process, the genome of an individual in a chromosome region is a continuum of sequence data rather than discrete observations. The genome of an individual is viewed as a stochastic function that contains both linkage and linkage disequilibrium (LD) information of the genetic markers. By using techniques of functional data analysis, both fixed and mixed effect functional linear models are built to test the association between quantitative traits and genetic variants adjusting for covariates. After extensive simulation analysis, it is shown that the F‐distributed tests of the proposed fixed effect functional linear models have higher power than that of sequence kernel association test (SKAT) and its optimal unified test (SKAT‐O) for three scenarios in most cases: (1) the causal variants are all rare, (2) the causal variants are both rare and common, and (3) the causal variants are common. The superior performance of the fixed effect functional linear models is most likely due to its optimal utilization of both genetic linkage and LD information of multiple genetic variants in a genome and similarity among different individuals, while SKAT and SKAT‐O only model the similarities and pairwise LD but do not model linkage and higher order LD information sufficiently. In addition, the proposed fixed effect models generate accurate type I error rates in simulation studies. We also show that the functional kernel score tests of the proposed mixed effect functional linear models are preferable in candidate gene analysis and small sample problems. The methods are applied to analyze three biochemical traits in data from the Trinity Students Study.  相似文献   

18.
In genetic association analysis, a joint test of multiple distinct phenotypes can increase power to identify sets of trait-associated variants within genes or regions of interest. Existing multiphenotype tests for rare variants make specific assumptions about the patterns of association with underlying causal variants, and the violation of these assumptions can reduce power to detect association. Here, we develop a general framework for testing pleiotropic effects of rare variants on multiple continuous phenotypes using multivariate kernel regression (Multi-SKAT). Multi-SKAT models affect sizes of variants on the phenotypes through a kernel matrix and perform a variance component test of association. We show that many existing tests are equivalent to specific choices of kernel matrices with the Multi-SKAT framework. To increase power of detecting association across tests with different kernel matrices, we developed a fast and accurate approximation of the significance of the minimum observed P value across tests. To account for related individuals, our framework uses random effects for the kinship matrix. Using simulated data and amino acid and exome-array data from the METabolic Syndrome In Men (METSIM) study, we show that Multi-SKAT can improve power over single-phenotype SKAT-O test and existing multiple-phenotype tests, while maintaining Type I error rate.  相似文献   

19.
Genome-wide association studies (GWAS) have thus far achieved substantial success. In the last decade, a large number of common variants underlying complex diseases have been identified through GWAS. In most existing GWAS, the identified common variants are obtained by single marker-based tests, that is, testing one single-nucleotide polymorphism (SNP) at a time. Generally, the basic functional unit of inheritance is a gene, rather than a SNP. Thus, results from gene-level association test can be more readily integrated with downstream functional and pathogenic investigation. In this paper, we propose a general gene-based p-value adaptive combination approach (GPA) which can integrate association evidence of multiple genetic variants using only GWAS summary statistics (either p-value or other test statistics). The proposed method could be used to test genetic association for both continuous and binary traits through not only one study but also multiple studies, which would be helpful to overcome the limitation of existing methods that can only be applied to a specific type of data. We conducted thorough simulation studies to verify that the proposed method controls type I errors well, and performs favorably compared to single-marker analysis and other existing methods. We demonstrated the utility of our proposed method through analysis of GWAS meta-analysis results for fasting glucose and lipids from the international MAGIC consortium and Global Lipids Consortium, respectively. The proposed method identified some novel trait associated genes which can improve our understanding of the mechanisms involved in -cell function, glucose homeostasis, and lipids traits.  相似文献   

20.
As whole-exome/genome sequencing data become increasingly available in genetic epidemiology research consortia, there is emerging interest in testing the interactions between rare genetic variants and environmental exposures that modify the risk of complex diseases. However, testing rare-variant–based gene-by-environment interactions (GxE) is more challenging than testing the genetic main effects due to the difficulty in correctly estimating the latter under the null hypothesis of no GxE effects and the presence of neutral variants. In response, we have developed a family of powerful and data-adaptive GxE tests, called “aGE” tests, in the framework of the adaptive powered score test, originally proposed for testing the genetic main effects. Using extensive simulations, we show that aGE tests can control the type I error rate in the presence of a large number of neutral variants or a nonlinear environmental main effect, and the power is more resilient to the inclusion of neutral variants than that of existing methods. We demonstrate the performance of the proposed aGE tests using Pancreatic Cancer Case-Control Consortium Exome Chip data. An R package “aGE” is available at http://github.com/ytzhong/projects/ .  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号