首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 4 毫秒
1.
Traditional genome‐wide association studies (GWASs) usually focus on single‐marker analysis, which only accesses marginal effects. Pathway analysis, on the other hand, considers biological pathway gene marker hierarchical structure and therefore provides additional insights into the genetic architecture underlining complex diseases. Recently, a number of methods for pathway analysis have been proposed to assess the significance of a biological pathway from a collection of single‐nucleotide polymorphisms. In this study, we propose a novel approach for pathway analysis that assesses the effects of genes using the sequence kernel association test and the effects of pathways using an extended adaptive rank truncated product statistic. It has been increasingly recognized that complex diseases are caused by both common and rare variants. We propose a new weighting scheme for genetic variants across the whole allelic frequency spectrum to be analyzed together without any form of frequency cutoff for defining rare variants. The proposed approach is flexible. It is applicable to both binary and continuous traits, and incorporating covariates is easy. Furthermore, it can be readily applied to GWAS data, exome‐sequencing data, and deep resequencing data. We evaluate the new approach on data simulated under comprehensive scenarios and show that it has the highest power in most of the scenarios while maintaining the correct type I error rate. We also apply our proposed methodology to data from a study of the association between bipolar disorder and candidate pathways from Wellcome Trust Case Control Consortium (WTCCC) to show its utility.  相似文献   

2.
With its potential to discover a much greater amount of genetic variation, next‐generation sequencing is fast becoming an emergent tool for genetic association studies. However, the cost of sequencing all individuals in a large‐scale population study is still high in comparison to most alternative genotyping options. While the ability to identify individual‐level data is lost (without bar‐coding), sequencing pooled samples can substantially lower costs without compromising the power to detect significant associations. We propose a hierarchical Bayesian model that estimates the association of each variant using pools of cases and controls, accounting for the variation in read depth across pools and sequencing error. To investigate the performance of our method across a range of number of pools, number of individuals within each pool, and average coverage, we undertook extensive simulations varying effect sizes, minor allele frequencies, and sequencing error rates. In general, the number of pools and pool size have dramatic effects on power while the total depth of coverage per pool has only a moderate impact. This information can guide the selection of a study design that maximizes power subject to cost, sample size, or other laboratory constraints. We provide an R package (hiPOD: hierarchical Pooled Optimal Design) to find the optimal design, allowing the user to specify a cost function, cost, and sample size limitations, and distributions of effect size, minor allele frequency, and sequencing error rate.  相似文献   

3.
In this paper we propose a new method to analyze time‐to‐event data in longitudinal genetic studies. This method address the fundamental problem of incorporating uncertainty when analyzing survival data and imputed single‐nucleotide polymorphisms (SNPs) from genome‐wide association studies (GWAS). Our method incorporates uncertainty in the likelihood function, the opposite of existing methods that incorporate the uncertainty in the design matrix. Through simulation studies and real data analyses, we show that our proposed method is unbiased and provides powerful results. We also show how combining results from different GWAS (meta‐analysis) may lead to wrong results when effects are not estimated using our approach. The model is implemented in an R package that is designed to analyze uncertainty not only arising from imputed SNPs, but also from copy number variants.  相似文献   

4.
The recent development of high‐throughput sequencing technologies calls for powerful statistical tests to detect rare genetic variants associated with complex human traits. Sampling related individuals in sequencing studies offers advantages over sampling unrelated individuals only, including improved protection against sequencing error, the ability to use imputation to make more efficient use of sequence data, and the possibility of power boost due to more observed copies of extremely rare alleles among relatives. With related individuals, familial correlation needs to be accounted for to ensure correct control over type I error and to improve power. Recognizing the limitations of existing rare‐variant association tests for family data, we propose MONSTER (Minimum P‐value Optimized Nuisance parameter Score Test Extended to Relatives), a robust rare‐variant association test, which generalizes the SKAT‐O method for independent samples. MONSTER uses a mixed effects model that accounts for covariates and additive polygenic effects. To obtain a powerful test, MONSTER adaptively adjusts to the unknown configuration of effects of rare‐variant sites. MONSTER also offers an analytical way of assessing P‐values, which is desirable because permutation is not straightforward to conduct in related samples. In simulation studies, we demonstrate that MONSTER effectively accounts for family structure, is computationally efficient and compares very favorably, in terms of power, to previously proposed tests that allow related individuals. We apply MONSTER to an analysis of high‐density lipoprotein cholesterol in the Framingham Heart Study, where we are able to replicate association with three genes.  相似文献   

5.
An important aspect of disease gene mapping is replication, that is, a putative finding in one group of individuals is confirmed in another set of individuals. As it can happen by chance that individuals share an estimated disease position, we developed a statistical approach to determine the p-value for multiple individuals or families to share a possibly small number of candidate susceptibility variants. Here, we focus on candidate variants for dominant traits that have been obtained by our previously developed heterozygosity analysis, and we are testing the sharing of candidate variants obtained for different individuals. Our approach allows for multiple pathogenic variants in a gene to contribute to disease, and for estimated disease variant positions to be imprecise. Statistically, the method developed here falls into the category of equivalence testing, where the classical null and alternative hypotheses of homogeneity and heterogeneity are reversed. The null hypothesis situation is created by permuting genomic locations of variants for one individual after another. We applied our methodology to the ALSPAC data set of 1,927 whole-genome sequenced individuals, where some individuals carry a pathogenic variant for the BRCA1 gene, but no two individuals carry the same variant. Our shared genomic segment analysis found significant evidence for BRCA1 pathogenic variants within ±5 kb of a given DNA variant.  相似文献   

6.
Objective: Observational studies have shown the association between iron status and osteoarthritis (OA). However, due to difficulties of determining sequential temporality, their causal association is still elusive. Based on the summary data of genome-wide association studies (GWASs) of a large-scale population, this study explored the genetic causal association between iron status and OA. Methods: First, we took a series of quality control steps to select eligible instrumental SNPs which were strongly associated with exposure. The genetic causal association between iron status and OA was analyzed using the two-sample Mendelian randomization (MR). Inverse-variance weighted (IVW), MR-Egger, weighted median, simple mode, and weighted mode methods were used for analysis. The results were mainly based on IVW (random effects), followed by sensitivity analysis. IVW and MR-Egger were used for heterogeneity testing. MR-Egger was also used for pleiotropy testing. Leave-one-SNP-out analysis was used to identify single nucleotide polymorphisms (SNPs) with potential impact. Maximum likelihood, penalized weighted median, and IVW (fixed effects) were performed to further validate the reliability of results. Results: IVW results showed that transferrin saturation had a positive causal association with knee osteoarthritis (KOA), hip osteoarthritis (HOA) and KOA or HOA (p < 0.05, OR > 1), and there was a negative causal association between transferrin and HOA and KOA or HOA (p < 0.05, OR < 1). The results of heterogeneity test showed that our IVW analysis results were basically free of heterogeneity (p > 0.05). The results of the pleiotropy test showed that there was no pleiotropy in our IVW analysis (p > 0.05). The analysis results of maximum likelihood, penalized weighted median and IVW (fixed effects) were consistent with our IVW results. No genetic causal association was found between serum iron and ferritin and OA. Conclusions: This study provides evidence of the causal association between iron status and OA, which provides novel insights to the genetic research of OA.  相似文献   

7.
Along with the accumulated data of genetic variants and biomedical phenotypes in the genome era, statistical identification of pleiotropy is of growing interest for dissecting and understanding genetic correlations between complex traits. We proposed a novel method for estimating and testing pleiotropic effect of a genetic variant on two quantitative traits. Based on a covariance decomposition and estimation, our method quantifies pleiotropy as the portion of between‐trait correlation explained by the same genetic variant. Unlike most multiple‐trait methods that assess potential pleiotropy (i.e., whether a variant contributes to at least one trait), our method formulates a statistic that tests exact pleiotropy (i.e., whether a variant contributes to both of two traits). We developed two approaches (a regression approach and a bootstrapping approach) for such test and investigated their statistical properties, in comparison with other potential pleiotropy test methods. Our simulation shows that the regression approach produces correct P‐values under both the complete null (i.e., a variant has no effect on both two traits) and the incomplete null (i.e., a variant has effect on only one of two traits), but requires large sample sizes to achieve a good power, when the bootstrapping approach has a better power and produces conservative P‐values under the complete null. We demonstrate our method for detecting exact pleiotropy using a real GWAS dataset. Our method provides an easy‐to‐implement tool for measuring, testing, and understanding the pleiotropic effect of a single variant on the correlation architecture of two complex traits.  相似文献   

8.
In a genome‐wide association study (GWAS), association between genotype and phenotype at autosomal loci is generally tested by regression models. However, X‐chromosome data are often excluded from published analyses of autosomes because of the difference between males and females in number of X chromosomes. Failure to analyze X‐chromosome data at all is obviously less than ideal, and can lead to missed discoveries. Even when X‐chromosome data are included, they are often analyzed with suboptimal statistics. Several mathematically sensible statistics for X‐chromosome association have been proposed. The optimality of these statistics, however, is based on very specific simple genetic models. In addition, while previous simulation studies of these statistics have been informative, they have focused on single‐marker tests and have not considered the types of error that occur even under the null hypothesis when the entire X chromosome is scanned. In this study, we comprehensively tested several X‐chromosome association statistics using simulation studies that include the entire chromosome. We also considered a wide range of trait models for sex differences and phenotypic effects of X inactivation. We found that models that do not incorporate a sex effect can have large type I error in some cases. We also found that many of the best statistics perform well even when there are modest deviations, such as trait variance differences between the sexes or small sex differences in allele frequencies, from assumptions.  相似文献   

9.
Functional linear models are developed in this paper for testing associations between quantitative traits and genetic variants, which can be rare variants or common variants or the combination of the two. By treating multiple genetic variants of an individual in a human population as a realization of a stochastic process, the genome of an individual in a chromosome region is a continuum of sequence data rather than discrete observations. The genome of an individual is viewed as a stochastic function that contains both linkage and linkage disequilibrium (LD) information of the genetic markers. By using techniques of functional data analysis, both fixed and mixed effect functional linear models are built to test the association between quantitative traits and genetic variants adjusting for covariates. After extensive simulation analysis, it is shown that the F‐distributed tests of the proposed fixed effect functional linear models have higher power than that of sequence kernel association test (SKAT) and its optimal unified test (SKAT‐O) for three scenarios in most cases: (1) the causal variants are all rare, (2) the causal variants are both rare and common, and (3) the causal variants are common. The superior performance of the fixed effect functional linear models is most likely due to its optimal utilization of both genetic linkage and LD information of multiple genetic variants in a genome and similarity among different individuals, while SKAT and SKAT‐O only model the similarities and pairwise LD but do not model linkage and higher order LD information sufficiently. In addition, the proposed fixed effect models generate accurate type I error rates in simulation studies. We also show that the functional kernel score tests of the proposed mixed effect functional linear models are preferable in candidate gene analysis and small sample problems. The methods are applied to analyze three biochemical traits in data from the Trinity Students Study.  相似文献   

10.
With rapid advancements of sequencing technologies and accumulations of electronic health records, a large number of genetic variants and multiple correlated human complex traits have become available in many genetic association studies. Thus, it becomes necessary and important to develop new methods that can jointly analyze the association between multiple genetic variants and multiple traits. Compared with methods that only use a single marker or trait, the joint analysis of multiple genetic variants and multiple traits is more powerful since such an analysis can fully incorporate the correlation structure of genetic variants and/or traits and their mutual dependence patterns. However, most of existing methods that simultaneously analyze multiple genetic variants and multiple traits are only applicable to unrelated samples. We develop a new method called MF‐TOWmuT to detect association of multiple phenotypes and multiple genetic variants in a genomic region with family samples. MF‐TOWmuT is based on an optimally weighted combination of variants. Our method can be applied to both rare and common variants and both qualitative and quantitative traits. Our simulation results show that (1) the type I error of MF‐TOWmuT is preserved; (2) MF‐TOWmuT outperforms two existing methods such as Multiple Family‐based Quasi‐Likelihood Score Test and Multivariate Family‐based Rare Variant Association Test in terms of power. We also illustrate the usefulness of MF‐TOWmuT by analyzing genotypic and phenotipic data from the Genetics of Kidneys in Diabetes study. R program is available at https://github.com/gaochengPRC/MF-TOWmuT .  相似文献   

11.
Instrumental variable regression is one way to overcome unmeasured confounding and estimate causal effect in observational studies. Built on structural mean models, there has been considerable work recently developed for consistent estimation of causal relative risk and causal odds ratio. Such models can sometimes suffer from identification issues for weak instruments. This hampered the applicability of Mendelian randomization analysis in genetic epidemiology. When there are multiple genetic variants available as instrumental variables, and causal effect is defined in a generalized linear model in the presence of unmeasured confounders, we propose to test concordance between instrumental variable effects on the intermediate exposure and instrumental variable effects on the disease outcome, as a means to test the causal effect. We show that a class of generalized least squares estimators provide valid and consistent tests of causality. For causal effect of a continuous exposure on a dichotomous outcome in logistic models, the proposed estimators are shown to be asymptotically conservative. When the disease outcome is rare, such estimators are consistent because of the log‐linear approximation of the logistic function. Optimality of such estimators relative to the well‐known two‐stage least squares estimator and the double‐logistic structural mean model is further discussed. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

12.
Genetic studies of survival outcomes have been proposed and conducted recently, but statistical methods for identifying genetic variants that affect disease progression are rarely developed. Motivated by our ongoing real studies, here we develop Cox proportional hazard models using functional regression (FR) to perform gene‐based association analysis of survival traits while adjusting for covariates. The proposed Cox models are fixed effect models where the genetic effects of multiple genetic variants are assumed to be fixed. We introduce likelihood ratio test (LRT) statistics to test for associations between the survival traits and multiple genetic variants in a genetic region. Extensive simulation studies demonstrate that the proposed Cox RF LRT statistics have well‐controlled type I error rates. To evaluate power, we compare the Cox FR LRT with the previously developed burden test (BT) in a Cox model and sequence kernel association test (SKAT), which is based on mixed effect Cox models. The Cox FR LRT statistics have higher power than or similar power as Cox SKAT LRT except when 50%/50% causal variants had negative/positive effects and all causal variants are rare. In addition, the Cox FR LRT statistics have higher power than Cox BT LRT. The models and related test statistics can be useful in the whole genome and whole exome association studies. An age‐related macular degeneration dataset was analyzed as an example.  相似文献   

13.
The etiology of complex traits likely involves the effects of genetic and environmental factors, along with complicated interaction effects between them. Consequently, there has been interest in applying genetic association tests of complex traits that account for potential modification of the genetic effect in the presence of an environmental factor. One can perform such an analysis using a joint test of gene and gene‐environment interaction. An optimal joint test would be one that remains powerful under a variety of models ranging from those of strong gene‐environment interaction effect to those of little or no gene‐environment interaction effect. To fill this demand, we have extended a kernel machine based approach for association mapping of multiple SNPs to consider joint tests of gene and gene‐environment interaction. The kernel‐based approach for joint testing is promising, because it incorporates linkage disequilibrium information from multiple SNPs simultaneously in analysis and permits flexible modeling of interaction effects. Using simulated data, we show that our kernel machine approach typically outperforms the traditional joint test under strong gene‐environment interaction models and further outperforms the traditional main‐effect association test under models of weak or no gene‐environment interaction effects. We illustrate our test using genome‐wide association data from the Grady Trauma Project, a cohort of highly traumatized, at‐risk individuals, which has previously been investigated for interaction effects.  相似文献   

14.
In the past few years, a plethora of methods for rare variant association with phenotype have been proposed. These methods aggregate information from multiple rare variants across genomic region(s), but there is little consensus as to which method is most effective. The weighting scheme adopted when aggregating information across variants is one of the primary determinants of effectiveness. Here we present a systematic evaluation of multiple weighting schemes through a series of simulations intended to mimic large sequencing studies of a quantitative trait. We evaluate existing phenotype‐independent and phenotype‐dependent methods, as well as weights estimated by penalized regression approaches including Lasso, Elastic Net, and SCAD. We find that the difference in power between phenotype‐dependent schemes is negligible when high‐quality functional annotations are available. When functional annotations are unavailable or incomplete, all methods suffer from power loss; however, the variable selection methods outperform the others at the cost of increased computational time. Therefore, in the absence of good annotation, we recommend variable selection methods (which can be viewed as “statistical annotation”) on top of regions implicated by a phenotype‐independent weighting scheme. Further, once a region is implicated, variable selection can help to identify potential causal single nucleotide polymorphisms for biological validation. These findings are supported by an analysis of a high coverage targeted sequencing study of 1,898 individuals.  相似文献   

15.
So HC  Li M  Sham PC 《Genetic epidemiology》2011,35(6):447-456
Genome-wide association studies (GWAS) have become increasingly popular recently and contributed to the discovery of many susceptibility variants. However, a large proportion of the heritability still remained unexplained. This observation raises queries regarding the ability of GWAS to uncover the genetic basis of complex diseases. In this study, we propose a simple and fast statistical framework to estimate the total heritability explained by all true susceptibility variants in a GWAS. It is expected that many true risk variants will not be detected in a GWAS due to limited power. The proposed framework aims at recovering the "hidden" heritability. Importantly, only the summary z-statistics are required as input and no raw genotype data are needed. The strategy is to recover the true effect sizes from the observed z-statistics. The methodology does not rely on any distributional assumptions of the effect sizes of variants. Both binary and quantitative traits can be handled and covariates may be included. Population-based or family-based designs are allowed as long as the summary statistics are available. Simulations were conducted and showed satisfactory performance of the proposed approach. Application to real data (Crohn's disease, HDL, LDL, and triglycerides) reveals that at least around 10-20% of variance in liability or phenotype can be explained by GWAS panels. This translates to around 10-40% of the total heritability for the studied traits.  相似文献   

16.
Identification of gene‐environment interaction (G × E) is important in understanding the etiology of complex diseases. Based on our previously developed Set Based gene EnviRonment InterAction test (SBERIA), in this paper we propose a powerful framework for enhanced set‐based G × E testing (eSBERIA). The major challenge of signal aggregation within a set is how to tell signals from noise. eSBERIA tackles this challenge by adaptively aggregating the interaction signals within a set weighted by the strength of the marginal and correlation screening signals. eSBERIA then combines the screening‐informed aggregate test with a variance component test to account for the residual signals. Additionally, we develop a case‐only extension for eSBERIA (coSBERIA) and an existing set‐based method, which boosts the power not only by exploiting the G‐E independence assumption but also by avoiding the need to specify main effects for a large number of variants in the set. Through extensive simulation, we show that coSBERIA and eSBERIA are considerably more powerful than existing methods within the case‐only and the case‐control method categories across a wide range of scenarios. We conduct a genome‐wide G × E search by applying our methods to Illumina HumanExome Beadchip data of 10,446 colorectal cancer cases and 10,191 controls and identify two novel interactions between nonsteroidal anti‐inflammatory drugs (NSAIDs) and MINK1 and PTCHD3.  相似文献   

17.
Many longitudinal cohort studies have both genome‐wide measures of genetic variation and repeated measures of phenotypes and environmental exposures. Genome‐wide association study analyses have typically used only cross‐sectional data to evaluate quantitative phenotypes and binary traits. Incorporation of repeated measures may increase power to detect associations, but also requires specialized analysis methods. Here, we discuss one such method—generalized estimating equations (GEE)—in the contexts of analysis of main effects of rare genetic variants and analysis of gene‐environment interactions. We illustrate the potential for increased power using GEE analyses instead of cross‐sectional analyses. We also address challenges that arise, such as the need for small‐sample corrections when the minor allele frequency of a genetic variant and/or the prevalence of an environmental exposure is low. To illustrate methods for detection of gene‐drug interactions on a genome‐wide scale, using repeated measures data, we conduct single‐study analyses and meta‐analyses across studies in three large cohort studies participating in the Cohorts for Heart and Aging Research in Genomic Epidemiology consortium—the Atherosclerosis Risk in Communities study, the Cardiovascular Health Study, and the Rotterdam Study. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

18.
Recent studies suggest that rare variants play an important role in the etiology of many traits. Although a number of methods have been developed for genetic association analysis of rare variants, they all assume a relatively homogeneous population under study. Such an assumption may not be valid for samples collected from admixed populations such asAfricanAmericans andHispanicAmericans as there is a great extent of local variation in ancestry in these populations. To ensure valid and more powerful rare variant association tests performed in admixed populations, we have developed a local ancestry‐based weighted dosage test, which is able to take into account local ancestry of rare alleles, uncertainties in rare variant imputation when imputed data are included, and the direction of effect that rare variants exert on phenotypic outcome. We used simulated sequence data to show that our proposed test has controlled typeIerror rates, whereas naïve application of existing rare variants tests and tests that adjust for global ancestry lead to inflated type I error rates. We showed that our test has higher power than tests without proper adjustment of ancestry. We also applied the proposed method to a candidate gene study on low‐density lipoprotein cholesterol. Our results suggest that it is important to appropriately control for potential population stratification induced by local ancestry difference in the analysis of rare variants in admixed populations.  相似文献   

19.
Most rare‐variant association tests for complex traits are applicable only to population‐based or case‐control resequencing studies. There are fewer rare‐variant association tests for family‐based resequencing studies, which is unfortunate because pedigrees possess many attractive characteristics for such analyses. Family‐based studies can be more powerful than their population‐based counterparts due to increased genetic load and further enable the implementation of rare‐variant association tests that, by design, are robust to confounding due to population stratification. With this in mind, we propose a rare‐variant association test for quantitative traits in families; this test integrates the QTDT approach of Abecasis et al. [Abecasis et al., 2000a ] into the kernel‐based SNP association test KMFAM of Schifano et al. [Schifano et al., 2012 ]. The resulting within‐family test enjoys the many benefits of the kernel framework for rare‐variant association testing, including rapid evaluation of P‐values and preservation of power when a region harbors rare causal variation that acts in different directions on phenotype. Additionally, by design, this within‐family test is robust to confounding due to population stratification. Although within‐family association tests are generally less powerful than their counterparts that use all genetic information, we show that we can recover much of this power (although still ensuring robustness to population stratification) using a straightforward screening procedure. Our method accommodates covariates and allows for missing parental genotype data, and we have written software implementing the approach in R for public use.  相似文献   

20.
Mendelian randomization, the use of genetic variants as instrumental variables (IV), can test for and estimate the causal effect of an exposure on an outcome. Most IV methods assume that the function relating the exposure to the expected value of the outcome (the exposure‐outcome relationship) is linear. However, in practice, this assumption may not hold. Indeed, often the primary question of interest is to assess the shape of this relationship. We present two novel IV methods for investigating the shape of the exposure‐outcome relationship: a fractional polynomial method and a piecewise linear method. We divide the population into strata using the exposure distribution, and estimate a causal effect, referred to as a localized average causal effect (LACE), in each stratum of population. The fractional polynomial method performs metaregression on these LACE estimates. The piecewise linear method estimates a continuous piecewise linear function, the gradient of which is the LACE estimate in each stratum. Both methods were demonstrated in a simulation study to estimate the true exposure‐outcome relationship well, particularly when the relationship was a fractional polynomial (for the fractional polynomial method) or was piecewise linear (for the piecewise linear method). The methods were used to investigate the shape of relationship of body mass index with systolic blood pressure and diastolic blood pressure.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号