首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 296 毫秒
1.
Pathway analysis can complement point‐wise single nucleotide polymorphism (SNP) analysis in exploring genomewide association study (GWAS) data to identify specific disease‐associated genes that can be candidate causal genes. We propose a straightforward methodology that can be used for conducting a gene‐based pathway analysis using summary GWAS statistics in combination with widely available reference genotype data. We used this method to perform a gene‐based pathway analysis of a type 1 diabetes (T1D) meta‐analysis GWAS (of 7,514 cases and 9,045 controls). An important feature of the conducted analysis is the removal of the major histocompatibility complex gene region, the major genetic risk factor for T1D. Thirty‐one of the 1,583 (2%) tested pathways were identified to be enriched for association with T1D at a 5% false discovery rate. We analyzed these 31 pathways and their genes to identify SNPs in or near these pathway genes that showed potentially novel association with T1D and attempted to replicate the association of 22 SNPs in additional samples. Replication P‐values were skewed () with 12 of the 22 SNPs showing . Support, including replication evidence, was obtained for nine T1D associated variants in genes ITGB7 (rs11170466, ), NRP1 (rs722988, ), BAD (rs694739, ), CTSB (rs1296023, ), FYN (rs11964650, ), UBE2G1 (rs9906760, ), MAP3K14 (rs17759555, ), ITGB1 (rs1557150, ), and IL7R (rs1445898, ). The proposed methodology can be applied to other GWAS datasets for which only summary level data are available.  相似文献   

2.
Investigators often meta‐analyze multiple genome‐wide association studies (GWASs) to increase the power to detect associations of single nucleotide polymorphisms (SNPs) with a trait. Meta‐analysis is also performed within a single cohort that is stratified by, e.g., sex or ancestry group. Having correlated individuals among the strata may complicate meta‐analyses, limit power, and inflate Type 1 error. For example, in the Hispanic Community Health Study/Study of Latinos (HCHS/SOL), sources of correlation include genetic relatedness, shared household, and shared community. We propose a novel mixed‐effect model for meta‐analysis, “MetaCor,” which accounts for correlation between stratum‐specific effect estimates. Simulations show that MetaCor controls inflation better than alternatives such as ignoring the correlation between the strata or analyzing all strata together in a “pooled” GWAS, especially with different minor allele frequencies (MAFs) between strata. We illustrate the benefits of MetaCor on two GWASs in the HCHS/SOL. Analysis of dental caries (tooth decay) stratified by ancestry group detected a genome‐wide significant SNP (rs7791001, P‐value = , compared to in pooled), with different MAFs between strata. Stratified analysis of body mass index (BMI) by ancestry group and sex reduced overall inflation from (pooled) to (MetaCor). Furthermore, even after removing close relatives to obtain nearly uncorrelated strata, a naïve stratified analysis resulted in compared to for MetaCor.  相似文献   

3.
Lu Q  Wei C  Ye C  Li M  Elston RC 《Genetic epidemiology》2012,36(6):583-593
The potential importance of the joint action of genes, whether modeled with or without a statistical interaction term, has long been recognized. However, identifying such action has been a great challenge, especially when millions of genetic markers are involved. We propose a likelihood ratio‐based Mann‐Whitney test to search for joint gene action either among candidate genes or genome‐wide. It extends the traditional univariate Mann‐Whitney test to assess the joint association of genotypes at multiple loci with disease, allowing for high‐order statistical interactions. Because only one overall significance test is conducted for the entire analysis, it avoids the issue of multiple testing. Moreover, the approach adopts a computationally efficient algorithm, making a genome‐wide search feasible in a reasonable amount of time on a high performance personal computer. We evaluated the approach using both theoretical and real data. By applying the approach to 40 type 2 diabetes (T2D) susceptibility single‐nucleotide polymorphisms (SNPs), we identified a four‐locus model strongly associated with T2D in the Wellcome Trust (WT) study (permutation P‐value < 0.001), and replicated the same finding in the Nurses’ Health Study/Health Professionals Follow‐Up Study (NHS/HPFS) (P‐value = ). We also conducted a genome‐wide search on 385,598 SNPs in the WT study. The analysis took approximately 55 hr on a personal computer, identifying the same first two loci, but overall a different set of four SNPs, jointly associated with T2D (P‐value = ). The nominal significance of this same association reached in the NHS/HPFS. Genet. Epidemiol. 00:1‐11, 2012. © 2012 Wiley Periodicals, Inc.  相似文献   

4.
Epigenome‐wide association studies (EWAS) are designed to characterise population‐level epigenetic differences across the genome and link them to disease. Most commonly, they assess DNA‐methylation status at cytosine‐guanine dinucleotide (CpG) sites, using platforms such as the Illumina 450k array that profile a subset of CpGs genome wide. An important challenge in the context of EWAS is determining a significance threshold for declaring a CpG site as differentially methylated, taking multiple testing into account. We used a permutation method to estimate a significance threshold specifically for the 450k array and a simulation extrapolation approach to estimate a genome‐wide threshold. These methods were applied to five different EWAS datasets derived from a variety of populations and tissue types. We obtained an estimate of for the 450k array, and a genome‐wide estimate of . We further demonstrate the importance of these results by showing that previously recommended sample sizes for EWAS should be adjusted upwards, requiring samples between ~10% and ~20% larger in order to maintain type‐1 errors at the desired level.  相似文献   

5.
The current era of targeted treatment has accelerated the interest in studying gene‐treatment, gene‐gene, and gene‐environment interactions using statistical models in the health sciences. Interactions are incorporated into models as product terms of risk factors. The statistical significance of interactions is traditionally examined using a likelihood ratio test (LRT). Epidemiological and clinical studies also evaluate interactions in order to understand the prognostic and predictive values of genetic factors. However, it is not clear how different types and magnitudes of interaction effects are related to prognostic and predictive values. The contribution of interaction to prognostic values can be examined via improvements in the area under the receiver operating characteristic curve due to the inclusion of interaction terms in the model (). We develop a resampling based approach to test the significance of this improvement and show that it is equivalent to LRT. Predictive values provide insights into whether carriers of genetic factors benefit from specific treatment or preventive interventions relative to noncarriers, under some definition of treatment benefit. However, there is no unique definition of the term treatment benefit. We show that and relative excess risk due to interaction (RERI) measure predictive values under two specific definitions of treatment benefit. We investigate the properties of LRT, , and RERI using simulations. We illustrate these approaches using published melanoma data to understand the benefits of possible intervention on sun exposure in relation to the MC1R gene. The goal is to evaluate possible interventions on sun exposure in relation to MC1R.  相似文献   

6.
It is hypothesized that certain alleles can have a protective effect not only when inherited by the offspring but also as noninherited maternal antigens (NIMA). To estimate the NIMA effect, large samples of families are needed. When large samples are not available, we propose a combined approach to estimate the NIMA effect from ascertained nuclear families and twin pairs. We develop a likelihood‐based approach allowing for several ascertainment schemes, to accommodate for the outcome‐dependent sampling scheme, and a family‐specific random term, to take into account the correlation between family members. We estimate the parameters using maximum likelihood based on the combined joint likelihood () approach. Simulations show that the is more efficient for estimating the NIMA odds ratios as compared to a families‐only approach. To illustrate our approach, we used data from a family and a twin study from the United Kingdom on rheumatoid arthritis, and confirmed the protective NIMA effect, with an odds ratio of 0.477 (95% CI 0.264–0.864).  相似文献   

7.

1 Background

Epistasis and gene‐environment interactions are known to contribute significantly to variation of complex phenotypes in model organisms. However, their identification in human association studies remains challenging for myriad reasons. In the case of epistatic interactions, the large number of potential interacting sets of genes presents computational, multiple hypothesis correction, and other statistical power issues. In the case of gene‐environment interactions, the lack of consistently measured environmental covariates in most disease studies precludes searching for interactions and creates difficulties for replicating studies.

2 Results

In this work, we develop a new statistical approach to address these issues that leverages genetic ancestry, defined as the proportion of ancestry derived from each ancestral population (e.g., the fraction of European/African ancestry in African Americans), in admixed populations. We applied our method to gene expression and methylation data from African American and Latino admixed individuals, respectively, identifying nine interactions that were significant at . We show that two of the interactions in methylation data replicate, and the remaining six are significantly enriched for low P‐values ().

3 Conclusion

We show that genetic ancestry can be a useful proxy for unknown and unmeasured covariates in the search for interaction effects. These results have important implications for our understanding of the genetic architecture of complex traits.  相似文献   

8.
Many important complex diseases are composed of a series of phenotypes, which makes the disease diagnosis and its genetic dissection difficult. The standard procedures to determine heritability in such complex diseases are either applied for single phenotype analyses or to compare findings across phenotypes or multidimensional reduction procedures, such as principal components analysis using all phenotypes. However each method has its own problems and the challenges are even more complex for extended family data and categorical phenotypes. In this paper, we propose a methodology to determine a scale for complex outcomes involving multiple categorical phenotypes in extended pedigrees using item response theory (IRT) models that take all categorical phenotypes into account, allowing informative comparison among individuals. An advantage of the IRT framework is that a straightforward joint heritability parameter can be estimated for categorical phenotypes. Furthermore, our methodology allows many possible extensions such as the inclusion of covariates and multiple variance components. We use Markov Chain Monte Carlo algorithm for the parameter estimation and validate our method through simulated data. As an application we consider the metabolic syndrome as the multiple phenotype disease using data from the Baependi Heart Study consisting of 1,696 individuals in 95 families. We adjust IRT models without covariates and include age and age squared as covariates. The results showed that adjusting for covariates yields a higher joint heritability () than without co variates () indicating that the covariates absorbed some of the error variance.  相似文献   

9.
The sequence kernel association test (SKAT) is widely used to test for associations between a phenotype and a set of genetic variants that are usually rare. Evaluating tail probabilities or quantiles of the null distribution for SKAT requires computing the eigenvalues of a matrix related to the genotype covariance between markers. Extracting the full set of eigenvalues of this matrix (an matrix, for n subjects) has computational complexity proportional to n3. As SKAT is often used when , this step becomes a major bottleneck in its use in practice. We therefore propose fastSKAT, a new computationally inexpensive but accurate approximations to the tail probabilities, in which the k largest eigenvalues of a weighted genotype covariance matrix or the largest singular values of a weighted genotype matrix are extracted, and a single term based on the Satterthwaite approximation is used for the remaining eigenvalues. While the method is not particularly sensitive to the choice of k, we also describe how to choose its value, and show how fastSKAT can automatically alert users to the rare cases where the choice may affect results. As well as providing faster implementation of SKAT, the new method also enables entirely new applications of SKAT that were not possible before; we give examples grouping variants by topologically associating domains, and comparing chromosome‐wide association by class of histone marker.  相似文献   

10.
In genome‐wide association studies (GWAS), “generalization” is the replication of genotype‐phenotype association in a population with different ancestry than the population in which it was first identified. Current practices for declaring generalizations rely on testing associations while controlling the family‐wise error rate (FWER) in the discovery study, then separately controlling error measures in the follow‐up study. This approach does not guarantee control over the FWER or false discovery rate (FDR) of the generalization null hypotheses. It also fails to leverage the two‐stage design to increase power for detecting generalized associations. We provide a formal statistical framework for quantifying the evidence of generalization that accounts for the (in)consistency between the directions of associations in the discovery and follow‐up studies. We develop the directional generalization FWER (FWERg) and FDR (FDRg) controlling r‐values, which are used to declare associations as generalized. This framework extends to generalization testing when applied to a published list of Single Nucleotide Polymorphism‐(SNP)‐trait associations. Our methods control FWERg or FDRg under various SNP selection rules based on P‐values in the discovery study. We find that it is often beneficial to use a more lenient P‐value threshold than the genome‐wide significance threshold. In a GWAS of total cholesterol in the Hispanic Community Health Study/Study of Latinos (HCHS/SOL), when testing all SNPs with P‐values (15 genomic regions) for generalization in a large GWAS of whites, we generalized SNPs from 15 regions. But when testing all SNPs with P‐values (89 regions), we generalized SNPs from 27 regions.  相似文献   

11.
Assessing the magnitude of heterogeneity in a meta‐analysis is important for determining the appropriateness of combining results. The most popular measure of heterogeneity, I2, was derived under an assumption of homogeneity of the within‐study variances, which is almost never true, and the alternative estimator, , uses the harmonic mean to estimate the average of the within‐study variances, which may also lead to bias. This paper thus presents a new measure for quantifying the extent to which the variance of the pooled random‐effects estimator is due to between‐studies variation, , that overcomes the limitations of the previous approach. We show that this measure estimates the expected value of the proportion of total variance due to between‐studies variation and we present its point and interval estimators. The performance of all three heterogeneity measures is evaluated in an extensive simulation study. A negative bias for was observed when the number of studies was very small and became negligible as the number of studies increased, while and I2 showed a tendency to overestimate the impact of heterogeneity. The coverage of confidence intervals based upon was good across different simulation scenarios but was substantially lower for and I2, especially for high values of heterogeneity and when a large number of studies were included in the meta‐analysis. The proposed measure is implemented in a user‐friendly function available for routine use in r and sas . will be useful in quantifying the magnitude of heterogeneity in meta‐analysis and should supplement the p‐value for the test of heterogeneity obtained from the Q test. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

12.
Precise measurement of sedentary behavior and physical activity is necessary to characterize the dose‐response relationship between these variables and health outcomes. The most frequently used methods employ portable devices to measure mechanical or physiological parameters (eg, pedometers, heart rate monitors, accelerometers). There is considerable variability in the accuracy of total energy expenditure (TEE) estimates from these devices. This review examines the potential of measurement of ventilation () to provide an estimate of free‐living TEE. The existence of a linear relationship between and energy expenditure (EE) was demonstrated in the mid‐20th century. However, few studies have investigated this parameter as an estimate of EE due to the cumbersome equipment required to measure . Portable systems that measure without the use of a mouthpiece have existed for about 20 years (respiratory inductive plethysmography). However, these devices are adapted for clinical monitoring and are too cumbersome to be used in conditions of daily life. Technological innovations of recent years (small electromagnetic coils glued on the chest/back) suggest that could be estimated from variations in rib cage and abdominal distances. This method of TEE estimation is based on the development of individual/group calibration curves to predict the relationship between ventilation and oxygen consumption. The new method provides a reasonably accurate estimate of TEE in different free‐living conditions such as sitting, standing, and walking. Further work is required to integrate these electromagnetic coils into a jacket or T‐shirt to create a wearable device suitable for long‐term use in free‐living conditions.  相似文献   

13.
Recent studies have examined the genetic correlations of single-nucleotide polymorphism (SNP) effect sizes across pairs of populations to better understand the genetic architectures of complex traits. These studies have estimated , the cross-population correlation of joint-fit effect sizes at genotyped SNPs. However, the value of depends both on the cross-population correlation of true causal effect sizes () and on the similarity in linkage disequilibrium (LD) patterns in the two populations, which drive tagging effects. Here, we derive the value of the ratio as a function of LD in each population. By applying existing methods to obtain estimates of , we can use this ratio to estimate . Our estimates of were equal to 0.55 ( SE = 0.14) between Europeans and East Asians averaged across nine traits in the Genetic Epidemiology Research on Adult Health and Aging data set, 0.54 ( SE = 0.18) between Europeans and South Asians averaged across 13 traits in the UK Biobank data set, and 0.48 ( SE = 0.06) and 0.65 ( SE = 0.09) between Europeans and East Asians in summary statistic data sets for type 2 diabetes and rheumatoid arthritis, respectively. These results implicate substantially different causal genetic architectures across continental populations.  相似文献   

14.
Genome‐wide association studies, which typically report regression coefficients summarizing the associations of many genetic variants with various traits, are potentially a powerful source of data for Mendelian randomization investigations. We demonstrate how such coefficients from multiple variants can be combined in a Mendelian randomization analysis to estimate the causal effect of a risk factor on an outcome. The bias and efficiency of estimates based on summarized data are compared to those based on individual‐level data in simulation studies. We investigate the impact of gene–gene interactions, linkage disequilibrium, and ‘weak instruments’ on these estimates. Both an inverse‐variance weighted average of variant‐specific associations and a likelihood‐based approach for summarized data give similar estimates and precision to the two‐stage least squares method for individual‐level data, even when there are gene–gene interactions. However, these summarized data methods overstate precision when variants are in linkage disequilibrium. If the P‐value in a linear regression of the risk factor for each variant is less than , then weak instrument bias will be small. We use these methods to estimate the causal association of low‐density lipoprotein cholesterol (LDL‐C) on coronary artery disease using published data on five genetic variants. A 30% reduction in LDL‐C is estimated to reduce coronary artery disease risk by 67% (95% CI: 54% to 76%). We conclude that Mendelian randomization investigations using summarized data from uncorrelated variants are similarly efficient to those using individual‐level data, although the necessary assumptions cannot be so fully assessed.  相似文献   

15.
Advances in DNA sequencing technology facilitate investigating the impact of rare variants on complex diseases. However, using a conventional case‐control design, large samples are needed to capture enough rare variants to achieve sufficient power for testing the association between suspected loci and complex diseases. In such large samples, population stratification may easily cause spurious signals. One approach to overcome stratification is to use a family‐based design. For rare variants, this strategy is especially appropriate, as power can be increased considerably by analyzing cases with affected relatives. We propose a novel framework for association testing in affected sibpairs by comparing the allele count of rare variants on chromosome regions shared identical by descent to the allele count of rare variants on nonshared chromosome regions, referred to as test for rare variant association with family‐based internal control (TRAFIC). This design is generally robust to population stratification as cases and controls are matched within each sibpair. We evaluate the power analytically using general model for effect size of rare variants. For the same number of genotyped people, TRAFIC shows superior power over the conventional case‐control study for variants with summed risk allele frequency ; this power advantage is even more substantial when considering allelic heterogeneity. For complex models of gene‐gene interaction, this power advantage depends on the direction of interaction and overall heritability. In sum, we introduce a new method for analyzing rare variants in affected sibpairs that is robust to population stratification, and provide freely available software.  相似文献   

16.
When comparing two treatment groups in a time‐to‐event analysis, it is common to use a composite event consisting of two or more distinct outcomes. The goal of this paper is to develop a statistical methodology to derive efficiency guidelines for deciding whether to expand a study primary endpoint from (for example, non‐fatal myocardial infarction and cardiovascular death) to the composite of and (for example, non‐fatal myocardial infarction, cardiovascular death or revascularisation). We investigate this problem by considering the asymptotic relative efficiency of a log‐rank test for comparing treatment groups with respect to a primary relevant endpoint versus the composite primary endpoint, say , of and , where is some additional endpoint. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

17.
Case‐control association studies often collect from their subjects information on secondary phenotypes. Reusing the data and studying the association between genes and secondary phenotypes provide an attractive and cost‐effective approach that can lead to discovery of new genetic associations. A number of approaches have been proposed, including simple and computationally efficient ad hoc methods that ignore ascertainment or stratify on case‐control status. Justification for these approaches relies on the assumption of no covariates and the correct specification of the primary disease model as a logistic model. Both might not be true in practice, for example, in the presence of population stratification or the primary disease model following a probit model. In this paper, we investigate the validity of ad hoc methods in the presence of covariates and possible disease model misspecification. We show that in taking an ad hoc approach, it may be desirable to include covariates that affect the primary disease in the secondary phenotype model, even though these covariates are not necessarily associated with the secondary phenotype. We also show that when the disease is rare, ad hoc methods can lead to severely biased estimation and inference if the true disease model follows a probit model instead of a logistic model. Our results are justified theoretically and via simulations. Applied to real data analysis of genetic associations with cigarette smoking, ad hoc methods collectively identified as highly significant () single nucleotide polymorphisms from over 10 genes, genes that were identified in previous studies of smoking cessation.  相似文献   

18.
There has been increasing interest in developing more powerful and flexible statistical tests to detect genetic associations with multiple traits, as arising from neuroimaging genetic studies. Most of existing methods treat a single trait or multiple traits as response while treating an SNP as a predictor coded under an additive inheritance mode. In this paper, we follow an earlier approach in treating an SNP as an ordinal response while treating traits as predictors in a proportional odds model (POM). In this way, it is not only easier to handle mixed types of traits, e.g., some quantitative and some binary, but it is also potentially more robust to the commonly adopted additive inheritance mode. More importantly, we develop an adaptive test in a POM so that it can maintain high power across many possible situations. Compared to the existing methods treating multiple traits as responses, e.g., in a generalized estimating equation (GEE) approach, the proposed method can be applied to a high dimensional setting where the number of phenotypes (p) can be larger than the sample size (n), in addition to a usual small P setting. The promising performance of the proposed method was demonstrated with applications to the Alzheimer's Disease Neuroimaging Initiative (ADNI) data, in which either structural MRI driven phenotypes or resting‐state functional MRI (rs‐fMRI) derived brain functional connectivity measures were used as phenotypes. The applications led to the identification of several top SNPs of biological interest. Furthermore, simulation studies showed competitive performance of the new method, especially for .  相似文献   

19.
Bayes factors (BFs) are becoming increasingly important tools in genetic association studies, partly because they provide a natural framework for including prior information. The Wakefield BF (WBF) approximation is easy to calculate and assumes a normal prior on the log odds ratio (logOR) with a mean of zero. However, the prior variance (W) must be specified. Because of the potentially high sensitivity of the WBF to the choice of W, we propose several new BF approximations with , but allow W to take a probability distribution rather than a fixed value. We provide several prior distributions for W which lead to BFs that can be calculated easily in freely available software packages. These priors allow a wide range of densities for W and provide considerable flexibility. We examine some properties of the priors and BFs and show how to determine the most appropriate prior based on elicited quantiles of the prior odds ratio (OR). We show by simulation that our novel BFs have superior true‐positive rates at low false‐positive rates compared to those from both P‐value and WBF analyses across a range of sample sizes and ORs. We give an example of utilizing our BFs to fine‐map the CASP8 region using genotype data on approximately 46,000 breast cancer case and 43,000 healthy control samples from the Collaborative Oncological Gene‐environment Study (COGS) Consortium, and compare the single‐nucleotide polymorphism ranks to those obtained using WBFs and P‐values from univariate logistic regression.  相似文献   

20.
When evaluating a newly developed statistical test, an important step is to check its type 1 error (T1E) control using simulations. This is often achieved by the standard simulation design S0 under the so-called “theoretical” null of no association. In practice, the whole-genome association analyses scan through a large number of genetic markers (s) for the ones associated with an outcome of interest (), where comes from an alternative while the majority of s are not associated with ; the relationships are under the “empirical” null. This reality can be better represented by two other simulation designs, where design S1.1 simulates from analternative model based on , then evaluates its association with independently generated ; while design S1.2 evaluates the association between permutated and . More than a decade ago, Efron (2004) has noted the important distinction between the “theoretical” and “empirical” null in false discovery rate control. Using scale tests for variance heterogeneity, direct univariate, and multivariate interaction tests as examples, here we show that not all null simulation designs are equal. In examining the accuracy of a likelihood ratio test, while simulation design S0 suggested the method being accurate, designs S1.1 and S1.2 revealed its increased empirical T1E rate if applied in real data setting. The inflation becomes more severe at the tail and does not diminish as sample size increases. This is an important observation that calls for new practices for methods evaluation and T1E control interpretation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号