首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 57 毫秒
1.
Pathway analysis can complement point‐wise single nucleotide polymorphism (SNP) analysis in exploring genomewide association study (GWAS) data to identify specific disease‐associated genes that can be candidate causal genes. We propose a straightforward methodology that can be used for conducting a gene‐based pathway analysis using summary GWAS statistics in combination with widely available reference genotype data. We used this method to perform a gene‐based pathway analysis of a type 1 diabetes (T1D) meta‐analysis GWAS (of 7,514 cases and 9,045 controls). An important feature of the conducted analysis is the removal of the major histocompatibility complex gene region, the major genetic risk factor for T1D. Thirty‐one of the 1,583 (2%) tested pathways were identified to be enriched for association with T1D at a 5% false discovery rate. We analyzed these 31 pathways and their genes to identify SNPs in or near these pathway genes that showed potentially novel association with T1D and attempted to replicate the association of 22 SNPs in additional samples. Replication P‐values were skewed () with 12 of the 22 SNPs showing . Support, including replication evidence, was obtained for nine T1D associated variants in genes ITGB7 (rs11170466, ), NRP1 (rs722988, ), BAD (rs694739, ), CTSB (rs1296023, ), FYN (rs11964650, ), UBE2G1 (rs9906760, ), MAP3K14 (rs17759555, ), ITGB1 (rs1557150, ), and IL7R (rs1445898, ). The proposed methodology can be applied to other GWAS datasets for which only summary level data are available.  相似文献   

2.
Investigators often meta‐analyze multiple genome‐wide association studies (GWASs) to increase the power to detect associations of single nucleotide polymorphisms (SNPs) with a trait. Meta‐analysis is also performed within a single cohort that is stratified by, e.g., sex or ancestry group. Having correlated individuals among the strata may complicate meta‐analyses, limit power, and inflate Type 1 error. For example, in the Hispanic Community Health Study/Study of Latinos (HCHS/SOL), sources of correlation include genetic relatedness, shared household, and shared community. We propose a novel mixed‐effect model for meta‐analysis, “MetaCor,” which accounts for correlation between stratum‐specific effect estimates. Simulations show that MetaCor controls inflation better than alternatives such as ignoring the correlation between the strata or analyzing all strata together in a “pooled” GWAS, especially with different minor allele frequencies (MAFs) between strata. We illustrate the benefits of MetaCor on two GWASs in the HCHS/SOL. Analysis of dental caries (tooth decay) stratified by ancestry group detected a genome‐wide significant SNP (rs7791001, P‐value = , compared to in pooled), with different MAFs between strata. Stratified analysis of body mass index (BMI) by ancestry group and sex reduced overall inflation from (pooled) to (MetaCor). Furthermore, even after removing close relatives to obtain nearly uncorrelated strata, a naïve stratified analysis resulted in compared to for MetaCor.  相似文献   

3.
Lu Q  Wei C  Ye C  Li M  Elston RC 《Genetic epidemiology》2012,36(6):583-593
The potential importance of the joint action of genes, whether modeled with or without a statistical interaction term, has long been recognized. However, identifying such action has been a great challenge, especially when millions of genetic markers are involved. We propose a likelihood ratio‐based Mann‐Whitney test to search for joint gene action either among candidate genes or genome‐wide. It extends the traditional univariate Mann‐Whitney test to assess the joint association of genotypes at multiple loci with disease, allowing for high‐order statistical interactions. Because only one overall significance test is conducted for the entire analysis, it avoids the issue of multiple testing. Moreover, the approach adopts a computationally efficient algorithm, making a genome‐wide search feasible in a reasonable amount of time on a high performance personal computer. We evaluated the approach using both theoretical and real data. By applying the approach to 40 type 2 diabetes (T2D) susceptibility single‐nucleotide polymorphisms (SNPs), we identified a four‐locus model strongly associated with T2D in the Wellcome Trust (WT) study (permutation P‐value < 0.001), and replicated the same finding in the Nurses’ Health Study/Health Professionals Follow‐Up Study (NHS/HPFS) (P‐value = ). We also conducted a genome‐wide search on 385,598 SNPs in the WT study. The analysis took approximately 55 hr on a personal computer, identifying the same first two loci, but overall a different set of four SNPs, jointly associated with T2D (P‐value = ). The nominal significance of this same association reached in the NHS/HPFS. Genet. Epidemiol. 00:1‐11, 2012. © 2012 Wiley Periodicals, Inc.  相似文献   

4.
Epigenome‐wide association studies (EWAS) are designed to characterise population‐level epigenetic differences across the genome and link them to disease. Most commonly, they assess DNA‐methylation status at cytosine‐guanine dinucleotide (CpG) sites, using platforms such as the Illumina 450k array that profile a subset of CpGs genome wide. An important challenge in the context of EWAS is determining a significance threshold for declaring a CpG site as differentially methylated, taking multiple testing into account. We used a permutation method to estimate a significance threshold specifically for the 450k array and a simulation extrapolation approach to estimate a genome‐wide threshold. These methods were applied to five different EWAS datasets derived from a variety of populations and tissue types. We obtained an estimate of for the 450k array, and a genome‐wide estimate of . We further demonstrate the importance of these results by showing that previously recommended sample sizes for EWAS should be adjusted upwards, requiring samples between ~10% and ~20% larger in order to maintain type‐1 errors at the desired level.  相似文献   

5.
The current era of targeted treatment has accelerated the interest in studying gene‐treatment, gene‐gene, and gene‐environment interactions using statistical models in the health sciences. Interactions are incorporated into models as product terms of risk factors. The statistical significance of interactions is traditionally examined using a likelihood ratio test (LRT). Epidemiological and clinical studies also evaluate interactions in order to understand the prognostic and predictive values of genetic factors. However, it is not clear how different types and magnitudes of interaction effects are related to prognostic and predictive values. The contribution of interaction to prognostic values can be examined via improvements in the area under the receiver operating characteristic curve due to the inclusion of interaction terms in the model (). We develop a resampling based approach to test the significance of this improvement and show that it is equivalent to LRT. Predictive values provide insights into whether carriers of genetic factors benefit from specific treatment or preventive interventions relative to noncarriers, under some definition of treatment benefit. However, there is no unique definition of the term treatment benefit. We show that and relative excess risk due to interaction (RERI) measure predictive values under two specific definitions of treatment benefit. We investigate the properties of LRT, , and RERI using simulations. We illustrate these approaches using published melanoma data to understand the benefits of possible intervention on sun exposure in relation to the MC1R gene. The goal is to evaluate possible interventions on sun exposure in relation to MC1R.  相似文献   

6.
In genome‐wide association studies (GWAS), “generalization” is the replication of genotype‐phenotype association in a population with different ancestry than the population in which it was first identified. Current practices for declaring generalizations rely on testing associations while controlling the family‐wise error rate (FWER) in the discovery study, then separately controlling error measures in the follow‐up study. This approach does not guarantee control over the FWER or false discovery rate (FDR) of the generalization null hypotheses. It also fails to leverage the two‐stage design to increase power for detecting generalized associations. We provide a formal statistical framework for quantifying the evidence of generalization that accounts for the (in)consistency between the directions of associations in the discovery and follow‐up studies. We develop the directional generalization FWER (FWERg) and FDR (FDRg) controlling r‐values, which are used to declare associations as generalized. This framework extends to generalization testing when applied to a published list of Single Nucleotide Polymorphism‐(SNP)‐trait associations. Our methods control FWERg or FDRg under various SNP selection rules based on P‐values in the discovery study. We find that it is often beneficial to use a more lenient P‐value threshold than the genome‐wide significance threshold. In a GWAS of total cholesterol in the Hispanic Community Health Study/Study of Latinos (HCHS/SOL), when testing all SNPs with P‐values (15 genomic regions) for generalization in a large GWAS of whites, we generalized SNPs from 15 regions. But when testing all SNPs with P‐values (89 regions), we generalized SNPs from 27 regions.  相似文献   

7.
The sequence kernel association test (SKAT) is widely used to test for associations between a phenotype and a set of genetic variants that are usually rare. Evaluating tail probabilities or quantiles of the null distribution for SKAT requires computing the eigenvalues of a matrix related to the genotype covariance between markers. Extracting the full set of eigenvalues of this matrix (an matrix, for n subjects) has computational complexity proportional to n3. As SKAT is often used when , this step becomes a major bottleneck in its use in practice. We therefore propose fastSKAT, a new computationally inexpensive but accurate approximations to the tail probabilities, in which the k largest eigenvalues of a weighted genotype covariance matrix or the largest singular values of a weighted genotype matrix are extracted, and a single term based on the Satterthwaite approximation is used for the remaining eigenvalues. While the method is not particularly sensitive to the choice of k, we also describe how to choose its value, and show how fastSKAT can automatically alert users to the rare cases where the choice may affect results. As well as providing faster implementation of SKAT, the new method also enables entirely new applications of SKAT that were not possible before; we give examples grouping variants by topologically associating domains, and comparing chromosome‐wide association by class of histone marker.  相似文献   

8.
It is hypothesized that certain alleles can have a protective effect not only when inherited by the offspring but also as noninherited maternal antigens (NIMA). To estimate the NIMA effect, large samples of families are needed. When large samples are not available, we propose a combined approach to estimate the NIMA effect from ascertained nuclear families and twin pairs. We develop a likelihood‐based approach allowing for several ascertainment schemes, to accommodate for the outcome‐dependent sampling scheme, and a family‐specific random term, to take into account the correlation between family members. We estimate the parameters using maximum likelihood based on the combined joint likelihood () approach. Simulations show that the is more efficient for estimating the NIMA odds ratios as compared to a families‐only approach. To illustrate our approach, we used data from a family and a twin study from the United Kingdom on rheumatoid arthritis, and confirmed the protective NIMA effect, with an odds ratio of 0.477 (95% CI 0.264–0.864).  相似文献   

9.
Many important complex diseases are composed of a series of phenotypes, which makes the disease diagnosis and its genetic dissection difficult. The standard procedures to determine heritability in such complex diseases are either applied for single phenotype analyses or to compare findings across phenotypes or multidimensional reduction procedures, such as principal components analysis using all phenotypes. However each method has its own problems and the challenges are even more complex for extended family data and categorical phenotypes. In this paper, we propose a methodology to determine a scale for complex outcomes involving multiple categorical phenotypes in extended pedigrees using item response theory (IRT) models that take all categorical phenotypes into account, allowing informative comparison among individuals. An advantage of the IRT framework is that a straightforward joint heritability parameter can be estimated for categorical phenotypes. Furthermore, our methodology allows many possible extensions such as the inclusion of covariates and multiple variance components. We use Markov Chain Monte Carlo algorithm for the parameter estimation and validate our method through simulated data. As an application we consider the metabolic syndrome as the multiple phenotype disease using data from the Baependi Heart Study consisting of 1,696 individuals in 95 families. We adjust IRT models without covariates and include age and age squared as covariates. The results showed that adjusting for covariates yields a higher joint heritability () than without co variates () indicating that the covariates absorbed some of the error variance.  相似文献   

10.
Genome‐wide association studies allow detection of non‐genotyped disease‐causing variants through testing of nearby genotyped SNPs. This approach may fail when there are no genotyped SNPs in strong LD with the causal variant. Several genotyped SNPs in weak LD with the causal variant may, however, considered together, provide equivalent information. This observation motivates popular but computationally intensive approaches based on imputation or haplotyping. Here we present a new method and accompanying software designed for this scenario. Our approach proceeds by selecting, for each genotyped “anchor” SNP, a nearby genotyped “partner” SNP, chosen via a specific algorithm we have developed. These two SNPs are used as predictors in linear or logistic regression analysis to generate a final significance test. In simulations, our method captures much of the signal captured by imputation, while taking a fraction of the time and disc space, and generating a smaller number of false‐positives. We apply our method to a case/control study of severe malaria genotyped using the Affymetrix 500K array. Previous analysis showed that fine‐scale sequencing of a Gambian reference panel in the region of the known causal locus, followed by imputation, increased the signal of association to genome‐wide significance levels. Our method also increases the signal of association from to . Our method thus, in some cases, eliminates the need for more complex methods such as sequencing and imputation, and provides a useful additional test that may be used to identify genetic regions of interest.  相似文献   

11.
When evaluating a newly developed statistical test, an important step is to check its type 1 error (T1E) control using simulations. This is often achieved by the standard simulation design S0 under the so-called “theoretical” null of no association. In practice, the whole-genome association analyses scan through a large number of genetic markers (s) for the ones associated with an outcome of interest (), where comes from an alternative while the majority of s are not associated with ; the relationships are under the “empirical” null. This reality can be better represented by two other simulation designs, where design S1.1 simulates from analternative model based on , then evaluates its association with independently generated ; while design S1.2 evaluates the association between permutated and . More than a decade ago, Efron (2004) has noted the important distinction between the “theoretical” and “empirical” null in false discovery rate control. Using scale tests for variance heterogeneity, direct univariate, and multivariate interaction tests as examples, here we show that not all null simulation designs are equal. In examining the accuracy of a likelihood ratio test, while simulation design S0 suggested the method being accurate, designs S1.1 and S1.2 revealed its increased empirical T1E rate if applied in real data setting. The inflation becomes more severe at the tail and does not diminish as sample size increases. This is an important observation that calls for new practices for methods evaluation and T1E control interpretation.  相似文献   

12.
We develop linear mixed models (LMMs) and functional linear mixed models (FLMMs) for gene-based tests of association between a quantitative trait and genetic variants on pedigrees. The effects of a major gene are modeled as a fixed effect, the contributions of polygenes are modeled as a random effect, and the correlations of pedigree members are modeled via inbreeding/kinship coefficients. -statistics and χ 2 likelihood ratio test (LRT) statistics based on the LMMs and FLMMs are constructed to test for association. We show empirically that the -distributed statistics provide a good control of the type I error rate. The -test statistics of the LMMs have similar or higher power than the FLMMs, kernel-based famSKAT (family-based sequence kernel association test), and burden test famBT (family-based burden test). The -statistics of the FLMMs perform well when analyzing a combination of rare and common variants. For small samples, the LRT statistics of the FLMMs control the type I error rate well at the nominal levels and . For moderate/large samples, the LRT statistics of the FLMMs control the type I error rates well. The LRT statistics of the LMMs can lead to inflated type I error rates. The proposed models are useful in whole genome and whole exome association studies of complex traits.  相似文献   

13.
Case‐control association studies often collect from their subjects information on secondary phenotypes. Reusing the data and studying the association between genes and secondary phenotypes provide an attractive and cost‐effective approach that can lead to discovery of new genetic associations. A number of approaches have been proposed, including simple and computationally efficient ad hoc methods that ignore ascertainment or stratify on case‐control status. Justification for these approaches relies on the assumption of no covariates and the correct specification of the primary disease model as a logistic model. Both might not be true in practice, for example, in the presence of population stratification or the primary disease model following a probit model. In this paper, we investigate the validity of ad hoc methods in the presence of covariates and possible disease model misspecification. We show that in taking an ad hoc approach, it may be desirable to include covariates that affect the primary disease in the secondary phenotype model, even though these covariates are not necessarily associated with the secondary phenotype. We also show that when the disease is rare, ad hoc methods can lead to severely biased estimation and inference if the true disease model follows a probit model instead of a logistic model. Our results are justified theoretically and via simulations. Applied to real data analysis of genetic associations with cigarette smoking, ad hoc methods collectively identified as highly significant () single nucleotide polymorphisms from over 10 genes, genes that were identified in previous studies of smoking cessation.  相似文献   

14.
There has been increasing interest in developing more powerful and flexible statistical tests to detect genetic associations with multiple traits, as arising from neuroimaging genetic studies. Most of existing methods treat a single trait or multiple traits as response while treating an SNP as a predictor coded under an additive inheritance mode. In this paper, we follow an earlier approach in treating an SNP as an ordinal response while treating traits as predictors in a proportional odds model (POM). In this way, it is not only easier to handle mixed types of traits, e.g., some quantitative and some binary, but it is also potentially more robust to the commonly adopted additive inheritance mode. More importantly, we develop an adaptive test in a POM so that it can maintain high power across many possible situations. Compared to the existing methods treating multiple traits as responses, e.g., in a generalized estimating equation (GEE) approach, the proposed method can be applied to a high dimensional setting where the number of phenotypes (p) can be larger than the sample size (n), in addition to a usual small P setting. The promising performance of the proposed method was demonstrated with applications to the Alzheimer's Disease Neuroimaging Initiative (ADNI) data, in which either structural MRI driven phenotypes or resting‐state functional MRI (rs‐fMRI) derived brain functional connectivity measures were used as phenotypes. The applications led to the identification of several top SNPs of biological interest. Furthermore, simulation studies showed competitive performance of the new method, especially for .  相似文献   

15.
Mediation hypothesis testing for a large number of mediators is challenging due to the composite structure of the null hypothesis, H 0 : α β = 0 ${H}_{0}:\alpha \beta =0$ ( α $\alpha $ : effect of the exposure on the mediator after adjusting for confounders; β $\beta $ : effect of the mediator on the outcome after adjusting for exposure and confounders). In this paper, we reviewed three classes of methods for large-scale one at a time mediation hypothesis testing. These methods are commonly used for continuous outcomes and continuous mediators assuming there is no exposure-mediator interaction so that the product α β $\alpha \beta $ has a causal interpretation as the indirect effect. The first class of methods ignores the impact of different structures under the composite null hypothesis, namely, (1) α = 0 , β 0 $\alpha =0,\beta \ne 0$ ; (2) α 0 , β = 0 $\alpha \ne 0,\beta =0$ ; and (3) α = β = 0 $\alpha =\beta =0$ . The second class of methods weights the reference distribution under each case of the null to form a mixture reference distribution. The third class constructs a composite test statistic using the three p values obtained under each case of the null so that the reference distribution of the composite statistic is approximately U ( 0 , 1 ) $U(0,1)$ . In addition to these existing methods, we developed the Sobel-comp method belonging to the second class, which uses a corrected mixture reference distribution for Sobel's test statistic. We performed extensive simulation studies to compare all six methods belonging to these three classes in terms of the false positive rates (FPRs) under the null hypothesis and the true positive rates under the alternative hypothesis. We found that the second class of methods which uses a mixture reference distribution could best maintain the FPRs at the nominal level under the null hypothesis and had the greatest true positive rates under the alternative hypothesis. We applied all methods to study the mediation mechanism of DNA methylation sites in the pathway from adult socioeconomic status to glycated hemoglobin level using data from the Multi-Ethnic Study of Atherosclerosis (MESA). We provide guidelines for choosing the optimal mediation hypothesis testing method in practice and develop an R package medScan available on the CRAN for implementing all the six methods.  相似文献   

16.
Recent studies have examined the genetic correlations of single-nucleotide polymorphism (SNP) effect sizes across pairs of populations to better understand the genetic architectures of complex traits. These studies have estimated , the cross-population correlation of joint-fit effect sizes at genotyped SNPs. However, the value of depends both on the cross-population correlation of true causal effect sizes () and on the similarity in linkage disequilibrium (LD) patterns in the two populations, which drive tagging effects. Here, we derive the value of the ratio as a function of LD in each population. By applying existing methods to obtain estimates of , we can use this ratio to estimate . Our estimates of were equal to 0.55 ( SE = 0.14) between Europeans and East Asians averaged across nine traits in the Genetic Epidemiology Research on Adult Health and Aging data set, 0.54 ( SE = 0.18) between Europeans and South Asians averaged across 13 traits in the UK Biobank data set, and 0.48 ( SE = 0.06) and 0.65 ( SE = 0.09) between Europeans and East Asians in summary statistic data sets for type 2 diabetes and rheumatoid arthritis, respectively. These results implicate substantially different causal genetic architectures across continental populations.  相似文献   

17.
Genome‐wide association studies, which typically report regression coefficients summarizing the associations of many genetic variants with various traits, are potentially a powerful source of data for Mendelian randomization investigations. We demonstrate how such coefficients from multiple variants can be combined in a Mendelian randomization analysis to estimate the causal effect of a risk factor on an outcome. The bias and efficiency of estimates based on summarized data are compared to those based on individual‐level data in simulation studies. We investigate the impact of gene–gene interactions, linkage disequilibrium, and ‘weak instruments’ on these estimates. Both an inverse‐variance weighted average of variant‐specific associations and a likelihood‐based approach for summarized data give similar estimates and precision to the two‐stage least squares method for individual‐level data, even when there are gene–gene interactions. However, these summarized data methods overstate precision when variants are in linkage disequilibrium. If the P‐value in a linear regression of the risk factor for each variant is less than , then weak instrument bias will be small. We use these methods to estimate the causal association of low‐density lipoprotein cholesterol (LDL‐C) on coronary artery disease using published data on five genetic variants. A 30% reduction in LDL‐C is estimated to reduce coronary artery disease risk by 67% (95% CI: 54% to 76%). We conclude that Mendelian randomization investigations using summarized data from uncorrelated variants are similarly efficient to those using individual‐level data, although the necessary assumptions cannot be so fully assessed.  相似文献   

18.
We propose a novel variant set test for rare-variant association studies, which leverages multiple single-nucleotide variant (SNV) annotations. Our approach optimizes a convex combination of different sequence kernel association test (SKAT) statistics, where each statistic is constructed from a different annotation and combination weights are optimized through a multiple kernel learning algorithm. The combination test statistic is evaluated empirically through data splitting. In simulations, we find our method preserves type I error at and has greater power than SKAT(-O) when SNV weights are not misspecified and sample sizes are large (). We utilize our method in the Framingham Heart Study (FHS) to identify SNV sets associated with fasting glucose. While we are unable to detect any genome-wide significant associations between fasting glucose and 4-kb windows of rare variants () in 6,419 FHS participants, our method identifies suggestive associations between fasting glucose and rare variants near ROCK2 () and within CPLX1 (). These two genes were previously reported to be involved in obesity-mediated insulin resistance and glucose-induced insulin secretion by pancreatic beta-cells, respectively. These findings will need to be replicated in other cohorts and validated by functional genomic studies.  相似文献   

19.
Advances in DNA sequencing technology facilitate investigating the impact of rare variants on complex diseases. However, using a conventional case‐control design, large samples are needed to capture enough rare variants to achieve sufficient power for testing the association between suspected loci and complex diseases. In such large samples, population stratification may easily cause spurious signals. One approach to overcome stratification is to use a family‐based design. For rare variants, this strategy is especially appropriate, as power can be increased considerably by analyzing cases with affected relatives. We propose a novel framework for association testing in affected sibpairs by comparing the allele count of rare variants on chromosome regions shared identical by descent to the allele count of rare variants on nonshared chromosome regions, referred to as test for rare variant association with family‐based internal control (TRAFIC). This design is generally robust to population stratification as cases and controls are matched within each sibpair. We evaluate the power analytically using general model for effect size of rare variants. For the same number of genotyped people, TRAFIC shows superior power over the conventional case‐control study for variants with summed risk allele frequency ; this power advantage is even more substantial when considering allelic heterogeneity. For complex models of gene‐gene interaction, this power advantage depends on the direction of interaction and overall heritability. In sum, we introduce a new method for analyzing rare variants in affected sibpairs that is robust to population stratification, and provide freely available software.  相似文献   

20.
Polygenic risk scores (PRSs) are weighted sums of risk allele counts of single-nucleotide polymorphisms (SNPs) associated with a disease or trait. PRSs are typically constructed based on published results from Genome-Wide Association Studies (GWASs), and the majority of which has been performed in large populations of European ancestry (EA) individuals. Although many genotype-trait associations have generalized across populations, the optimal choice of SNPs and weights for PRSs may differ between populations due to different linkage disequilibrium (LD) and allele frequency patterns. We compare various approaches for PRS construction, using GWAS results from both large EA studies and a smaller study in Hispanics/Latinos: The Hispanic Community Health Study/Study of Latinos (HCHS/SOL, ). We consider multiple approaches for selecting SNPs and for computing SNP weights. We study the performance of the resulting PRSs in an independent study of Hispanics/Latinos from the Women’s Health Initiative (WHI, ). We support our investigation with simulation studies of potential genetic architectures in a single locus. We observed that selecting variants based on EA GWASs generally performs well, except for blood pressure trait. However, the use of EA GWASs for weight estimation was suboptimal. Using non-EA GWAS results to estimate weights improved results.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号