首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
2.
3.
4.
5.
Genome‐wide association studies (GWAS) for complex diseases have focused primarily on single‐trait analyses for disease status and disease‐related quantitative traits. For example, GWAS on risk factors for coronary artery disease analyze genetic associations of plasma lipids such as total cholesterol, LDL‐cholesterol, HDL‐cholesterol, and triglycerides (TGs) separately. However, traits are often correlated and a joint analysis may yield increased statistical power for association over multiple univariate analyses. Recently several multivariate methods have been proposed that require individual‐level data. Here, we develop metaUSAT (where USAT is unified score‐based association test), a novel unified association test of a single genetic variant with multiple traits that uses only summary statistics from existing GWAS. Although the existing methods either perform well when most correlated traits are affected by the genetic variant in the same direction or are powerful when only a few of the correlated traits are associated, metaUSAT is designed to be robust to the association structure of correlated traits. metaUSAT does not require individual‐level data and can test genetic associations of categorical and/or continuous traits. One can also use metaUSAT to analyze a single trait over multiple studies, appropriately accounting for overlapping samples, if any. metaUSAT provides an approximate asymptotic P‐value for association and is computationally efficient for implementation at a genome‐wide level. Simulation experiments show that metaUSAT maintains proper type‐I error at low error levels. It has similar and sometimes greater power to detect association across a wide array of scenarios compared to existing methods, which are usually powerful for some specific association scenarios only. When applied to plasma lipids summary data from the METSIM and the T2D‐GENES studies, metaUSAT detected genome‐wide significant loci beyond the ones identified by univariate analyses. Evidence from larger studies suggest that the variants additionally detected by our test are, indeed, associated with lipid levels in humans. In summary, metaUSAT can provide novel insights into the genetic architecture of a common disease or traits.  相似文献   

6.
Several genome‐wide association studies (GWAS) have been published on various complex diseases. Although, new loci are found to be associated with these diseases, still only very little of the genetic risk for these diseases can be explained. As GWAS are still underpowered to find small main effects, and gene‐gene interactions are likely to play a role, the data might currently not be analyzed to its full potential. In this study, we evaluated alternative methods to study GWAS data. Instead of focusing on the single nucleotide polymorphisms (SNPs) with the highest statistical significance, we took advantage of prior biological information and tried to detect overrepresented pathways in the GWAS data. We evaluated whether pathway classification analysis can help prioritize the biological pathways most likely to be involved in the disease etiology. In this study, we present the various benefits and limitations of pathway‐classification tools in analyzing GWAS data. We show multiple differences in outcome between pathway tools analyzing the same dataset. Furthermore, analyzing randomly selected SNPs always results in significantly overrepresented pathways, large pathways have a higher chance of becoming statistically significant and the bioinformatics tools used in this study are biased toward detecting well‐defined pathways. As an example, we analyzed data from two GWAS on type 2 diabetes (T2D): the Diabetes Genetics Initiative (DGI) and the Wellcome Trust Case Control Consortium (WTCCC). Occasionally the results from the DGI and the WTCCC GWAS showed concordance in overrepresented pathways, but discordance in the corresponding genes. Thus, incorporating gene networks and pathway classification tools into the analysis can point toward significantly overrepresented molecular pathways, which cannot be picked up using traditional single‐locus analyses. However, the limitations discussed in this study, need to be addressed before these methods can be widely used. Genet. Epidemiol. 33:419–431, 2009. © 2009 Wiley‐Liss, Inc.  相似文献   

7.
For analyzing complex trait association with sequencing data, most current studies test aggregated effects of variants in a gene or genomic region. Although gene‐based tests have insufficient power even for moderately sized samples, pathway‐based analyses combine information across multiple genes in biological pathways and may offer additional insight. However, most existing pathway association methods are originally designed for genome‐wide association studies, and are not comprehensively evaluated for sequencing data. Moreover, region‐based rare variant association methods, although potentially applicable to pathway‐based analysis by extending their region definition to gene sets, have never been rigorously tested. In the context of exome‐based studies, we use simulated and real datasets to evaluate pathway‐based association tests. Our simulation strategy adopts a genome‐wide genetic model that distributes total genetic effects hierarchically into pathways, genes, and individual variants, allowing the evaluation of pathway‐based methods with realistic quantifiable assumptions on the underlying genetic architectures. The results show that, although no single pathway‐based association method offers superior performance in all simulated scenarios, a modification of Gene Set Enrichment Analysis approach using statistics from single‐marker tests without gene‐level collapsing (weighted Kolmogrov‐Smirnov [WKS]‐Variant method) is consistently powerful. Interestingly, directly applying rare variant association tests (e.g., sequence kernel association test) to pathway analysis offers a similar power, but its results are sensitive to assumptions of genetic architecture. We applied pathway association analysis to an exome‐sequencing data of the chronic obstructive pulmonary disease, and found that the WKS‐Variant method confirms associated genes previously published.  相似文献   

8.
Although genome‐wide association studies (GWAS) have now discovered thousands of genetic variants associated with common traits, such variants cannot explain the large degree of “missing heritability,” likely due to rare variants. The advent of next generation sequencing technology has allowed rare variant detection and association with common traits, often by investigating specific genomic regions for rare variant effects on a trait. Although multiple correlated phenotypes are often concurrently observed in GWAS, most studies analyze only single phenotypes, which may lessen statistical power. To increase power, multivariate analyses, which consider correlations between multiple phenotypes, can be used. However, few existing multivariant analyses can identify rare variants for assessing multiple phenotypes. Here, we propose Multivariate Association Analysis using Score Statistics (MAAUSS), to identify rare variants associated with multiple phenotypes, based on the widely used sequence kernel association test (SKAT) for a single phenotype. We applied MAAUSS to whole exome sequencing (WES) data from a Korean population of 1,058 subjects to discover genes associated with multiple traits of liver function. We then assessed validation of those genes by a replication study, using an independent dataset of 3,445 individuals. Notably, we detected the gene ZNF620 among five significant genes. We then performed a simulation study to compare MAAUSS's performance with existing methods. Overall, MAAUSS successfully conserved type 1 error rates and in many cases had a higher power than the existing methods. This study illustrates a feasible and straightforward approach for identifying rare variants correlated with multiple phenotypes, with likely relevance to missing heritability.  相似文献   

9.
Many complex diseases are influenced by genetic variations in multiple genes, each with only a small marginal effect on disease susceptibility. Pathway analysis, which identifies biological pathways associated with disease outcome, has become increasingly popular for genome‐wide association studies (GWAS). In addition to combining weak signals from a number of SNPs in the same pathway, results from pathway analysis also shed light on the biological processes underlying disease. We propose a new pathway‐based analysis method for GWAS, the supervised principal component analysis (SPCA) model. In the proposed SPCA model, a selected subset of SNPs most associated with disease outcome is used to estimate the latent variable for a pathway. The estimated latent variable for each pathway is an optimal linear combination of a selected subset of SNPs; therefore, the proposed SPCA model provides the ability to borrow strength across the SNPs in a pathway. In addition to identifying pathways associated with disease outcome, SPCA also carries out additional within‐category selection to identify the most important SNPs within each gene set. The proposed model operates in a well‐established statistical framework and can handle design information such as covariate adjustment and matching information in GWAS. We compare the proposed method with currently available methods using data with realistic linkage disequilibrium structures, and we illustrate the SPCA method using the Wellcome Trust Case‐Control Consortium Crohn Disease (CD) data set. Genet. Epidemiol. 34: 716‐724, 2010. © 2010 Wiley‐Liss, Inc.  相似文献   

10.
Many variants with low frequencies or with low to modest effects likely remain unidentified in genome-wide association studies (GWAS) because of stringent genome-wide thresholds for detection. To improve the power of detection, variant prioritization based on their functional annotations and epigenetic landmarks has been used successfully. Here, we propose a novel method of prioritization of a GWAS by exploiting gene-level knowledge (e.g., annotations to pathways and ontologies) and show that it further improves power. Often, disease associated variants are found near genes that are coinvolved in specific biological pathways relevant to disease process. Utilization of this knowledge to conduct a prioritized scan increases the power to detect loci that map to genes clustered in a few specific pathways. We have developed a computationally scalable framework based on penalized logistic regression (termed GKnowMTestGenomic Knowledge-guided Multiplte Testing) to enable a prioritized pathway-guided GWAS scan with a very large number of gene-level annotations. We demonstrate that the proposed strategy improves overall power and maintains the Type 1 error globally. Our method works on genome-wide summary level data and a user-specified list of pathways (e.g., those extracted from large pathway databases without reference to biology of a specific disease). It automatically reweights the input p values by incorporating the pathway enrichments as “adaptively learned” from the data using a cross-validation technique to avoid overfitting. We used whole-genome simulations and some publicly available GWAS data sets to illustrate the application of our method. The GKnowMTest framework has been implemented as a user-friendly open-source R package.  相似文献   

11.
Genome‐wide association studies (GWAS) have been a standard practice in identifying single nucleotide polymorphisms (SNPs) for disease susceptibility. We propose a new approach, termed integrative GWAS (iGWAS) that exploits the information of gene expressions to investigate the mechanisms of the association of SNPs with a disease phenotype, and to incorporate the family‐based design for genetic association studies. Specifically, the relations among SNPs, gene expression, and disease are modeled within the mediation analysis framework, which allows us to disentangle the genetic effect on a disease phenotype into two parts: an effect mediated through a gene expression (mediation effect, ME) and an effect through other biological mechanisms or environment‐mediated mechanisms (alternative effect, AE). We develop omnibus tests for the ME and AE that are robust to underlying true disease models. Numerical studies show that the iGWAS approach is able to facilitate discovering genetic association mechanisms, and outperforms the SNP‐only method for testing genetic associations. We conduct a family‐based iGWAS of childhood asthma that integrates genetic and genomic data. The iGWAS approach identifies six novel susceptibility genes (MANEA, MRPL53, LYCAT, ST8SIA4, NDFIP1, and PTCH1) using the omnibus test with false discovery rate less than 1%, whereas no gene using SNP‐only analyses survives with the same cut‐off. The iGWAS analyses further characterize that genetic effects of these genes are mostly mediated through their gene expressions. In summary, the iGWAS approach provides a new analytic framework to investigate the mechanism of genetic etiology, and identifies novel susceptibility genes of childhood asthma that were biologically meaningful.  相似文献   

12.
Genome‐wide association studies (GWAS) for nonsyndromic cleft lip with or without cleft palate (CL/P) have identified multiple genes as important in the etiology of this common birth defect. We performed a candidate gene/pathway analysis explicitly considering gene‐gene (G × G) interaction to further explore the etiology of CL/P. Animal models have shown the WNT signaling pathway plays an important role in mid‐facial development, and various genes in this pathway have been associated with nonsyndromic CL/P in previous studies. We propose a combined approach to search for possible G × G interactions using machine learning and regression‐based methods to test for interactions between genes in the WNT family, and between these genes and other genes identified by GWAS in case‐parent trios. Using this combined approach of regression‐based and machine learning methods in CL/P case‐parent trios, we found robust evidence of G × G interaction between markers in WNT5B and MAFB (empiric P‐values = 0.0076 among Asian trios and P‐values = 0.018 among European trios). Additional evidence for epistatic interaction between markers in WNT5A, IRF6, and C1orf107 was seen among Asian trios, and markers in the 8q24 region and WNT5B among European trios.  相似文献   

13.
As the cost of genome‐wide genotyping decreases, the number of genome‐wide association studies (GWAS) has increased considerably. However, the transition from GWAS findings to the underlying biology of various phenotypes remains challenging. As a result, due to its system‐level interpretability, pathway analysis has become a popular tool for gaining insights on the underlying biology from high‐throughput genetic association data. In pathway analyses, gene sets representing particular biological processes are tested for significant associations with a given phenotype. Most existing pathway analysis approaches rely on single‐marker statistics and assume that pathways are independent of each other. As biological systems are driven by complex biomolecular interactions, embracing the complex relationships between single‐nucleotide polymorphisms (SNPs) and pathways needs to be addressed. To incorporate the complexity of gene‐gene interactions and pathway‐pathway relationships, we propose a system‐level pathway analysis approach, synthetic feature random forest (SF‐RF), which is designed to detect pathway‐phenotype associations without making assumptions about the relationships among SNPs or pathways. In our approach, the genotypes of SNPs in a particular pathway are aggregated into a synthetic feature representing that pathway via Random Forest (RF). Multiple synthetic features are analyzed using RF simultaneously and the significance of a synthetic feature indicates the significance of the corresponding pathway. We further complement SF‐RF with pathway‐based Statistical Epistasis Network (SEN) analysis that evaluates interactions among pathways. By investigating the pathway SEN, we hope to gain additional insights into the genetic mechanisms contributing to the pathway‐phenotype association. We apply SF‐RF to a population‐based genetic study of bladder cancer and further investigate the mechanisms that help explain the pathway‐phenotype associations using SEN. The bladder cancer associated pathways we found are both consistent with existing biological knowledge and reveal novel and plausible hypotheses for future biological validations.  相似文献   

14.
Genome-wide association studies (GWAS) have successfully identified thousands of genetic variants contributing to disease and other phenotypes. However, significant obstacles hamper our ability to elucidate causal variants, identify genes affected by causal variants, and characterize the mechanisms by which genotypes influence phenotypes. The increasing availability of genome-wide functional annotation data is providing unique opportunities to incorporate prior information into the analysis of GWAS to better understand the impact of variants on disease etiology. Although there have been many advances in incorporating prior information into prioritization of trait-associated variants in GWAS, functional annotation data have played a secondary role in the joint analysis of GWAS and molecular (i.e., expression) quantitative trait loci (eQTL) data in assessing evidence for association. To address this, we develop a novel mediation framework, iFunMed, to integrate GWAS and eQTL data with the utilization of publicly available functional annotation data. iFunMed extends the scope of standard mediation analysis by incorporating information from multiple genetic variants at a time and leveraging variant-level summary statistics. Data-driven computational experiments convey how informative annotations improve single-nucleotide polymorphism (SNP) selection performance while emphasizing robustness of iFunMed to noninformative annotations. Application to Framingham Heart Study data indicates that iFunMed is able to boost detection of SNPs with mediation effects that can be attributed to regulatory mechanisms.  相似文献   

15.
16.
In spite of the tremendous success of genome-wide association studies (GWAS) in identifying genetic variants associated with complex traits and common diseases, many more are yet to be discovered. Hence, it is always desirable to improve the statistical power of GWAS. Paralleling with the intensive efforts of integrating GWAS with functional annotations or other omic data, we propose leveraging other published GWAS summary data to boost statistical power for a new/focus GWAS; the traits of the published GWAS may or may not be genetically correlated with the target trait of the new GWAS. Building on weighted hypothesis testing with a solid theoretical foundation, we develop a novel and effective method to construct single-nucleotide polymorphism (SNP)-specific weights based on 22 published GWAS data sets with various traits, detecting sometimes dramatically increased numbers of significant SNPs and independent loci as compared to the standard/unweighted analysis. For example, by integrating a schizophrenia GWAS summary data set with 19 other GWAS summary data sets of nonschizophrenia traits, our new method identified 1,585 genome-wide significant SNPs mapping to 15 linkage disequilibrium-independent loci, largely exceeding 818 significant SNPs in 13 independent loci identified by the standard/unweighted analysis; furthermore, using a later and larger schizophrenia GWAS summary data set as the validation data, 1,423 (out of 1,585) significant SNPs identified by the weighted analysis, compared to 705 (out of 818) by the unweighted analysis, were confirmed, while all 15 and 13 independent loci were also confirmed. Similar conclusions were reached with lipids and Alzheimer's disease (AD) traits. We conclude that the proposed approach is simple and cost-effective to improve GWAS power.  相似文献   

17.
Tissue factor pathway inhibitor (TFPI) regulates the formation of intravascular blood clots, which manifest clinically as ischemic heart disease, ischemic stroke, and venous thromboembolism (VTE). TFPI plasma levels are heritable, but the genetics underlying TFPI plasma level variability are poorly understood. Herein we report the first genome‐wide association scan (GWAS) of TFPI plasma levels, conducted in 251 individuals from five extended French‐Canadian Families ascertained on VTE. To improve discovery, we also applied a hypothesis‐driven (HD) GWAS approach that prioritized single nucleotide polymorphisms (SNPs) in (1) hemostasis pathway genes, and (2) vascular endothelial cell (EC) regulatory regions, which are among the highest expressers of TFPI . Our GWAS identified 131 SNPs with suggestive evidence of association (P‐value < 5 × 10?8), but no SNPs reached the genome‐wide threshold for statistical significance. Hemostasis pathway genes were not enriched for TFPI plasma level associated SNPs (global hypothesis test P‐value = 0.147), but EC regulatory regions contained more TFPI plasma level associated SNPs than expected by chance (global hypothesis test P‐value = 0.046). We therefore stratified our genome‐wide SNPs, prioritizing those in EC regulatory regions via stratified false discovery rate (sFDR) control, and reranked the SNPs by q‐value. The minimum q‐value was 0.27, and the top‐ranked SNPs did not show association evidence in the MARTHA replication sample of 1,033 unrelated VTE cases. Although this study did not result in new loci for TFPI, our work lays out a strategy to utilize epigenomic data in prioritization schemes for future GWAS studies.  相似文献   

18.
Unraveling the underlying biological mechanisms or pathways behind the effects of genetic variations on complex diseases remains one of the major challenges in the post‐GWAS (where GWAS is genome‐wide association study) era. To further explore the relationship between genetic variations, biomarkers, and diseases for elucidating underlying pathological mechanism, a huge effort has been placed on examining pleiotropic and gene‐environmental interaction effects. We propose a novel genetic stochastic process model (GSPM) that can be applied to GWAS and jointly investigate the genetic effects on longitudinally measured biomarkers and risks of diseases. This model is characterized by more profound biological interpretation and takes into account the dynamics of biomarkers during follow‐up when investigating the hazards of a disease. We illustrate the rationale and evaluate the performance of the proposed model through two GWAS. One is to detect single nucleotide polymorphisms (SNPs) having interaction effects on type 2 diabetes (T2D) with body mass index (BMI) and the other is to detect SNPs affecting the optimal BMI level for protecting from T2D. We identified multiple SNPs that showed interaction effects with BMI on T2D, including a novel SNP rs11757677 in the CDKAL1 gene (P = 5.77 × 10?7). We also found a SNP rs1551133 located on 2q14.2 that reversed the effect of BMI on T2D (P = 6.70 × 10?7). In conclusion, the proposed GSPM provides a promising and useful tool in GWAS of longitudinal data for interrogating pleiotropic and interaction effects to gain more insights into the relationship between genes, quantitative biomarkers, and risks of complex diseases.  相似文献   

19.
Genome-wide association studies (GWAS) have thus far achieved substantial success. In the last decade, a large number of common variants underlying complex diseases have been identified through GWAS. In most existing GWAS, the identified common variants are obtained by single marker-based tests, that is, testing one single-nucleotide polymorphism (SNP) at a time. Generally, the basic functional unit of inheritance is a gene, rather than a SNP. Thus, results from gene-level association test can be more readily integrated with downstream functional and pathogenic investigation. In this paper, we propose a general gene-based p-value adaptive combination approach (GPA) which can integrate association evidence of multiple genetic variants using only GWAS summary statistics (either p-value or other test statistics). The proposed method could be used to test genetic association for both continuous and binary traits through not only one study but also multiple studies, which would be helpful to overcome the limitation of existing methods that can only be applied to a specific type of data. We conducted thorough simulation studies to verify that the proposed method controls type I errors well, and performs favorably compared to single-marker analysis and other existing methods. We demonstrated the utility of our proposed method through analysis of GWAS meta-analysis results for fasting glucose and lipids from the international MAGIC consortium and Global Lipids Consortium, respectively. The proposed method identified some novel trait associated genes which can improve our understanding of the mechanisms involved in -cell function, glucose homeostasis, and lipids traits.  相似文献   

20.
Genome‐wide association studies (GWASs) are unraveling the genetics of adult brain neuroanatomy as measured by cross‐sectional anatomic magnetic resonance imaging (aMRI). However, the genetic mechanisms that shape childhood brain development are, as yet, largely unexplored. In this study we identify common genetic variants associated with childhood brain development as defined by longitudinal aMRI. Genome‐wide single nucleotide polymorphism (SNP) data were determined in two cohorts: one enriched for attention‐deficit/hyperactivity disorder (ADHD) (LONG cohort: 458 participants; 119 with ADHD) and the other from a population‐based cohort (Generation R: 257 participants). The growth of the brain's major regions (cerebral cortex, white matter, basal ganglia, and cerebellum) and one region of interest (the right lateral prefrontal cortex) were defined on all individuals from two aMRIs, and a GWAS and a pathway analysis were performed. In addition, association between polygenic risk for ADHD and brain growth was determined for the LONG cohort. For white matter growth, GWAS meta‐analysis identified a genome‐wide significant intergenic SNP (rs12386571, P = 9.09 × 10?9), near AKR1B10. This gene is part of the aldo‐keto reductase superfamily and shows neural expression. No enrichment of neural pathways was detected and polygenic risk for ADHD was not associated with the brain growth phenotypes in the LONG cohort that was enriched for the diagnosis of ADHD. The study illustrates the use of a novel brain growth phenotype defined in vivo for further study.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号