首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
2.
In genome‐wide association studies (GWAS), “generalization” is the replication of genotype‐phenotype association in a population with different ancestry than the population in which it was first identified. Current practices for declaring generalizations rely on testing associations while controlling the family‐wise error rate (FWER) in the discovery study, then separately controlling error measures in the follow‐up study. This approach does not guarantee control over the FWER or false discovery rate (FDR) of the generalization null hypotheses. It also fails to leverage the two‐stage design to increase power for detecting generalized associations. We provide a formal statistical framework for quantifying the evidence of generalization that accounts for the (in)consistency between the directions of associations in the discovery and follow‐up studies. We develop the directional generalization FWER (FWERg) and FDR (FDRg) controlling r‐values, which are used to declare associations as generalized. This framework extends to generalization testing when applied to a published list of Single Nucleotide Polymorphism‐(SNP)‐trait associations. Our methods control FWERg or FDRg under various SNP selection rules based on P‐values in the discovery study. We find that it is often beneficial to use a more lenient P‐value threshold than the genome‐wide significance threshold. In a GWAS of total cholesterol in the Hispanic Community Health Study/Study of Latinos (HCHS/SOL), when testing all SNPs with P‐values (15 genomic regions) for generalization in a large GWAS of whites, we generalized SNPs from 15 regions. But when testing all SNPs with P‐values (89 regions), we generalized SNPs from 27 regions.  相似文献   

3.
Although genome‐wide association studies (GWAS) have been performed in longitudinal studies, most used only a single trait measure. GWAS of fasting glucose have generally included only normoglycemic individuals. We examined the impact of both repeated measures and sample selection on GWAS in Atherosclerosis Risk In Communities (ARIC), a study which obtained four longitudinal measures of fasting glucose and included both individuals with and without prevalent diabetes. The sample included Caucasians and the Affymetrix 6.0 chip was used for genotyping. Sample sizes for GWAS analyses ranged from 8,372 (first study visit) to 5,782 (average fasting glucose). Candidate SNP analyses with SNPs identified through fasting glucose or diabetes GWAS were conducted in 9,133 individuals, including 761 with prevalent diabetes. For a constant sample size, smaller P‐values were obtained for the average measure of fasting glucose compared to values at any single visit, and two additional significant GWAS signals were detected. For four candidate SNPs (rs780094, rs10830963, rs7903146, and rs4607517), the strength of association between genotype and glucose was significantly (P‐interaction<0.05) different in those with and without prevalent diabetes, and for all five fasting glucose candidate SNPs (rs780094, rs10830963, rs560887, rs4607517, and rs13266634) the association with measured fasting glucose was more significant in the smaller sample without prevalent diabetes than in the larger combined sample of those with and without diabetes. This analysis demonstrates the potential utility of averaging trait values in GWAS studies and explores the advantage of using only individuals without prevalent diabetes in GWAS of fasting glucose. Genet. Epidemiol. 34: 665‐673, 2010. © 2010 Wiley‐Liss, Inc.  相似文献   

4.
Genome‐wide association studies (GWAS) have been successful in identifying common variants related to complex disorders. However, some disorders have proved resistant to this strategy with few associations confirmed, despite evidence from twin and family studies of a genetic component. Sophisticated strategies that account for phenotypic heterogeneity may be required to uncover these genetic contributions. Age at onset is an example of a potential source of this heterogeneity in ischaemic stroke. We explore the contribution of age at onset in the Wellcome Trust Case‐Control Consortium 2 ischaemic stroke study. We first examine four established stroke loci in younger onset cases. We extend this to all single‐nucleotide polymorphisms (SNPs) genome‐wide, testing for stronger association signals in younger subsets of cases. Finally, we estimate the pseudoheritability accounted for by common SNPs present on genome‐wide genotyping arrays for cases stratified by age at onset. We find evidence for stronger associations in younger onset cases for the four established stroke loci. Genome‐wide, in cardioembolic and small vessel stroke subphenotypes, a significant number of SNPs show stronger association P‐values when the oldest cases are removed. Finally, we show that the pseudoheritability estimated by common SNPs in cardioembolic stroke increased from 16.5% for older onset cases to 28.5% for younger onset cases. Our results indicate that age at onset is a valuable measure for case ascertainment and in analysis of GWAS in ischaemic stroke: focussing on younger cases who may have a stronger genetic predisposition increases power to detect associations.  相似文献   

5.
Many complex diseases are influenced by genetic variations in multiple genes, each with only a small marginal effect on disease susceptibility. Pathway analysis, which identifies biological pathways associated with disease outcome, has become increasingly popular for genome‐wide association studies (GWAS). In addition to combining weak signals from a number of SNPs in the same pathway, results from pathway analysis also shed light on the biological processes underlying disease. We propose a new pathway‐based analysis method for GWAS, the supervised principal component analysis (SPCA) model. In the proposed SPCA model, a selected subset of SNPs most associated with disease outcome is used to estimate the latent variable for a pathway. The estimated latent variable for each pathway is an optimal linear combination of a selected subset of SNPs; therefore, the proposed SPCA model provides the ability to borrow strength across the SNPs in a pathway. In addition to identifying pathways associated with disease outcome, SPCA also carries out additional within‐category selection to identify the most important SNPs within each gene set. The proposed model operates in a well‐established statistical framework and can handle design information such as covariate adjustment and matching information in GWAS. We compare the proposed method with currently available methods using data with realistic linkage disequilibrium structures, and we illustrate the SPCA method using the Wellcome Trust Case‐Control Consortium Crohn Disease (CD) data set. Genet. Epidemiol. 34: 716‐724, 2010. © 2010 Wiley‐Liss, Inc.  相似文献   

6.
After genetic regions have been identified in genomewide association studies (GWAS), investigators often follow up with more targeted investigations of specific regions. These investigations typically are based on single nucleotide polymorphisms (SNPs) with dense coverage of a region. Methods are thus needed to test the hypothesis of any association in given genetic regions. Several approaches for combining P‐values obtained from testing individual SNP hypothesis tests are available. We recently proposed a sequential procedure for testing the global null hypothesis of no association in a region. When this global null hypothesis is rejected, this method provides a list of significant hypotheses and has weak control of the family‐wise error rate. In this paper, we devise a permutation‐based version of the test that accounts for correlations of tests based on SNPs in the same genetic region. Based on simulated data, the method has correct control of the type I error rate and higher or comparable power to other tests.  相似文献   

7.
Both the prevalence and incidence of heart failure (HF) are increasing, especially among African Americans, but no large‐scale, genome‐wide association study (GWAS) of HF‐related metabolites has been reported. We sought to identify novel genetic variants that are associated with metabolites previously reported to relate to HF incidence. GWASs of three metabolites identified previously as risk factors for incident HF (pyroglutamine, dihydroxy docosatrienoic acid, and X‐11787, being either hydroxy‐leucine or hydroxy‐isoleucine) were performed in 1,260 African Americans free of HF at the baseline examination of the Atherosclerosis Risk in Communities (ARIC) study. A significant association on chromosome 5q33 (rs10463316, MAF = 0.358, P‐value = 1.92 × 10?10) was identified for pyroglutamine. One region on chromosome 2p13 contained a nonsynonymous substitution in N‐acetyltransferase 8 (NAT8) was associated with X‐11787 (rs13538, MAF = 0.481, P‐value = 1.71 × 10?23). The smallest P‐value for dihydroxy docosatrienoic acid was rs4006531 on chromosome 8q24 (MAF = 0.400, P‐value = 6.98 × 10?7). None of the above SNPs were individually associated with incident HF, but a genetic risk score (GRS) created by summing the most significant risk alleles from each metabolite detected 11% greater risk of HF per allele. In summary, we identified three loci associated with previously reported HF‐related metabolites. Further use of metabolomics technology will facilitate replication of these findings in independent samples.  相似文献   

8.
Ryan Sun  Miao Xu  Xihao Li  Sheila Gaynor  Hufeng Zhou  Zilin Li  Yohan Boss  Stephen Lam  Ming‐Sound Tsao  Adonina Tardon  Chu Chen  Jennifer Doherty  Gary Goodman  Stig E. Bojesen  Maria T. Landi  Mattias Johansson  John K. Field  Heike Bickebller  H‐Erich Wichmann  Angela Risch  Gadi Rennert  Suzanne Arnold  Xifeng Wu  Olle Melander  Hans Brunnstrm  Loic Le Marchand  Geoffrey Liu  Angeline Andrew  Eric Duell  Lambertus A. Kiemeney  Hongbing Shen  Aage Haugen  Mikael Johansson  Kjell Grankvist  Neil Caporaso  Penella Woll  M. Dawn Teare  Ghislaine Scelo  Yun‐Chul Hong  Jian‐Min Yuan  Philip Lazarus  Matthew B. Schabath  Melinda C. Aldrich  Demetrios Albanes  Raymond Mak  David Barbie  Paul Brennan  Rayjean J. Hung  Christopher I. Amos  David C. Christiani  Xihong Lin 《Genetic epidemiology》2021,45(1):99-114
Clinical trial results have recently demonstrated that inhibiting inflammation by targeting the interleukin‐1β pathway can offer a significant reduction in lung cancer incidence and mortality, highlighting a pressing and unmet need to understand the benefits of inflammation‐focused lung cancer therapies at the genetic level. While numerous genome‐wide association studies (GWAS) have explored the genetic etiology of lung cancer, there remains a large gap between the type of information that may be gleaned from an association study and the depth of understanding necessary to explain and drive translational findings. Thus, in this study we jointly model and integrate extensive multiomics data sources, utilizing a total of 40 genome‐wide functional annotations that augment previously published results from the International Lung Cancer Consortium (ILCCO) GWAS, to prioritize and characterize single nucleotide polymorphisms (SNPs) that increase risk of squamous cell lung cancer through the inflammatory and immune responses. Our work bridges the gap between correlative analysis and translational follow‐up research, refining GWAS association measures in an interpretable and systematic manner. In particular, reanalysis of the ILCCO data highlights the impact of highly associated SNPs from nuclear factor‐κB signaling pathway genes as well as major histocompatibility complex mediated variation in immune responses. One consequence of prioritizing likely functional SNPs is the pruning of variants that might be selected for follow‐up work by over an order of magnitude, from potentially tens of thousands to hundreds. The strategies we introduce provide informative and interpretable approaches for incorporating extensive genome‐wide annotation data in analysis of genetic association studies.  相似文献   

9.
Genomewide association studies (GWAS) and candidate‐gene studies have implicated single‐nucleotide polymorphisms (SNPs) in at least 45 different genes as putative glioma risk factors. Attempts to validate these associations have yielded variable results and few genetic risk factors have been consistently replicated. We conducted a case‐control study of Caucasian glioma cases and controls from the University of California San Francisco (810 cases, 512 controls) and the Mayo Clinic (852 cases, 789 controls) in an attempt to replicate previously reported genetic risk factors for glioma. Sixty SNPs selected from the literature (eight from GWAS and 52 from candidate‐gene studies) were successfully genotyped on an Illumina custom genotyping panel. Eight SNPs in/near seven different genes (TERT, EGFR, CCDC26, CDKN2A, PHLDB1, RTEL1, TP53) were significantly associated with glioma risk in the combined dataset (P < 0.05), with all associations in the same direction as in previous reports. Several SNP associations showed considerable differences across histologic subtype. All eight successfully replicated associations were first identified by GWAS, although none of the putative risk SNPs from candidate‐gene studies was associated in the full case‐control sample (all P values > 0.05). Although several confirmed associations are located near genes long known to be involved in gliomagenesis (e.g., EGFR, CDKN2A, TP53), these associations were first discovered by the GWAS approach and are in noncoding regions. These results highlight that the deficiencies of the candidate‐gene approach lay in selecting both appropriate genes and relevant SNPs within these genes.  相似文献   

10.
Genome‐wide association studies (GWAS) have identified many single nucleotide polymorphisms (SNPs) associated with complex traits. However, the genetic heritability of most of these traits remains unexplained. To help guide future studies, we address the crucial question of whether future GWAS can detect new SNP associations and explain additional heritability given the new availability of larger GWAS SNP arrays, imputation, and reduced genotyping costs. We first describe the pairwise and imputation coverage of all SNPs in the human genome by commercially available GWAS SNP arrays, using the 1000 Genomes Project as a reference. Next, we describe the findings from 6 years of GWAS of 172 chronic diseases, calculating the power to detect each of them while taking array coverage and sample size into account. We then calculate the power to detect these SNP associations under different conditions using improved coverage and/or sample sizes. Finally, we estimate the percentages of SNP associations and heritability previously detected and detectable by future GWAS under each condition. Overall, we estimated that previous GWAS have detected less than one‐fifth of all GWAS‐detectable SNPs underlying chronic disease. Furthermore, increasing sample size has a much larger impact than increasing coverage on the potential of future GWAS to detect additional SNP‐disease associations and heritability.  相似文献   

11.
Lung cancer is the leading cause of cancer death worldwide. Although several genetic variants associated with lung cancer have been identified in the past, stringent selection criteria of genome‐wide association studies (GWAS) can lead to missed variants. The objective of this study was to uncover missed variants by using the known association between lung cancer and first‐degree family history of lung cancer to enrich the variant prioritization for lung cancer susceptibility regions. In this two‐stage GWAS study, we first selected a list of variants associated with both lung cancer and family history of lung cancer in four GWAS (3,953 cases, 4,730 controls), then replicated our findings for 30 variants in a meta‐analysis of four additional studies (7,510 cases, 7,476 controls). The top ranked genetic variant rs12415204 in chr10q23.33 encoding FFAR4 in the Discovery set was validated in the Replication set with an overall OR of 1.09 (95% CI = 1.04, 1.14, P = 1.63 × 10?4). When combining the two stages of the study, the strongest association was found in rs1158970 at Ch4p15.2 encoding KCNIP4 with an OR of 0.89 (95% CI = 0.85, 0.94, P = 9.64 × 10?6). We performed a stratified analysis of rs12415204 and rs1158970 across all eight studies by age, gender, smoking status, and histology, and found consistent results across strata. Four of the 30 replicated variants act as expression quantitative trait loci (eQTL) sites in 1,111 nontumor lung tissues and meet the genome‐wide 10% FDR threshold.  相似文献   

12.
The primary circulating form of vitamin D is 25‐hydroxy vitamin D (25(OH)D), a modifiable trait linked with a growing number of chronic diseases. In addition to environmental determinants of 25(OH)D, including dietary sources and skin ultraviolet B (UVB) exposure, twin‐ and family‐based studies suggest that genetics contribute substantially to vitamin D variability with heritability estimates ranging from 43% to 80%. Genome‐wide association studies (GWAS) have identified single nucleotide polymorphisms (SNPs) located in four gene regions associated with 25(OH)D. These SNPs collectively explain only a fraction of the heritability in 25(OH)D estimated by twin‐ and family‐based studies. Using 25(OH)D concentrations and GWAS data on 5,575 subjects drawn from five cohorts, we hypothesized that genome‐wide data, in the form of (1) a polygenic score comprised of hundreds or thousands of SNPs that do not individually reach GWAS significance, or (2) a linear mixed model for genome‐wide complex trait analysis, would explain variance in measured circulating 25(OH)D beyond that explained by known genome‐wide significant 25(OH)D‐associated SNPs. GWAS identified SNPs explained 5.2% of the variation in circulating 25(OH)D in these samples and there was little evidence additional markers significantly improved predictive ability. On average, a polygenic score comprised of GWAS‐identified SNPs explained a larger proportion of variation in circulating 25(OH)D than scores comprised of thousands of SNPs that were on average, nonsignificant. Employing a linear mixed model for genome‐wide complex trait analysis explained little additional variability (range 0–22%). The absence of a significant polygenic effect in this relatively large sample suggests an oligogenetic architecture for 25(OH)D.  相似文献   

13.
Genome‐wide association studies (GWASs) are unraveling the genetics of adult brain neuroanatomy as measured by cross‐sectional anatomic magnetic resonance imaging (aMRI). However, the genetic mechanisms that shape childhood brain development are, as yet, largely unexplored. In this study we identify common genetic variants associated with childhood brain development as defined by longitudinal aMRI. Genome‐wide single nucleotide polymorphism (SNP) data were determined in two cohorts: one enriched for attention‐deficit/hyperactivity disorder (ADHD) (LONG cohort: 458 participants; 119 with ADHD) and the other from a population‐based cohort (Generation R: 257 participants). The growth of the brain's major regions (cerebral cortex, white matter, basal ganglia, and cerebellum) and one region of interest (the right lateral prefrontal cortex) were defined on all individuals from two aMRIs, and a GWAS and a pathway analysis were performed. In addition, association between polygenic risk for ADHD and brain growth was determined for the LONG cohort. For white matter growth, GWAS meta‐analysis identified a genome‐wide significant intergenic SNP (rs12386571, P = 9.09 × 10?9), near AKR1B10. This gene is part of the aldo‐keto reductase superfamily and shows neural expression. No enrichment of neural pathways was detected and polygenic risk for ADHD was not associated with the brain growth phenotypes in the LONG cohort that was enriched for the diagnosis of ADHD. The study illustrates the use of a novel brain growth phenotype defined in vivo for further study.  相似文献   

14.
Several genome‐wide association studies (GWAS) have been published on various complex diseases. Although, new loci are found to be associated with these diseases, still only very little of the genetic risk for these diseases can be explained. As GWAS are still underpowered to find small main effects, and gene‐gene interactions are likely to play a role, the data might currently not be analyzed to its full potential. In this study, we evaluated alternative methods to study GWAS data. Instead of focusing on the single nucleotide polymorphisms (SNPs) with the highest statistical significance, we took advantage of prior biological information and tried to detect overrepresented pathways in the GWAS data. We evaluated whether pathway classification analysis can help prioritize the biological pathways most likely to be involved in the disease etiology. In this study, we present the various benefits and limitations of pathway‐classification tools in analyzing GWAS data. We show multiple differences in outcome between pathway tools analyzing the same dataset. Furthermore, analyzing randomly selected SNPs always results in significantly overrepresented pathways, large pathways have a higher chance of becoming statistically significant and the bioinformatics tools used in this study are biased toward detecting well‐defined pathways. As an example, we analyzed data from two GWAS on type 2 diabetes (T2D): the Diabetes Genetics Initiative (DGI) and the Wellcome Trust Case Control Consortium (WTCCC). Occasionally the results from the DGI and the WTCCC GWAS showed concordance in overrepresented pathways, but discordance in the corresponding genes. Thus, incorporating gene networks and pathway classification tools into the analysis can point toward significantly overrepresented molecular pathways, which cannot be picked up using traditional single‐locus analyses. However, the limitations discussed in this study, need to be addressed before these methods can be widely used. Genet. Epidemiol. 33:419–431, 2009. © 2009 Wiley‐Liss, Inc.  相似文献   

15.
Genome‐wide association studies (GWAS) for complex diseases have focused primarily on single‐trait analyses for disease status and disease‐related quantitative traits. For example, GWAS on risk factors for coronary artery disease analyze genetic associations of plasma lipids such as total cholesterol, LDL‐cholesterol, HDL‐cholesterol, and triglycerides (TGs) separately. However, traits are often correlated and a joint analysis may yield increased statistical power for association over multiple univariate analyses. Recently several multivariate methods have been proposed that require individual‐level data. Here, we develop metaUSAT (where USAT is unified score‐based association test), a novel unified association test of a single genetic variant with multiple traits that uses only summary statistics from existing GWAS. Although the existing methods either perform well when most correlated traits are affected by the genetic variant in the same direction or are powerful when only a few of the correlated traits are associated, metaUSAT is designed to be robust to the association structure of correlated traits. metaUSAT does not require individual‐level data and can test genetic associations of categorical and/or continuous traits. One can also use metaUSAT to analyze a single trait over multiple studies, appropriately accounting for overlapping samples, if any. metaUSAT provides an approximate asymptotic P‐value for association and is computationally efficient for implementation at a genome‐wide level. Simulation experiments show that metaUSAT maintains proper type‐I error at low error levels. It has similar and sometimes greater power to detect association across a wide array of scenarios compared to existing methods, which are usually powerful for some specific association scenarios only. When applied to plasma lipids summary data from the METSIM and the T2D‐GENES studies, metaUSAT detected genome‐wide significant loci beyond the ones identified by univariate analyses. Evidence from larger studies suggest that the variants additionally detected by our test are, indeed, associated with lipid levels in humans. In summary, metaUSAT can provide novel insights into the genetic architecture of a common disease or traits.  相似文献   

16.
Prioritization is the process whereby a set of possible candidate genes or SNPs is ranked so that the most promising can be taken forward into further studies. In a genome‐wide association study, prioritization is usually based on the P‐values alone, but researchers sometimes take account of external annotation information about the SNPs such as whether the SNP lies close to a good candidate gene. Using external information in this way is inherently subjective and is often not formalized, making the analysis difficult to reproduce. Building on previous work that has identified 14 important types of external information, we present an approximate Bayesian analysis that produces an estimate of the probability of association. The calculation combines four sources of information: the genome‐wide data, SNP information derived from bioinformatics databases, empirical SNP weights, and the researchers’ subjective prior opinions. The calculation is fast enough that it can be applied to millions of SNPS and although it does rely on subjective judgments, those judgments are made explicit so that the final SNP selection can be reproduced. We show that the resulting probability of association is intuitively more appealing than the P‐value because it is easier to interpret and it makes allowance for the power of the study. We illustrate the use of the probability of association for SNP prioritization by applying it to a meta‐analysis of kidney function genome‐wide association studies and demonstrate that SNP selection performs better using the probability of association compared with P‐values alone.  相似文献   

17.
Protein C is an endogenous anticoagulant protein with anti‐inflammatory properties. Single‐nucleotide polymorphisms (SNPs) affect the levels of circulating protein C in European Americans. We performed a genome‐wide association (GWA) scan of plasma protein C concentration with approximately 2.5 million SNPs in 2,701 African Americans in the Atherosclerosis Risk in Communities Study. Seventy‐nine SNPs from the 20q11 and 2q14 regions reached the genome‐wide significance threshold of 5 × 10‐8. A missense variant rs867186 in the PROCR gene at 20q11 is known to affect protein C levels in individuals of European descent and showed the strongest signal (P = 9.84 × 10‐65) in African Americans. The minor allele of this SNP was associated with higher protein C levels (β = 0.49 μg/ml; 10% variance explained). In the 2q14 region, the top SNPs were near or within the PROC gene: rs7580658 (β = 0.15 μg/ml; 2% variance explained, P = 1.7 × 10‐12) and rs1799808 (β = 0.15 μg/ml; 2% variance explained, P = 2.03 × 10‐12). These two SNPs were in strong linkage disequilibrium (LD) with another SNP rs1158867 that resides in a biochemically functional site and in weak to strong LD with the top PROC variants previously reported in individuals of European descent. In addition, two variants outside the PROC region were significantly and independently associated with protein C levels: rs4321325 in CYP27C1 and rs13419716 in MYO7B. In summary, this first GWA study for plasma protein C levels in African Americans confirms the associations of SNPs in the PROC and PROCR regions with circulating levels of protein C across ethnic populations and identifies new candidates for protein C regulation.  相似文献   

18.
Unraveling the underlying biological mechanisms or pathways behind the effects of genetic variations on complex diseases remains one of the major challenges in the post‐GWAS (where GWAS is genome‐wide association study) era. To further explore the relationship between genetic variations, biomarkers, and diseases for elucidating underlying pathological mechanism, a huge effort has been placed on examining pleiotropic and gene‐environmental interaction effects. We propose a novel genetic stochastic process model (GSPM) that can be applied to GWAS and jointly investigate the genetic effects on longitudinally measured biomarkers and risks of diseases. This model is characterized by more profound biological interpretation and takes into account the dynamics of biomarkers during follow‐up when investigating the hazards of a disease. We illustrate the rationale and evaluate the performance of the proposed model through two GWAS. One is to detect single nucleotide polymorphisms (SNPs) having interaction effects on type 2 diabetes (T2D) with body mass index (BMI) and the other is to detect SNPs affecting the optimal BMI level for protecting from T2D. We identified multiple SNPs that showed interaction effects with BMI on T2D, including a novel SNP rs11757677 in the CDKAL1 gene (P = 5.77 × 10?7). We also found a SNP rs1551133 located on 2q14.2 that reversed the effect of BMI on T2D (P = 6.70 × 10?7). In conclusion, the proposed GSPM provides a promising and useful tool in GWAS of longitudinal data for interrogating pleiotropic and interaction effects to gain more insights into the relationship between genes, quantitative biomarkers, and risks of complex diseases.  相似文献   

19.
It is increasingly recognized that pathway analyses—a joint test of association between the outcome and a group of single nucleotide polymorphisms (SNPs) within a biological pathway—could potentially complement single‐SNP analysis and provide additional insights for the genetic architecture of complex diseases. Building upon existing P‐value combining methods, we propose a class of highly flexible pathway analysis approaches based on an adaptive rank truncated product statistic that can effectively combine evidence of associations over different SNPs and genes within a pathway. The statistical significance of the pathway‐level test statistics is evaluated using a highly efficient permutation algorithm that remains computationally feasible irrespective of the size of the pathway and complexity of the underlying test statistics for summarizing SNP‐ and gene‐level associations. We demonstrate through simulation studies that a gene‐based analysis that treats the underlying genes, as opposed to the underlying SNPs, as the basic units for hypothesis testing, is a very robust and powerful approach to pathway‐based association testing. We also illustrate the advantage of the proposed methods using a study of the association between the nicotinic receptor pathway and cigarette smoking behaviors. Genet. Epidemiol. 33:700–709, 2009. Published 2009 Wiley‐Liss, Inc.  相似文献   

20.
As the cost of genome‐wide genotyping decreases, the number of genome‐wide association studies (GWAS) has increased considerably. However, the transition from GWAS findings to the underlying biology of various phenotypes remains challenging. As a result, due to its system‐level interpretability, pathway analysis has become a popular tool for gaining insights on the underlying biology from high‐throughput genetic association data. In pathway analyses, gene sets representing particular biological processes are tested for significant associations with a given phenotype. Most existing pathway analysis approaches rely on single‐marker statistics and assume that pathways are independent of each other. As biological systems are driven by complex biomolecular interactions, embracing the complex relationships between single‐nucleotide polymorphisms (SNPs) and pathways needs to be addressed. To incorporate the complexity of gene‐gene interactions and pathway‐pathway relationships, we propose a system‐level pathway analysis approach, synthetic feature random forest (SF‐RF), which is designed to detect pathway‐phenotype associations without making assumptions about the relationships among SNPs or pathways. In our approach, the genotypes of SNPs in a particular pathway are aggregated into a synthetic feature representing that pathway via Random Forest (RF). Multiple synthetic features are analyzed using RF simultaneously and the significance of a synthetic feature indicates the significance of the corresponding pathway. We further complement SF‐RF with pathway‐based Statistical Epistasis Network (SEN) analysis that evaluates interactions among pathways. By investigating the pathway SEN, we hope to gain additional insights into the genetic mechanisms contributing to the pathway‐phenotype association. We apply SF‐RF to a population‐based genetic study of bladder cancer and further investigate the mechanisms that help explain the pathway‐phenotype associations using SEN. The bladder cancer associated pathways we found are both consistent with existing biological knowledge and reveal novel and plausible hypotheses for future biological validations.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号