首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In the last decade, numerous genome‐wide linkage and association studies of complex diseases have been completed. The critical question remains of how to best use this potentially valuable information to improve study design and statistical analysis in current and future genetic association studies. With genetic effect size for complex diseases being relatively small, the use of all available information is essential to untangle the genetic architecture of complex diseases. One promising approach to incorporating prior knowledge from linkage scans, or other information, is to up‐ or down‐weight P‐values resulting from genetic association study in either a frequentist or Bayesian manner. As an alternative to these methods, we propose a fully Bayesian mixture model to incorporate previous knowledge into on‐going association analysis. In this approach, both the data and previous information collectively inform the association analysis, in contrast to modifying the association results (P‐values) to conform to the prior knowledge. By using a Bayesian framework, one has flexibility in modeling, and is able to comprehensively assess the impact of model specification on posterior inferences. We illustrate the use of this method through a genome‐wide linkage study of colorectal cancer, and a genome‐wide association study of colorectal polyps. Genet. Epidemiol. 34:418–426, 2010. © 2010 Wiley‐Liss, Inc.  相似文献   

2.
We are interested in developing integrative approaches for variable selection problems that incorporate external knowledge on a set of predictors of interest. In particular, we have developed an integrative Bayesian model uncertainty (iBMU) method, which formally incorporates multiple sources of data via a second‐stage probit model on the probability that any predictor is associated with the outcome of interest. Using simulations, we demonstrate that iBMU leads to an increase in power to detect true marginal associations over more commonly used variable selection techniques, such as least absolute shrinkage and selection operator and elastic net. In addition, iBMU leads to a more efficient model search algorithm over the basic BMU method even when the predictor‐level covariates are only modestly informative. The increase in power and efficiency of our method becomes more substantial as the predictor‐level covariates become more informative. Finally, we demonstrate the power and flexibility of iBMU for integrating both gene structure and functional biomarker information into a candidate gene study investigating over 50 genes in the brain reward system and their role with smoking cessation from the Pharmacogenetics of Nicotine Addiction and Treatment Consortium. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

3.
We present a novel statistical method for linkage disequilibrium (LD) mapping of disease susceptibility loci in case-control studies. Such studies exploit the statistical correlation or LD that exist between variants physically close along the genome to identify those that correlate with disease status and might thus be close to a causative mutation, generally assumed unobserved. LD structure, however, varies markedly over short distances because of variation in local recombination rates, mutation and genetic drift among other factors. We propose a Bayesian multivariate probit model that flexibly accounts for the local spatial correlation between markers. In a case-control setting, we use a retrospective model that properly reflects the sampling scheme and identify regions where single- or multi-locus marker frequencies differ across cases and controls. We formally quantify these differences using information-theoretic distance measures while the fully Bayesian approach naturally accommodates unphased or missing genotype data. We demonstrate our approach on simulated data and on real data from the CYP2D6 region that has a confirmed role in drug metabolism.  相似文献   

4.
Studies of gene‐trait associations for complex diseases often involve multiple traits that may vary by genotype groups or patterns. Such traits are usually manifestations of lower‐dimensional latent factors or disease syndromes. We illustrate the use of a variance components factor (VCF) model to model the association between multiple traits and genotype groups as well as any other existing patient‐level covariates. This model characterizes the correlations between traits as underlying latent factors that can be used in clinical decision‐making. We apply it within the Bayesian framework and provide a straightforward implementation using the WinBUGS software. The VCF model is illustrated with simulated data and an example that comprises changes in plasma lipid measurements of patients who were treated with statins to lower low‐density lipoprotein cholesterol, and polymorphisms from the apolipoprotein‐E gene. The simulation shows that this model clearly characterizes existing multiple trait manifestations across genotype groups where individuals' group assignments are fully observed or can be deduced from the observed data. It also allows one to investigate covariate by genotype group interactions that may explain the variability in the traits. The flexibility to characterize such multiple trait manifestations makes the VCF model more desirable than the univariate variance components model, which is applied to each trait separately. The Bayesian framework offers a flexible approach that allows one to incorporate prior information. Genet. Epidemiol. 34: 529–536, 2010. © 2010 Wiley‐Liss, Inc.  相似文献   

5.
We are interested in investigating the involvement of multiple rare variants within a given region by conducting analyses of individual regions with two goals: (1) to determine if regional rare variation in aggregate is associated with risk; and (2) conditional upon the region being associated, to identify specific genetic variants within the region that are driving the association. In particular, we seek a formal integrated analysis that achieves both of our goals. For rare variants with low minor allele frequencies, there is very little power to statistically test the null hypothesis of equal allele or genotype counts for each variant. Thus, genetic association studies are often limited to detecting association within a subset of the common genetic markers. However, it is very likely that associations exist for the rare variants that may not be captured by the set of common markers. Our framework aims at constructing a risk index based on multiple rare variants within a region. Our analytical strategy is novel in that we use a Bayesian approach to incorporate model uncertainty in the selection of variants to include in the index as well as the direction of the associated effects. Additionally, the approach allows for inference at both the group and variant-specific levels. Using a set of simulations, we show that our methodology has added power over other popular rare variant methods to detect global associations. In addition, we apply the approach to sequence data from the WECARE Study of second primary breast cancers.  相似文献   

6.
The growing number of multinational clinical trials in which patient-level health care resource data are collected have raised the issue of which is the best approach for making inference for individual countries with respect to the between-treatment difference in mean cost. We describe and discuss the relative merits of three approaches. The first uses the random effects pooled estimate from all countries to estimate the difference for any particular country. The second approach estimates the difference using only the data from the specific country in question. Using empirical Bayes estimation a third approach estimates the country-specific difference using a variance-weighted linear sum of the estimates provided by the other two approaches. The approaches are illustrated and compared using the data from the ASSENT-3 trial.  相似文献   

7.
Prioritization is the process whereby a set of possible candidate genes or SNPs is ranked so that the most promising can be taken forward into further studies. In a genome‐wide association study, prioritization is usually based on the P‐values alone, but researchers sometimes take account of external annotation information about the SNPs such as whether the SNP lies close to a good candidate gene. Using external information in this way is inherently subjective and is often not formalized, making the analysis difficult to reproduce. Building on previous work that has identified 14 important types of external information, we present an approximate Bayesian analysis that produces an estimate of the probability of association. The calculation combines four sources of information: the genome‐wide data, SNP information derived from bioinformatics databases, empirical SNP weights, and the researchers’ subjective prior opinions. The calculation is fast enough that it can be applied to millions of SNPS and although it does rely on subjective judgments, those judgments are made explicit so that the final SNP selection can be reproduced. We show that the resulting probability of association is intuitively more appealing than the P‐value because it is easier to interpret and it makes allowance for the power of the study. We illustrate the use of the probability of association for SNP prioritization by applying it to a meta‐analysis of kidney function genome‐wide association studies and demonstrate that SNP selection performs better using the probability of association compared with P‐values alone.  相似文献   

8.
Unraveling the nature of genetic interactions is crucial to obtaining a more complete picture of complex diseases. It is thought that gene-gene interactions play an important role in the etiology of cancer, cardiovascular, and immune-mediated disease. Interactions among genes are defined as phenotypic effects that differ from those observed for independent contributions of each gene, usually detected by univariate logistic regression methods. Using a multivariate extension of linkage disequilibrium (LD), we have developed a new method, based on distances between sample covariance matrices for groups of single nucleotide polymorphisms (SNPs), to test for interaction effects of two groups of genes associated with a disease phenotype. Since a disease-associated interacting locus will often be in LD with more than one marker in the region, a method that examines a set of markers in a region collectively can offer greater power than traditional methods. Our method effectively identifies interaction effects in simulated data, as well as in data on the genetic contributions to the risk for graft-versus-host disease following hematopoietic stem cell transplantation.  相似文献   

9.
With its potential to discover a much greater amount of genetic variation, next‐generation sequencing is fast becoming an emergent tool for genetic association studies. However, the cost of sequencing all individuals in a large‐scale population study is still high in comparison to most alternative genotyping options. While the ability to identify individual‐level data is lost (without bar‐coding), sequencing pooled samples can substantially lower costs without compromising the power to detect significant associations. We propose a hierarchical Bayesian model that estimates the association of each variant using pools of cases and controls, accounting for the variation in read depth across pools and sequencing error. To investigate the performance of our method across a range of number of pools, number of individuals within each pool, and average coverage, we undertook extensive simulations varying effect sizes, minor allele frequencies, and sequencing error rates. In general, the number of pools and pool size have dramatic effects on power while the total depth of coverage per pool has only a moderate impact. This information can guide the selection of a study design that maximizes power subject to cost, sample size, or other laboratory constraints. We provide an R package (hiPOD: hierarchical Pooled Optimal Design) to find the optimal design, allowing the user to specify a cost function, cost, and sample size limitations, and distributions of effect size, minor allele frequency, and sequencing error rate.  相似文献   

10.
Genome wide association studies (GWAS) have revealed many fascinating insights into complex diseases even from simple, single-marker statistical tests. Most of these tests are designed for testing of associations between a phenotype and an autosomal genotype and are therefore not applicable to X chromosome data. Testing for association on the X chromosome raises unique challenges that have motivated the development of X-specific statistical tests in the literature. However, to date there has been no study of these methods under a wide range of realistic study designs, allele frequencies and disease models to assess the size and power of each test. To address this, we have performed an extensive simulation study to investigate the effects of the sex ratios in the case and control cohorts, as well as the allele frequencies, on the size and power of eight test statistics under three different disease models that each account for X-inactivation. We show that existing, but under-used, methods that make use of both male and female data are uniformly more powerful than popular methods that make use of only female data. In particular, we show that Clayton's one degree of freedom statistic [Clayton, 2008] is robust and powerful across a wide range of realistic simulation parameters. Our results provide guidance on selecting the most appropriate test statistic to analyse X chromosome data from GWAS and show that much power can be gained by a more careful analysis of X chromosome GWAS data.  相似文献   

11.
The joint modeling of longitudinal and survival data has recently received much attention. Several extensions of the standard joint model that consists of one longitudinal and one survival outcome have been proposed including the use of different association structures between the longitudinal and the survival outcomes. However, in general, relatively little attention has been given to the selection of the most appropriate functional form to link the two outcomes. In common practice, it is assumed that the underlying value of the longitudinal outcome is associated with the survival outcome. However, it could be that different characteristics of the patients' longitudinal profiles influence the hazard. For example, not only the current value but also the slope or the area under the curve of the longitudinal outcome. The choice of which functional form to use is an important decision that needs to be investigated because it could influence the results. In this paper, we use a Bayesian shrinkage approach in order to determine the most appropriate functional forms. We propose a joint model that includes different association structures of different biomarkers and assume informative priors for the regression coefficients that correspond to the terms of the longitudinal process. Specifically, we assume Bayesian lasso, Bayesian ridge, Bayesian elastic net, and horseshoe. These methods are applied to a dataset consisting of patients with a chronic liver disease, where it is important to investigate which characteristics of the biomarkers have an influence on survival. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

12.
By analyzing more next‐generation sequencing data, researchers have affirmed that rare genetic variants are widespread among populations and likely play an important role in complex phenotypes. Recently, a handful of statistical models have been developed to analyze rare variant (RV) association in different study designs. However, due to the scarce occurrence of minor alleles in data, appropriate statistical methods for detecting RV interaction effects are still difficult to develop. We propose a hierarchical Bayesian latent variable collapsing method (BLVCM), which circumvents the obstacles by parameterizing the signals of RVs with latent variables in a Bayesian framework and is parameterized for twin data. The BLVCM can tackle nonassociated variants, allow both protective and deleterious effects, capture SNP‐SNP synergistic effect, provide estimates for the gene level and individual SNP contributions, and can be applied to both independent and various twin designs. We assessed the statistical properties of the BLVCM using simulated data, and found that it achieved better performance in terms of power for interaction effect detection compared to the Granvil and the SKAT. As proof of practical application, the BLVCM was then applied to a twin study analysis of more than 20,000 gene regions to identify significant RVs associated with low‐density lipoprotein cholesterol level. The results show that some of the findings are consistent with previous studies, and we identified some novel gene regions with significant SNP–SNP synergistic effects.  相似文献   

13.
Current genome-wide association studies (GWAS) often involve populations that have experienced recent genetic admixture. Genotype data generated from these studies can be used to test for association directly, as in a non-admixed population. As an alternative, these data can be used to infer chromosomal ancestry, and thus allow for admixture mapping. We quantify the contribution of allele-based and ancestry-based association testing under a family-design, and demonstrate that the two tests can provide non-redundant information. We propose a joint testing procedure, which efficiently integrates the two sources information. The efficiencies of the allele, ancestry and combined tests are compared in the context of a GWAS. We discuss the impact of population history and provide guidelines for future design and analysis of GWAS in admixed populations.  相似文献   

14.
In this paper we propose a new method to analyze time‐to‐event data in longitudinal genetic studies. This method address the fundamental problem of incorporating uncertainty when analyzing survival data and imputed single‐nucleotide polymorphisms (SNPs) from genome‐wide association studies (GWAS). Our method incorporates uncertainty in the likelihood function, the opposite of existing methods that incorporate the uncertainty in the design matrix. Through simulation studies and real data analyses, we show that our proposed method is unbiased and provides powerful results. We also show how combining results from different GWAS (meta‐analysis) may lead to wrong results when effects are not estimated using our approach. The model is implemented in an R package that is designed to analyze uncertainty not only arising from imputed SNPs, but also from copy number variants.  相似文献   

15.
An omnibus permutation test of the overall null hypothesis can be used to assess the association of an entire ensemble of genetic markers with disease in case-control studies. In this approach, p-values for univariate marker-specific Armitage trend tests are combined to form a scalar statistic, which is then used in a permutation test to determine an overall p-value. Two previously described competing methods utilize either a standard two-sample Hotelling's T2 statistic or a global U statistic that is a weighted sum of univariate U statistics. In contrast to Hotelling's test, omnibus tests are much less sensitive to missing data, and utilize all available data. In contrast to the global U test, omnibus tests do not require that the direction of the effects of the individual markers on the risk of disease be correctly specified in advance; in fact, any combination of one- and two-sided univariate tests can be used. Simulations show that, even under circumstances favoring the competing tests (no missing data; direction of effects known), omnibus permutation tests based on Fisher's combining function or the Anderson-Darling statistic typically have power comparable to or greater than Hotelling's and the global U tests.  相似文献   

16.
This paper considers the design and interpretation of clinical trials comparing treatments for conditions so rare that worldwide recruitment efforts are likely to yield total sample sizes of 50 or fewer, even when patients are recruited over several years. For such studies, the sample size needed to meet a conventional frequentist power requirement is clearly infeasible. Rather, the expectation of any such trial has to be limited to the generation of an improved understanding of treatment options. We propose a Bayesian approach for the conduct of rare‐disease trials comparing an experimental treatment with a control where patient responses are classified as a success or failure. A systematic elicitation from clinicians of their beliefs concerning treatment efficacy is used to establish Bayesian priors for unknown model parameters. The process of determining the prior is described, including the possibility of formally considering results from related trials. As sample sizes are small, it is possible to compute all possible posterior distributions of the two success rates. A number of allocation ratios between the two treatment groups can be considered with a view to maximising the prior probability that the trial concludes recommending the new treatment when in fact it is non‐inferior to control. Consideration of the extent to which opinion can be changed, even by data from the best feasible design, can help to determine whether such a trial is worthwhile. © 2014 The Authors. Statistics in Medicine published by John Wiley & Sons, Ltd.  相似文献   

17.
With the emergence of Biobanks alongside large‐scale genome‐wide association studies (GWAS) we will soon be in the enviable situation of obtaining precise estimates of population allele frequencies for SNPs which make up the panels in standard genotyping arrays, such as those produced from Illumina and Affymetrix. For disease association studies it is well known that for rare diseases with known population minor allele frequencies (pMAFs) a case‐only design is most powerful. That is, for a fixed budget the optimal procedure is to genotype only cases (affecteds). In such tests experimenters look for a divergence from allele distribution in cases from that of the known population pMAF; in order to test the null hypothesis of no association between the disease status and the allele frequency. However, what has not been previously characterized is the utility of controls (known unaffecteds) when available. In this study we consider frequentist and Bayesian statistical methods for testing for SNP genotype association when population MAFs are known and when both cases and controls are available. We demonstrate that for rare diseases the most powerful frequentist design is, somewhat counterintuitively, to actively discard the controls even though they contain information on the association. In contrast we develop a Bayesian test which uses all available information (cases and controls) and appears to exhibit uniformaly greater power than all frequentist methods we considered. Genet. Epidemiol. 33:371–378, 2009. © 2009 Wiley Liss, Inc.  相似文献   

18.
Genotype imputation is a critical technique for following up genome‐wide association studies. Efficient methods are available for dealing with the probabilistic nature of imputed single nucleotide polymorphisms (SNPs) in population‐based designs, but not for family‐based studies. We have developed a new analytical approach (FBATdosage), using imputed allele dosage in the general framework of family‐based association tests to bridge this gap. Simulation studies showed that FBATdosage yielded highly consistent type I error rates, whatever the level of genotype uncertainty, and a much higher power than the best‐guess genotype approach. FBATdosage allows fast linkage and association testing of several million of imputed variants with binary or quantitative phenotypes in nuclear families of arbitrary size with arbitrary missing data for the parents. The application of this approach to a family‐based association study of leprosy susceptibility successfully refined the association signal at two candidate loci, C1orf141‐IL23R on chromosome 1 and RAB32‐C6orf103 on chromosome 6.  相似文献   

19.
Genome‐wide association studies (GWAS) have identified many single nucleotide polymorphisms (SNPs) associated with complex traits. However, the genetic heritability of most of these traits remains unexplained. To help guide future studies, we address the crucial question of whether future GWAS can detect new SNP associations and explain additional heritability given the new availability of larger GWAS SNP arrays, imputation, and reduced genotyping costs. We first describe the pairwise and imputation coverage of all SNPs in the human genome by commercially available GWAS SNP arrays, using the 1000 Genomes Project as a reference. Next, we describe the findings from 6 years of GWAS of 172 chronic diseases, calculating the power to detect each of them while taking array coverage and sample size into account. We then calculate the power to detect these SNP associations under different conditions using improved coverage and/or sample sizes. Finally, we estimate the percentages of SNP associations and heritability previously detected and detectable by future GWAS under each condition. Overall, we estimated that previous GWAS have detected less than one‐fifth of all GWAS‐detectable SNPs underlying chronic disease. Furthermore, increasing sample size has a much larger impact than increasing coverage on the potential of future GWAS to detect additional SNP‐disease associations and heritability.  相似文献   

20.
    
A genome‐wide association study (GWAS) correlates marker and trait variation in a study sample. Each subject is genotyped at a multitude of SNPs (single nucleotide polymorphisms) spanning the genome. Here, we assume that subjects are randomly collected unrelateds and that trait values are normally distributed or can be transformed to normality. Over the past decade, geneticists have been remarkably successful in applying GWAS analysis to hundreds of traits. The massive amount of data produced in these studies present unique computational challenges. Penalized regression with the ?1 penalty (LASSO) or minimax concave penalty (MCP) penalties is capable of selecting a handful of associated SNPs from millions of potential SNPs. Unfortunately, model selection can be corrupted by false positives and false negatives, obscuring the genetic underpinning of a trait. Here, we compare LASSO and MCP penalized regression to iterative hard thresholding (IHT). On GWAS regression data, IHT is better at model selection and comparable in speed to both methods of penalized regression. This conclusion holds for both simulated and real GWAS data. IHT fosters parallelization and scales well in problems with large numbers of causal markers. Our parallel implementation of IHT accommodates SNP genotype compression and exploits multiple CPU cores and graphics processing units (GPUs). This allows statistical geneticists to leverage commodity desktop computers in GWAS analysis and to avoid supercomputing. Availability : Source code is freely available at https://github.com/klkeys/IHT.jl .  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号