首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
A Bayesian toolkit for genetic association studies   总被引:3,自引:0,他引:3  
We present a range of modelling components designed to facilitate Bayesian analysis of genetic-association-study data. A key feature of our approach is the ability to combine different submodels together, almost arbitrarily, for dealing with the complexities of real data. In particular, we propose various techniques for selecting the "best" subset of genetic predictors for a specific phenotype (or set of phenotypes). At the same time, we may control for complex, non-linear relationships between phenotypes and additional (non-genetic) covariates as well as accounting for any residual correlation that exists among multiple phenotypes. Both of these additional modelling components are shown to potentially aid in detecting the underlying genetic signal. We may also account for uncertainty regarding missing genotype data. Indeed, at the heart of our approach is a novel method for reconstructing unobserved haplotypes and/or inferring the values of missing genotypes. This can be deployed independently or, alternatively, it can be fully integrated into arbitrary genotype- or haplotype-based association models such that the missing data and the association model are "estimated" simultaneously. The impact of such simultaneous analysis on inferences drawn from the association model is shown to be potentially significant. Our modelling components are packaged as an "add-on" interface to the widely used WinBUGS software, which allows Markov chain Monte Carlo analysis of a wide range of statistical models. We illustrate their use with a series of increasingly complex analyses conducted on simulated data based on a real pharmacogenetic example.  相似文献   

2.
A data set from an outbreak of gastroenteritis in a school is analysed using a stochastic transmission model. The causative agent of the outbreak is believed to be a Norovirus, spread through person-to-person contact. Particular attention is given to the question of whether or not vomiting episodes enhance the spread of the virus via aerosol transmission. The methodology developed uses Bayesian model choice, implemented with reversible-jump Markov chain Monte Carlo methods. The methodology appears to be highly sensitive to assumptions made concerning the data, which provides some assurance that the conclusions are driven by observations rather than the underlying model and methodology.  相似文献   

3.
With new technologies, multiple types of genomic data are commonly collected on a single set of samples. However, standard analysis methods concentrate on a single data type at a time and ignore the relationships between genes, proteins, and biochemical reactions that give rise to complex phenotypes. In this paper, we propose a novel integrative model to incorporate multiple types of genomic data into an association analysis with a complex phenotype. The method combines path analysis and stochastic search variable selection into a Bayesian hierarchical model that simultaneously identifies both direct and indirect genomic effects on the phenotype. Results from a simulation study and application of the Bayesian model to a pharmacogenomic study of the drug gemcitabine demonstrate greater sensitivity to detect genomic effects in some simulation scenarios, when compared to the standard single data type analysis. Further research is required to extend and modify this integrative modeling framework to increase computational efficiency to best capitalize on the wealth of information available across multiple genomic data types.  相似文献   

4.
Logistic regression is the standard method for assessing predictors of diseases. In logistic regression analyses, a stepwise strategy is often adopted to choose a subset of variables. Inference about the predictors is then made based on the chosen model constructed of only those variables retained in that model. This method subsequently ignores both the variables not selected by the procedure, and the uncertainty due to the variable selection procedure. This limitation may be addressed by adopting a Bayesian model averaging approach, which selects a number of all possible such models, and uses the posterior probabilities of these models to perform all inferences and predictions. This study compares the Bayesian model averaging approach with the stepwise procedures for selection of predictor variables in logistic regression using simulated data sets and the Framingham Heart Study data. The results show that in most cases Bayesian model averaging selects the correct model and out-performs stepwise approaches at predicting an event of interest.  相似文献   

5.
The recent successes of genome-wide association studies (GWAS) have revealed that many of the replicated findings have explained only a small fraction of the heritability of common diseases. One hypothesis that investigators have suggested is that higher order interactions between SNPs or SNPs and environmental risk factors may account for some of this missing heritability. Searching for these interactions poses great statistical and computational challenges. In this article, we propose a novel method that addresses these challenges by incorporating external biological knowledge into a fully Bayesian analysis. The method is designed to be scalable for high-dimensional search spaces (where it supports interactions of any order) because priors that use such knowledge focus the search in regions that are more biologically plausible and avoid having to enumerate all possible interactions. We provide several examples based on simulated data demonstrating how external information can enhance power, specificity, and effect estimates in comparison to conventional approaches based on maximum likelihood estimates. We also apply the method to data from a GWAS for breast cancer, revealing a set of interactions enriched for the Gene Ontology terms growth, metabolic process, and biological regulation.  相似文献   

6.
We consider situations, which are common in medical statistics, where we have a number of sets of response data, from different individuals, say, potentially under different conditions. A parametric model is defined for each set of data, giving rise to a set of random effects. Our goal here is to efficiently explore a range of possible ‘population’ models for the random effects, to select the most appropriate model. The range of possible models is potentially vast, because the random effects may depend on observed covariates, and there may be multiple credible ways of partitioning their variability. Here, we consider pharmacokinetic (PK) data on insulin aspart, a fast acting insulin analogue used in the treatment of diabetes. PK models are typically nonlinear (in their parameters), often complex and sometimes only available as a set of differential equations, with no closed‐form solution. Fitting such a model for just a single individual can be a challenging task. Fitting a joint model for all individuals can be even harder, even without the complication of an overarching model selection objective. We describe a two‐stage approach that decouples the population model for the random effects from the PK model applied to the response data but nevertheless fits the full, joint, hierarchical model, accounting fully for uncertainty. This allows us to repeatedly reuse results from a single analysis of the response data to explore various population models for the random effects. This greatly expedites not only model exploration but also cross‐validation for the purposes of model criticism. © 2015 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd.  相似文献   

7.
    
The clustering of proteins is of interest in cancer cell biology. This article proposes a hierarchical Bayesian model for protein (variable) clustering hinging on correlation structure. Starting from a multivariate normal likelihood, we enforce the clustering through prior modeling using angle-based unconstrained reparameterization of correlations and assume a truncated Poisson distribution (to penalize a large number of clusters) as prior on the number of clusters. The posterior distributions of the parameters are not in explicit form and we use a reversible jump Markov chain Monte Carlo based technique is used to simulate the parameters from the posteriors. The end products of the proposed method are estimated cluster configuration of the proteins (variables) along with the number of clusters. The Bayesian method is flexible enough to cluster the proteins as well as estimate the number of clusters. The performance of the proposed method has been substantiated with extensive simulation studies and one protein expression data with a hereditary disposition in breast cancer where the proteins are coming from different pathways.  相似文献   

8.
Inference about the treatment effect in a crossover design has received much attention over time owing to the uncertainty in the existence of the carryover effect and its impact on the estimation of the treatment effect. Adding to this uncertainty is that the existence of the carryover effect and its size may depend on the presence of the treatment effect and its size. We consider estimation and testing hypothesis about the treatment effect in a two‐period crossover design, assuming normally distributed response variable, and use an objective Bayesian approach to test the hypothesis about the treatment effect and to estimate its size when it exists while accounting for the uncertainty about the presence of the carryover effect as well as the treatment and period effects. We evaluate and compare the performance of the proposed approach with a standard frequentist approach using simulated data, and real data. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

9.
In this paper a novel method for the monitoring of disease maps over time in a surveillance setting is described. The approach relies upon the use of a spatial model that is fitted to current spatial data and is smoothed with historical spatial estimates. The method of smoothing is a vector exponentially weighted moving average procedure. A simulation study with a range of scenarios is presented and finally a case study of monitoring infectious disease spread is presented.  相似文献   

10.
In many diseases, Markov transition models are useful in describing transitions between discrete disease states. Often the probability of transitioning from one state to another varies widely across subjects. This heterogeneity is driven, in part, by a possibly unknown number of previous disease states and by potentially complex relationships between clinical data and these states. We propose use of Bayesian variable selection in Markov transition models to allow estimation of subject‐specific transition probabilities. Our approach simultaneously estimates the order of the Markov process and the transition‐specific covariate effects. The methods are assessed using simulation studies and applied to model disease‐state transition on the expanded disability status scale (EDSS) in multiple sclerosis (MS) patients from the Partners MS Center in Boston, MA. The proposed methodology is shown to accurately identify complex covariate–transition relationships in simulations and identifies a clinically significant interaction between relapse history and EDSS history in MS patients. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

11.
Most disease association mapping algorithms are based on hypothesis testing procedures that test one variant at a time. Those methods lose power when the disease mutations are jointly tagged by multiple variants, or when gene-gene interaction exist. Nearby variants are also correlated, for which procedures ignoring the dependence between variants will inevitably produce redundant results. With a large number of variants genotyped in current genome-wide disease association studies, simultaneous multivariant association mapping algorithms are strongly desired. We present a novel Bayesian method for automatic detection of multivariant joint association in genome-wide case-control studies. Our method has improved power and specificity over existing tools. We fit a joint probabilistic model to the entire data and identify disease variants simultaneously. The method dynamically accounts for the strong linkage disequilibrium (LD) between variants. As a result, only the primary disease variants will be identified, with all secondary associations due to LD effects filtered out. Our method better pinpoints the disease variants with improved resolution. The method is also computationally efficient for genome-wide studies. When applied to a real data set of inflammatory bowel disease (IBD) containing 401,473 variants in 4,720 individuals, our method detected all previously reported IBD loci in the same data, and recovered two missed loci. We further detected two novel interchromosome interactions. The first is between STAT3 and PARD6G, and the second is between DLG5 and an intergenic region at 5p14. We further validated the two interactions in an independent study.  相似文献   

12.
In this paper we propose a Bayesian modeling approach to the analysis of genome-wide association studies based on single nucleotide polymorphism (SNP) data. Our latent seed model combines various aspects of k-means clustering, hidden Markov models (HMMs) and logistic regression into a fully Bayesian model. It is fitted using the Markov chain Monte Carlo stochastic simulation method, with Metropolis-Hastings update steps. The approach is flexible, both in allowing different types of genetic models, and because it can be easily extended while remaining computationally feasible due to the use of fast algorithms for HMMs. It allows for inference primarily on the location of the causal locus and also on other parameters of interest. The latent seed model is used here to analyze three data sets, using both synthetic and real disease phenotypes with real SNP data, and shows promising results. Our method is able to correctly identify the causal locus in examples where single SNP analysis is both successful and unsuccessful at identifying the causal SNP.  相似文献   

13.
We are interested in developing integrative approaches for variable selection problems that incorporate external knowledge on a set of predictors of interest. In particular, we have developed an integrative Bayesian model uncertainty (iBMU) method, which formally incorporates multiple sources of data via a second‐stage probit model on the probability that any predictor is associated with the outcome of interest. Using simulations, we demonstrate that iBMU leads to an increase in power to detect true marginal associations over more commonly used variable selection techniques, such as least absolute shrinkage and selection operator and elastic net. In addition, iBMU leads to a more efficient model search algorithm over the basic BMU method even when the predictor‐level covariates are only modestly informative. The increase in power and efficiency of our method becomes more substantial as the predictor‐level covariates become more informative. Finally, we demonstrate the power and flexibility of iBMU for integrating both gene structure and functional biomarker information into a candidate gene study investigating over 50 genes in the brain reward system and their role with smoking cessation from the Pharmacogenetics of Nicotine Addiction and Treatment Consortium. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

14.
BACKGROUND: One problem of interpreting population-based biomonitoring data is the reconstruction of corresponding external exposure in cases where no such data are available. OBJECTIVES: We demonstrate the use of a computational framework that integrates physiologically based pharmacokinetic (PBPK) modeling, Bayesian inference, and Markov chain Monte Carlo simulation to obtain a population estimate of environmental chloroform source concentrations consistent with human biomonitoring data. The biomonitoring data consist of chloroform blood concentrations measured as part of the Third National Health and Nutrition Examination Survey (NHANES III), and for which no corresponding exposure data were collected. METHODS: We used a combined PBPK and shower exposure model to consider several routes and sources of exposure: ingestion of tap water, inhalation of ambient household air, and inhalation and dermal absorption while showering. We determined posterior distributions for chloroform concentration in tap water and ambient household air using U.S. Environmental Protection Agency Total Exposure Assessment Methodology (TEAM) data as prior distributions for the Bayesian analysis. RESULTS: Posterior distributions for exposure indicate that 95% of the population represented by the NHANES III data had likely chloroform exposures < or = 67 microg/L [corrected] in tap water and < or = 0.02 microg/L in ambient household air. CONCLUSIONS: Our results demonstrate the application of computer simulation to aid in the interpretation of human biomonitoring data in the context of the exposure-health evaluation-risk assessment continuum. These results should be considered as a demonstration of the method and can be improved with the addition of more detailed data.  相似文献   

15.
We consider the problem of assessing new and existing technologies for their cost-effectiveness in the case where data on both costs and effects are available from a clinical trial, and we address it by means of the cost-effectiveness acceptability curve. The main difficulty in these analyses is that cost data usually exhibit highly skew and heavy-tailed distributions so that it can be extremely difficult to produce realistic probabilistic models for the underlying population distribution, and in particular to model accurately the tail of the distribution, which is highly influential in estimating the population mean. Here, in order to integrate the uncertainty about the model into the analysis of cost data and into cost-effectiveness analyses, we consider an approach based on Bayesian model averaging: instead of choosing a single parametric model, we specify a set of plausible models for costs and estimate the mean cost with a weighted mean of its posterior expectations under each model, with weights given by the posterior model probabilities. The results are compared with those obtained with a semi-parametric approach that does not require any assumption about the distribution of costs.  相似文献   

16.
We propose a method to analyze haplotype effects using ideas derived from Bayesian spatial statistics. We assume that two haplotypes that are similar to one another in structure are likely to have similar risks, and define a distance metric to specify the appropriate level of closeness between the two haplotypes. Through the choice of distance metric, varying levels of population genetics theory can be incorporated into the modeling process, including some that allow estimation of the location of the disease causing mutation(s). This location can be estimated, along with the other parameters of the model, using Markov chain Monte Carlo (MCMC) estimation methods. We demonstrate the effectiveness of the model on two real datasets, a well-known dataset used to fine-map the gene for cystic fibrosis, and one used to localize the gene for Friedreich's ataxia.  相似文献   

17.
As biomarkers transformable by specific drug agents increasingly become available, so their usefulness also increases for monitoring compliance in clinical and prevention trials, and for subsequent monitoring in the general population if a treatment is found successful. Marker levels measured over the course of a treatment yield a longitudinal trajectory that is typically non-linear, with varying velocities during the phase-in and steady-state periods of treatment, followed by decays back to normal in the presence of non-compliance. There is often considerable between-individual variability both in the mean parameters of the trajectory and the variability over time. An example is the biomarker mean corpuscular volume (MCV), which increases by 20 per cent from the drug zidovudine (AZT), and has been used to monitor compliance to AZT. Using MCV data from a previous AIDS clinical trial as an example, we describe a non-linear hierarchical growth model suitable for biomarkers that exhibit sigmoidal and/or asymptotic growth behaviour and show how such models can be supplemented with a change-point to identify potential times of non-compliance. We perform a fully Bayesian analysis to obtain a variety of posterior summaries for the behaviour of the longitudinal trajectory and the times of non-compliance, and describe how to obtain predictions of non-compliance for new individuals.  相似文献   

18.
This study compares a typical heuristic algorithm with classical and Bayesian regression models in ascertaining the presence of acute bronchopulmonary disease events in lung transplant recipients. These models attempt to predict whether an epoch will end in an event, based on the preceding two weeks of data. The data consist of 150 two-week epochs of daily to biweekly spirometry and symptom covariates for 30 subjects over 60 subject-years. Seventy-five 'event' epochs end on a day when an acute bronchopulmonary disease event is documented in the medical record; 75 randomly selected 'non-event' epochs end on a day when no event is documented. The data are partitioned by randomly assigning 15 subjects for training and the remaining 15 subjects for testing. For cross-validation, a second random partition is generated from the same data set. The statistical models are trained and tested on both partitions. For the heuristic algorithm, its historical event classifications on the same test cases are used. Classification performance on both partitions of all models is compared using receiver operating characteristic curves, sensitivity and specificity, and a Shannon information score. Data partition did not appreciably affect statistical model performance. All statistical models, unlike the heuristic algorithm, performed significantly different than chance (family significance < 0.05, Pearson independence chi-square, Bonferroni multiple correction), and better than the heuristic algorithm. The best models were Bayesian changepoint models. Through a clinically oriented discussion, a case classified by all of these algorithms is presented, suggesting the clinical usefulness of the Bayesian approach compared with the classical and heuristic approaches.  相似文献   

19.
    
We present a reversible jump Bayesian piecewise log-linear hazard model that extends the Bayesian piecewise exponential hazard to a continuous function of piecewise linear log hazards. A simulation study encompassing several different hazard shapes, accrual rates, censoring proportion, and sample sizes showed that the Bayesian piecewise linear log-hazard model estimated the true mean survival time and survival distributions better than the piecewsie exponential hazard. Survival data from Wake Forest Baptist Medical Center is analyzed by both methods and the posterior results are compared.  相似文献   

20.
The Bayesian dynamic survival model (BDSM), a time‐varying coefficient survival model from the Bayesian prospective, was proposed in early 1990s but has not been widely used or discussed. In this paper, we describe the model structure of the BDSM and introduce two estimation approaches for BDSMs: the Markov Chain Monte Carlo (MCMC) approach and the linear Bayesian (LB) method. The MCMC approach estimates model parameters through sampling and is computationally intensive. With the newly developed geoadditive survival models and software BayesX, the BDSM is available for general applications. The LB approach is easier in terms of computations but it requires the prespecification of some unknown smoothing parameters. In a simulation study, we use the LB approach to show the effects of smoothing parameters on the performance of the BDSM and propose an ad hoc method for identifying appropriate values for those parameters. We also demonstrate the performance of the MCMC approach compared with the LB approach and a penalized partial likelihood method available in software R packages. A gastric cancer trial is utilized to illustrate the application of the BDSM. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号