首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Current technology allows investigators to obtain genotypes at multiple single nucleotide polymorphism (SNPs) within a candidate locus. Many approaches have been developed for using such data in a test of association with disease, ranging from genotype-based to haplotype-based tests. We develop a new approach that involves two basic steps. In the first step, we use principal components (PCs) analysis to compute combinations of SNPs that capture the underlying correlation structure within the locus. The second step uses the PCs directly in a test of disease association. The PC approach captures linkage-disequilibrium information within a candidate region, but does not require the difficult computing implicit in a haplotype analysis. We demonstrate by simulation that the PC approach is typically as or more powerful than both genotype- and haplotype-based approaches. We also analyze association between respiratory symptoms in children and four SNPs in the Glutathione-S-Transferase P1 locus, based on data from the Children's Health Study. We observe stronger evidence of an association using the PC approach (p = 0.044) than using either a genotype-based (p = 0.13) or haplotype-based (p = 0.052) approach.  相似文献   

2.
We describe an association mapping approach that utilizes linkage disequilibrium (LD) maps in LD units (LDU). This method uses composite likelihood to combine information from all single marker tests, and applies a model with a parameter for the location of the causal polymorphism. Previous analyses of the poor drug metabolizer phenotype provided evidence of the substantial utility of LDU maps for disease gene association mapping. Using LDU locations for the 27 single nucleotide polymorphisms (SNPs) flanking the CYP2D6 gene on chromosome 22, the most common functional polymorphism within the gene was located at 15 kb from its true location. Here, we examine the performance of this mapping approach by exploiting the high-density LDU map constructed from the HapMap data. Expressing the locations of the 27 SNPs in LDU from the HapMap LDU map, analysis yielded an estimated location that is only 0.3 kb away from the CYP2D6 gene. This supports the use of the high marker density HapMap-derived LDU map for association mapping even though it is derived from a much smaller number of individuals compared to the CYP2D6 sample. We also examine the performance of 2-SNP haplotypes. Using the same modelling procedures and composite likelihood as for single SNPs, the haplotype data provided much poorer localization compared to single SNP analysis. Haplotypes generate more autocorrelation through multiple inclusions of the same SNPs, which could inflate significance in association studies. The results of the present study demonstrate the great potential of the genome HapMap LDU maps for high-resolution mapping of complex phenotypes.  相似文献   

3.
We consider the analysis of multiple single nucleotide polymorphisms (SNPs) within a gene or region. The simplest analysis of such data is based on a series of single SNP hypothesis tests, followed by correction for multiple testing, but it is intuitively plausible that a joint analysis of the SNPs will have higher power, particularly when the causal locus may not have been observed. However, standard tests, such as a likelihood ratio test based on an unrestricted alternative hypothesis, tend to have large numbers of degrees of freedom and hence low power. This has motivated a number of alternative test statistics. Here we compare several of the competing methods, including the multivariate score test (Hotelling's test) of Chapman et al. ([2003] Hum. Hered. 56:18-31), Fisher's method for combining P-values, the minimum P-value approach, a Fourier-transform-based approach recently suggested by Wang and Elston ([2007] Am. J. Human Genet. 80:353-360) and a Bayesian score statistic proposed for microarray data by Goeman et al. ([2005] J. R. Stat. Soc. B 68:477-493). Some relationships between these methods are pointed out, and simulation results given to show that the minimum P-value and the Goeman et al. ([2005] J. R. Stat. Soc. B 68:477-493) approaches work well over a range of scenarios. The Wang and Elston approach often performs poorly; we explain why, and show how its performance can be substantially improved.  相似文献   

4.
Recently, several authors have proposed the use of linear regression models in cost-effectiveness analysis. In this paper, by modelling costs and outcomes using patient and Health Centre covariates, we seek to identify the part of the cost or outcome difference that is not attributable to the treatment itself, but to the patients' condition or to characteristics of the Centres. Selection of the covariates to be included as predictors of effectiveness and cost is usually assumed by the researcher. This behaviour ignores the uncertainty associated with model selection and leads to underestimation of the uncertainty about quantities of interest. We propose the use of Bayesian model averaging as a mechanism to account for such uncertainty about the model. Data from a clinical trial are used to analyze the effect of incorporating model uncertainty, by comparing two highly active antiretroviral treatments applied to asymptomatic HIV patients. The joint posterior density of incremental effectiveness and cost and cost-effectiveness acceptability curves are proposed as decision-making measures.  相似文献   

5.
In association analyses, it is critical that informative single-nucleotide polymorphisms (SNPs) be selected for study and utilized appropriately. We sequenced 38 kb, including exons of ELAC2, promoter region and conserved upstream intergenic sequences. A comprehensive characterization of linkage disequilibrium (LD) structure and mutation history was performed using our principal components analysis (PCA) method and a phylogenetic analysis. We identified a complex pattern of LD structure consistent with the occurrence of both recombination and mutation events within ELAC2. Four overlapping and noncontiguous LD groups were defined. Eight tagging SNPs (tSNPs) were identified, accounting for over 90% of the genetic variation of the 19 total variants. We tested associations between familial early-onset prostate cancer (PRCA) and each variant independently and in haplotypes. We performed these tests using all 19 variants and the 8 tSNPs; the results using tSNP haplotypes accurately represent the association evidence for the full haplotypes. We observed increased evidence for association when SNPs were analyzed in haplotypes. The phylogenetic analysis indicated three haplotypes, clustered farthest from the root-node, all of which were found more often in cases than controls. These three haplotypes together showed the best evidence of association with familial, early-onset PRCA (P=0.0024; odds ratio=2.23; 95% CI, 1.33-3.74), indicating possible allelic heterogeneity. Our results suggest that 8 tSNPs are required to comprehensively assess associations in ELAC2, and that haplotypes should be considered for analysis, and that a knowledge of mutation history may be helpful in parsing allelic heterogeneity and suggesting combinations of haplotypes to be tested.  相似文献   

6.
Systematic analysis of the genetic background of complex diseases using single nucleotide polymorphisms (SNPs) affords a tremendous amount of genotypings. To reduce the amount of genotypings necessary and hence the overall cost of a case-control study with SNPs, the genotyping is often performed in two stages. In the first, the DNA of all cases and all controls are mixed into two pools and genotyped for each SNP. The frequency of both alleles is determined in both pooled DNA samples. If different frequencies are observed in the pools of cases and controls, genotyping is performed individually in the second stage and analyzed conventionally. However, so far no well-founded algorithm is available to guide the decision on whether to genotype a SNP individually. In this report, an approach is introduced for the decision on individual genotyping based on the results from pooled DNA. The analysis is modeled as a decision process with the specific goal to decide on whether to genotype a specific SNP individually. For a given situation, the resulting decision criteria are aimed to be optimal for those conducting the study. Different loss functions and decision rules are presented. Using Monte-Carlo simulations, we show that for a given situation, the genotyping rates and hence the costs can be reduced remarkably while maintaining acceptable overall error rates.  相似文献   

7.
Human apolipoprotein A-IV (APO A-IV) exhibits a common protein polymorphism detectable by isoelectric focusing (IEF) due to a single base substitution at codon 360 which replaces the frequently occurring glutamine residue (allele 1) with histidine (allele 2). Recently, sequence analysis of the APO A-IV coding region has revealed another common nucleotide substitution at codon 347 which converts the commonly present threonine residue (allele A) into serine (allele T). In order to investigate the extent of genetic variation at codon 347, we screened DNA samples from 192 unrelated individuals using a polymerase chain reaction based assay. The frequencies of the two alleles, A-IV*A and A-IV*T, were 0.81 and 0.19, respectively, with average heterozygosity 0.31. Genetic screening of the corresponding 192 plasma samples by IEF gave frequencies of 0.922 and 0.078 for the A-IV*1 and A-IV*2 alleles, respectively, at codon 360 with average heterozygosity 0.14. Genotype data at the two polymorphic sites were used to assign unequivocal haplotypes to all the 384 chromosomes. Of the expected four haplotypes (A1, T1, A2, and T2) only three were observed and their frequencies were 0.732 for A1, 0.190 for T1 and 0.078 for A2, with average heterozygosity 0.42. Although our data indicate significant linkage disequilibrium between the two sites (chi 21 = 7.65, P < 0.006, standardized disequilibrium constant phi = -0.14) the degree of nonrandom association varied between alleles at the two sites. Based upon allele frequency data and variable linkage disequilibrium between alleles, we propose that the A2 and T1 haplotypes may have evolved from the parental A1 haplotype by two independent mutations.  相似文献   

8.
Penalized regression methods offer an attractive alternative to single marker testing in genetic association analysis. Penalized regression methods shrink down to zero the coefficient of markers that have little apparent effect on the trait of interest, resulting in a parsimonious subset of what we hope are true pertinent predictors. Here we explore the performance of penalization in selecting SNPs as predictors in genetic association studies. The strength of the penalty can be chosen either to select a good predictive model (via methods such as computationally expensive cross validation), through maximum likelihood-based model selection criterion (such as the BIC), or to select a model that controls for type I error, as done here. We have investigated the performance of several penalized logistic regression approaches, simulating data under a variety of disease locus effect size and linkage disequilibrium patterns. We compared several penalties, including the elastic net, ridge, Lasso, MCP and the normal-exponential-γ shrinkage prior implemented in the hyperlasso software, to standard single locus analysis and simple forward stepwise regression. We examined how markers enter the model as penalties and P-value thresholds are varied, and report the sensitivity and specificity of each of the methods. Results show that penalized methods outperform single marker analysis, with the main difference being that penalized methods allow the simultaneous inclusion of a number of markers, and generally do not allow correlated variables to enter the model, producing a sparse model in which most of the identified explanatory markers are accounted for.  相似文献   

9.
Inference about the treatment effect in a crossover design has received much attention over time owing to the uncertainty in the existence of the carryover effect and its impact on the estimation of the treatment effect. Adding to this uncertainty is that the existence of the carryover effect and its size may depend on the presence of the treatment effect and its size. We consider estimation and testing hypothesis about the treatment effect in a two‐period crossover design, assuming normally distributed response variable, and use an objective Bayesian approach to test the hypothesis about the treatment effect and to estimate its size when it exists while accounting for the uncertainty about the presence of the carryover effect as well as the treatment and period effects. We evaluate and compare the performance of the proposed approach with a standard frequentist approach using simulated data, and real data. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

10.
Meta-analysis has become a key component of well-designed genetic association studies due to the boost in statistical power achieved by combining results across multiple samples of individuals and the need to validate observed associations in independent studies. Meta-analyses of genetic association studies based on multiple SNPs and traits are subject to the same multiple testing issues as single-sample studies, but it is often difficult to adjust accurately for the multiple tests. Procedures such as Bonferroni may control the type-I error rate but will generally provide an overly harsh correction if SNPs or traits are correlated. Depending on study design, availability of individual-level data, and computational requirements, permutation testing may not be feasible in a meta-analysis framework. In this article, we present methods for adjusting for multiple correlated tests under several study designs commonly employed in meta-analyses of genetic association tests. Our methods are applicable to both prospective meta-analyses in which several samples of individuals are analyzed with the intent to combine results, and retrospective meta-analyses, in which results from published studies are combined, including situations in which (1) individual-level data are unavailable, and (2) different sets of SNPs are genotyped in different studies due to random missingness or two-stage design. We show through simulation that our methods accurately control the rate of type I error and achieve improved power over multiple testing adjustments that do not account for correlation between SNPs or traits.  相似文献   

11.
A Bayesian model-based method for multilocus association analysis of quantitative and qualitative (binary) traits is presented. The method selects a trait-associated subset of markers among candidates, and is equally applicable for analyzing wide chromosomal segments (genome scans) and small candidate regions. The method can be applied in situations involving missing genotype data. The number of trait loci, their marker positions, and the magnitudes of their gene effects (strengths of association) are all estimated simultaneously. The inference of parameters is based on their posterior distributions, which are obtained through Markov chain Monte Carlo simulations. The strengths of the approach are: 1) flexible use of oligogenic models with unknown number of loci, 2) performing the estimation of association jointly with model selection, and 3) avoidance of the multiple testing problem, which typically complicates the approaches based on association testing. The performance of the method was tested and compared to the multilocus conditional search procedure by analyzing two simulated data sets. We also applied the method to cystic fibrosis haplotype data (two-locus haplotypes), where gene position has already been identified. The method is implemented as a software package, which is freely available for research purposes under the name BAMA.  相似文献   

12.
We are interested in investigating the involvement of multiple rare variants within a given region by conducting analyses of individual regions with two goals: (1) to determine if regional rare variation in aggregate is associated with risk; and (2) conditional upon the region being associated, to identify specific genetic variants within the region that are driving the association. In particular, we seek a formal integrated analysis that achieves both of our goals. For rare variants with low minor allele frequencies, there is very little power to statistically test the null hypothesis of equal allele or genotype counts for each variant. Thus, genetic association studies are often limited to detecting association within a subset of the common genetic markers. However, it is very likely that associations exist for the rare variants that may not be captured by the set of common markers. Our framework aims at constructing a risk index based on multiple rare variants within a region. Our analytical strategy is novel in that we use a Bayesian approach to incorporate model uncertainty in the selection of variants to include in the index as well as the direction of the associated effects. Additionally, the approach allows for inference at both the group and variant-specific levels. Using a set of simulations, we show that our methodology has added power over other popular rare variant methods to detect global associations. In addition, we apply the approach to sequence data from the WECARE Study of second primary breast cancers.  相似文献   

13.
Logistic regression is the standard method for assessing predictors of diseases. In logistic regression analyses, a stepwise strategy is often adopted to choose a subset of variables. Inference about the predictors is then made based on the chosen model constructed of only those variables retained in that model. This method subsequently ignores both the variables not selected by the procedure, and the uncertainty due to the variable selection procedure. This limitation may be addressed by adopting a Bayesian model averaging approach, which selects a number of all possible such models, and uses the posterior probabilities of these models to perform all inferences and predictions. This study compares the Bayesian model averaging approach with the stepwise procedures for selection of predictor variables in logistic regression using simulated data sets and the Framingham Heart Study data. The results show that in most cases Bayesian model averaging selects the correct model and out-performs stepwise approaches at predicting an event of interest.  相似文献   

14.
Mixed effects models have become very popular, especially for the analysis of longitudinal data. One challenge is how to build a good enough mixed effects model. In this paper, we suggest a systematic strategy for addressing this challenge and introduce easily implemented practical advice to build mixed effects models. A general discussion of the scientific strategies motivates the recommended five‐step procedure for model fitting. The need to model both the mean structure (the fixed effects) and the covariance structure (the random effects and residual error) creates the fundamental flexibility and complexity. Some very practical recommendations help to conquer the complexity. Centering, scaling, and full‐rank coding of all the predictor variables radically improve the chances of convergence, computing speed, and numerical accuracy. Applying computational and assumption diagnostics from univariate linear models to mixed model data greatly helps to detect and solve the related computational problems. Applying computational and assumption diagnostics from the univariate linear models to the mixed model data can radically improve the chances of convergence, computing speed, and numerical accuracy. The approach helps to fit more general covariance models, a crucial step in selecting a credible covariance model needed for defensible inference. A detailed demonstration of the recommended strategy is based on data from a published study of a randomized trial of a multicomponent intervention to prevent young adolescents' alcohol use. The discussion highlights a need for additional covariance and inference tools for mixed models. The discussion also highlights the need for improving how scientists and statisticians teach and review the process of finding a good enough mixed model. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

15.
There is considerable biologic plausibility to the hypothesis that genetic variability in pathways involved in insulin signaling and energy homeostasis may modulate dietary risk associated with colorectal cancer. We utilized data from 2 population-based case-control studies of colon (n = 1,574 cases, 1,970 controls) and rectal (n = 791 cases, 999 controls) cancer to evaluate genetic variation in candidate SNPs identified from 9 genes in a candidate pathway: PDK1, RP6KA1, RPS6KA2, RPS6KB1, RPS6KB2, PTEN, FRAP1 (mTOR), TSC1, TSC2, Akt1, PIK3CA, and PRKAG2 with dietary intake of total energy, carbohydrates, fat, and fiber. We employed SNP, haplotype, and multiple-gene analysis to evaluate associations. PDK1 interacted with dietary fat for both colon and rectal cancer and with dietary carbohydrates for colon cancer. Statistically significant interaction with dietary carbohydrates and rectal cancer was detected by haplotype analysis of PDK1. Evaluation of dietary interactions with multiple genes in this candidate pathway showed several interactions with pairs of genes: Akt1 and PDK1, PDK1 and PTEN, PDK1 and TSC1, and PRKAG2 and PTEN. Analyses show that genetic variation influences risk of colorectal cancer associated with diet and illustrate the importance of evaluating dietary interactions beyond the level of single SNPs or haplotypes when a biologically relevant candidate pathway is examined.  相似文献   

16.
We present a statistical model for allele-specific patterns of copy number polymorphisms (CNPs) in commercial single nucleotide polymorphism (SNP) array data. This model is based on the observation that fluorescent signal intensities tend to cluster into clouds of similar allele-specific copy number (ASCN) genotypes at each SNP locus. To capture the tendency of this clustering to be made vague by instrumental errors, our model allows for cluster memberships to overlap each other, according to a Bayesian Gaussian mixture model (GMM). This approach is flexible, allowing for both absolute scale differences and X/Y scale imbalances of fluorescent signal intensities. The resulting model is also robust toward unobserved ASCN genotypes, which can be problematic for ordinary GMMs. We illustrated the utility of the model by applying it to commercial SNP array intensity data obtained from the Illumina HumanHap 610K platform. We retrieved more than 4,000 allele-specific CNPs, though 99% of them showed rather simple allele-specific CNP patterns with only a single aneuploid haplotype among the normal haplotypes. The genotyping accuracy was assessed by two approaches, quantitative PCR and replicated subjects. The results of both of these approaches demonstrated mean genotyping error rates of 1%. We demonstrated a preliminary genome-wide association study of three hematological traits. The result exhibited that it could form the foundation for new, more effective statistical methods for the mapping of both disease genes and quantitative trait loci with genome-wide CNPs. The methods described in this work are implemented in a software package, PlatinumCNV, available on the Internet.  相似文献   

17.
The understanding of complex diseases and insights to improve their medical management may be achieved through the deduction of how specific haplotypes may play a joint effect to change relative risk information. In this paper we describe an ascertainment adjusted likelihood-based method to estimate haplotype relative risks using pooled family data coming from association and/or linkage studies that were used to identify specific haplotypes. Haplotype-based analysis tends to require a large amount of parameters to capture all the information that leads to efficiency problems. An adaptation of the Stochastic Expectation Maximization algorithm is used for haplotypes inference from genotypic data and to reduce the number of nuisance parameters for risk estimation. Using different simulations, we show that this method provides unbiased relative risk estimates even in case of departure from Hardy-Weinberg equilibrium.  相似文献   

18.
We describe a novel method for assessing the strength of disease association with single nucleotide polymorphisms (SNPs) in a candidate gene or small candidate region, and for estimating the corresponding haplotype relative risks of disease, using unphased genotype data directly. We begin by estimating the relative frequencies of haplotypes consistent with observed SNP genotypes. Under the Bayesian partition model, we specify cluster centres from this set of consistent SNP haplotypes. The remaining haplotypes are then assigned to the cluster with the "nearest" centre, where distance is defined in terms of SNP allele matches. Within a logistic regression modelling framework, each haplotype within a cluster is assigned the same disease risk, reducing the number of parameters required. Uncertainty in phase assignment is addressed by considering all possible haplotype configurations consistent with each unphased genotype, weighted in the logistic regression likelihood by their probabilities, calculated according to the estimated relative haplotype frequencies. We develop a Markov chain Monte Carlo algorithm to sample over the space of haplotype clusters and corresponding disease risks, allowing for covariates that might include environmental risk factors or polygenic effects. Application of the algorithm to SNP genotype data in an 890-kb region flanking the CYP2D6 gene illustrates that we can identify clusters of haplotypes with similar risk of poor drug metaboliser (PDM) phenotype, and can distinguish PDM cases carrying different high-risk variants. Further, the results of a detailed simulation study suggest that we can identify positive evidence of association for moderate relative disease risks with a sample of 1,000 cases and 1,000 controls.  相似文献   

19.
由于许多致病性细菌在遗传学上非常相似,使得细菌间的鉴别非常困难.近年来,随着越来越多的细菌全基因组序列分析和测定,DNA串联重复序列逐渐吸引了人们关注的目光.  相似文献   

20.
Variable selection is growing in importance with the advent of high throughput genotyping methods requiring analysis of hundreds to thousands of single nucleotide polymorphisms (SNPs) and the increased interest in using these genetic studies to better understand common, complex diseases. Up to now, the standard approach has been to analyze the genotypes for each SNP individually to look for an association with a disease. Alternatively, combinations of SNPs or haplotypes are analyzed for association. Another added complication in studying complex diseases or phenotypes is that genetic risk for the disease is often due to multiple SNPs in various locations on the chromosome with small individual effects that may have a collectively large effect on the phenotype. Hence, multi-locus SNP models, as opposed to single SNP models, may better capture the true underlying genotypic-phenotypic relationship. Thus, innovative methods for determining which SNPs to include in the model are needed. The goal of this article is to describe several methods currently available for variable and model selection using Bayesian approaches and to illustrate their application for genetic association studies using both real and simulated candidate gene data for a complex disease. In particular, Bayesian model averaging (BMA), stochastic search variable selection (SSVS), and Bayesian variable selection (BVS) using a reversible jump Markov chain Monte Carlo (MCMC) for candidate gene association studies are illustrated using a study of age-related macular degeneration (AMD) and simulated data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号