首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 11 毫秒
1.
Using genome-wide association studies to identify genetic variants contributing to disease has been highly successful with many novel genetic predispositions identified and biological pathways revealed. Several pitfalls for spurious association or non-replication have been highlighted: from population structure, automated genotype scoring for cases and controls, to age-varying association. We describe an important yet unreported source of bias in case-control studies due to variations in chip technology between different commercial array releases. As cases are commonly genotyped with newer arrays and freely available control resources are frequently used for comparison, there exists an important potential for false associations which are robust to standard quality control and replication design.  相似文献   

2.
Errors in genotyping can greatly affect family-based association studies. If a mendelian inconsistency is detected, the family is usually removed from the analysis. This reduces power, and may introduce bias. In addition, a large proportion of genotyping errors remain undetected, and these also reduce power. We present a Bayesian framework for performing association studies with SNP data on samples of trios consisting of parents with an affected offspring, while allowing for the presence of both detectable and undetectable genotyping errors. This framework also allows for the inclusion of missing genotypes. Associations between the SNP and disease were modelled in terms of the genotypic relative risks. The performances of the analysis methods were investigated under a variety of models for disease association and genotype error, looking at both power to detect association and precision of genotypic relative risk estimates. As expected, power to detect association decreased as genotyping error probability increased. Importantly, however, analyses allowing for genotyping error had similar power to standard analyses when applied to data without genotyping error. Furthermore, allowing for genotyping error yielded relative risk estimates that were approximately unbiased, together with 95% credible intervals giving approximately correct coverage. The methods were also applied to a real dataset: a sample of schizophrenia cases and their parents genotyped at SNPs in the dysbindin gene. The analysis methods presented here require no prior information on the genotyping error probabilities, and may be fitted in WinBUGS.  相似文献   

3.
The present study introduces new Haplotype Sharing Transmission/Disequilibrium Tests (HS-TDTs) that allow for random genotyping errors. We evaluate the type I error rate and power of the new proposed tests under a variety of scenarios and perform a power comparison among the proposed tests, the HS-TDT and the single-marker TDT. The results indicate that the HS-TDT shows a significant increase in type I error when applied to data in which either Mendelian inconsistent trios are removed or Mendelian inconsistent markers are treated as missing genotypes, and the magnitude of the type I error increases both with an increase in sample size and with an increase in genotyping error rate. The results also show that a simple strategy, that is, merging each rare haplotype to a most similar common haplotype, can control the type I error inflation for a wide range of genotyping error rates, and after merging rare haplotypes, the power of the test is very similar to that without merging the rare haplotypes. Therefore, we conclude that a simple strategy may make the HS-TDT robust to genotyping errors. Our simulation results also show that this strategy may also be applicable to other haplotype-based TDTs.  相似文献   

4.
Genome‐wide association (GWA) studies have proved to be extremely successful in identifying novel common polymorphisms contributing effects to the genetic component underlying complex traits. Nevertheless, one source of, as yet, undiscovered genetic determinants of complex traits are those mediated through the effects of rare variants. With the increasing availability of large‐scale re‐sequencing data for rare variant discovery, we have developed a novel statistical method for the detection of complex trait associations with these loci, based on searching for accumulations of minor alleles within the same functional unit. We have undertaken simulations to evaluate strategies for the identification of rare variant associations in population‐based genetic studies when data are available from re‐sequencing discovery efforts or from commercially available GWA chips. Our results demonstrate that methods based on accumulations of rare variants discovered through re‐sequencing offer substantially greater power than conventional analysis of GWA data, and thus provide an exciting opportunity for future discovery of genetic determinants of complex traits. Genet. Epidemiol. 34: 188–193, 2010. © 2009 Wiley‐Liss, Inc.  相似文献   

5.
Current genome-wide association studies still heavily rely on a single-marker strategy, in which each single nucleotide polymorphism (SNP) is tested individually for association with a phenotype. Although methods and software packages that consider multimarker models have become available, they have been slow to become widely adopted and their efficacy in real data analysis is often questioned. Based on conducting extensive simulations, here we endeavor to provide more insights into the performance of simple multimarker association tests as compared to single-marker tests. The results reveal the power advantage as well as disadvantage of the two- vs. the single-marker test. Power differentials depend on the correlation structure among tag SNPs, as well as that between tag SNPs and causal variants. A two-marker test has relatively better performance than single-marker tests when the correlation of the two adjacent markers is high. However, using HapMap data, two-marker tests tended to have a greater chance of being less powerful than single-marker tests, due to constraints on the number of actual possible haplotypes in the HapMap data. Yet, the average power difference was small whenever the one-marker test is more powerful, while there were many situations where the two-marker test can be much more powerful. These findings can be useful to guide analyses of future studies.  相似文献   

6.
Han F  Pan W 《Genetic epidemiology》2010,34(7):680-688
To detect genetic association with common and complex diseases, many statistical tests have been proposed for candidate gene or genome-wide association studies with the case-control design. Due to linkage disequilibrium (LD), multi-marker association tests can gain power over single-marker tests with a Bonferroni multiple testing adjustment. Among many existing multi-marker association tests, most target to detect only one of many possible aspects in distributional differences between the genotypes of cases and controls, such as allele frequency differences, while a few new ones aim to target two or three aspects, all of which can be implemented in logistic regression. In contrast to logistic regression, a genomic distance-based regression (GDBR) approach aims to detect some high-order genotypic differences between cases and controls. A recent study has confirmed the high power of GDBR tests. At this moment, the popular logistic regression and the emerging GDBR approaches are completely unrelated; for example, one has to choose between the two. In this article, we reformulate GDBR as logistic regression, opening a venue to constructing other powerful tests while overcoming some limitations of GDBR. For example, asymptotic distributions can replace time-consuming permutations for deriving P-values and covariates, including gene-gene interactions, can be easily incorporated. Importantly, this reformulation facilitates combining GDBR with other existing methods in a unified framework of logistic regression. In particular, we show that Fisher's P-value combining method can boost statistical power by incorporating information from allele frequencies, Hardy-Weinberg disequilibrium, LD patterns, and other higher-order interactions among multi-markers as captured by GDBR.  相似文献   

7.
The standard procedure to assess genetic equilibrium is a χ2 test of goodness‐of‐fit. As is the case with any statistical procedure of that type, the null hypothesis is that the distribution underlying the data is in agreement with the model. Thus, a significant result indicates incompatibility of the observed data with the model, which is clearly at variance with the aim in the majority of applications: to exclude the existence of gross violations of the equilibrium condition. In current practice, we try to avoid this basic logical difficulty by increasing the significance bound to the P‐value (e.g. from 5 to 10%) and inferring compatibility of the data with Hardy Weinberg Equilibrium (HWE) from an insignificant result. Unfortunately, such direct inversion of a statistical testing procedure fails to produce a valid test of the hypothesis of interest, namely, that the data are in sufficiently good agreement with the model under which the P‐value is calculated. We present a logically unflawed solution to the problem of establishing (approximate) compatibility of an observed genotype distribution with HWE. The test is available in one‐ and two‐sided versions. For both versions, we provide tools for exact power calculation. We demonstrate the merits of the new approach through comparison with the traditional χ2 goodness‐of‐fit test in 2×60 genotype distributions from 43 published genetic studies of complex diseases where departure from HWE was noted in either the case or control sample. In addition, we show that the new test is useful for the analysis of genome‐wide association studies. Genet. Epidemiol. 33:569–580, 2009. © 2009 Wiley‐Liss, Inc.  相似文献   

8.
The purpose of this work is the development of linear trend tests that allow for error (LTT ae), specifically incorporating double-sampling information on phenotypes and/or genotypes. We use a likelihood framework. Misclassification errors are estimated via double sampling. Unbiased estimates of penetrances and genotype frequencies are determined through application of the Expectation-Maximization algorithm. We perform simulation studies to evaluate false-positive rates for various genotype classification weights (recessive, dominant, additive). We compare simulated power between the LTT ae and its genotypic test equivalent, the LRT ae, in the presence of phenotype and genotype misclassification, to evaluate power gains of the LTT ae for multi-locus haplotype association with a dominant mode of inheritance. Finally, we apply LTT ae and a method without double-sample information (LTT std) to double-sampled phenotype data for an actual Alzheimer's disease (AD) case-control study with ApoE genotypes. Simulation results suggest that the LTT ae maintains correct false-positive rates in the presence of misclassification. For power simulations, the LTT ae method is at least as powerful as LRT ae method, with a maximum power gain of 0.42 over the LRT ae method for certain parameter settings. For AD data, LTT ae provides more significant evidence for association (permutation p=0.0522) than LTT std (permutation p=0.1684). This is due to observed phenotype misclassification. The LTT ae statistic enables researchers to apply linear trend tests to case-control genetic data, increasing power to detect association in the presence of misclassification. If the disease MOI is known, LTT ae methods are usually more powerful due to the fact that the statistic has fewer degrees of freedom.  相似文献   

9.
In case‐control single nucleotide polymorphism (SNP) data, the allele frequency, Hardy Weinberg Disequilibrium, and linkage disequilibrium (LD) contrast tests are three distinct sources of information about genetic association. While all three tests are typically developed in a retrospective context, we show that prospective logistic regression models may be developed that correspond conceptually to the retrospective tests. This approach provides a flexible framework for conducting a systematic series of association analyses using unphased genotype data and any number of covariates. For a single stage study, two single‐marker tests and four two‐marker tests are discussed. The true association models are derived and they allow us to understand why a model with only a linear term will generally fit well for a SNP in weak LD with a causal SNP, whatever the disease model, but not for a SNP in high LD with a non‐additive disease SNP. We investigate the power of the association tests using real LD parameters from chromosome 11 in the HapMap CEU population data. Among the single‐marker tests, the allelic test has on average the most power in the case of an additive disease, but for dominant, recessive, and heterozygote disadvantage diseases, the genotypic test has the most power. Among the four two‐marker tests, the Allelic‐LD contrast test, which incorporates linear terms for two markers and their interaction term, provides the most reliable power overall for the cases studied. Therefore, our result supports incorporating an interaction term as well as linear terms in multi‐marker tests. Genet. Epidemiol. 34:67–77, 2010. © 2009 Wiley‐Liss, Inc.  相似文献   

10.
In anticipation of the availability of next‐generation sequencing data, there is increasing interest in investigating association between complex traits and rare variants (RVs). In contrast to association studies for common variants (CVs), due to the low frequencies of RVs, common wisdom suggests that existing statistical tests for CVs might not work, motivating the recent development of several new tests for analyzing RVs, most of which are based on the idea of pooling/collapsing RVs. However, there is a lack of evaluations of, and thus guidance on the use of, existing tests. Here we provide a comprehensive comparison of various statistical tests using simulated data. We consider both independent and correlated rare mutations, and representative tests for both CVs and RVs. As expected, if there are no or few non‐causal (i.e. neutral or non‐associated) RVs in a locus of interest while the effects of causal RVs on the trait are all (or mostly) in the same direction (i.e. either protective or deleterious, but not both), then the simple pooled association tests (without selecting RVs and their association directions) and a new test called kernel‐based adaptive clustering (KBAC) perform similarly and are most powerful; KBAC is more robust than simple pooled association tests in the presence of non‐causal RVs; however, as the number of non‐causal CVs increases and/or in the presence of opposite association directions, the winners are two methods originally proposed for CVs and a new test called C‐alpha test proposed for RVs, each of which can be regarded as testing on a variance component in a random‐effects model. Interestingly, several methods based on sequential model selection (i.e. selecting causal RVs and their association directions), including two new methods proposed here, perform robustly and often have statistical power between those of the above two classes. Genet. Epidemiol. 2011. © 2011 Wiley Periodicals, Inc. 35:606‐619, 2011  相似文献   

11.
Case-control association studies using unrelated individuals may offer an effective approach for identifying genetic variants that have small to moderate disease risks. In general, two different strategies may be employed to establish associations between genotypes and phenotypes: (1) collecting individual genotypes or (2) quantifying allele frequencies in DNA pools. These two technologies have their respective advantages. Individual genotyping gathers more information, whereas DNA pooling may be more cost effective. Recent technological advances in DNA pooling have generated great interest in using DNA pooling in association studies. In this article, we investigate the impacts of errors in genotyping or measuring allele frequencies on the identification of genetic associations with these two strategies. We find that, with current technologies, compared to individual genotyping, a larger sample is generally required to achieve the same power using DNA pooling. We further consider the use of DNA pooling as a screening tool to identify candidate regions for follow-up studies. We find that the majority of the positive regions identified from DNA pooling results may represent false positives if measurement errors are not appropriately considered in the design of the study.  相似文献   

12.
Meta-analyses of genome-wide association studies require numerous study partners to conduct pre-defined analyses and thus simple but efficient analyses plans. Potential differences between strata (e.g. men and women) are usually ignored, but often the question arises whether stratified analyses help to unravel the genetics of a phenotype or if they unnecessarily increase the burden of analyses. To decide whether to stratify or not to stratify, we compare general analytical power computations for the overall analysis with those of stratified analyses considering quantitative trait analyses and two strata. We also relate the stratification problem to interaction modeling and exemplify theoretical considerations on obesity and renal function genetics. We demonstrate that the overall analyses have better power compared to stratified analyses as long as the signals are pronounced in both strata with consistent effect direction. Stratified analyses are advantageous in the case of signals with zero (or very small) effect in one stratum and for signals with opposite effect direction in the two strata. Applying the joint test for a main SNP effect and SNP-stratum interaction beats both overall and stratified analyses regarding power, but involves more complex models. In summary, we recommend to employ stratified analyses or the joint test to better understand the potential of strata-specific signals with opposite effect direction. Only after systematic genome-wide searches for opposite effect direction loci have been conducted, we will know if such signals exist and to what extent stratified analyses can depict loci that otherwise are missed.  相似文献   

13.
Neighboring common polymorphisms are often correlated (in linkage disequilibrium (LD)) as a result of shared ancestry. An association between a polymorphism and a disease trait may therefore be the indirect result of a correlated functional variant, and identifying the true causal variant(s) from an initial disease association is a major challenge in genetic association studies. Here, we present a method to estimate the sample size needed to discriminate between a functional variant of a given allele frequency and effect size, and other correlated variants. The sample size required to conduct such fine‐scale mapping is typically 1–4 times larger than required to detect the initial association. Association studies in populations with different LD patterns can substantially improve the power to isolate the causal variant. An online tool to perform these calculations is available at http://moya.srl.cam.ac.uk/ocac/FineMappingPowerCalculator.html . Genet. Epidemiol. 34:463–468, 2010. © 2010 Wiley‐Liss, Inc.  相似文献   

14.
Improving power in genome-wide association studies: weights tip the scale   总被引:3,自引:0,他引:3  
The potential of genome-wide association analysis can only be realized when they have power to detect signals despite the detrimental effect of multiple testing on power. We develop a weighted multiple testing procedure that facilitates the input of prior information in the form of groupings of tests. For each group a weight is estimated from the observed test statistics within the group. Differentially weighting groups improves the power to detect signals in likely groupings. The advantage of the grouped-weighting concept, over fixed weights based on prior information, is that it often leads to an increase in power even if many of the groupings are not correlated with the signal. Being data dependent, the procedure is remarkably robust to poor choices in groupings. Power is typically improved if one (or more) of the groups clusters multiple tests with signals, yet little power is lost when the groupings are totally random. If there is no apparent signal in a group, relative to a group that appears to have several tests with signals, the former group will be down-weighted relative to the latter. If no groups show apparent signals, then the weights will be approximately equal. The only restriction on the procedure is that the number of groups be small, relative to the total number of tests performed.  相似文献   

15.
Neuropsychological disorders have a biological basis rooted in brain function, and neuroimaging data are expected to better illuminate the complex genetic basis of neuropsychological disorders. Because they are biological measures, neuroimaging data avoid biases arising from clinical diagnostic criteria that are subject to human understanding and interpretation. A challenge with analyzing neuroimaging data is their high dimensionality and complex spatial relationships. To tackle this challenge, we introduced a novel distance covariance tests that can assess the association between genetic markers and multivariate diffusion tensor imaging measurements, and analyzed a genome‐wide association study (GWAS) dataset collected by the Pediatric Imaging, Neurocognition, and Genetics (PING) study. We also considered existing approaches as comparisons. Our results showed that, after correcting for multiplicity, distance covariance tests of the multivariate phenotype yield significantly greater power at detecting genetic markers affecting brain structure than standard mass univariate GWAS of individual neuroimaging biomarkers. Our results underscore the usefulness of utilizing the distance covariance to incorporate neuroimaging data in GWAS.  相似文献   

16.
Multivariate phenotypes are frequently encountered in genome‐wide association studies (GWAS). Such phenotypes contain more information than univariate phenotypes, but how to best exploit the information to increase the chance of detecting genetic variant of pleiotropic effect is not always clear. Moreover, when multivariate phenotypes contain a mixture of quantitative and qualitative measures, limited methods are applicable. In this paper, we first evaluated the approach originally proposed by O'Brien and by Wei and Johnson that combines the univariate test statistics and then we proposed two extensions to that approach. The original and proposed approaches are applicable to a multivariate phenotype containing any type of components including continuous, categorical and survival phenotypes, and applicable to samples consisting of families or unrelated samples. Simulation results suggested that all methods had valid type I error rates. Our extensions had a better power than O'Brien's method with heterogeneous means among univariate test statistics, but were less powerful than O'Brien's with homogeneous means among individual test statistics. All approaches have shown considerable increase in power compared to testing each component of a multivariate phenotype individually in some cases. We apply all the methods to GWAS of serum uric acid levels and gout with 550,000 single nucleotide polymorphisms in the Framingham Heart Study. Genet. Epidemiol. 34:444–454, 2010. © 2010 Wiley‐Liss, Inc.  相似文献   

17.
Promising findings from genetic association studies are commonly presented with two distinct figures: one gives the association study results and the other indicates linkage disequilibrium (LD) between genetic markers in the region(s) of interest. Fully interpreting the results of such studies requires synthesizing the information in these figures, which is generally done in a subjective and unsystematic manner. Here we present a method to formally combine association results and LD and display them in the same figure; we have developed a freely available web‐based application that can be used to generate figures to display the combined data. To demonstrate this approach we apply it to fine mapping data from the prostate cancer 8q24 loci. Combining these two sources of information in a single figure allows one to more clearly assess patterns of association, facilitating the interpretation of genome‐wide and fine mapping data and improving our ability to localize causal variants. Genet. Epidemiol. 33:599–603, 2009. © 2009 Wiley‐Liss, Inc.  相似文献   

18.
Emily M 《Statistics in medicine》2012,31(21):2359-2373
Epistasis is often cited as the biological mechanism carrying the missing heritability in genome‐wide association studies. However, there is a very few number of studies reported in the literature. The low power of existing statistical methods is a potential explanation. Statistical procedures are also mainly based on the statistical definition of epistasis that prevents from detecting SNP–SNP interactions that rely on some classes of epistatic models. In this paper, we propose a new statistic, called IndOR for independence‐based odds ratio, based on the biological definition of epistasis. We assume that epistasis modifies the dependency between the two causal SNPs, and we develop a Wald procedure to test such hypothesis. Our new statistic is compared with three statistical procedures in a large power study on simulated data sets. We use extensive simulations, based on 45 scenarios, to investigate the effect of three factors: the underlying disease model, the linkage disequilibrium, and the control‐to‐case ratio. We demonstrate that our new test has the ability to detect a wider range of epistatic models. Furthermore, our new statistical procedure is remarkably powerful when the two loci are linked and when the control‐to‐case ratio is higher than 1. The application of our new statistic on the Wellcome Trust Case Control Consortium data set on Crohn's disease enhances our results on simulated data. Our new test, IndOR, catches previously reported interaction with more power. Furthermore, a new combination of variant has been detected by our new test as significantly associated with Crohn's disease. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

19.
Next-generation sequencing is widely used to study complex diseases because of its ability to identify both common and rare variants without prior single nucleotide polymorphism (SNP) information. Pooled sequencing of implicated target regions can lower costs and allow more samples to be analyzed, thus improving statistical power for disease-associated variant detection. Several methods for disease association tests of pooled data and for optimal pooling designs have been developed under certain assumptions of the pooling process, for example, equal/unequal contributions to the pool, sequencing depth variation, and error rate. However, these simplified assumptions may not portray the many factors affecting pooled sequencing data quality, such as PCR amplification during target capture and sequencing, reference allele preferential bias, and others. As a result, the properties of the observed data may differ substantially from those expected under the simplified assumptions. Here, we use real datasets from targeted sequencing of pooled samples, together with microarray SNP genotypes of the same subjects, to identify and quantify factors (biases and errors) affecting the observed sequencing data. Through simulations, we find that these factors have a significant impact on the accuracy of allele frequency estimation and the power of association tests. Furthermore, we develop a workflow protocol to incorporate these factors in data analysis to reduce the potential biases and errors in pooled sequencing data and to gain better estimation of allele frequencies. The workflow, Psafe, is available at http://bioinformatics.med.yale.edu/group/.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号