首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
The phenomenon known as the winner's curse is a form of selection bias that affects estimates of genetic association. In genome-wide association studies (GWAS) the bias is exacerbated by the use of stringent selection thresholds and ranking over hundreds of thousands of single nucleotide polymorphisms (SNPs). We develop an improved multi-locus bootstrap point estimate and confidence interval, which accounts for both ranking- and threshold-selection bias in the presence of genome-wide SNP linkage disequilibrium structure. The bootstrap method easily adapts to various study designs and alternative test statistics as well as complex SNP selection criteria. The latter is demonstrated by our application to the Wellcome Trust Case Control Consortium findings, in which the selection criterion was the minimum of the p-values for the additive and genotypic genetic effect models. In contrast, existing likelihood-based bias-reduced estimators account for the selection criterion applied to an SNP as if it were the only one tested, and so are more simple computationally, but do not address ranking across SNPs. Our simulation studies show that the bootstrap bias-reduced estimates are usually closer to the true genetic effect than the likelihood estimates and are less variable with a narrower confidence interval. Replication study sample size requirements computed from the bootstrap bias-reduced estimates are adequate 75-90 per cent of the time compared to 53-60 per cent of the time for the likelihood method. The bootstrap methods are implemented in a user-friendly package able to provide point and interval estimation for both binary and quantitative phenotypes in large-scale GWAS.  相似文献   

2.
OBJECTIVES: Genotyping errors can induce biases in frequency estimates for haplotypes of single nucleotide polymorphisms (SNPs). Here, we considered the impact of SNP allele misclassification on haplotype odds ratio estimates from case-control studies of unrelated individuals. METHODS: We calculated bias analytically, using the haplotype counts expected in cases and controls under genotype misclassification. We evaluated the bias due to allele misclassification across a range of haplotype distributions using empirical haplotype frequencies within blocks of limited haplotype diversity. We also considered simple two- and three-locus haplotype distributions to understand the impact of haplotype frequency and number of SNPs on misclassification bias. RESULTS: We found that for common haplotypes (>5% frequency), realistic genotyping error rates (0.1-1% chance of miscalling an allele), and moderate relative risks (2-4), the bias was always towards the null and increases in magnitude with increasing error rate, increasing odds ratio. For common haplotypes, bias generally increased with increasing haplotype frequency, while for rare haplotypes, bias generally increased with decreasing frequency. When the chance of miscalling an allele is 0.5%, the median bias in haplotype-specific odds ratios for common haplotypes was generally small (<4% on the log odds ratio scale), but the bias for some individual haplotypes was larger (10-20%). Bias towards the null leads to a loss in power; the relative efficiency using a test statistic based upon misclassified haplotype data compared to a test based on the unobserved true haplotypes ranged from roughly 60% to 80%, and worsened with increasing haplotype frequency. CONCLUSIONS: The cumulative effect of small allele-calling errors across multiple loci can induce noticeable bias and reduce power in realistic scenarios. This has implications for the design of candidate gene association studies that utilize multi-marker haplotypes.  相似文献   

3.
Genomewide association studies (GWAS) sometimes identify loci at which both the number and identities of the underlying causal variants are ambiguous. In such cases, statistical methods that model effects of multiple single‐nucleotide polymorphisms (SNPs) simultaneously can help disentangle the observed patterns of association and provide information about how those SNPs could be prioritized for follow‐up studies. Current multi‐SNP methods, however, tend to assume that SNP effects are well captured by additive genetics; yet when genetic dominance is present, this assumption translates to reduced power and faulty prioritizations. We describe a statistical procedure for prioritizing SNPs at GWAS loci that efficiently models both additive and dominance effects. Our method, LLARRMA‐dawg, combines a group LASSO procedure for sparse modeling of multiple SNP effects with a resampling procedure based on fractional observation weights. It estimates for each SNP the robustness of association with the phenotype both to sampling variation and to competing explanations from other SNPs. In producing an SNP prioritization that best identifies underlying true signals, we show the following: our method easily outperforms a single‐marker analysis; when additive‐only signals are present, our joint model for additive and dominance is equivalent to or only slightly less powerful than modeling additive‐only effects; and when dominance signals are present, even in combination with substantial additive effects, our joint model is unequivocally more powerful than a model assuming additivity. We also describe how performance can be improved through calibrated randomized penalization, and discuss how dominance in ungenotyped SNPs can be incorporated through either heterozygote dosage or multiple imputation.  相似文献   

4.
Genome‐wide association studies (GWAS) provide an important approach for identifying common genetic variants that predispose to human disease. However, odds ratio (OR) estimates for the reported findings from GWAS discovery data are typically affected by a bias away from the null sometimes referred to the “winner's curse”. Also standard confidence intervals (CIs) may have far from the desired coverage rates. We applied a bias reduction method to GWAS findings from several major complex human diseases, including breast cancer, colorectal cancer, lung cancer, prostate cancer, type I diabetes, and type II diabetes. We found the simple bias correction procedure allows one to estimate bias‐adjusted ORs that have substantial consistency with ORs from subsequent replication studies, and that corresponding selection‐adjusted CIs appear to help quantify the uncertainty of the findings. Selection‐adjusted ORs and CIs can provide a reliable summary of GWAS data, and can help to choose single nucleotide polymorphisms for subsequent validation studies. Genet. Epidemiol. 34:78–91, 2010. © 2009 Wiley‐Liss, Inc.  相似文献   

5.
In genome‐wide association studies (GWAS), it is a common practice to impute the genotypes of untyped single nucleotide polymorphism (SNP) by exploiting the linkage disequilibrium structure among SNPs. The use of imputed genotypes improves genome coverage and makes it possible to perform meta‐analysis combining results from studies genotyped on different platforms. A popular way of using imputed data is the “expectation‐substitution” method, which treats the imputed dosage as if it were the true genotype. In current practice, the estimates given by the expectation‐substitution method are usually combined using inverse variance weighting (IVM) scheme in meta‐analysis. However, the IVM is not optimal as the estimates given by the expectation‐substitution method are generally biased. The optimal weight is, in fact, proportional to the inverse variance and the expected value of the effect size estimates. We show both theoretically and numerically that the bias of the estimates is very small under practical conditions of low effect sizes in GWAS. This finding validates the use of the expectation‐substitution method, and shows the inverse variance is a good approximation of the optimal weight. Through simulation, we compared the power of the IVM method with several methods including the optimal weight, the regular z‐score meta‐analysis and a recently proposed “imputation aware” meta‐analysis method (Zaitlen and Eskin [2010] Genet Epidemiol 34:537–542). Our results show that the performance of the inverse variance weight is always indistinguishable from the optimal weight and similar to or better than the other two methods. Genet. Epidemiol. 2011. © 2011 Wiley Periodicals, Inc. 35:597‐605, 2011  相似文献   

6.
A central issue in genome‐wide association (GWA) studies is assessing statistical significance while adjusting for multiple hypothesis testing. An equally important question is the statistical efficiency of the GWA design as compared to the traditional sequential approach in which genome‐wide linkage analysis is followed by region‐wise association mapping. Nevertheless, GWA is becoming more popular due in part to cost efficiency: commercially available 1M chips are nearly as inexpensive as a custom‐designed 10 K chip. It is becoming apparent, however, that most of the on‐going GWA studies with 2,000–5,000 samples are in fact underpowered. As a means to improve power, we emphasize the importance of utilizing prior information such as results of previous linkage studies via a stratified false discovery rate (FDR) control. The essence of the stratified FDR control is to prioritize the genome and maintain power to interrogate candidate regions within the GWA study. These candidate regions can be defined as, but are by no means limited to, linkage‐peak regions. Furthermore, we theoretically unify the stratified FDR approach and the weighted P‐value method, and we show that stratified FDR can be formulated as a robust version of weighted FDR. Finally, we demonstrate the utility of the methods in two GWA datasets: Type 2 diabetes (FUSION) and an on‐going study of long‐term diabetic complications (DCCT/EDIC). The methods are implemented as a user‐friendly software package, SFDR. The same stratification framework can be readily applied to other type of studies, for example, using GWA results to improve the power of sequencing data analyses. Genet. Epidemiol. 34: 107–118, 2010. © 2009 Wiley‐Liss, Inc.  相似文献   

7.
Confounding due to population stratification (PS) arises when differences in both allele and disease frequencies exist in a population of mixed racial/ethnic subpopulations. Genomic control, structured association, principal components analysis (PCA), and multidimensional scaling (MDS) approaches have been proposed to address this bias using genetic markers. However, confounding due to PS can also be due to non‐genetic factors. Propensity scores are widely used to address confounding in observational studies but have not been adapted to deal with PS in genetic association studies. We propose a genomic propensity score (GPS) approach to correct for bias due to PS that considers both genetic and non‐genetic factors. We compare the GPS method with PCA and MDS using simulation studies. Our results show that GPS can adequately adjust and consistently correct for bias due to PS. Under no/mild, moderate, and severe PS, GPS yielded estimated with bias close to 0 (mean=?0.0044, standard error=0.0087). Under moderate or severe PS, the GPS method consistently outperforms the PCA method in terms of bias, coverage probability (CP), and type I error. Under moderate PS, the GPS method consistently outperforms the MDS method in terms of CP. PCA maintains relatively high power compared to both MDS and GPS methods under the simulated situations. GPS and MDS are comparable in terms of statistical properties such as bias, type I error, and power. The GPS method provides a novel and robust tool for obtaining less‐biased estimates of genetic associations that can consider both genetic and non‐genetic factors. Genet. Epidemiol. 33:679–690, 2009. © 2009 Wiley‐Liss, Inc.  相似文献   

8.
M Hu  J M Lachin 《Statistics in medicine》2001,20(22):3411-3428
A model fit by general estimating equations (GEE) has been used extensively for the analysis of longitudinal data in medical studies. To some extent, GEE tries to minimize a quadratic form of the residuals, and therefore is not robust in the sense that it, like least squares estimates, is sensitive to heavy-tailed distributions, contaminated distributions and extreme values. This paper describes the family of truncated robust estimating equations and its properties for the analysis of quantitative longitudinal data. Like GEE, the robust estimating equations aim to assess the covariate effects in the generalized linear model in the complete population of observations, but in a manner that is more robust to the influence of aberrant observations. A simulation study has been conducted to compare the finite-sample performance of GEE and the robust estimating equations under a variety of error distributions and data structures. It shows that the parameter estimates based on GEE and the robust estimating equations are approximately unbiased and the type I errors of Wald tests do not tend to be inflated. GEE is slightly more efficient with pure normal data, but the efficiency of GEE declines much more quickly than the robust estimating equations when the data become contaminated or have heavy tails, which makes the robust estimating equations advantageous with non-normal data. Both GEE and the robust estimating equations are applied to a longitudinal analysis of renal function in the Diabetes Control and Complications Trial (DCCT). For this application, GEE seems to be sensitive to the working correlation specification in that different working correlation structures may lead to different conclusions about the effect of intensive diabetes treatment. On the other hand, the robust estimating equations consistently conclude that the treatment effect is highly significant no matter which working correlation structure is used. The DCCT Research Group also demonstrated a significant effect using a mixed-effects longitudinal model.  相似文献   

9.
The accuracy of gene localization, the reliability of locus-specific effect estimates, and the ability to replicate initial claims of linkage and/or association have emerged as major methodological concerns in genomewide studies of complex diseases and quantitative traits. To address the issue of multiple comparisons inherent in genomewide studies, the use of stringent criteria for assessing statistical significance has been generally acknowledged as a strategy to control type I error. However, the application of genomewide significance criteria does not take account of the selection bias introduced into parameter estimates, e.g., estimates of locus-specific effect size of disease/trait loci. Some have argued that reliable locus-specific parameter estimates can only be obtained in an independent sample. In this report, we examine statistical resampling techniques, including cross-validation and the bootstrap, applied to the initial sample to improve the estimation of locus-specific effects. We compare them with the naive method in which all data are used for both hypothesis testing and parameter estimation, as well as with the split-sample approach in which part of the data are reserved for estimation. Upward bias of the naive estimator and inadequacy of the split-sample approach are derived analytically under a simple quantitative trait model. Simulation studies of the resampling methods are performed for both the simple model and a more realistic genomewide linkage analysis. Our results suggest that cross-validation and bootstrap methods can substantially reduce the estimation bias, especially when the effect size is small or there is no genetic effect.  相似文献   

10.
Genome‐wide association studies (GWAS) have been widely used to identify genetic effects on complex diseases or traits. Most currently used methods are based on separate single‐nucleotide polymorphism (SNP) analyses. Because this approach requires correction for multiple testing to avoid excessive false‐positive results, it suffers from reduced power to detect weak genetic effects under limited sample size. To increase the power to detect multiple weak genetic factors and reduce false‐positive results caused by multiple tests and dependence among test statistics, a modified forward multiple regression (MFMR) approach is proposed. Simulation studies show that MFMR has higher power than the Bonferroni and false discovery rate procedures for detecting moderate and weak genetic effects, and MFMR retains an acceptable‐false positive rate even if causal SNPs are correlated with many SNPs due to population stratification or other unknown reasons. Genet. Epidemiol. 33:518–525, 2009. © 2009 Wiley‐Liss, Inc.  相似文献   

11.
Genome-wide association studies are carried out to identify unknown genes for a complex trait. Polymorphisms showing the most statistically significant associations are reported and followed up in subsequent confirmatory studies. In addition to the test of association, the statistical analysis provides point estimates of the relationship between the genotype and phenotype at each polymorphism, typically an odds ratio in case-control association studies. The statistical significance of the test and the estimator of the odds ratio are completely correlated. Selecting the most extreme statistics is equivalent to selecting the most extreme odds ratios. The value of the estimator, given the value of the statistical significance depends on the standard error of the estimator and the power of the study. This report shows that when power is low, estimates of the odds ratio from a genome-wide association study, or any large-scale association study, will be upwardly biased. Genome-wide association studies are often underpowered given the low alpha levels required to declare statistical significance and the small individual genetic effects known to characterize complex traits. Factors such as low allele frequency, inadequate sample size and weak genetic effects contribute to large standard errors in the odds ratio estimates, low power and upwardly biased odds ratios. Studies that have high power to detect an association with the true odds ratio will have little or no bias, regardless of the statistical significance threshold. The results have implications for the interpretation of genome-wide association analysis and the planning of subsequent confirmatory stages.  相似文献   

12.
Errors in genotyping can greatly affect family-based association studies. If a mendelian inconsistency is detected, the family is usually removed from the analysis. This reduces power, and may introduce bias. In addition, a large proportion of genotyping errors remain undetected, and these also reduce power. We present a Bayesian framework for performing association studies with SNP data on samples of trios consisting of parents with an affected offspring, while allowing for the presence of both detectable and undetectable genotyping errors. This framework also allows for the inclusion of missing genotypes. Associations between the SNP and disease were modelled in terms of the genotypic relative risks. The performances of the analysis methods were investigated under a variety of models for disease association and genotype error, looking at both power to detect association and precision of genotypic relative risk estimates. As expected, power to detect association decreased as genotyping error probability increased. Importantly, however, analyses allowing for genotyping error had similar power to standard analyses when applied to data without genotyping error. Furthermore, allowing for genotyping error yielded relative risk estimates that were approximately unbiased, together with 95% credible intervals giving approximately correct coverage. The methods were also applied to a real dataset: a sample of schizophrenia cases and their parents genotyped at SNPs in the dysbindin gene. The analysis methods presented here require no prior information on the genotyping error probabilities, and may be fitted in WinBUGS.  相似文献   

13.
Not accounting for interaction in association analyses may reduce the power to detect the variants involved. We investigate the powers of different designs to detect under two‐locus models the effect of disease‐causing variants among several hundreds of markers using family‐based association tests by simulation. This setting reflects realistic situations of exploration of linkage regions or of biological pathways. We define four strategies: (S1) single‐marker analysis of all Single Nucleotide Polymorphisms (SNPs), (S2) two‐marker analysis of all possible SNPs pairs, (S3) lax preliminary selection of SNPs followed by a two‐marker analysis of all selected SNP pairs, (S4) stringent preliminary selection of SNPs, each being later paired with all the SNPs for two‐marker analysis. Strategy S2 is never the best design, except when there is an inversion of the gene effect (flip‐flop model). Testing individual SNPs (S1) is the most efficient when the two genes act multiplicatively. Designs S3 and S4 are the most powerful for nonmultiplicative models. Their respective powers depend on the level of symmetry of the model. Because the true genetic model is unknown, we cannot conclude that one design outperforms another. The optimal approach would be the two‐step strategy (S3 or S4) as it is often the most powerful, or the second best. Genet.  相似文献   

14.
Paul E 《The Case Manager》2002,13(2):78-81
Nutrition therapy has been the focus of diabetes management since before insulin was discovered.(1) Many theories and approaches have been recommended and reemerged over the years. Since the Diabetes Control and Complications Trial (DCCT) results were released in 1993, nutrition is considered the most critical and pivotal component of diabetes care in achieving blood glucose goals. We have seen increased emphases on individualized nutrition therapy and the dietitian as a true partner in diabetes care, research, and management.(1) Advances in nutrition therapy now center on methods to improve behavioral change because it is the major challenge facing people with diabetes. Access to nutrition therapy and self-management training is critical to improve clinical outcomes and reduce health care costs otherwise spent on clinic visits, expensive medications, emergency room visits, and hospitalizations.(1)  相似文献   

15.
Genome‐wide association studies (GWAS) are now routinely imputed for untyped single nucleotide polymorphisms (SNPs) based on various powerful statistical algorithms for imputation trained on reference datasets. The use of predicted allele counts for imputed SNPs as the dosage variable is known to produce valid score test for genetic association. In this paper, we investigate how to best handle imputed SNPs in various modern complex tests for genetic associations incorporating gene–environment interactions. We focus on case‐control association studies where inference for an underlying logistic regression model can be performed using alternative methods that rely on varying degree on an assumption of gene–environment independence in the underlying population. As increasingly large‐scale GWAS are being performed through consortia effort where it is preferable to share only summary‐level information across studies, we also describe simple mechanisms for implementing score tests based on standard meta‐analysis of “one‐step” maximum‐likelihood estimates across studies. Applications of the methods in simulation studies and a dataset from GWAS of lung cancer illustrate ability of the proposed methods to maintain type‐I error rates for the underlying testing procedures. For analysis of imputed SNPs, similar to typed SNPs, the retrospective methods can lead to considerable efficiency gain for modeling of gene–environment interactions under the assumption of gene–environment independence. Methods are made available for public use through CGEN R software package.  相似文献   

16.
The results obtained from 85 antidiabetic centers enrolled in the DAI study are presented with regard to the external quality assessment scheme for glycohemoglobin. The materials have been prepared by a laboratory of the network of reference laboratories of the International Federation of Clinical Chemistry (IFCC). To each control a Diabetes Control and Complications Trial (DCCT) traceable target value was assigned. The High-Performance Liquid Chromatography (HPLC) methods for glycohemoglobin are used in 75% of the centers, the immunochemical techniques in 21% and less than 5% is using affinity chromatography based methods. The data collected from the laboratories who completed the set of measurements show that 64% of the centers are well aligned to the DCCT system. The reproducibility of the methods varied between 3.7 and 5.8% (as CV, %) and has to be improved.  相似文献   

17.
Survival bias is difficult to detect and adjust for in case–control genetic association studies but can invalidate findings when only surviving cases are studied and survival is associated with the genetic variants under study. Here, we propose a design where one genotypes genetically informative family members (such as offspring, parents, and spouses) of deceased cases and incorporates that surrogate genetic information into a retrospective maximum likelihood analysis. We show that inclusion of genotype data from first‐degree relatives permits unbiased estimation of genotype association parameters. We derive closed‐form maximum likelihood estimates for association parameters under the widely used log‐additive and dominant association models. Our proposed design not only permits a valid analysis but also enhances statistical power by augmenting the sample with indirectly studied individuals. Gene variants associated with poor prognosis can also be identified under this design. We provide simulation results to assess performance of the methods. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

18.
Even in large-scale genome-wide association studies (GWASs), only a fraction of the true associations are detected at the genome-wide significance level. When few or no associations reach the significance threshold, one strategy is to follow up on the most promising candidates, i.e. the single nucleotide polymorphisms (SNPs) with the smallest association-test P-values, by genotyping them in additional studies. In this communication, we propose an overall test for GWASs that analyzes the SNPs with the most promising P-values simultaneously and therefore allows an early assessment of whether the follow-up of the selected SNPs is likely promising. We theoretically derive the properties of the proposed overall test under the null hypothesis and assess its power based on simulation studies. An application to a GWAS for chronic obstructive pulmonary disease suggests that there are true association signals among the top SNPs and that an additional follow-up study is promising.  相似文献   

19.
Genome‐wide association studies (GWAS) of complex traits have generated many association signals for single nucleotide polymorphisms (SNPs). To understand the underlying causal genetic variant(s), focused DNA resequencing of targeted genomic regions is commonly used, yet the current cost of resequencing limits sample sizes for resequencing studies. Information from the large GWAS can be used to guide choice of samples for resequencing, such as the SNP genotypes in the targeted genomic region. Viewing the GWAS tag‐SNPs as imperfect surrogates for the underlying causal variants, yet expecting that the tag‐SNPs are correlated with the causal variants, a reasonable approach is a two‐phase case‐control design, with the GWAS serving as the first‐phase and the resequencing study serving as the second‐phase. Using stratified sampling based on both tag‐SNP genotypes and case‐control status, we explore the gains in power of a two‐phase design relative to randomly sampling cases and controls for resequencing (i.e., ignoring tag‐SNP genotypes). Simulation results show that stratified sampling based on both tag‐SNP genotypes and case‐control status is not likely to have lower power than stratified sampling based only on case‐control status, and can sometimes have substantially greater power. The gain in power depends on the amount of linkage disequilibrium between the tag‐SNP and causal variant alleles, as well as the effect size of the causal variant. Hence, the two‐phase design provides an efficient approach to follow‐up GWAS signals with DNA resequencing.  相似文献   

20.
It is well known that measurement error in the covariates of regression models generally causes bias in parameter estimates. Correction for such biases requires information concerning the measurement error, which is often in the form of internal validation or replication data. Regression calibration (RC) is a popular approach to correct for covariate measurement error, which involves predicting the true covariate using error‐prone measurements. Likelihood methods have previously been proposed as an alternative approach to estimate the parameters in models affected by measurement error, but have been relatively infrequently employed in medical statistics and epidemiology, partly because of computational complexity and concerns regarding robustness to distributional assumptions. We show how a standard random‐intercepts model can be used to obtain maximum likelihood (ML) estimates when the outcome model is linear or logistic regression under certain normality assumptions, when internal error‐prone replicate measurements are available. Through simulations we show that for linear regression, ML gives more efficient estimates than RC, although the gain is typically small. Furthermore, we show that RC and ML estimates remain consistent even when the normality assumptions are violated. For logistic regression, our implementation of ML is consistent if the true covariate is conditionally normal given the outcome, in contrast to RC. In simulations, this ML estimator showed less bias in situations where RC gives non‐negligible biases. Our proposal makes the ML approach to dealing with covariate measurement error more accessible to researchers, which we hope will improve its viability as a useful alternative to methods such as RC. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号