首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Case-control association studies using unrelated individuals may offer an effective approach for identifying genetic variants that have small to moderate disease risks. In general, two different strategies may be employed to establish associations between genotypes and phenotypes: (1) collecting individual genotypes or (2) quantifying allele frequencies in DNA pools. These two technologies have their respective advantages. Individual genotyping gathers more information, whereas DNA pooling may be more cost effective. Recent technological advances in DNA pooling have generated great interest in using DNA pooling in association studies. In this article, we investigate the impacts of errors in genotyping or measuring allele frequencies on the identification of genetic associations with these two strategies. We find that, with current technologies, compared to individual genotyping, a larger sample is generally required to achieve the same power using DNA pooling. We further consider the use of DNA pooling as a screening tool to identify candidate regions for follow-up studies. We find that the majority of the positive regions identified from DNA pooling results may represent false positives if measurement errors are not appropriately considered in the design of the study.  相似文献   

2.
In many biomedical studies, covariates of interest may be measured with errors. However, frequently in a regression analysis, the quantiles of the exposure variable are often used as the covariates in the regression analysis. Because of measurement errors in the continuous exposure variable, there could be misclassification in the quantiles for the exposure variable. Misclassification in the quantiles could lead to bias estimation in the association between the exposure variable and the outcome variable. Adjustment for misclassification will be challenging when the gold standard variables are not available. In this paper, we develop two regression calibration estimators to reduce bias in effect estimation. The first estimator is normal likelihood‐based. The second estimator is linearization‐based, and it provides a simple and practical correction. Finite sample performance is examined via a simulation study. We apply the methods to a four‐arm randomized clinical trial that tested exercise and weight loss interventions in women aged 50–75years. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

3.
In genetic association studies, mixed effects models have been widely used in detecting the pleiotropy effects which occur when one gene affects multiple phenotype traits. In particular, bivariate mixed effects models are useful for describing the association of a gene with a continuous trait and a binary trait. However, such models are inadequate to feature the data with response mismeasurement, a characteristic that is often overlooked. It has been well studied that in univariate settings, ignorance of mismeasurement in variables usually results in biased estimation. In this paper, we consider the setting with a bivariate outcome vector which contains a continuous component and a binary component both subject to mismeasurement. We propose an induced likelihood approach and an EM algorithm method to handle measurement error in continuous response and misclassification in binary response simultaneously. Simulation studies confirm that the proposed methods successfully remove the bias induced from the response mismeasurement.  相似文献   

4.
Genotyping errors can create a problem for the analysis of case-parents data because some families will exhibit genotypes that are inconsistent with Mendelian inheritance. The problem with correcting Mendelian inconsistent genotype errors by regenotyping or removing families in which they occur is that the remaining unidentified genotype errors can produce excess type I (false positive) error for some family-based tests for association. We address this problem by developing a likelihood ratio test (LRT) for association in a case-parents design that incorporates nuisance parameters for a general genotype error model. We extend the likelihood approach for a single SNP to include short haplotypes consisting of 2 or 3 SNPs. The extension to haplotypes is based on assumptions of random mating, multiplicative penetrances, and at most a single genotype error per family. For a single SNP, we found, using Monte Carlo simulation, that type I error rate can be controlled for a number of genotype error models at different error rates. Simulation results suggest the same is true for 2 and 3 SNPs. In all cases, power declined with increasing genotyping error rates. In the absence of genotyping errors, power was similar whether nuisance parameters for genotype error were included in the LRT or not. The LRT developed here does not require prior specification of a particular model for genotype errors and it can be readily computed using the EM algorithm. Consequently, this test may be generally useful as a test of association with case-parents data in which Mendelian inconsistent families are observed.  相似文献   

5.
The linkage between electronic health records (EHRs) and genotype data makes it plausible to study the genetic susceptibility of a wide range of disease phenotypes. Despite that EHR‐derived phenotype data are subjected to misclassification, it has been shown useful for discovering susceptible genes, particularly in the setting of phenome‐wide association studies (PheWAS). It is essential to characterize discovered associations using gold standard phenotype data by chart review. In this work, we propose a genotype stratified case‐control sampling strategy to select subjects for phenotype validation. We develop a closed‐form maximum‐likelihood estimator for the odds ratio parameters and a score statistic for testing genetic association using the combined validated and error‐prone EHR‐derived phenotype data, and assess the extent of power improvement provided by this approach. Compared with case‐control sampling based only on EHR‐derived phenotype data, our genotype stratified strategy maintains nominal type I error rates, and result in higher power for detecting associations. It also corrects the bias in the odds ratio parameter estimates, and reduces the corresponding variance especially when the minor allele frequency is small.  相似文献   

6.
We propose a cost-effective two-stage approach to investigate gene-disease associations when testing a large number of candidate markers using a case-control design. Under this approach, all the markers are genotyped and tested at stage 1 using a subset of affected cases and unaffected controls, and the most promising markers are genotyped on the remaining individuals and tested using all the individuals at stage 2. The sample size at stage 1 is chosen such that the power to detect the true markers of association is 1-beta(1) at significance level alpha(1). The most promising markers are tested at significance level alpha(2) at stage 2. In contrast, a one-stage approach would evaluate and test all the markers on all the cases and controls to identify the markers significantly associated with the disease. The goal is to determine the two-stage parameters (alpha(1), beta(1), alpha(2)) that minimize the cost of the study such that the desired overall significance is alpha and the desired power is close to 1-beta, the power of the one-stage approach. We provide analytic formulae to estimate the two-stage parameters. The properties of the two-stage approach are evaluated under various parametric configurations and compared with those of the corresponding one-stage approach. The optimal two-stage procedure does not depend on the signal of the markers associated with the study. Further, when there is a large number of markers, the optimal procedure is not substantially influenced by the total number of markers associated with the disease. The results show that, compared to a one-stage approach, a two-stage procedure typically halves the cost of the study.  相似文献   

7.
Errors in genotyping can greatly affect family-based association studies. If a mendelian inconsistency is detected, the family is usually removed from the analysis. This reduces power, and may introduce bias. In addition, a large proportion of genotyping errors remain undetected, and these also reduce power. We present a Bayesian framework for performing association studies with SNP data on samples of trios consisting of parents with an affected offspring, while allowing for the presence of both detectable and undetectable genotyping errors. This framework also allows for the inclusion of missing genotypes. Associations between the SNP and disease were modelled in terms of the genotypic relative risks. The performances of the analysis methods were investigated under a variety of models for disease association and genotype error, looking at both power to detect association and precision of genotypic relative risk estimates. As expected, power to detect association decreased as genotyping error probability increased. Importantly, however, analyses allowing for genotyping error had similar power to standard analyses when applied to data without genotyping error. Furthermore, allowing for genotyping error yielded relative risk estimates that were approximately unbiased, together with 95% credible intervals giving approximately correct coverage. The methods were also applied to a real dataset: a sample of schizophrenia cases and their parents genotyped at SNPs in the dysbindin gene. The analysis methods presented here require no prior information on the genotyping error probabilities, and may be fitted in WinBUGS.  相似文献   

8.
We compared different ascertainment schemes for genetic association analysis: affected sib-pairs (ASPs), case-parent trios, and unrelated cases and controls. We found, with empirical type 1 diabetes data at four known disease loci, that studies based on case-parent trios and on unmatched cases and controls often gave higher odds ratio estimates and stronger significance test values than ASP designs. We used simulations and a simplified disease model involving two interacting loci, one of large effect and one smaller, to examine interaction models that could cause such an effect. The different ascertainment schemes were compared for power to detect an effect when only the locus of smaller effect was genotyped. ASPs showed the greatest power for association testing under most models of interaction except under additive and certain epistatic crossover models, for which case/controls and case-parent trios did better. All ascertainment schemes gave an unbiased estimation of log genotype relative risks (GRRs) under a multiplicative model. Under nonmultiplicative interactions, GRRs at the minor locus as estimated from ASPs could be biased upwards or downwards, resulting in either an increase or decrease in power compared to the case/control or trio design. For the four known type 1 diabetes loci, we observed decreased risks with ASPs, which could be due to additive interactions with the remaining susceptibility loci. Thus, the optimal ascertainment strategy in genetic association studies depends on the unknown underlying multilocus genetic model, and on whether the goal of the study is to detect an effect or to accurately estimate the resulting disease risks.  相似文献   

9.
广义线性混合效应模型在分类重复测量资料中的应用   总被引:3,自引:1,他引:3  
罗天娥  刘桂芬 《中国卫生统计》2007,24(5):486-487,492
目的探讨分类重复测量资料广义线性混合效应模型(GLMMs)建模及SAS8.0的GLIMMIX宏实现。方法利用GLIMMIX宏ERROR和LINK语句来指示反应变量的分布及连接函数,通过REPEATED和RANDOM语句的TYPE选项选择合适的方差-协方差结构矩阵来模拟数据的相关性,采用基于线性的伪似然函数进行模型参数估计。结果GLMMs是在广义线性固定效应模型的基础上引入随机效应,反应变量可以是指数家族中任意分布(连续分布包括正态分布,beta分布,卡方分布等;离散分布包括二项分布,泊松分布,负二项分布等),可以通过连接函数将观测的均数向量与模型参数联系起来,根据重复测量资料的特点选择合适的方差-协方差结构矩阵。结论GLMMs应用范围广,建模灵活,可以为相关或非常量方差数据建模,能提供客观正确的统计结论。  相似文献   

10.
This paper addresses the issue of biases in cost measures which are used in economic evaluation studies. The basic measure of hospital costs which is used by most investigators is unit cost. Focusing on this measure, a set of criteria which the basic measures must fulfil in order to approximate the marginal cost (MC) of a service for the relevant product, in the representative site, was identified. Then four distinct biases—a scale bias, a case mix bias, a methods bias and a site selection bias—each of which reflects the divergence of the unit cost measure from the desired MC measure, were identified. Measures are proposed for several of these biases and it is suggested how they can be corrected.  相似文献   

11.
Cost-effectiveness analysis (CEA) compares the costs and outcomes of two or more technologies. However, there is no consensus about which measure of effectiveness should be used in each analysis. Clinical researchers have to select an appropriate outcome for their purpose, and this choice can have dramatic consequences on the conclusions of their analysis. In this paper we present a Bayesian cost-effectiveness framework to carry out CEA when more than one measure is considered. In particular, we analyse the case in which two measures of effectiveness, one binary and the other continuous, are considered. Decision-making measures, such as the incremental cost-effectiveness ratio, incremental net-benefit and cost-effectiveness acceptability curves, are used to compare costs and one measure of outcome. We propose an extension of cost-acceptability curves, namely the cost-effectiveness acceptability plane, as a suitable measure for decision taking. The models were validated using data from two clinical trials. In the first one, we compared four highly active antiretroviral treatments applied to asymptomatic HIV patients. As measures of effectiveness, we considered the percentage of patients with undetectable levels of viral load, and changes in quality of life, measured according to EuroQol. In the second clinical trial we compared three methadone maintenance programmes for opioid-addicted patients. In this case, the measures of effectiveness considered were quality of life, according to the Nottingham Health Profile, and adherence to the treatment, measured as the percentage of patients who participated in the whole treatment programme.  相似文献   

12.
Large-scale association analyses based on observational health care databases such as electronic health records have been a topic of increasing interest in the scientific community. However, challenges due to nonprobability sampling and phenotype misclassification associated with the use of these data sources are often ignored in standard analyses. The extent of the bias introduced by ignoring these factors is not well-characterized. In this paper, we develop an analytic framework for characterizing the bias expected in disease-gene association studies based on electronic health records when disease status misclassification and the sampling mechanism are ignored. Through a sensitivity analysis approach, this framework can be used to obtain plausible values for parameters of interest given summary results from standard analysis. We develop an online tool for performing this sensitivity analysis. Simulations demonstrate promising properties of the proposed method. We apply our approach to study bias in disease-gene association studies using electronic health record data from the Michigan Genomics Initiative, a longitudinal biorepository effort within The University Michigan health system.  相似文献   

13.
Chen HY  Li M 《Genetic epidemiology》2011,35(8):823-830
Extreme-value sampling design that samples subjects with extremely large or small quantitative trait values is commonly used in genetic association studies. Samples in such designs are often treated as "cases" and "controls" and analyzed using logistic regression. Such a case-control analysis ignores the potential dose-response relationship between the quantitative trait and the underlying trait locus and thus may lead to loss of power in detecting genetic association. An alternative approach to analyzing such data is to model the dose-response relationship by a linear regression model. However, parameter estimation from this model can be biased, which may lead to inflated type I errors. We propose a robust and efficient approach that takes into consideration of both the biased sampling design and the potential dose-response relationship. Extensive simulations demonstrate that the proposed method is more powerful than the traditional logistic regression analysis and is more robust than the linear regression analysis. We applied our method to the analysis of a candidate gene association study on high-density lipoprotein cholesterol (HDL-C) which includes study subjects with extremely high or low HDL-C levels. Using our method, we identified several SNPs showing a stronger evidence of association with HDL-C than the traditional case-control logistic regression analysis. Our results suggest that it is important to appropriately model the quantitative traits and to adjust for the biased sampling when dose-response relationship exists in extreme-value sampling designs.  相似文献   

14.
In clinical chemistry and medical research, there is often a need to calibrate the values obtained from an old or discontinued laboratory procedure to the values obtained from a new or currently used laboratory method. The objective of the calibration study is to identify a transformation that can be used to convert the test values of one laboratory measurement procedure into the values that would be obtained using another measurement procedure. However, in the presence of heteroscedastic measurement error, there is no good statistical method available for estimating the transformation. In this paper, we propose a set of statistical methods for a calibration study when the magnitude of the measurement error is proportional to the underlying true level. The corresponding sample size estimation method for conducting a calibration study is discussed as well. The proposed new method is theoretically justified and evaluated for its finite sample properties via an extensive numerical study. Two examples based on real data are used to illustrate the procedure. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

15.
M S Pepe  S G Self  R L Prentice 《Statistics in medicine》1989,8(9):1167-78; discussion 1179
The impact of covariate measurement errors on the estimation of relative risk regression parameters is discussed. First the dependence of the induced relative risk process on the cumulative baseline failure rate function is noted. Next induced relative risk models under some specific failure time and measurement error models are described, including the much simplified models that are appropriate under a 'rare disease' assumption. The presentation then turns to the joint estimation of relative risk parameters of primary interest along with measurement error parameters. A partial likelihood product is proposed for such estimation and asymptotic properties are indicated. Guidance is also presented as to the appropriate size of a 'validation' sample relative to the full cohort size. Finally some more general considerations are presented as to the usefulness and interpretation of deattenuated regression coefficients.  相似文献   

16.
Association studies of risk factors and complex diseases require careful assessment of potential confounding factors. Two‐stage regression analysis, sometimes referred to as residual‐ or adjusted‐outcome analysis, has been increasingly used in association studies of single nucleotide polymorphisms (SNPs) and quantitative traits. In this analysis, first, a residual‐outcome is calculated from a regression of the outcome variable on covariates and then the relationship between the adjusted‐outcome and the SNP is evaluated by a simple linear regression of the adjusted‐outcome on the SNP. In this article, we examine the performance of this two‐stage analysis as compared with multiple linear regression (MLR) analysis. Our findings show that when a SNP and a covariate are correlated, the two‐stage approach results in biased genotypic effect and loss of power. Bias is always toward the null and increases with the squared‐correlation between the SNP and the covariate (). For example, for , 0.1, and 0.5, two‐stage analysis results in, respectively, 0, 10, and 50% attenuation in the SNP effect. As expected, MLR was always unbiased. Since individual SNPs often show little or no correlation with covariates, a two‐stage analysis is expected to perform as well as MLR in many genetic studies; however, it produces considerably different results from MLR and may lead to incorrect conclusions when independent variables are highly correlated. While a useful alternative to MLR under , the two ‐stage approach has serious limitations. Its use as a simple substitute for MLR should be avoided. Genet. Epidemiol. 2011. © 2011 Wiley Periodicals, Inc. 35: 592‐596, 2011  相似文献   

17.
We use a simple lifetime utility maximization model to study the problem of medical resource allocation. This model leads to a welfare specification with a QALY (quality-adjusted life-year) component that captures an individual's preferences over both life expectancy and health status. The goal of medical cost-effectiveness analysis (CEA) is characterized as maximizing the QALY measure for a given total medical expenditure. We show that the CEA with such a goal has a longevity bias: the CEA-based division of a given total medical expenditure between extending life and improving health gives the former a larger share than is called for by welfare maximization.  相似文献   

18.
A key step in genomic studies is to assess high throughput measurements across millions of markers for each participant's DNA, either using microarrays or sequencing techniques. Accurate genotype calling is essential for downstream statistical analysis of genotype‐phenotype associations, and next generation sequencing (NGS) has recently become a more common approach in genomic studies. How the accuracy of variant calling in NGS‐based studies affects downstream association analysis has not, however, been studied using empirical data in which both microarrays and NGS were available. In this article, we investigate the impact of variant calling errors on the statistical power to identify associations between single nucleotides and disease, and on associations between multiple rare variants and disease. Both differential and nondifferential genotyping errors are considered. Our results show that the power of burden tests for rare variants is strongly influenced by the specificity in variant calling, but is rather robust with regard to sensitivity. By using the variant calling accuracies estimated from a substudy of a Cooperative Studies Program project conducted by the Department of Veterans Affairs, we show that the power of association tests is mostly retained with commonly adopted variant calling pipelines. An R package, GWAS.PC, is provided to accommodate power analysis that takes account of genotyping errors ( http://zhaocenter.org/software/ ).  相似文献   

19.
目的探讨江苏省丙型肝炎病毒(HCV)基因型的分布及其与感染途径、肝功能等因素的关系。方法按照Simmonds分型方法对505例不同感染途径的丙型肝炎患者进行HCV基因型分型检测并分析年龄、性别、肝功能以及感染途径与HCV基因型的关系。结果505例HCV RNA阳性标本中,1a型8例(1.6%),1b型348例(68.9%),2型24例(4.8%),3型40例(7.9%),1b/2混合型67例(13.3%),1b/3混合型13例(2.6%),2/3混合型4例(0.8%),1a/1b混合型1例(0.2%);不同性别、年龄人群HCV基因型分布差异均无统计学意义(均P>0.05);不同HCV基因型人群丙氨酸转氨酶(ALT)、天门冬氨酸转氨酶(AST)水平差异均无统计学意义(均P>0.05);不同感染途径人群HCV基因型分布差异有统计学意义(χ2 =73.348,P <0.001)。结论江苏省丙型肝炎患者HCV基因型以1b型为主,1a型HCV在中国大陆地区流行程度有扩大趋势,HCV基因型与其感染途径有关。  相似文献   

20.
Imaging technology and machine learning algorithms for disease classification set the stage for high-throughput phenotyping and promising new avenues for genome-wide association studies (GWAS). Despite emerging algorithms, there has been no successful application in GWAS so far. We establish machine learning-based phenotyping in genetic association analysis as misclassification problem. To evaluate chances and challenges, we performed a GWAS based on automatically classified age-related macular degeneration (AMD) in UK Biobank (images from 135,500 eyes; 68,400 persons). We quantified misclassification of automatically derived AMD in internal validation data (4,001 eyes; 2,013 persons) and developed a maximum likelihood approach (MLA) to account for it when estimating genetic association. We demonstrate that our MLA guards against bias and artifacts in simulation studies. By combining a GWAS on automatically derived AMD and our MLA in UK Biobank data, we were able to dissect true association (ARMS2/HTRA1, CFH) from artifacts (near HERC2) and identified eye color as associated with the misclassification. On this example, we provide a proof-of-concept that a GWAS using machine learning-derived disease classification yields relevant results and that misclassification needs to be considered in analysis. These findings generalize to other phenotypes and emphasize the utility of genetic data for understanding misclassification structure of machine learning algorithms.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号