首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The propensity score is a subject's probability of treatment, conditional on observed baseline covariates. Conditional on the true propensity score, treated and untreated subjects have similar distributions of observed baseline covariates. Propensity‐score matching is a popular method of using the propensity score in the medical literature. Using this approach, matched sets of treated and untreated subjects with similar values of the propensity score are formed. Inferences about treatment effect made using propensity‐score matching are valid only if, in the matched sample, treated and untreated subjects have similar distributions of measured baseline covariates. In this paper we discuss the following methods for assessing whether the propensity score model has been correctly specified: comparing means and prevalences of baseline characteristics using standardized differences; ratios comparing the variance of continuous covariates between treated and untreated subjects; comparison of higher order moments and interactions; five‐number summaries; and graphical methods such as quantile–quantile plots, side‐by‐side boxplots, and non‐parametric density plots for comparing the distribution of baseline covariates between treatment groups. We describe methods to determine the sampling distribution of the standardized difference when the true standardized difference is equal to zero, thereby allowing one to determine the range of standardized differences that are plausible with the propensity score model having been correctly specified. We highlight the limitations of some previously used methods for assessing the adequacy of the specification of the propensity‐score model. In particular, methods based on comparing the distribution of the estimated propensity score between treated and untreated subjects are uninformative. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

2.
Propensity scores have been used widely as a bias reduction method to estimate the treatment effect in nonrandomized studies. Since many covariates are generally included in the model for estimating the propensity scores, the proportion of subjects with at least one missing covariate could be large. While many methods have been proposed for propensity score‐based estimation in the presence of missing covariates, little has been published comparing the performance of these methods. In this article we propose a novel method called multiple imputation missingness pattern (MIMP) and compare it with the naive estimator (ignoring propensity score) and three commonly used methods of handling missing covariates in propensity score‐based estimation (separate estimation of propensity scores within each pattern of missing data, multiple imputation and discarding missing data) under different mechanisms of missing data and degree of correlation among covariates. Simulation shows that all adjusted estimators are much less biased than the naive estimator. Under certain conditions MIMP provides benefits (smaller bias and mean‐squared error) compared with existing alternatives. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

3.
Genetic association studies often collect data on multiple traits that are correlated. Discovery of genetic variants influencing multiple traits can lead to better understanding of the etiology of complex human diseases. Conventional univariate association tests may miss variants that have weak or moderate effects on individual traits. We propose several multivariate test statistics to complement univariate tests. Our framework covers both studies of unrelated individuals and family studies and allows any type/mixture of traits. We relate the marginal distributions of multivariate traits to genetic variants and covariates through generalized linear models without modeling the dependence among the traits or family members. We construct score‐type statistics, which are computationally fast and numerically stable even in the presence of covariates and which can be combined efficiently across studies with different designs and arbitrary patterns of missing data. We compare the power of the test statistics both theoretically and empirically. We provide a strategy to determine genome‐wide significance that properly accounts for the linkage disequilibrium (LD) of genetic variants. The application of the new methods to the meta‐analysis of five major cardiovascular cohort studies identifies a new locus (HSCB) that is pleiotropic for the four traits analyzed.  相似文献   

4.
Funnel plots are widely used to visualize grouped data, for example, in institutional comparison. This paper extends the concept to a multi‐level setting, displaying one level at a time, adjusted for the other levels, as well as for covariates at all levels. These level‐adjusted funnel plots are based on a Markov chain Monte Carlo fit of a random effects model, translating the estimated model parameters to predicted marginal expectations. Working within the estimation framework, we accommodate outlying institutions using heavy‐tailed random effects distributions. We also develop computer‐efficient methods to compute predicted probabilities in the case of dichotomous outcome data and various random effect distributions. We apply the method to a data set on prophylactic antibiotics in gallstone surgery. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

5.
We assess the asymptotic bias of estimates of exposure effects conditional on covariates when summary scores of confounders, instead of the confounders themselves, are used to analyze observational data. First, we study regression models for cohort data that are adjusted for summary scores. Second, we derive the asymptotic bias for case‐control studies when cases and controls are matched on a summary score, and then analyzed either using conditional logistic regression or by unconditional logistic regression adjusted for the summary score. Two scores, the propensity score (PS) and the disease risk score (DRS) are studied in detail. For cohort analysis, when regression models are adjusted for the PS, the estimated conditional treatment effect is unbiased only for linear models, or at the null for non‐linear models. Adjustment of cohort data for DRS yields unbiased estimates only for linear regression; all other estimates of exposure effects are biased. Matching cases and controls on DRS and analyzing them using conditional logistic regression yields unbiased estimates of exposure effect, whereas adjusting for the DRS in unconditional logistic regression yields biased estimates, even under the null hypothesis of no association. Matching cases and controls on the PS yield unbiased estimates only under the null for both conditional and unconditional logistic regression, adjusted for the PS. We study the bias for various confounding scenarios and compare our asymptotic results with those from simulations with limited sample sizes. To create realistic correlations among multiple confounders, we also based simulations on a real dataset. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

6.
Meta‐analysis is now an essential tool for genetic association studies, allowing them to combine large studies and greatly accelerating the pace of genetic discovery. Although the standard meta‐analysis methods perform equivalently as the more cumbersome joint analysis under ideal settings, they result in substantial power loss under unbalanced settings with various case–control ratios. Here, we investigate the power loss problem by the standard meta‐analysis methods for unbalanced studies, and further propose novel meta‐analysis methods performing equivalently to the joint analysis under both balanced and unbalanced settings. We derive improved meta‐score‐statistics that can accurately approximate the joint‐score‐statistics with combined individual‐level data, for both linear and logistic regression models, with and without covariates. In addition, we propose a novel approach to adjust for population stratification by correcting for known population structures through minor allele frequencies. In the simulated gene‐level association studies under unbalanced settings, our method recovered up to 85% power loss caused by the standard methods. We further showed the power gain of our methods in gene‐level tests with 26 unbalanced studies of age‐related macular degeneration . In addition, we took the meta‐analysis of three unbalanced studies of type 2 diabetes as an example to discuss the challenges of meta‐analyzing multi‐ethnic samples. In summary, our improved meta‐score‐statistics with corrections for population stratification can be used to construct both single‐variant and gene‐level association studies, providing a useful framework for ensuring well‐powered, convenient, cross‐study analyses.  相似文献   

7.
Recently, there has been increased interest in evaluating extended haplotypes in p53 as risk factors for cancer. An allele-specific polymerase chain reaction (PCR) method, confirmed by restriction analysis, has been used to determine absolute extended haplotypes in diploid genomes. We describe statistical analyses for comparing cases and controls, or comparing different ethnic groups with respect to haplotypes composed of several biallelic loci, especially in the presence of other covariates. Tests based on cross-tabulating all possible genotypes by disease state can have limited power due to the large number of possible genotypes. Tests based simply on cross-tabulating all possible haplotypes by disease state cannot be extended to account for other variables measured on the individual. We propose imposing an assumption of additivity upon the haplotype-based analysis. This yields a logistic regression in which the outcome is case or control, and the predictor variables include the number of copies (0,1, or 2) of each haplotype, as well as other explanatory variables. In a case-control study, the model can be constructed so that each coefficient gives the log odds ratio for disease for an individual with a single copy of the suspect haplotype and another copy of the most common haplotype, relative to an individual with two copies of the most common haplotype. We illustrate the method with published data on p53 and breast cancer. The method can also be applied to any polymorphic system, whether multiple alleles at a single locus or multiple haplotypes over several loci. Genet. Epidemiol. 15:173–181,1998. © 1998 Wiley-Liss, Inc.  相似文献   

8.
A novel semiparametric regression model is developed for evaluating the covariate‐specific accuracy of a continuous medical test or biomarker. Ideally, studies designed to estimate or compare medical test accuracy will use a separate, flawless gold‐standard procedure to determine the true disease status of sampled individuals. We treat this as a special case of the more complicated and increasingly common scenario in which disease status is unknown because a gold‐standard procedure does not exist or is too costly or invasive for widespread use. To compensate for missing data on disease status, covariate information is used to discriminate between diseased and healthy units. We thus model the probability of disease as a function of ‘disease covariates’. In addition, we model test/biomarker outcome data to depend on ‘test covariates’, which provides researchers the opportunity to quantify the impact of covariates on the accuracy of a medical test. We further model the distributions of test outcomes using flexible semiparametric classes. An important new theoretical result demonstrating model identifiability under mild conditions is presented. The modeling framework can be used to obtain inferences about covariate‐specific test accuracy and the probability of disease based on subject‐specific disease and test covariate information. The value of the model is illustrated using multiple simulation studies and data on the age‐adjusted ability of soluble epidermal growth factor receptor – a ubiquitous serum protein – to serve as a biomarker of lung cancer in men. SAS code for fitting the model is provided. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

9.
The change in area under the curve (?AUC), the integrated discrimination improvement (IDI), and net reclassification index (NRI) are commonly used measures of risk prediction model performance. Some authors have reported good validity of associated methods of estimating their standard errors (SE) and construction of confidence intervals, whereas others have questioned their performance. To address these issues, we unite the ?AUC, IDI, and three versions of the NRI under the umbrella of the U‐statistics family. We rigorously show that the asymptotic behavior of ?AUC, NRIs, and IDI fits the asymptotic distribution theory developed for U‐statistics. We prove that the ?AUC, NRIs, and IDI are asymptotically normal, unless they compare nested models under the null hypothesis. In the latter case, asymptotic normality and existing SE estimates cannot be applied to ?AUC, NRIs, or IDI. In the former case, SE formulas proposed in the literature are equivalent to SE formulas obtained from U‐statistics theory if we ignore adjustment for estimated parameters. We use Sukhatme–Randles–deWet condition to determine when adjustment for estimated parameters is necessary. We show that adjustment is not necessary for SEs of the ?AUC and two versions of the NRI when added predictor variables are significant and normally distributed. The SEs of the IDI and three‐category NRI should always be adjusted for estimated parameters. These results allow us to define when existing formulas for SE estimates can be used and when resampling methods such as the bootstrap should be used instead when comparing nested models. We also use the U‐statistic theory to develop a new SE estimate of ?AUC. Copyright © 2017 John Wiley & Sons, Ltd.  相似文献   

10.
Multiple linear regression is commonly used to test for association between genetic variants and continuous traits and estimate genetic effect sizes. Confounding variables are controlled for by including them as additional covariates. An alternative technique that is increasingly used is to regress out covariates from the raw trait and then perform regression analysis with only the genetic variants included as predictors. In the case of single-variant analysis, this adjusted trait regression (ATR) technique is known to be less powerful than the traditional technique when the genetic variant is correlated with the covariates We extend previous results for single-variant tests by deriving exact relationships between the single-variant score, Wald, likelihood-ratio, and F test statistics and their ATR analogs. We also derive the asymptotic power of ATR analogs of the multiple-variant score and burden tests. We show that the maximum power loss of the ATR analog of the multiple-variant score test is completely characterized by the canonical correlations between the set of genetic variants and the set of covariates. Further, we show that for both single- and multiple-variant tests, the power loss for ATR analogs increases with increasing stringency of Type 1 error control () and increasing correlation (or canonical correlations) between the genetic variant (or multiple variants) and covariates. We recommend using ATR only when maximum canonical correlation between variants and covariates is low, as is typically true.  相似文献   

11.
The risk‐adjusted Bernoulli cumulative sum (CUSUM) chart developed by Steiner et al. (2000) is an increasingly popular tool for monitoring clinical and surgical performance. In practice, however, the use of a fixed control limit for the chart leads to a quite variable in‐control average run length performance for patient populations with different risk score distributions. To overcome this problem, we determine simulation‐based dynamic probability control limits (DPCLs) patient‐by‐patient for the risk‐adjusted Bernoulli CUSUM charts. By maintaining the probability of a false alarm at a constant level conditional on no false alarm for previous observations, our risk‐adjusted CUSUM charts with DPCLs have consistent in‐control performance at the desired level with approximately geometrically distributed run lengths. Our simulation results demonstrate that our method does not rely on any information or assumptions about the patients' risk distributions. The use of DPCLs for risk‐adjusted Bernoulli CUSUM charts allows each chart to be designed for the corresponding particular sequence of patients for a surgeon or hospital. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

12.
Association tests based on multi-marker haplotypes may be more powerful than those based on single markers. The existing association tests based on multi-marker haplotypes include Pearson's chi2 test which tests for the difference of haplotype distributions in cases and controls, and haplotype-similarity based methods which compare the average similarity among cases with that of the controls. In this article, we propose new association tests based on haplotype similarities. These new tests compare the average similarities within cases and controls with the average similarity between cases and controls. These methods can be applied to either phase-known or phase-unknown data. We compare the performance of the proposed methods with Pearson's chi2 test and the existing similarity-based tests by simulation studies under a variety of scenarios and by analyzing a real data set. The simulation results show that, in most cases, the new proposed methods are more powerful than both Pearson's chi2 test and the existing similarity-based tests. In one extreme case where the disease mutant induced at a very rare haplotype (相似文献   

13.
It is naive and incorrect to use the proportions of successful operations to compare the performance of surgeons because the patients' risk profiles are different. In this paper, we explore the use of risk‐adjusted procedures to compare the performance of surgeons. One such risk‐adjusted statistic is the standardized mortality ratio (SMR), which measures the performance of a surgeon adjusted for the risks of patients assuming the average performance of a group of surgeons. Unlike the traditional SMR which is defined based on a population, this SMR is a random variable. Thus, all existing results for the traditional SMR are not valid unless the sample is large enough to be considered a population. In this paper, we develop two risk‐adjusted procedures for comparing the performance of surgeons. The asymptotic distributions of the test statistics are derived. We also use the bootstrap procedure to estimate finite‐sample distributions. Both probability of type I error and power of these procedures are investigated. Copyright © 2017 John Wiley & Sons, Ltd.  相似文献   

14.
Genome‐wide genetic association studies typically start with univariate statistical tests of each marker. In principle, this single‐SNP scanning is statistically straightforward—the testing is done with standard methods (e.g. χ2 tests, regression) that have been well studied for decades. However, a number of different tests and testing procedures can be used. In a case‐control study, one can use a 1 df allele‐based test, a 1 or 2 df genotype‐based test, or a compound procedure that combines two or more of these statistics. Additionally, most of the tests can be performed with or without covariates included in the model. While there are a number of statistical papers that make power comparisons among subsets of these methods, none has comprehensively tackled the question of which of the methods in common use is best suited to univariate scanning in a genome‐wide association study. In this paper, we consider a wide variety of realistic test procedures, and first compare the power of the different procedures to detect a single locus under different genetic models. We then address the question of whether or when it is a good idea to include covariates in the analysis. We conclude that the most commonly used approach to handle covariates—modeling covariate main effects but not interactions—is almost never a good idea. Finally, we consider the performance of the statistics in a genome scan context. Genet. Epidemiol. 34: 246–253, 2010. © 2009 Wiley‐Liss, Inc.  相似文献   

15.
Missing outcome data is a crucial threat to the validity of treatment effect estimates from randomized trials. The outcome distributions of participants with missing and observed data are often different, which increases bias. Causal inference methods may aid in reducing the bias and improving efficiency by incorporating baseline variables into the analysis. In particular, doubly robust estimators incorporate 2 nuisance parameters: the outcome regression and the missingness mechanism (ie, the probability of missingness conditional on treatment assignment and baseline variables), to adjust for differences in the observed and unobserved groups that can be explained by observed covariates. To consistently estimate the treatment effect, one of these nuisance parameters must be consistently estimated. Traditionally, nuisance parameters are estimated using parametric models, which often precludes consistency, particularly in moderate to high dimensions. Recent research on missing data has focused on data‐adaptive estimation to help achieve consistency, but the large sample properties of such methods are poorly understood. In this article, we discuss a doubly robust estimator that is consistent and asymptotically normal under data‐adaptive estimation of the nuisance parameters. We provide a formula for an asymptotically exact confidence interval under minimal assumptions. We show that our proposed estimator has smaller finite‐sample bias compared to standard doubly robust estimators. We present a simulation study demonstrating the enhanced performance of our estimators in terms of bias, efficiency, and coverage of the confidence intervals. We present the results of an illustrative example: a randomized, double‐blind phase 2/3 trial of antiretroviral therapy in HIV‐infected persons.  相似文献   

16.
Assessing goodness-of-fit in logistic regression models can be problematic, in that commonly used deviance or Pearson chi-square statistics do not have approximate chi-square distributions, under the null hypothesis of no lack of fit, when continuous covariates are modelled. We present two easy to implement test statistics similar to the deviance and Pearson chi-square tests that are appropriate when continuous covariates are present. The methodology uses an approach similar to that incorporated by the Hosmer and Lemeshow goodness-of-fit test in that observations are classified into distinct groups according to fitted probabilities, allowing sufficient cell sizes for chi-square testing. The major difference is that the proposed tests perform this grouping within the cross-classification of all categorical covariates in the model and, in some situations, allow for a more powerful assessment of where model predicted and observed counts may differ. A variety of simulations are performed comparing the proposed tests to the Hosmer-Lemeshow test.  相似文献   

17.
Missing covariates in regression analysis are a pervasive problem in medical, social, and economic researches. We study empirical-likelihood confidence regions for unconstrained and constrained regression parameters in a nonignorable covariate-missing data problem. For an assumed conditional mean regression model, we assume that some covariates are fully observed but other covariates are missing for some subjects. By exploitation of a probability model of missingness and a working conditional score model from a semiparametric perspective, we build a system of unbiased estimating equations, where the number of equations exceeds the number of unknown parameters. Based on the proposed estimating equations, we introduce unconstrained and constrained empirical-likelihood ratio statistics to construct empirical-likelihood confidence regions for the underlying regression parameters without and with constraints. We establish the asymptotic distributions of the proposed empirical-likelihood ratio statistics. Simulation results show that the proposed empirical-likelihood methods have a better finite-sample performance than other competitors in terms of coverage probability and interval length. Finally, we apply the proposed empirical-likelihood methods to the analysis of a data set from the US National Health and Nutrition Examination Survey.  相似文献   

18.
Two important contributors to missing heritability are believed to be rare variants and gene‐environment interaction (GXE). Thus, detecting GXE where G is a rare haplotype variant (rHTV) is a pressing problem. Haplotype analysis is usually the natural second step to follow up on a genomic region that is implicated to be associated through single nucleotide variants (SNV) analysis. Further, rHTV can tag associated rare SNV and provide greater power to detect them than popular collapsing methods. Recently we proposed Logistic Bayesian LASSO (LBL) for detecting rHTV association with case–control data. LBL shrinks the unassociated (especially common) haplotypes toward zero so that an associated rHTV can be identified with greater power. Here, we incorporate environmental factors and their interactions with haplotypes in LBL. As LBL is based on retrospective likelihood, this extension is not trivial. We model the joint distribution of haplotypes and covariates given the case–control status. We apply the approach (LBL‐GXE) to the Michigan, Mayo, AREDS, Pennsylvania Cohort Study on Age‐related Macular Degeneration (AMD). LBL‐GXE detects interaction of a specific rHTV in CFH gene with smoking. To the best of our knowledge, this is the first time in the AMD literature that an interaction of smoking with a specific (rather than pooled) rHTV has been implicated. We also carry out simulations and find that LBL‐GXE has reasonably good powers for detecting interactions with rHTV while keeping the type I error rates well controlled. Thus, we conclude that LBL‐GXE is a useful tool for uncovering missing heritability.  相似文献   

19.
Propensity score matching is often used in observational studies to create treatment and control groups with similar distributions of observed covariates. Typically, propensity scores are estimated using logistic regressions that assume linearity between the logistic link and the predictors. We evaluate the use of generalized additive models (GAMs) for estimating propensity scores. We compare logistic regressions and GAMs in terms of balancing covariates using simulation studies with artificial and genuine data. We find that, when the distributions of covariates in the treatment and control groups overlap sufficiently, using GAMs can improve overall covariate balance, especially for higher-order moments of distributions. When the distributions in the two groups overlap insufficiently, GAM more clearly reveals this fact than logistic regression does. We also demonstrate via simulation that matching with GAMs can result in larger reductions in bias when estimating treatment effects than matching with logistic regression.  相似文献   

20.
By using functional data analysis techniques, we developed generalized functional linear models for testing association between a dichotomous trait and multiple genetic variants in a genetic region while adjusting for covariates. Both fixed and mixed effect models are developed and compared. Extensive simulations show that Rao's efficient score tests of the fixed effect models are very conservative since they generate lower type I errors than nominal levels, and global tests of the mixed effect models generate accurate type I errors. Furthermore, we found that the Rao's efficient score test statistics of the fixed effect models have higher power than the sequence kernel association test (SKAT) and its optimal unified version (SKAT‐O) in most cases when the causal variants are both rare and common. When the causal variants are all rare (i.e., minor allele frequencies less than 0.03), the Rao's efficient score test statistics and the global tests have similar or slightly lower power than SKAT and SKAT‐O. In practice, it is not known whether rare variants or common variants in a gene region are disease related. All we can assume is that a combination of rare and common variants influences disease susceptibility. Thus, the improved performance of our models when the causal variants are both rare and common shows that the proposed models can be very useful in dissecting complex traits. We compare the performance of our methods with SKAT and SKAT‐O on real neural tube defects and Hirschsprung's disease datasets. The Rao's efficient score test statistics and the global tests are more sensitive than SKAT and SKAT‐O in the real data analysis. Our methods can be used in either gene‐disease genome‐wide/exome‐wide association studies or candidate gene analyses.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号