首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Propensity-score methods are increasingly being used to reduce the impact of treatment-selection bias in the estimation of treatment effects using observational data. Commonly used propensity-score methods include covariate adjustment using the propensity score, stratification on the propensity score, and propensity-score matching. Empirical and theoretical research has demonstrated that matching on the propensity score eliminates a greater proportion of baseline differences between treated and untreated subjects than does stratification on the propensity score. However, the analysis of propensity-score-matched samples requires statistical methods appropriate for matched-pairs data. We critically evaluated 47 articles that were published between 1996 and 2003 in the medical literature and that employed propensity-score matching. We found that only two of the articles reported the balance of baseline characteristics between treated and untreated subjects in the matched sample and used correct statistical methods to assess the degree of imbalance. Thirteen (28 per cent) of the articles explicitly used statistical methods appropriate for the analysis of matched data when estimating the treatment effect and its statistical significance. Common errors included using the log-rank test to compare Kaplan-Meier survival curves in the matched sample, using Cox regression, logistic regression, chi-squared tests, t-tests, and Wilcoxon rank sum tests in the matched sample, thereby failing to account for the matched nature of the data. We provide guidelines for the analysis and reporting of studies that employ propensity-score matching.  相似文献   

2.
Propensity‐score matching is frequently used to estimate the effect of treatments, exposures, and interventions when using observational data. An important issue when using propensity‐score matching is how to estimate the standard error of the estimated treatment effect. Accurate variance estimation permits construction of confidence intervals that have the advertised coverage rates and tests of statistical significance that have the correct type I error rates. There is disagreement in the literature as to how standard errors should be estimated. The bootstrap is a commonly used resampling method that permits estimation of the sampling variability of estimated parameters. Bootstrap methods are rarely used in conjunction with propensity‐score matching. We propose two different bootstrap methods for use when using propensity‐score matching without replacementand examined their performance with a series of Monte Carlo simulations. The first method involved drawing bootstrap samples from the matched pairs in the propensity‐score‐matched sample. The second method involved drawing bootstrap samples from the original sample and estimating the propensity score separately in each bootstrap sample and creating a matched sample within each of these bootstrap samples. The former approach was found to result in estimates of the standard error that were closer to the empirical standard deviation of the sampling distribution of estimated effects. © 2014 The Authors. Statistics in Medicine Published by John Wiley & Sons, Ltd.  相似文献   

3.
Propensity-score matching is a popular analytic method to remove the effects of confounding due to measured baseline covariates when using observational data to estimate the effects of treatment. Time-to-event outcomes are common in medical research. Competing risks are outcomes whose occurrence precludes the occurrence of the primary time-to-event outcome of interest. All non-fatal outcomes and all cause-specific mortality outcomes are potentially subject to competing risks. There is a paucity of guidance on the conduct of propensity-score matching in the presence of competing risks. We describe how both relative and absolute measures of treatment effect can be obtained when using propensity-score matching with competing risks data. Estimates of the relative effect of treatment can be obtained by using cause-specific hazard models in the matched sample. Estimates of absolute treatment effects can be obtained by comparing cumulative incidence functions (CIFs) between matched treated and matched control subjects. We conducted a series of Monte Carlo simulations to compare the empirical type I error rate of different statistical methods for testing the equality of CIFs estimated in the matched sample. We also examined the performance of different methods to estimate the marginal subdistribution hazard ratio. We recommend that a marginal subdistribution hazard model that accounts for the within-pair clustering of outcomes be used to test the equality of CIFs and to estimate subdistribution hazard ratios. We illustrate the described methods by using data on patients discharged from hospital with acute myocardial infarction to estimate the effect of discharge prescribing of statins on cardiovascular death.  相似文献   

4.
Propensity-score matching has been used widely in observational studies to balance confounders across treatment groups. However, whether matched-pairs analyses should be used as a primary approach is still in debate. We compared the statistical power and type 1 error rate for four commonly used methods of analyzing propensity-score–matched samples with continuous outcomes: (1) an unadjusted mixed-effects model, (2) an unadjusted generalized estimating method, (3) simple linear regression, and (4) multiple linear regression. Multiple linear regression had the highest statistical power among the four competing methods. We also found that the degree of intraclass correlation within matched pairs depends on the dissimilarity between the coefficient vectors of confounders in the outcome and treatment models. Multiple linear regression is superior to the unadjusted matched-pairs analyses for propensity-score–matched data.  相似文献   

5.
Many observational studies estimate causal effects using methods based on matching on the propensity score. Full matching on the propensity score is an effective and flexible method for utilizing all available data and for creating well‐balanced treatment and control groups. An important component of the full matching algorithm is the decision about whether to impose a restriction on the maximum ratio of controls matched to each treated subject. Despite the possible effect of this restriction on subsequent inferences, this issue has not been examined. We used a series of Monte Carlo simulations to evaluate the effect of imposing a restriction on the maximum ratio of controls matched to each treated subject when estimating risk differences. We considered full matching both with and without a caliper restriction. When using full matching with a caliper restriction, the imposition of a subsequent constraint on the maximum ratio of the number of controls matched to each treated subject had no effect on the quality of inferences. However, when using full matching without a caliper restriction, the imposition of a constraint on the maximum ratio of the number of controls matched to each treated subject tended to result in an increase in bias in the estimated risk difference. However, this increase in bias tended to be accompanied by a corresponding decrease in the sampling variability of the estimated risk difference. We illustrate the consequences of these restrictions using observational data to estimate the effect of medication prescribing on survival following hospitalization for a heart attack.  相似文献   

6.
There is an increasing interest in the use of propensity score methods to estimate causal effects in observational studies. However, recent systematic reviews have demonstrated that propensity score methods are inconsistently used and frequently poorly applied in the medical literature. In this study, we compared the following propensity score methods for estimating the reduction in all-cause mortality due to statin therapy for patients hospitalized with acute myocardial infarction: propensity-score matching, stratification using the propensity score, covariate adjustment using the propensity score, and weighting using the propensity score. We used propensity score methods to estimate both adjusted treated effects and the absolute and relative risk reduction in all-cause mortality. We also examined the use of statistical hypothesis testing, standardized differences, box plots, non-parametric density estimates, and quantile-quantile plots to assess residual confounding that remained after stratification or matching on the propensity score. Estimates of the absolute reduction in 3-year mortality ranged from 2.1 to 4.5 per cent, while estimates of the relative risk reduction ranged from 13.3 to 17.0 per cent. Adjusted estimates of the reduction in the odds of 3-year death varied from 15 to 24 per cent across the different propensity score methods.  相似文献   

7.
Propensity score methods are increasingly being used to estimate the effects of treatments on health outcomes using observational data. There are four methods for using the propensity score to estimate treatment effects: covariate adjustment using the propensity score, stratification on the propensity score, propensity‐score matching, and inverse probability of treatment weighting (IPTW) using the propensity score. When outcomes are binary, the effect of treatment on the outcome can be described using odds ratios, relative risks, risk differences, or the number needed to treat. Several clinical commentators suggested that risk differences and numbers needed to treat are more meaningful for clinical decision making than are odds ratios or relative risks. However, there is a paucity of information about the relative performance of the different propensity‐score methods for estimating risk differences. We conducted a series of Monte Carlo simulations to examine this issue. We examined bias, variance estimation, coverage of confidence intervals, mean‐squared error (MSE), and type I error rates. A doubly robust version of IPTW had superior performance compared with the other propensity‐score methods. It resulted in unbiased estimation of risk differences, treatment effects with the lowest standard errors, confidence intervals with the correct coverage rates, and correct type I error rates. Stratification, matching on the propensity score, and covariate adjustment using the propensity score resulted in minor to modest bias in estimating risk differences. Estimators based on IPTW had lower MSE compared with other propensity‐score methods. Differences between IPTW and propensity‐score matching may reflect that these two methods estimate the average treatment effect and the average treatment effect for the treated, respectively. Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

8.
OBJECTIVES: The propensity score is the probability of treatment conditional on observed variables. Conditioning on the propensity-score results in unbiased estimation of the expected difference in observed responses to two treatments. The performance of propensity-score methods for estimating relative risks has not been studied. STUDY DESIGN AND SETTING: Monte Carlo simulations were used to assess the performance of matching, stratification, and covariate adjustment using the propensity score to estimate relative risks. RESULTS: Matching on the propensity score and stratification on the quintiles of the propensity score resulted in estimates of relative risk with similar mean squared error (MSE). Propensity-score matching resulted in estimates with less bias, whereas stratification on the propensity score resulted in estimates of with greater precision. Including only variables associated with the outcome or including only the true confounders in the propensity-score model resulted in estimates with lower MSE than did including all variables associated with treatment or all measured variables in the propensity-score model. CONCLUSIONS: When estimating relative risks, propensity-score matching resulted in estimates with less bias than did stratification on the quintiles of the propensity score, but stratification on the quintiles of the propensity score resulted in estimates with greater precision.  相似文献   

9.
In causal studies without random assignment of treatment, causal effects can be estimated using matched treated and control samples, where matches are obtained using estimated propensity scores. Propensity score matching can reduce bias in treatment effect estimators in cases where the matched samples have overlapping covariate distributions. Despite its application in many applied problems, there is no universally employed approach to interval estimation when using propensity score matching. In this article, we present and evaluate approaches to interval estimation when using propensity score matching.  相似文献   

10.
This paper considers a model for the difference of two proportions in a paired or matched design of clinical trials, case-control studies and also sensitivity comparison studies of two laboratory tests. This model includes a parameter indicating both interpatient variability of response probabilities and their correlation. Under the proposed model, we derive a one-sided test for equivalence based upon the efficient score. Equivalence is defined here as not more than 100Δ per cent inferior. McNemar's test for significance is shown to be a special case of the proposed test. Further, a score-based confidence interval for the difference of two proportions is derived. One of the features of these methods is applicability to the 2×2 table with off-diagonal zero cells; all the McNemar type tests and confidence intervals published so far cannot apply to such data. A Monte Carlo simulation study shows that the proposed test has empirical significance levels closer to the nominal α-level than the other tests recently proposed and further that the proposed confidence interval has better empirical coverage probability than those of the four published methods. © 1998 John Wiley & Sons, Ltd.  相似文献   

11.
Nonrandomized studies of treatments from electronic healthcare databases are critical for producing the evidence necessary to making informed treatment decisions, but often rely on comparing rates of events observed in a small number of patients. In addition, studies constructed from electronic healthcare databases, for example, administrative claims data, often adjust for many, possibly hundreds, of potential confounders. Despite the importance of maximizing efficiency when there are many confounders and few observed outcome events, there has been relatively little research on the relative performance of different propensity score methods in this context. In this paper, we compare a wide variety of propensity‐based estimators of the marginal relative risk. In contrast to prior research that has focused on specific statistical methods in isolation of other analytic choices, we instead consider a method to be defined by the complete multistep process from propensity score modeling to final treatment effect estimation. Propensity score model estimation methods considered include ordinary logistic regression, Bayesian logistic regression, lasso, and boosted regression trees. Methods for utilizing the propensity score include pair matching, full matching, decile strata, fine strata, regression adjustment using one or two nonlinear splines, inverse propensity weighting, and matching weights. We evaluate methods via a ‘plasmode’ simulation study, which creates simulated datasets on the basis of a real cohort study of two treatments constructed from administrative claims data. Our results suggest that regression adjustment and matching weights, regardless of the propensity score model estimation method, provide lower bias and mean squared error in the context of rare binary outcomes. Copyright © 2017 John Wiley & Sons, Ltd.  相似文献   

12.
Saeki H  Tango T 《Statistics in medicine》2011,30(28):3313-3327
The efficacy of diagnostic procedures is generally evaluated on the basis of the results from multiple raters. However, there are few adequate methods of performing non-inferiority tests with confidence intervals to compare the accuracies (sensitivities or specificities) when multiple raters are considered. We propose new statistical methods for comparing the accuracies of two diagnostic procedures in a non-inferiority trial, on the basis of the results from multiple independent raters who are also independent of the study centers. We consider a study design in which each patient is subjected to two diagnostic procedures and all images are read by all raters. By assuming a multinomial distribution for matched-pair categorical data arising from the study design, we derive a score-based full menu, that is, a non-inferiority test, confidence interval and sample size formula, for inference of the difference in correlated proportions between the two diagnostic procedures. We conduct Monte Carlo simulation studies to examine the validity of the proposed methods, which showed that the proposed test has a size closer to the nominal significance level than a Wald-type test and that the proposed confidence interval has better empirical coverage probability than a Wald-type confidence interval. We illustrate the proposed methods with data from a study of diagnostic procedures for the diagnosis of oesophageal carcinoma infiltrating the tracheobronchial tree.  相似文献   

13.
The propensity score--the probability of exposure to a specific treatment conditional on observed variables--is increasingly being used in observational studies. Creating strata in which subjects are matched on the propensity score allows one to balance measured variables between treated and untreated subjects. There is an ongoing controversy in the literature as to which variables to include in the propensity score model. Some advocate including those variables that predict treatment assignment, while others suggest including all variables potentially related to the outcome, and still others advocate including only variables that are associated with both treatment and outcome. We provide a case study of the association between drug exposure and mortality to show that including a variable that is related to treatment, but not outcome, does not improve balance and reduces the number of matched pairs available for analysis. In order to investigate this issue more comprehensively, we conducted a series of Monte Carlo simulations of the performance of propensity score models that contained variables related to treatment allocation, or variables that were confounders for the treatment-outcome pair, or variables related to outcome or all variables related to either outcome or treatment or neither. We compared the use of these different propensity scores models in matching and stratification in terms of the extent to which they balanced variables. We demonstrated that all propensity scores models balanced measured confounders between treated and untreated subjects in a propensity-score matched sample. However, including only the true confounders or the variables predictive of the outcome in the propensity score model resulted in a substantially larger number of matched pairs than did using the treatment-allocation model. Stratifying on the quintiles of any propensity score model resulted in residual imbalance between treated and untreated subjects in the upper and lower quintiles. Greater balance between treated and untreated subjects was obtained after matching on the propensity score than after stratifying on the quintiles of the propensity score. When a confounding variable was omitted from any of the propensity score models, then matching or stratifying on the propensity score resulted in residual imbalance in prognostically important variables between treated and untreated subjects. We considered four propensity score models for estimating treatment effects: the model that included only true confounders; the model that included all variables associated with the outcome; the model that included all measured variables; and the model that included all variables associated with treatment selection. Reduction in bias when estimating a null treatment effect was equivalent for all four propensity score models when propensity score matching was used. Reduction in bias was marginally greater for the first two propensity score models than for the last two propensity score models when stratification on the quintiles of the propensity score model was employed. Furthermore, omitting a confounding variable from the propensity score model resulted in biased estimation of the treatment effect. Finally, the mean squared error for estimating a null treatment effect was lower when either of the first two propensity scores was used compared to when either of the last two propensity score models was used.  相似文献   

14.
A stratified matched‐pair study is often designed for adjusting a confounding effect or effect of different trails/centers/ groups in modern medical studies. The relative risk is one of the most frequently used indices in comparing efficiency of two treatments in clinical trials. In this paper, we propose seven confidence interval estimators for the common relative risk and three simultaneous confidence interval estimators for the relative risks in stratified matched‐pair designs. The performance of the proposed methods is evaluated with respect to their type I error rates, powers, coverage probabilities, and expected widths. Our empirical results show that the percentile bootstrap confidence interval and bootstrap‐resampling‐based Bonferroni simultaneous confidence interval behave satisfactorily for small to large sample sizes in the sense that (i) their empirical coverage probabilities can be well controlled around the pre‐specified nominal confidence level with reasonably shorter confidence widths; and (ii) the empirical type I error rates of their associated test statistics are generally closer to the pre‐specified nominal level with larger powers. They are hence recommended. Two real examples from clinical laboratory studies are used to illustrate the proposed methodologies. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

15.
Paired dichotomous data may arise in clinical trials such as pre-/post-test comparison studies and equivalence trials. Reporting parameter estimates (e.g. odds ratio, rate difference and rate ratio) along with their associated confidence interval estimates becomes a necessity in many medical journals. Various asymptotic confidence interval estimators have long been developed for differences in correlated binary proportions. Nevertheless, the performance of these asymptotic methods may have poor coverage properties in small samples. In this article, we investigate several alternative confidence interval estimators for the difference between binomial proportions based on small-sample paired data. Specifically, we consider exact and approximate unconditional confidence intervals for rate difference via inverting a score test. The exact unconditional confidence interval guarantees the coverage probability, and it is recommended if strict control of coverage probability is required. However, the exact method tends to be overly conservative and computationally demanding. Our empirical results show that the approximate unconditional score confidence interval estimators based on inverting the score test demonstrate reasonably good coverage properties even in small-sample designs, and yet they are relatively easy to implement computationally. We illustrate the methods using real examples from a pain management study and a cancer study.  相似文献   

16.
Propensity‐score matching is increasingly being used to reduce the confounding that can occur in observational studies examining the effects of treatments or interventions on outcomes. We used Monte Carlo simulations to examine the following algorithms for forming matched pairs of treated and untreated subjects: optimal matching, greedy nearest neighbor matching without replacement, and greedy nearest neighbor matching without replacement within specified caliper widths. For each of the latter two algorithms, we examined four different sub‐algorithms defined by the order in which treated subjects were selected for matching to an untreated subject: lowest to highest propensity score, highest to lowest propensity score, best match first, and random order. We also examined matching with replacement. We found that (i) nearest neighbor matching induced the same balance in baseline covariates as did optimal matching; (ii) when at least some of the covariates were continuous, caliper matching tended to induce balance on baseline covariates that was at least as good as the other algorithms; (iii) caliper matching tended to result in estimates of treatment effect with less bias compared with optimal and nearest neighbor matching; (iv) optimal and nearest neighbor matching resulted in estimates of treatment effect with negligibly less variability than did caliper matching; (v) caliper matching had amongst the best performance when assessed using mean squared error; (vi) the order in which treated subjects were selected for matching had at most a modest effect on estimation; and (vii) matching with replacement did not have superior performance compared with caliper matching without replacement. © 2013 The Authors. Statistics in Medicine published by John Wiley & Sons, Ltd.  相似文献   

17.
Onur Baser  PhD 《Value in health》2006,9(6):377-385
OBJECTIVE: A large number of possible techniques are available when conducting matching procedures, yet coherent guidelines for selecting the most appropriate application do not yet exist. In this article we evaluate several matching techniques and provide a suggested guideline for selecting the best technique. METHODS: The main purpose of a matching procedure is to reduce selection bias by increasing the balance between the treatment and control groups. The following approach, consisting of five quantifiable steps, is proposed to check for balance: 1) Using two sample t-statistics to compare the means of the treatment and control groups for each explanatory variable; 2) Comparing the mean difference as a percentage of the average standard deviations; 3) Comparing percent reduction of bias in the means of the explanatory variables before and after matching; 4) Comparing treatment and control density estimates for the explanatory variables; and 5) Comparing the density estimates of the propensity scores of the control units with those of the treated units. We investigated seven different matching techniques and how they performed with regard to proposed five steps. Moreover, we estimate the average treatment effect with multivariate analysis and compared the results with the estimates of propensity score matching techniques. The Medstat MarketScan Data Base provided data for use in empirical examples of the utility of several matching methods. We conducted nearest neighborhood matching (NNM) analyses in seven ways: replacement, 2 to 1 matching, Mahalanobis matching (MM), MM with caliper, kernel matching, radius matching, and the stratification method. RESULTS: Comparing techniques according to the above criteria revealed that the choice of matching has significant effects on outcomes. Patients with asthma are compared with patients without asthma and cost of illness ranged from 2040 dollars to 4463 dollars depending on the type of matching. After matching, we looked at the insignificant differences or larger P-values in the mean values (criterion 1); low mean differences as a percentage of the average standard deviation (criterion 2); 100% reduction bias in the means of explanatory variables (criterion 3); and insignificant differences when comparing the density estimates of the treatment and control groups (criterion 4 and criterion 5). Mahalanobis matching with caliber yielded the better results according all five criteria (Mean = 4463 dollars, SD = 3252 dollars). We also applied multivariate analysis over the matched sample. This decreased the deviation in cost of illness estimates more than threefold (Mean = 4456 dollars, SD = 996 dollars). CONCLUSION: Sensitivity analysis of the matching techniques is especially important because none of the proposed methods in the literature is a priori superior to the others. The suggested joint consideration of propensity score matching and multivariate analysis offers an approach to assessing the robustness of the estimates.  相似文献   

18.
The propensity score is a subject's probability of treatment, conditional on observed baseline covariates. Conditional on the true propensity score, treated and untreated subjects have similar distributions of observed baseline covariates. Propensity‐score matching is a popular method of using the propensity score in the medical literature. Using this approach, matched sets of treated and untreated subjects with similar values of the propensity score are formed. Inferences about treatment effect made using propensity‐score matching are valid only if, in the matched sample, treated and untreated subjects have similar distributions of measured baseline covariates. In this paper we discuss the following methods for assessing whether the propensity score model has been correctly specified: comparing means and prevalences of baseline characteristics using standardized differences; ratios comparing the variance of continuous covariates between treated and untreated subjects; comparison of higher order moments and interactions; five‐number summaries; and graphical methods such as quantile–quantile plots, side‐by‐side boxplots, and non‐parametric density plots for comparing the distribution of baseline covariates between treatment groups. We describe methods to determine the sampling distribution of the standardized difference when the true standardized difference is equal to zero, thereby allowing one to determine the range of standardized differences that are plausible with the propensity score model having been correctly specified. We highlight the limitations of some previously used methods for assessing the adequacy of the specification of the propensity‐score model. In particular, methods based on comparing the distribution of the estimated propensity score between treated and untreated subjects are uninformative. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

19.
Cross‐country comparisons of costs and quality between hospitals are often made at the macro level. The goal of this study was to explore methods to compare micro‐level data from hospitals in different health care systems. To do so, we developed a multi‐level framework in combination with a propensity score matching technique using similarly structured data for patients receiving treatment for acute myocardial infarction in German and US Veterans Health Administration hospitals. Our case study shows important differences in results between multi‐level regressions based on matched and unmatched samples. We conclude that propensity score matching techniques are an appropriate way to deal with the usual baseline imbalances across the samples from different countries. Multi‐level models are recommendable to consider the clustered structure of the data when patient‐level data from different hospitals and health care systems are compared. The results provide an important justification for exploring new ways in performing health system comparisons. Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

20.
The estimation of a confidence interval for attributable risk from the logistic model based on data from case-control studies is a problem for which an accepted solution is lacking. Two methods, one based on the delta method and one bootstrap on the population base, have been described but their accuracy has not been compared. We present two other methods, one based on a jack-knife approach and the other using a bootstrap on two samples (cases and controls). The four methods are compared in a simulation study. The four methods are also applied to a case-control study on risk factors for preterm delivery; the confidence intervals are obtained assuming normality and by logarithmic transformation. When attributable risk is not smooth (for example, when exposure prevalence is low) both the jack-knife and the delta method tend to fail. If attributable risk is close to zero or one, normality cannot be assumed and log-transformed confidence intervals must be used. Finally, the extension to matched studies is analysed using a case-control study on risk factors of cutaneous malignant melanoma. In this situation, the population-based bootstrap is not available.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号