首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
  目的  研究基于bootstrap抽样的期望最大化算法(EMB)的多重填补方法在横断面健康体检定量变量缺失数据的填补效果,为健康体检数据选择恰当的多重填补方法提供相关依据。  方法  基于人群横断面健康体检实测数据,采用EMB法多重填补法,应用R 3.5.0统计软件中的Amelia II程序包对2013年1 — 12 月在陕西省西安市西京医院健康体检中心进行常规体检的1 634名员工的健康体检数据进行多重填补分析。  结果  对于横断面定量健康体检资料,在单变量缺失率分别为 < 10 %、20 %和 70 % 3种随机缺失情况下,EMB多重填补法相对于列表删除法其估计误差均降低;基于相同数据,EMB多重填补次数不同,资料的填补效果不同,本研究资料较为合适的填补次数为m = 10次;填补前后概率密度曲线分布图显示,填补次数m = 10时多重填补值与实际观察值的概率密度曲线图吻合程度较好;变量过拟合诊断图进一步显示,填补次数m = 10时各变量大多数观测值的90 % CI包含了其最佳拟合线,且其可信区间较窄;基于列表删除法和EMB多重填补法处理后的2个不同分析数据集分别构建的多因素回归模型中包含的变量不同。  结论  对于不同缺失率随机缺失的定量变量,EMB多重填补法的填补效果均优于列表删除法;不同缺失资料的最优填补次数不同。  相似文献   

2.
Cholesterol,coronary heart disease,and stroke in the Asia Pacific region   总被引:16,自引:0,他引:16  
BACKGROUND: Cholesterol levels in many Asian countries are rising. Predictions of the likely effects of this on the incidence of cardiovascular diseases have mostly relied on data from Western populations. Whether the associations between total cholesterol and cardiovascular diseases are similar in Asia is not established. METHODS: The Asia Pacific Cohort Studies Collaboration (APCSC) is an individual-participant data meta-analysis of prospective studies from the Asia-Pacific region. Cox models were applied to the combined data from 29 cohorts to estimate the region-, sex-, and age-specific hazard ratios of major cardiovascular diseases by the fifths of total cholesterol. RESULTS: At baseline, the age/sex-adjusted mean value of total cholesterol was higher in Australia and New Zealand (ANZ) (5.52 +/- 1.05 mmol/l) than in Asia (4.87 +/- 1.05 mmol/l). During 2 million person-years of follow-up among 352 033 individuals, 4841 cardiovascular deaths were recorded. The association of total cholesterol with coronary heart disease and stroke was similar in Asian and ANZ cohorts. Overall, each 1-mmol/l higher level of total cholesterol was associated with 35% (95% CI: 26-44%) increased risk of coronary death, 25% (95% CI: 13-40%) increased risk of fatal or non-fatal ischaemic stroke, and 20% (95% CI: 8-30%) decreased risk of fatal haemorrhagic stroke. CONCLUSIONS: In both Asian and non-Asian populations in the Asia-Pacific region, total cholesterol is similarly strongly associated with the risk of CHD and ischaemic, but not haemorrhagic, stroke. Rising population-wide levels of cholesterol would be expected to contribute to a substantial increase in the overall burden of cardiovascular diseases in this region.  相似文献   

3.
Cigarette smoking is becoming increasingly common in Asia while quitting remains rare, in part because of a lack of knowledge about the risks of smoking. This study compared the risk of death from lung cancer associated with smoking habits in Australia and New Zealand and in Asia by using data from the Asia Pacific Cohort Studies Collaboration: 31 studies involving 480,125 individuals. Cox regression models were used. The hazard ratios for lung cancer mortality associated with current smoking were, for men, 2.48 (95% confidence interval (CI): 1.99, 3.11) in Asia versus 9.87 (95% CI: 6.04, 16.12) in Australia and New Zealand; p for homogeneity <0.0001. For women, the corresponding estimates were 2.35 (95% CI: 1.29, 4.28) in Asia versus 19.33 (95% CI: 10.0, 37.3) in Australia and New Zealand; p for homogeneity <0.0001. Quitting was beneficial in both regions; the hazard ratios for former compared with current smokers were 0.69 (95% CI: 0.53, 0.92) in Asia and 0.30 (95% CI: 0.22, 0.41) in Australia and New Zealand. The lesser effect in Asia was partly explained by the fewer number of cigarettes smoked and the shorter duration of follow-up in Asian studies. These results suggest that tobacco control policies in Asia should not solely concentrate on preventing the uptake of smoking but also attend to cessation.  相似文献   

4.
目的 简要介绍R 环境下MICE填补方法(Multivariate imputation by chained equations)的填补估算应用并评价其填补效果.方法以实际数据阐述填补估算流程,比较MICE与常见的缺失数据处理方法(删除法、均(众)数法、回归法)填补估算效果的差异.结果当数据缺失率为10%时,MICE与常见的缺失数据处理方法估算结果无明显差异,各填补方法的3种变量的回归系数估计的相对误差在10%左右.随着缺失率的增加(20%,40%),各方法回归系数估计的相对误差都增加,但MICE 3种变量的回归系数的相对误差稳定在10%~20%左右,MICE表现优于其他方法而且结果稳定,回归法次之,删除法和均(众)数法较差.当缺失率达50%时,3种类型的变量估算的误差已经较大,所有方法填补估算效果欠佳.结论 MICE较其他多重填补软件操作简便,与常见的缺失数据处理方法相比,可充分地利用缺失记录的信息,能较准确地反应调查的真实情况,值得在实际工作中推广应用.  相似文献   

5.
BACKGROUND AND OBJECTIVES: To illustrate the effects of different methods for handling missing data--complete case analysis, missing-indicator method, single imputation of unconditional and conditional mean, and multiple imputation (MI)--in the context of multivariable diagnostic research aiming to identify potential predictors (test results) that independently contribute to the prediction of disease presence or absence. METHODS: We used data from 398 subjects from a prospective study on the diagnosis of pulmonary embolism. Various diagnostic predictors or tests had (varying percentages of) missing values. Per method of handling these missing values, we fitted a diagnostic prediction model using multivariable logistic regression analysis. RESULTS: The receiver operating characteristic curve area for all diagnostic models was above 0.75. The predictors in the final models based on the complete case analysis, and after using the missing-indicator method, were very different compared to the other models. The models based on MI did not differ much from the models derived after using single conditional and unconditional mean imputation. CONCLUSION: In multivariable diagnostic research complete case analysis and the use of the missing-indicator method should be avoided, even when data are missing completely at random. MI methods are known to be superior to single imputation methods. For our example study, the single imputation methods performed equally well, but this was most likely because of the low overall number of missing values.  相似文献   

6.
National data from the Asia-Pacific region suggest that stroke accounts for over 10% of female deaths. With general aging in the region, and longer life expectancies for women than men, action is required to maintain recent improvements in female death rates from stroke. However, local data on incidence and risk factors for stroke amongst women are scarce. Data from 214,032 women in the Asia Pacific Cohort Studies Collaboration were thus used to investigate the risk factors for stroke in the region. Raised systolic blood pressure and diabetes were found to be key risk factors for both ischemic (IS) and hemorrhagic (HS) stroke. After adjustment for other risk factors, every extra 10 mmHg systolic blood pressure increased risk of IS by 36% and HS by 69%, whilst diabetes increased risk of IS by 170% and HS by 147%. Smoking was also an important risk factor for IS and HS; risk was reduced by quitting.  相似文献   

7.
The purpose of this paper was to illustrate the influence of missing data on the results of longitudinal statistical analyses [i.e., MANOVA for repeated measurements and Generalised Estimating Equations (GEE)] and to illustrate the influence of using different imputation methods to replace missing data. Besides a complete dataset, four incomplete datasets were considered: two datasets with 10% missing data and two datasets with 25% missing data. In both situations missingness was considered independent and dependent on observed data. Imputation methods were divided into cross-sectional methods (i.e., mean of series, hot deck, and cross-sectional regression) and longitudinal methods (i.e., last value carried forward, longitudinal interpolation, and longitudinal regression). Besides these, also the multiple imputation method was applied and discussed. The analyses were performed on a particular (observational) longitudinal dataset, with particular missing data patterns and imputation methods. The results of this illustration shows that when MANOVA for repeated measurements is used, imputation methods are highly recommendable (because MANOVA as implemented in the software used, uses listwise deletion of cases with a missing value). Applying GEE analysis, imputation methods were not necessary. When imputation methods were used, longitudinal imputation methods were often preferable above cross-sectional imputation methods, in a way that the point estimates and standard errors were closer to the estimates derived from the complete dataset. Furthermore, this study showed that the theoretically more valid multiple imputation method did not lead to different point estimates than the more simple (longitudinal) imputation methods. However, the estimated standard errors appeared to be theoretically more adequate, because they reflect the uncertainty in estimation caused by missing values.  相似文献   

8.
The aims of this study were to obtain the most recent representative data for the prevalence of diabetes in adult populations in the World Health Organisation's South-East Asia and Western Pacific regions and to quantify the contribution of diabetes to the burden of mortality from cardiovascular diseases in these regions. Previous reports indicate that there are 83 million individuals with diabetes in the Asia-Pacific region, but since many of the country-specific estimates were not from nationally representative studies, this figure may not accurately reflect the current burden of diabetes. Information on the prevalence of diabetes was obtained by searching Medline and government health websites. Data were available from 12 countries representing 78% of the total population of the Asia-Pacific region. Six of 10 countries with complete data reported a prevalence of diabetes exceeding those estimates currently cited by the World Health Organization; three of which have also already exceeded the World Health Organization projections for 2030. In the 12 countries in the region with nationally representative data, the prevalence of diabetes ranged from 2.6% to 15.1%. Hazard ratios from the Asia Pacific Cohort Studies Collaboration were used to calculate population attributable fractions for diabetes for fatal cardiovascular diseases in the region. Population attributable fractions ranged from 2% to 12% for coronary heart disease, 1% to 6% for haemorrhagic stroke, and 2% to 11% for ischaemic stroke. Accurate estimates of the prevalence of diabetes are of great importance and standard methods are needed for periodic surveillance across the Asia-Pacific region and elsewhere.  相似文献   

9.
Multiple imputation is a strategy for the analysis of incomplete data such that the impact of the missingness on the power and bias of estimates is mitigated. When data from multiple studies are collated, we can propose both within‐study and multilevel imputation models to impute missing data on covariates. It is not clear how to choose between imputation models or how to combine imputation and inverse‐variance weighted meta‐analysis methods. This is especially important as often different studies measure data on different variables, meaning that we may need to impute data on a variable which is systematically missing in a particular study. In this paper, we consider a simulation analysis of sporadically missing data in a single covariate with a linear analysis model and discuss how the results would be applicable to the case of systematically missing data. We find in this context that ensuring the congeniality of the imputation and analysis models is important to give correct standard errors and confidence intervals. For example, if the analysis model allows between‐study heterogeneity of a parameter, then we should incorporate this heterogeneity into the imputation model to maintain the congeniality of the two models. In an inverse‐variance weighted meta‐analysis, we should impute missing data and apply Rubin's rules at the study level prior to meta‐analysis, rather than meta‐analyzing each of the multiple imputations and then combining the meta‐analysis estimates using Rubin's rules. We illustrate the results using data from the Emerging Risk Factors Collaboration. © 2013 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd.  相似文献   

10.
在研究2012年太原市城镇居民医保参保学生(幼儿园至大学)的医疗费用及其影响因素时, 发现因变量数据中同时存在随机无应答偏倚(随机缺失)和选择性偏倚(非随机缺失), 为此本研究提出一个多重填补与样本选择模型相结合的两阶段策略, 同时校正这两种偏倚。实例中经过两阶段抽样、问卷调查, 整理获得合格数据1 190例, 因变量中存在2.52%非随机缺失和7.14%随机缺失。第一阶段利用完整数据对随机缺失进行多重填补, 第二阶段对填补后的数据利用样本选择模型校正非随机缺失, 同时建立多因素分析模型。通过1 000次两阶段校正模拟研究比较4种不同多重填补方法, 得出在此缺失比例组合下预测均数匹配法与样本选择模型结合的校正效果最优。最终在实例分析中得到影响太原市居民医保参保学生年度医疗费用的因素有被调查者类型、家庭年毛收入、对医疗费用水平的承受程度、慢性病、到社区卫生服务或私人诊所就诊、到医院门诊就诊、住院、是否有应住院而未住院情况、自我医疗、可接受的自付医疗费用比例。表明应用多重填补与样本选择模型相结合的两阶段校正方法, 可有效校正调查数据因变量中存在的随机无应答偏倚和选择性偏倚。  相似文献   

11.
ObjectiveWe compared popular methods to handle missing data with multiple imputation (a more sophisticated method that preserves data).Study Design and SettingWe used data of 804 patients with a suspicion of deep venous thrombosis (DVT). We studied three covariates to predict the presence of DVT: d-dimer level, difference in calf circumference, and history of leg trauma. We introduced missing values (missing at random) ranging from 10% to 90%. The risk of DVT was modeled with logistic regression for the three methods, that is, complete case analysis, exclusion of d-dimer level from the model, and multiple imputation.ResultsMultiple imputation showed less bias in the regression coefficients of the three variables and more accurate coverage of the corresponding 90% confidence intervals than complete case analysis and dropping d-dimer level from the analysis. Multiple imputation showed unbiased estimates of the area under the receiver operating characteristic curve (0.88) compared with complete case analysis (0.77) and when the variable with missing values was dropped (0.65).ConclusionAs this study shows that simple methods to deal with missing data can lead to seriously misleading results, we advise to consider multiple imputation. The purpose of multiple imputation is not to create data, but to prevent the exclusion of observed data.  相似文献   

12.
In healthcare cost-effectiveness analysis, probability distributions are typically skewed and missing data are frequent. Bootstrap and multiple imputation are well-established resampling methods for handling skewed and missing data. However, it is not clear how these techniques should be combined. This paper addresses combining multiple imputation and bootstrap to obtain confidence intervals of the mean difference in outcome for two independent treatment groups. We assessed statistical validity and efficiency of 10 candidate methods and applied these methods to a clinical data set. Single imputation nested in the bootstrap percentile method (with added noise to reflect the uncertainty of the imputation) emerged as the method with the best statistical properties. However, this method can require extensive computation times and the lack of standard software makes this method not accessible for a larger group of researchers. Using a standard unpaired t-test with standard multiple imputation without bootstrap appears to be a robust alternative with acceptable statistical performance for which standard multiple imputation software is available.  相似文献   

13.
张熙  李济宾  张晋昕 《中国卫生统计》2012,29(3):318-320,324
目的用模拟研究的方法,对含周期性的时间序列数据中的连续型缺失数据进行填补,比较基于周期信息的时间序列缺失值填补法(简称周期性填补法)和spline插值填补法对连续型缺失数据的填补效果。方法分别应用模拟时间序列数据和实际时间序列数据模拟连续型缺失,比较两种方法在不同连续缺失个数下的缺失值填补效果。采用NRMSE和RMSE量化填补的误差。结果除连续型缺失长度为10和平,随着连续缺失个数的增加,周期性填补法的填补误均小于spline插值填补法。周期性填补方法的填补误差在5~30的连续缺失范围内无明显波动,始终保持在一个较低的水平;而spline填补值的误差随着缺失个数的增加明显增高。结论对于含有确切周期性的时间序列,周期性填补方法对连续型缺失数据的填补效果相对于spline填补更好,填补误差稳定,并且不随连续缺失长度的增加而有较大的变化。  相似文献   

14.
We propose a transition model for analysing data from complex longitudinal studies. Because missing values are practically unavoidable in large longitudinal studies, we also present a two-stage imputation method for handling general patterns of missing values on both the outcome and the covariates by combining multiple imputation with stochastic regression imputation. Our model is a time-varying auto-regression on the past innovations (residuals), and it can be used in cases where general dynamics must be taken into account, and where the model selection is important. The entire estimation process was carried out using available procedures in statistical packages such as SAS and S-PLUS. To illustrate the viability of the proposed model and the two-stage imputation method, we analyse data collected in an epidemiological study that focused on various factors relating to childhood growth. Finally, we present a simulation study to investigate the behaviour of our two-stage imputation procedure.  相似文献   

15.
We consider a study‐level meta‐analysis with a normally distributed outcome variable and possibly unequal study‐level variances, where the object of inference is the difference in means between a treatment and control group. A common complication in such an analysis is missing sample variances for some studies. A frequently used approach is to impute the weighted (by sample size) mean of the observed variances (mean imputation). Another approach is to include only those studies with variances reported (complete case analysis). Both mean imputation and complete case analysis are only valid under the missing‐completely‐at‐random assumption, and even then the inverse variance weights produced are not necessarily optimal. We propose a multiple imputation method employing gamma meta‐regression to impute the missing sample variances. Our method takes advantage of study‐level covariates that may be used to provide information about the missing data. Through simulation studies, we show that multiple imputation, when the imputation model is correctly specified, is superior to competing methods in terms of confidence interval coverage probability and type I error probability when testing a specified group difference. Finally, we describe a similar approach to handling missing variances in cross‐over studies. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

16.
目的 针对纵向缺失数据,比较几种适用的填补方法并从中选择最佳方法用于阿尔茨海默病随访资料的数据缺失填补。方法 针对随机缺失机制且缺失变量为连续变量的纵向缺失资料,模拟缺失比例分别为10%、20%、30%、40%和50%的随机数据集,结合末次观察值结转(Last Observation Carried Forward, LOCF )填补方法、马尔可夫链蒙特卡罗填补法(Markov Chain Monte Carlo, MCMC)、全条件定义法(Fully Conditional Specification, FCS)进行填补,采用无偏性和有效性评价指标,比较填补效果,选取最理想的填补方法,并将该方法应用于阿尔茨海默病随访研究中收缩压和蒙特利尔认知评估量表(Montreal Cognitive Assessment, MoCA)得分的填补。结果 (1)纵向缺失资料中若不考虑时间变量,在处理几个连续性的缺失变量时,MCMC法在各缺失率下填补均优势明显,LOCF填补法在缺失率较低时具有一定的效果,且方法简单,而FCS法的填补结果均不太好。当数据缺失比较严重,缺失率高于40%时,各种填补方法的填补结果均不佳。(2)将MCMC法用于填补阿尔茨海默病的随访缺失数据,当填补次数为3时,收缩压和MoCA得分两指标的填补效果最佳。结论 为了得到最理想的结果,在处理缺失数据时填补方法和适当的填补次数都需要考虑。  相似文献   

17.
ABSTRACT: BACKGROUND: Multiple imputation is becoming increasingly popular for handling missing data. However, it is often implemented without adequate consideration of whether it offers any advantage over complete case analysis for the research question of interest, or whether potential gains may be offset by bias from a poorly fitting imputation model, particularly as the amount of missing data increases. METHODS: Simulated datasets (n = 1000) drawn from a synthetic population were used to explore information recovery from multiple imputation in estimating the coefficient of a binary exposure variable when various proportions of data (10-90%) were set missing at random in a highly-skewed continuous covariate or in the binary exposure. Imputation was performed using multivariate normal imputation (MVNI), with a simple or zero-skewness log transformation to manage non-normality. Bias, precision, mean-squared error and coverage for a set of regression parameter estimates were compared between multiple imputation and complete case analyses. RESULTS: For missingness in the continuous covariate, multiple imputation produced less bias and greater precision for the effect of the binary exposure variable, compared with complete case analysis, with larger gains in precision with more missing data. However, even with only moderate missingness, large bias and substantial under-coverage were apparent in estimating the continuous covariate's effect when skewness was not adequately addressed. For missingness in the binary covariate, all estimates had negligible bias but gains in precision from multiple imputation were minimal, particularly for the coefficient of the binary exposure. CONCLUSIONS: Although multiple imputation can be useful if covariates required for confounding adjustment are missing, benefits are likely to be minimal when data are missing in the exposure variable of interest. Furthermore, when there are large amounts of missingness, multiple imputation can become unreliable and introduce bias not present in a complete case analysis if the imputation model is not appropriate. Epidemiologists dealing with missing data should keep in mind the potential limitations as well as the potential benefits of multiple imputation. Further work is needed to provide clearer guidelines on effective application of this method.  相似文献   

18.
Multiple imputation is commonly used to impute missing covariate in Cox semiparametric regression setting. It is to fill each missing data with more plausible values, via a Gibbs sampling procedure, specifying an imputation model for each missing variable. This imputation method is implemented in several softwares that offer imputation models steered by the shape of the variable to be imputed, but all these imputation models make an assumption of linearity on covariates effect. However, this assumption is not often verified in practice as the covariates can have a nonlinear effect. Such a linear assumption can lead to a misleading conclusion because imputation model should be constructed to reflect the true distributional relationship between the missing values and the observed values. To estimate nonlinear effects of continuous time invariant covariates in imputation model, we propose a method based on B‐splines function. To assess the performance of this method, we conducted a simulation study, where we compared the multiple imputation method using Bayesian splines imputation model with multiple imputation using Bayesian linear imputation model in survival analysis setting. We evaluated the proposed method on the motivated data set collected in HIV‐infected patients enrolled in an observational cohort study in Senegal, which contains several incomplete variables. We found that our method performs well to estimate hazard ratio compared with the linear imputation methods, when data are missing completely at random, or missing at random. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

19.
Several approaches exist for handling missing covariates in the Cox proportional hazards model. The multiple imputation (MI) is relatively easy to implement with various software available and results in consistent estimates if the imputation model is correct. On the other hand, the fully augmented weighted estimators (FAWEs) recover a substantial proportion of the efficiency and have the doubly robust property. In this paper, we compare the FAWEs and the MI through a comprehensive simulation study. For the MI, we consider the multiple imputation by chained equation and focus on two imputation methods: Bayesian linear regression imputation and predictive mean matching. Simulation results show that the imputation methods can be rather sensitive to model misspecification and may have large bias when the censoring time depends on the missing covariates. In contrast, the FAWEs allow the censoring time to depend on the missing covariates and are remarkably robust as long as getting either the conditional expectations or the selection probability correct due to the doubly robust property. The comparison suggests that the FAWEs show the potential for being a competitive and attractive tool for tackling the analysis of survival data with missing covariates. Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

20.
BACKGROUND AND METHODOLOGY: Late 'age at menopause' is a recognised risk factor for postmenopausal breast cancer and is also associated with decreased use of hormone replacement therapy (HRT). When investigating the association between HRT use and breast cancer risk it is therefore necessary to adjust for the potential confounder, 'age at menopause'. 'Age at menopause', however, cannot be determined for women with a hysterectomy and ovarian conservation. Using data on 13 357 postmenopausal women in whom 396 cases of invasive breast cancer were diagnosed during 9 years of follow-up from the Melbourne Collaborative Cohort Study, we compared the estimates of relative risk of HRT use for breast cancer for three different methods of dealing with missing data: complete-case analysis, single imputation and multiple imputation. RESULTS: 'Age at menopause' was missing for 17% of the data. Both HRT use and 'age at menopause' were significant risk factors for breast cancer, although 'age at menopause' only marginally confounded the estimates of risk for HRT. Women with 'age at menopause' missing did not represent a random sample of the population. Complete-case analyses resulted in higher estimates of the risk associated with HRT use compared with the different methods of imputation. DISCUSSION AND CONCLUSIONS: We recommend that analyses investigating the association between HRT and breast cancer should present the results in two ways: excluding women with 'age at menopause' missing and including the women using multiple imputation. For both methods, estimates of risk, with and without the adjustment of 'age at menopause', should be given.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号