首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In the logistic regression analysis of a small‐sized, case‐control study on Alzheimer's disease, some of the risk factors exhibited missing values, motivating the use of multiple imputation. Usually, Rubin's rules (RR) for combining point estimates and variances would then be used to estimate (symmetric) confidence intervals (CIs), on the assumption that the regression coefficients were distributed normally. Yet, rarely is this assumption tested, with or without transformation. In analyses of small, sparse, or nearly separated data sets, such symmetric CI may not be reliable. Thus, RR alternatives have been considered, for example, Bayesian sampling methods, but not yet those that combine profile likelihoods, particularly penalized profile likelihoods, which can remove first order biases and guarantee convergence of parameter estimation. To fill the gap, we consider the combination of penalized likelihood profiles (CLIP) by expressing them as posterior cumulative distribution functions (CDFs) obtained via a chi‐squared approximation to the penalized likelihood ratio statistic. CDFs from multiple imputations can then easily be averaged into a combined CDF c, allowing confidence limits for a parameter β at level 1 ? α to be identified as those β* and β** that satisfy CDF c(β*) = α ∕ 2 and CDF c(β**) = 1 ? α ∕ 2. We demonstrate that the CLIP method outperforms RR in analyzing both simulated data and data from our motivating example. CLIP can also be useful as a confirmatory tool, should it show that the simpler RR are adequate for extended analysis. We also compare the performance of CLIP to Bayesian sampling methods using Markov chain Monte Carlo. CLIP is available in the R package logistf. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

2.

Background

Multiple imputation has become very popular as a general-purpose method for handling missing data. The validity of multiple-imputation-based analyses relies on the use of an appropriate model to impute the missing values. Despite the widespread use of multiple imputation, there are few guidelines available for checking imputation models.

Analysis

In this paper, we provide an overview of currently available methods for checking imputation models. These include graphical checks and numerical summaries, as well as simulation-based methods such as posterior predictive checking. These model checking techniques are illustrated using an analysis affected by missing data from the Longitudinal Study of Australian Children.

Conclusions

As multiple imputation becomes further established as a standard approach for handling missing data, it will become increasingly important that researchers employ appropriate model checking approaches to ensure that reliable results are obtained when using this method.
  相似文献   

3.
Recovery beds for cardiovascular surgical patients in the intensive care unit (ICU) and progressive care unit (PCU) are costly hospital resources that require effective management. This case study reports on the development and use of a discrete-event simulation model used to predict minimum bed needs to achieve the high patient service level demanded at Mayo Clinic. In addition to bed predictions that incorporate surgery growth and new recovery protocols, the model was used to explore the effects of smoothing surgery schedules and transferring long-stay patients from the ICU. The model projected bed needs that were 30 % lower than the traditional bed-planning approach and the options explored by the practice could substantially reduce the number of beds required.  相似文献   

4.
Combining information from multiple data sources can enhance estimates of health‐related measures by using one source to supply information that is lacking in another, assuming the former has accurate and complete data. However, there is little research conducted on combining methods when each source might be imperfect, for example, subject to measurement errors and/or missing data. In a multisite study of hospice‐use by late‐stage cancer patients, this variable was available from patients’ abstracted medical records, which may be considerably underreported because of incomplete acquisition of these records. Therefore, data for Medicare‐eligible patients were supplemented with their Medicare claims that contained information on hospice‐use, which may also be subject to underreporting yet to a lesser degree. In addition, both sources suffered from missing data because of unit nonresponse from medical record abstraction and sample undercoverage for Medicare claims. We treat the true hospice‐use status from these patients as a latent variable and propose to multiply impute it using information from both data sources, borrowing the strength from each. We characterize the complete‐data model as a product of an ‘outcome’ model for the probability of hospice‐use and a ‘reporting’ model for the probability of underreporting from both sources, adjusting for other covariates. Assuming the reports of hospice‐use from both sources are missing at random and the underreporting are conditionally independent, we develop a Bayesian multiple imputation algorithm and conduct multiple imputation analyses of patient hospice‐use in demographic and clinical subgroups. The proposed approach yields more sensible results than alternative methods in our example. Our model is also related to dual system estimation in population censuses and dual exposure assessment in epidemiology. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

5.

Background

When an outcome variable is missing not at random (MNAR: probability of missingness depends on outcome values), estimates of the effect of an exposure on this outcome are often biased. We investigated the extent of this bias and examined whether the bias can be reduced through incorporating proxy outcomes obtained through linkage to administrative data as auxiliary variables in multiple imputation (MI).

Methods

Using data from the Avon Longitudinal Study of Parents and Children (ALSPAC) we estimated the association between breastfeeding and IQ (continuous outcome), incorporating linked attainment data (proxies for IQ) as auxiliary variables in MI models. Simulation studies explored the impact of varying the proportion of missing data (from 20 to 80%), the correlation between the outcome and its proxy (0.1–0.9), the strength of the missing data mechanism, and having a proxy variable that was incomplete.

Results

Incorporating a linked proxy for the missing outcome as an auxiliary variable reduced bias and increased efficiency in all scenarios, even when 80% of the outcome was missing. Using an incomplete proxy was similarly beneficial. High correlations (> 0.5) between the outcome and its proxy substantially reduced the missing information. Consistent with this, ALSPAC analysis showed inclusion of a proxy reduced bias and improved efficiency. Gains with additional proxies were modest.

Conclusions

In longitudinal studies with loss to follow-up, incorporating proxies for this study outcome obtained via linkage to external sources of data as auxiliary variables in MI models can give practically important bias reduction and efficiency gains when the study outcome is MNAR.
  相似文献   

6.
S-T Wang Dr  L-Y Lin  M-L Yu 《Public health》1998,112(2):129-132
This paper presents a SAS macro for a simulation study of comparing a new variant of hot-deck imputation with mean imputation for missing values, in which a simple algorithm proposed by Bebbington (Applied Statistics, 1975) for carrying out simple random sampling without replacement was employed to draw repeated random samples efficiently. A simulated example of drawing repeated random samples from a regional survey of obesity in school children was used to demonstrate the SAS macro.  相似文献   

7.
8.
Multiple imputation can be a good solution to handling missing data if data are missing at random. However, this assumption is often difficult to verify. We describe an application of multiple imputation that makes this assumption plausible. This procedure requires contacting a random sample of subjects with incomplete data to fill in the missing information, and then adjusting the imputation model to incorporate the new data. Simulations with missing data that were decidedly not missing at random showed, as expected, that the method restored the original beta coefficients, whereas other methods of dealing with missing data failed. Using a dataset with real missing data, we found that different approaches to imputation produced moderately different results. Simulations suggest that filling in 10% of data that was initially missing is sufficient for imputation in many epidemiologic applications, and should produce approximately unbiased results, provided there is a high response on follow-up from the subsample of those with some originally missing data. This response can probably be achieved if this data collection is planned as an initial approach to dealing with the missing data, rather than at later stages, after further attempts that leave only data that is very difficult to complete.  相似文献   

9.
In this paper, we consider fitting semiparametric additive hazards models for case‐cohort studies using a multiple imputation approach. In a case‐cohort study, main exposure variables are measured only on some selected subjects, but other covariates are often available for the whole cohort. We consider this as a special case of a missing covariate by design. We propose to employ a popular incomplete data method, multiple imputation, for estimation of the regression parameters in additive hazards models. For imputation models, an imputation modeling procedure based on a rejection sampling is developed. A simple imputation modeling that can naturally be applied to a general missing‐at‐random situation is also considered and compared with the rejection sampling method via extensive simulation studies. In addition, a misspecification aspect in imputation modeling is investigated. The proposed procedures are illustrated using a cancer data example. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

10.
Missing (censored) death times for lung candidates in urgent need of transplant are a signpost of success for allocation policy makers. However, statisticians analyzing these data must properly account for dependent censoring as the sickest patients are removed from the waitlist. Multiple imputation allows the creation of complete data sets that can be used for a variety of standard analyses in this setting. We propose an approach to multiply impute lung candidate outcomes that incorporates (i) time‐varying factors predicting removal from the waitlist and (ii) estimates of transplant urgency via restricted mean models. The measures of transplant urgency and benefit for individual patient profiles are discussed in the context of lung allocation score modeling in the USA. Marginal survival estimates in the event that a transplant does not occur are also provided. Simulations suggest that the proposed imputation method gives attractive results when compared with existing methods. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

11.
We propose a propensity score-based multiple imputation (MI) method to tackle incomplete missing data resulting from drop-outs and/or intermittent skipped visits in longitudinal clinical trials with binary responses. The estimation and inferential properties of the proposed method are contrasted via simulation with those of the commonly used complete-case (CC) and generalized estimating equations (GEE) methods. Three key results are noted. First, if data are missing completely at random, MI can be notably more efficient than the CC and GEE methods. Second, with small samples, GEE often fails due to 'convergence problems', but MI is free of that problem. Finally, if the data are missing at random, while the CC and GEE methods yield results with moderate to large bias, MI generally yields results with negligible bias. A numerical example with real data is provided for illustration.  相似文献   

12.
Ruan PK  Gray RJ 《Statistics in medicine》2008,27(27):5709-5724
We describe a non-parametric multiple imputation method that recovers the missing potential censoring information from competing risks failure times for the analysis of cumulative incidence functions. The method can be applied in the settings of stratified analyses, time-varying covariates, weighted analysis of case-cohort samples and clustered survival data analysis, where no current available methods can be readily implemented. The method uses a Kaplan-Meier imputation method for the censoring times to form an imputed data set, so cumulative incidence can be analyzed using techniques and software developed for ordinary right censored survival data. We discuss the methodology and show from both simulations and real data examples that the method yields valid estimates and performs well. The method can be easily implemented via available software with a minor programming requirement (for the imputation step). It provides a practical, alternative analysis tool for otherwise complicated analyses of cumulative incidence of competing risks data.  相似文献   

13.
Multiple imputation (MI) has become popular for analyses with missing data in medical research. The standard implementation of MI is based on the assumption of data being missing at random (MAR). However, for missing data generated by missing not at random mechanisms, MI performed assuming MAR might not be satisfactory. For an incomplete variable in a given data set, its corresponding population marginal distribution might also be available in an external data source. We show how this information can be readily utilised in the imputation model to calibrate inference to the population by incorporating an appropriately calculated offset termed the “calibrated-δ adjustment.” We describe the derivation of this offset from the population distribution of the incomplete variable and show how, in applications, it can be used to closely (and often exactly) match the post-imputation distribution to the population level. Through analytic and simulation studies, we show that our proposed calibrated-δ adjustment MI method can give the same inference as standard MI when data are MAR, and can produce more accurate inference under two general missing not at random missingness mechanisms. The method is used to impute missing ethnicity data in a type 2 diabetes prevalence case study using UK primary care electronic health records, where it results in scientifically relevant changes in inference for non-White ethnic groups compared with standard MI. Calibrated-δ adjustment MI represents a pragmatic approach for utilising available population-level information in a sensitivity analysis to explore potential departures from the MAR assumption.  相似文献   

14.
Incomplete data are generally a challenge to the analysis of most large studies. The current gold standard to account for missing data is multiple imputation, and more specifically multiple imputation with chained equations (MICE). Numerous studies have been conducted to illustrate the performance of MICE for missing covariate data. The results show that the method works well in various situations. However, less is known about its performance in more complex models, specifically when the outcome is multivariate as in longitudinal studies. In current practice, the multivariate nature of the longitudinal outcome is often neglected in the imputation procedure, or only the baseline outcome is used to impute missing covariates. In this work, we evaluate the performance of MICE using different strategies to include a longitudinal outcome into the imputation models and compare it with a fully Bayesian approach that jointly imputes missing values and estimates the parameters of the longitudinal model. Results from simulation and a real data example show that MICE requires the analyst to correctly specify which components of the longitudinal process need to be included in the imputation models in order to obtain unbiased results. The full Bayesian approach, on the other hand, does not require the analyst to explicitly specify how the longitudinal outcome enters the imputation models. It performed well under different scenarios. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

15.
Multiple imputation (MI) has increasingly received attention as a flexible tool to resolve missing data problems both in observational and controlled studies. Our goal has been to develop a valid and efficient MI procedure for the Diabetes Prediction and Prevention Nutrition Study, in which the diet of a cohort of newborn children with HLA‐DQB1‐conferred susceptibility to type 1 diabetes is repeatedly measured by 3‐day food records over early childhood. The estimation of risk is based on a nested case‐control design setup within the cohort. We have used an iterative procedure known as the fully conditional specification (FCS) to generate appropriate values for the missing dietary data, here playing the role of time‐dependent covariates. Our method extends the standard FCS to repeated measurements settings with the possibility of non‐monotone missingness patterns by being doubly iterative over the follow‐up time of the individuals. In addition, our proposed procedure is nonparametric in the sense that the variables can have distributions deviating strongly from normality: it makes use of quantile normal scores to transform to normality, performs imputations, and transforms back to the original scale. By the use of a moving time window and stepwise regression procedures, the two‐fold FCS method operates well with a great number of variables each measured repeatedly over time. Extensive simulation studies demonstrate that the procedure together with the proposed transformations and variable selection methods provides tools for valid and efficient statistical inference in the nested case‐control setting, and its applications extend beyond that. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

16.
Reproductive hormone levels are highly variable among premenopausal women during the menstrual cycle. Accurate timing of hormone measurement is essential, especially when investigating day- or phase-specific effects. The BioCycle Study used daily urine home fertility monitors to help detect the luteinising hormone (LH) surge in order to schedule visits with biologically relevant windows of hormonal variability. However, as the LH surge is brief and cycles vary in length, relevant hormonal changes may not align with scheduled visits even when fertility monitors are used. Using monitor data, measurements were reclassified according to biological phase of the menstrual cycle to more accurate cycle phase categories. Longitudinal multiple imputation methods were applied after reclassification if no visit occurred during a given menstrual cycle phase. Reclassified cycles had more clearly defined hormonal profiles, with higher mean peak hormones (up to 141%) and reduced variability (up to 71%). We demonstrate the importance of realigning visits to biologically relevant windows when assessing phase- or day-specific effects and the feasibility of applying longitudinal multiple imputation methods. Our method has applications in settings where missing data may occur over time, where daily blood sampling for hormonal measurements is not feasible, and in other areas where timing is essential.  相似文献   

17.
多重填充方法评估日本血吸虫病感染率   总被引:5,自引:0,他引:5  
目的 对血吸虫病监测点的粪检感染率进行评估。方法 从全国12个先经血检过筛后再用粪检进行确诊的监测点中随机抽取一个。采用统一的问卷调查居民与血吸虫病感染有密切关系的因素,采用多重填充方法对血吸虫病粪检感染率进行估计。结果 监测点居民粪检感染率约为20%,间接血凝试验(IHA)检测阴性者415人中有8%左右的居民被漏检。结论 居民血吸虫病粪检感染率被低估了5%左右。  相似文献   

18.
19.
20.
ObjectiveMissing data are a pervasive problem, often leading to bias in complete records analysis (CRA). Multiple imputation (MI) via chained equations is one solution, but its use in the presence of interactions is not straightforward.Study Design and SettingWe simulated data with outcome Y dependent on binary explanatory variables X and Z and their interaction XZ. Six scenarios were simulated (Y continuous and binary, each with no interaction, a weak and a strong interaction), under five missing data mechanisms. We use directed acyclic graphs to identify when CRA and MI would each be unbiased. We evaluate the performance of CRA, MI without interactions, MI including all interactions, and stratified imputation. We also illustrated these methods using a simple example from the National Child Development Study (NCDS).ResultsMI excluding interactions is invalid and resulted in biased estimates and low coverage. When XZ was zero, MI excluding interactions gave unbiased estimates but overcoverage. MI including interactions and stratified MI gave equivalent, valid inference in all cases. In the NCDS example, MI excluding interactions incorrectly concluded there was no evidence for an important interaction.ConclusionsEpidemiologists carrying out MI should ensure that their imputation model(s) are compatible with their analysis model.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号