首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Standard measures of crude association in the context of a cross-sectional study are the risk difference, relative risk and odds ratio as derived from a 2x 2 table. Most such studies are subject to missing data on disease, exposure, or both, introducing bias into the usual complete-case analysis. We describe several scenarios distinguished by the manner in which missing data arise, and for each we adjust the natural multinomial likelihood to properly account for missing data. The situations presented allow for increasing levels of generality with regard to the missing data mechanism. The final case, quite conceivable in epidemiologic studies, assumes that the probability of missing exposure depends on true exposure and disease status, as well as upon whether disease status is missing (and conversely for the probability of missing disease information). When parameters relating to the missing data process are inestimable without strong assumptions, we propose maximum likelihood analysis subsequent to collecting supplemental data in the spirit of a validation study. Analytical results give insight into the bias inherent in complete-case analysis for each scenario, and numerical results illustrate the performance of likelihood-based point and interval estimates in the most general case. Adjustment for potential confounders via stratified analysis is also discussed.  相似文献   

2.
We studied bias due to missing exposure data in the proportional hazards regression model when using complete-case analysis (CCA). Eleven missing data scenarios were considered: one with missing completely at random (MCAR), four missing at random (MAR), and six non-ignorable missingness scenarios, with a variety of hazard ratios, censoring fractions, missingness fractions and sample sizes. When missingness was MCAR or dependent only on the exposure, there was negligible bias (2-3 per cent) that was similar to the difference between the estimate in the full data set with no missing data and the true parameter. In contrast, substantial bias occurred when missingness was dependent on outcome or both outcome and exposure. For models with hazard ratio of 3.5, a sample size of 400, 20 per cent censoring and 40 per cent missing data, the relative bias for the hazard ratio ranged between 7 per cent and 64 per cent. We observed important differences in the direction and magnitude of biases under the various missing data mechanisms. For example, in scenarios where missingness was associated with longer or shorter follow-up, the biases were notably different, although both mechanisms are MAR. The hazard ratio was underestimated (with larger bias) when missingness was associated with longer follow-up and overestimated (with smaller bias) when associated with shorter follow-up. If it is known that missingness is associated with a less frequently observed outcome or with both the outcome and exposure, CCA may result in an invalid inference and other methods for handling missing data should be considered.  相似文献   

3.
OBJECTIVE: To illustrate methods for handling incomplete data in health research. METHODS: Two strategies for handling missing data are presented: complete-case analysis and imputations. The imputations used were mean imputations, regression imputations, and multiple imputations. These strategies are illustrated in the context of logistic regression through an example using data from the "Second Cuban national survey on risk factors and non communicable disease", carried out in 2001. RESULTS: The results obtained via mean and regression imputation were similar. The odds ratios were overestimated by 10%. The results of complete-case analysis showed the greatest difference from the reference odds ratios, with a variation of between 2 and 65%. The three methods distorted the relationship between age and hypertension. Multiple imputations produced estimates closest to those of the reference estimates with a variation of less than 16%. This was the only procedure preserving the relationship between age and hypertension. CONCLUSIONS: Selecting methods for handling missing data is difficult, since the same procedure can give precise estimations in certain circumstances and not in others. Complete-case analysis should be used with caution due to the substantial loss of information it produces. Mean and regression imputations produce unreliable estimates under missing at random (MAR) mechanisms.  相似文献   

4.
Missing data are common in longitudinal studies and can occur in the exposure interest. There has been little work assessing the impact of missing data in marginal structural models (MSMs), which are used to estimate the effect of an exposure history on an outcome when time‐dependent confounding is present. We design a series of simulations based on the Framingham Heart Study data set to investigate the impact of missing data in the primary exposure of interest in a complex, realistic setting. We use a standard application of MSMs to estimate the causal odds ratio of a specific activity history on outcome. We report and discuss the results of four missing data methods, under seven possible missing data structures, including scenarios in which an unmeasured variable predicts missing information. In all missing data structures, we found that a complete case analysis, where all subjects with missing exposure data are removed from the analysis, provided the least bias. An analysis that censored individuals at the first occasion of missing exposure and includes a censorship model as well as a propensity model when creating the inverse probability weights also performed well. The presence of an unmeasured predictor of missing data only slightly increased bias, except in the situation such that the exposure had a large impact on missing data and the unmeasured variable had a large impact on missing data and outcome. A discussion of the results is provided using causal diagrams, showing the usefulness of drawing such diagrams before conducting an analysis. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

5.

Background

Missing data often cause problems in longitudinal cohort studies with repeated follow-up waves. Research in this area has focussed on analyses with missing data in repeated measures of the outcome, from which participants with missing exposure data are typically excluded. We performed a simulation study to compare complete-case analysis with Multiple imputation (MI) for dealing with missing data in an analysis of the association of waist circumference, measured at two waves, and the risk of colorectal cancer (a completely observed outcome).

Methods

We generated 1,000 datasets of 41,476 individuals with values of waist circumference at waves 1 and 2 and times to the events of colorectal cancer and death to resemble the distributions of the data from the Melbourne Collaborative Cohort Study. Three proportions of missing data (15, 30 and 50%) were imposed on waist circumference at wave 2 using three missing data mechanisms: Missing Completely at Random (MCAR), and a realistic and a more extreme covariate-dependent Missing at Random (MAR) scenarios. We assessed the impact of missing data on two epidemiological analyses: 1) the association between change in waist circumference between waves 1 and 2 and the risk of colorectal cancer, adjusted for waist circumference at wave 1; and 2) the association between waist circumference at wave 2 and the risk of colorectal cancer, not adjusted for waist circumference at wave 1.

Results

We observed very little bias for complete-case analysis or MI under all missing data scenarios, and the resulting coverage of interval estimates was near the nominal 95% level. MI showed gains in precision when waist circumference was included as a strong auxiliary variable in the imputation model.

Conclusions

This simulation study, based on data from a longitudinal cohort study, demonstrates that there is little gain in performing MI compared to a complete-case analysis in the presence of up to 50% missing data for the exposure of interest when the data are MCAR, or missing dependent on covariates. MI will result in some gain in precision if a strong auxiliary variable that is not in the analysis model is included in the imputation model.
  相似文献   

6.
In case–control studies, it is common for a categorical exposure variable to be misclassified. It is also common for exposure status to be informatively missing for some individuals, in that the probability of missingness may be related to exposure. Procedures for addressing the bias due to misclassification via validation data have been extensively studied, and related methods have been proposed for dealing with informative missingness based on supplemental sampling of some of those with missing data. In this paper, we introduce study designs and analytic procedures for dealing with both problems simultaneously in a 2×2 analysis. Results based on convergence in probability illustrate that the combined effects of missingness and misclassification, even when the latter is non‐differential, can lead to naïve exposure odds ratio estimates that are inflated or on the wrong side of the null. The motivating example comes from a case–control study of the association between low birth weight and the diagnosis of breast cancer later in life, where self‐reported birth weight for some women is supplemented by accurate information from birth certificates. Copyright © 2006 John Wiley & Sons, Ltd.  相似文献   

7.
Missing covariate data are common in observational studies of time to an event, especially when covariates are repeatedly measured over time. Failure to account for the missing data can lead to bias or loss of efficiency, especially when the data are non-ignorably missing. Previous work has focused on the case of fixed covariates rather than those that are repeatedly measured over the follow-up period, hence, here we present a selection model that allows for proportional hazards regression with time-varying covariates when some covariates may be non-ignorably missing. We develop a fully Bayesian model and obtain posterior estimates of the parameters via the Gibbs sampler in WinBUGS. We illustrate our model with an analysis of post-diagnosis weight change and survival after breast cancer diagnosis in the Long Island Breast Cancer Study Project follow-up study. Our results indicate that post-diagnosis weight gain is associated with lower all-cause and breast cancer-specific survival among women diagnosed with new primary breast cancer. Our sensitivity analysis showed only slight differences between models with different assumptions on the missing data mechanism yet the complete-case analysis yielded markedly different results.  相似文献   

8.
Longitudinal studies with repeated measures are often subject to non-response. Methods currently employed to alleviate the difficulties caused by missing data are typically unsatisfactory, especially when the cause of the missingness is related to the outcomes. We present an approach for incomplete categorical data in the repeated measures setting that allows missing data to depend on other observed outcomes for a study subject. The proposed methodology also allows a broader examination of study findings through interpretation of results in the framework of the set of all possible test statistics that might have been observed had no data been missing. The proposed approach consists of the following general steps. First, we generate all possible sets of missing values and form a set of possible complete data sets. We then weight each data set according to clearly defined assumptions and apply an appropriate statistical test procedure to each data set, combining the results to give an overall indication of significance. We make use of the EM algorithm and a Bayesian prior in this approach. While not restricted to the one-sample case, the proposed methodology is illustrated for one-sample data and compared to the common complete-case and available-case analysis methods.  相似文献   

9.
The purpose of this paper is to raise awareness of missing data when concentration indices are used to evaluate health-related inequality. Concentration indices are most commonly calculated using individual-level survey data. Incomplete data is a pervasive problem faced by most applied researchers who use survey data. The default analysis method in most statistical software packages is complete-case analysis. This excludes any cases where any variables are missing. If the missing variables in question are not completely random, the calculated concentration indices are likely to be biased, which may lead to inappropriate policy recommendations. In this paper, I use both a case study and a simulation study to show how complete-case analysis may lead to biases in the estimation of concentration indices. A possible solution to correct such biases is proposed.  相似文献   

10.
BACKGROUND: Most systematic reviewers aim to perform an intention-to-treat meta-analysis, including all randomized participants from each trial. This is not straightforward in practice: reviewers must decide how to handle missing outcome data in the contributing trials. OBJECTIVE: To investigate methods of allowing for uncertainty due to missing data in a meta-analysis. STUDY DESIGN AND SETTING: The Cochrane Library was surveyed to assess current use of imputation methods. We developed a methodology for incorporating uncertainty, with weights assigned to trials based on uncertainty interval widths. The uncertainty interval for a trial incorporates both sampling error and the potential impact of missing data. We evaluated the performance of this method using simulated data. RESULTS: The survey showed that complete-case analysis is commonly considered alongside best-worst case analysis. Best-worst case analysis gives an interval for the treatment effect that includes all of the uncertainty due to missing data. Unless there are few missing data, this interval is very wide. Simulations show that the uncertainty method consistently has better power and narrower interval widths than best-worst case analysis. CONCLUSION: The uncertainty method performs consistently better than best-worst case imputation and should be considered along with complete-case analysis whenever missing data are a concern.  相似文献   

11.
Many clinical or prevention studies involve missing or censored outcomes. Maximum likelihood (ML) methods provide a conceptually straightforward approach to estimation when the outcome is partially missing. Methods of implementing ML methods range from the simple to the complex, depending on the type of data and the missing-data mechanism. Simple ML methods for ignorable missing-data mechanisms (when data are missing at random) include complete-case analysis, complete-case analysis with covariate adjustment, survival analysis with covariate adjustment, and analysis via propensity-to-be-missing scores. More complex ML methods for ignorable missing-data mechanisms include the analysis of longitudinal dropouts via a marginal model for continuous data or a conditional model for categorical data. A moderately complex ML method for categorical data with a saturated model and either ignorable or nonignorable missing-data mechanisms is a perfect fit analysis, an algebraic method involving closed-form estimates and variances. A complex and flexible ML method with categorical data and either ignorable or nonignorable missing-data mechanisms is the method of composite linear models, a matrix method requiring specialized software. Except for the method of composite linear models, which can involve challenging matrix specifications, the implementation of these ML methods ranges in difficulty from easy to moderate.  相似文献   

12.
The treatment of missing data in comparative effectiveness studies with right-censored outcomes and time-varying covariates is challenging because of the multilevel structure of the data. In particular, the performance of an accessible method like multiple imputation (MI) under an imputation model that ignores the multilevel structure is unknown and has not been compared to complete-case (CC) and single imputation methods that are most commonly applied in this context. Through an extensive simulation study, we compared statistical properties among CC analysis, last value carried forward, mean imputation, the use of missing indicators, and MI-based approaches with and without auxiliary variables under an extended Cox model when the interest lies in characterizing relationships between non-missing time-varying exposures and right-censored outcomes. MI demonstrated favorable properties under a moderate missing-at-random condition (absolute bias <0.1) and outperformed CC and single imputation methods, even when the MI method did not account for correlated observations in the imputation model. The performance of MI decreased with increasing complexity such as when the missing data mechanism involved the exposure of interest, but was still preferred over other methods considered and performed well in the presence of strong auxiliary variables. We recommend considering MI that ignores the multilevel structure in the imputation model when data are missing in a time-varying confounder, incorporating variables associated with missingness in the MI models as well as conducting sensitivity analyses across plausible assumptions.  相似文献   

13.
T Sato 《Statistics in medicine》1991,10(7):1037-1042
Liang gave an extension of the Mantel-Haenszel estimating procedure for a common odds ratio to logistic regression models. It is applicable to case-control studies with multiple exposure levels, which yield K 2 x J tables. This paper provides variance and covariance estimators, which are consistent in both sparse-data and large-strata, for Liang's estimating functions in the K 2 x J tables case, and proposes an approximate confidence interval method for the common odds ratios.  相似文献   

14.
ObjectiveMissing indicator method (MIM) and complete case analysis (CC) are frequently used to handle missing confounder data. Using empirical data, we demonstrated the degree and direction of bias in the effect estimate when using these methods compared with multiple imputation (MI).Study Design and SettingFrom a cohort study, we selected an exposure (marital status), outcome (depression), and confounders (age, sex, and income). Missing values in “income” were created according to different patterns of missingness: missing values were created completely at random and depending on exposure and outcome values. Percentages of missing values ranged from 2.5% to 30%.ResultsWhen missing values were completely random, MIM gave an overestimation of the odds ratio, whereas CC and MI gave unbiased results. MIM and CC gave under- or overestimations when missing values depended on observed values. Magnitude and direction of bias depended on how the missing values were related to exposure and outcome. Bias increased with increasing percentage of missing values.ConclusionMIM should not be used in handling missing confounder data because it gives unpredictable bias of the odds ratio even with small percentages of missing values. CC can be used when missing values are completely random, but it gives loss of statistical power.  相似文献   

15.
Phenotyping, ie, identification of patients possessing a characteristic of interest, is a fundamental task for research conducted using electronic health records. However, challenges to this task include imperfect sensitivity and specificity of clinical codes and inconsistent availability of more detailed data such as laboratory test results. Despite these challenges, most existing electronic health records–derived phenotypes are rule-based, consisting of a series of Boolean arguments informed by expert knowledge of the disease of interest and its coding. The objective of this paper is to introduce a Bayesian latent phenotyping approach that accounts for imperfect data elements and missing not at random missingness patterns that can be used when no gold-standard data are available. We conducted simulation studies to compare alternative phenotyping methods under different patterns of missingness and applied these approaches to a cohort of 68 265 children at elevated risk for type 2 diabetes mellitus (T2DM). In simulation studies, the latent class approach had similar sensitivity to a rule-based approach (95.9% vs 91.9%) while substantially improving specificity (99.7% vs 90.8%). In the PEDSnet cohort, we found that biomarkers and clinical codes were strongly associated with latent T2DM status. The latent T2DM class was also strongly predictive of missingness in biomarkers. Glucose was missing in 83.4% of patients (odds ratio for latent T2DM status = 0.52) while hemoglobin A1c was missing in 91.2% (odds ratio for latent T2DM status = 0.03 ), suggesting missing not at random missingness. The latent phenotype approach may substantially improve on rule-based phenotyping.  相似文献   

16.
This paper argues that the use of the odds ratio parameter in epidemiology needs to be considered with a view to the specific study design and the types of exposure and disease data at hand. Frequently, the odds ratio measure is being used instead of the risk ratio or the incidence-proportion ratio in cohort studies or as an estimate for the incidence-density ratio in case-referent studies. Therefore, the analyses of epidemiologic data have produced biased estimates and the presentation of results has been misleading. However, the odds ratio can be relinquished as an effect measure for these study designs; and, the application of the case-base sampling approach permits the incidence ratio and difference measures to be estimated without any untenable assumptions. For the Poisson regression, the odds ratio is not a parameter of interest; only the risk or rate ratio and difference are relevant. For the conditional logistic regression in matched case-referent studies, the odds ratio remains useful, but only when it is interpreted as an estimate of the incidence-density ratio. Thus the odds ratio should, in general, give way to the incidence ratio and difference as the measures of choice for exposure effect in epidemiology.  相似文献   

17.
Parental cigarette smoking and the risk of spontaneous abortion.   总被引:8,自引:0,他引:8  
Although cigarette smoking is often considered a risk factor for spontaneous abortion, the epidemiologic literature is actually inconsistent. Therefore, the authors examined maternal and paternal smoking and maternal passive smoke exposure using data from a large case-control study of spontaneous abortion (626 cases and 1,300 controls) conducted in Santa Clara County, California, in 1986 and 1987. No excess risk of spontaneous abortion was seen in the 1% of women who smoked an average of more than 20 cigarettes per day in the first trimester. Moderate smokers (11-20 cigarettes per day) had a slightly elevated crude odds ratio of 1.3 (95% confidence interval 0.9-1.9), which was close to unity after adjustment for covariates. Paternal smoking showed a slight crude elevation for moderate and heavy smoking, but no association after adjustment. In contrast, maternal exposure to environmental tobacco smoke for 1 hour or more per day was associated with spontaneous abortion, even after adjustment (odds ratio = 1.5, 95% confidence interval 1.2-1.9). For both maternal direct and environmental exposure, the association appeared to be stronger in second-trimester abortions. Several studies have found stronger associations of smoking with late versus early abortions, perhaps reflecting smoking-associated placental insufficiency and fetal hypoxia.  相似文献   

18.
Sick building syndrome (SBS) is an increasingly common health problem for workers in modern office buildings. It is characterized by irritation of mucous membranes and the skin and general malaise. The impact of environmental tobacco smoke (ETS) exposure and overtime work on these symptoms remains unclear. The authors examined these relations using data from a 1998 cross-sectional survey of 1,281 municipal employees who worked in a variety of buildings in a Japanese city. Logistic regression was used to estimate the odds ratio for symptoms typical of SBS while adjusting for potential confounders. Among nonsmokers, the odds ratio for the association between study-defined SBS and 4 hours of ETS exposure per day was 2.7 (95% confidence interval: 1.6, 4.8), and for most symptom categories, odds ratios increased with increasing hours of ETS exposure. Working overtime for 30 or more hours per month was also associated with SBS symptoms, but the crude odds ratio of 3.0 for SBS (95% confidence interval: 1.8, 5.0) was reduced by 21% after adjustment for variables associated with overtime work and by 49% after further adjustment for perceived work overload. These results suggest that both ETS exposure and extensive amounts of overtime work contribute to the development of SBS symptoms and that the association between overtime and SBS can be explained substantially by the work environment and personal lifestyle correlated with overtime.  相似文献   

19.
BackgroundWe previously developed an approach to address the impact of missing participant data in meta-analyses of continuous variables in trials that used the same measurement instrument. We extend this approach to meta-analyses including trials that use different instruments to measure the same construct.MethodsWe reviewed the available literature, conducted an iterative consultative process, and developed an approach involving a complete-case analysis complemented by sensitivity analyses that apply a series of increasingly stringent assumptions about results in patients with missing continuous outcome data.ResultsOur approach involves choosing the reference measurement instrument; converting scores from different instruments to the units of the reference instrument; developing four successively more stringent imputation strategies for addressing missing participant data; calculating a pooled mean difference for the complete-case analysis and imputation strategies; calculating the proportion of patients who experienced an important treatment effect; and judging the impact of the imputation strategies on the confidence in the estimate of effect. We applied our approach to an example systematic review of respiratory rehabilitation for chronic obstructive pulmonary disease.ConclusionsOur extended approach provides quantitative guidance for addressing missing participant data in systematic reviews of trials using different instruments to measure the same construct.  相似文献   

20.
Epidemiologic research often aims to estimate the association between a binary exposure and a binary outcome, while adjusting for a set of covariates (eg, confounders). When data are clustered, as in, for instance, matched case-control studies and co-twin-control studies, it is common to use conditional logistic regression. In this model, all cluster-constant covariates are absorbed into a cluster-specific intercept, whereas cluster-varying covariates are adjusted for by explicitly adding these as explanatory variables to the model. In this paper, we propose a doubly robust estimator of the exposure-outcome odds ratio in conditional logistic regression models. This estimator protects against bias in the odds ratio estimator due to misspecification of the part of the model that contains the cluster-varying covariates. The doubly robust estimator uses two conditional logistic regression models for the odds ratio, one prospective and one retrospective, and is consistent for the exposure-outcome odds ratio if at least one of these models is correctly specified, not necessarily both. We demonstrate the properties of the proposed method by simulations and by re-analyzing a publicly available dataset from a matched case-control study on induced abortion and infertility.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号