共查询到20条相似文献,搜索用时 15 毫秒
1.
We consider regression analysis of a disease outcome in relation to longitudinal data which are observations from a random effects model. The covariate variables of interest are the values of the underlying trajectory at some time points, which may be fixed or subject-specific. Because the underlying random coefficients are unknown, the covariates to the primary model are generally unobserved. In addition, measurements are often not observed at the time points of interest. A motivating example to our model is the effects of age at adiposity rebound and the associated body mass index on the risk of adult obesity. The adiposity rebound is a time point at which the trajectory of a child's body fatness declines to a minimum. This general error in timing problem may be applied to an analysis when time-dependent marker variables follow a polynomial model in which the effect of a local maximum or minimum point may be of interest. It can be seen that directly applying estimated covariates, possibly obtained from estimated time points, may lead to bias. Estimation procedures based on expected estimating equations, regression calibration and simulation extrapolation are applied to this problem. 相似文献
2.
Randomized clinical trials increasingly collect daily data, frequently using electronic diaries. Such data are usually summarized into an 'intermediate' continuous outcome (such as the mean of the daily values in a period before a scheduled clinic visit). These are in turn often summarized further into a binary outcome, for example, indicating whether the intermediate continuous outcome has improved by a prespecified amount from randomization. This article compares and contrasts statistical approaches for analyzing such binary outcomes when the underlying study is subject to dropout so that some of the underlying diary data are missing. Such analysis involves rigorous rules for the derivation of outcomes, a thorough data exploration for the selection of covariates, and an elucidation of the missingness mechanism. The investigated statistical methods for treatment-effect analysis are based on direct modeling and on multiple imputation and are applied either to the binary outcome or the intermediate continuous outcome or to the daily diary data. These are compared on the basis of criteria for inferences at prespecified times during the follow-up. We show that multiple-imputation methods are particularly well adapted to our context and that missing data imputation on the daily diary data, rather than the derived outcomes, makes best use of the available information. The data set, which motivated our investigation, comes from a placebo-controlled clinical trial to assess the effect on pain of a new compound. 相似文献
3.
Regression calibration (RC) is a popular method for estimating regression coefficients when one or more continuous explanatory variables, X, are measured with an error. In this method, the mismeasured covariate, W, is substituted by the expectation E(X|W), based on the assumption that the error in the measurement of X is non-differential. Using simulations, we compare three versions of RC with two other 'substitution' methods, moment reconstruction (MR) and imputation (IM), neither of which rely on the non-differential error assumption. We investigate studies that have an internal calibration sub-study. For RC, we consider (i) the usual version of RC, (ii) RC applied only to the 'marker' information in the calibration study, and (iii) an 'efficient' version (ERC) in which the estimators (i) and (ii) are combined. Our results show that ERC is preferable when there is non-differential measurement error. Under this condition, there are cases where ERC is less efficient than MR or IM, but they rarely occur in epidemiology. We show that the efficiency gain of usual RC and ERC over the other methods can sometimes be dramatic. The usual version of RC carries similar efficiency gains to ERC over MR and IM, but becomes unstable as measurement error becomes large, leading to bias and poor precision. When differential measurement error does pertain, then MR and IM have considerably less bias than RC, but can have much larger variance. We demonstrate our findings with an analysis of dietary fat intake and mortality in a large cohort study. 相似文献
4.
Data collected in many epidemiological or clinical research studies are often contaminated with measurement errors that may be of classical or Berkson error type. The measurement error may also be a combination of both classical and Berkson errors and failure to account for both errors could lead to unreliable inference in many situations. We consider regression analysis in generalized linear models when some covariates are prone to a mixture of Berkson and classical errors, and calibration data are available only for some subjects in a subsample. We propose an expected estimating equation approach to accommodate both errors in generalized linear regression analyses. The proposed method can consistently estimate the classical and Berkson error variances based on the available data, without knowing the mixture percentage. We investigated its finite‐sample performance numerically. Our method is illustrated by an application to real data from an HIV vaccine study. Copyright © 2013 John Wiley & Sons, Ltd. 相似文献
5.
It is known that measurement error leads to bias in assessing exposure effects, which can however, be corrected if independent replicates are available. For expensive replicates, two‐stage (2S) studies that produce data ‘missing by design’, may be preferred over a single‐stage (1S) study, because in the second stage, measurement of replicates is restricted to a sample of first‐stage subjects. Motivated by an occupational study on the acute effect of carbon black exposure on respiratory morbidity, we compare the performance of several bias‐correction methods for both designs in a simulation study: an instrumental variable method (EVROS IV) based on grouping strategies, which had been recommended especially when measurement error is large, the regression calibration and the simulation extrapolation methods. For the 2S design, either the problem of ‘missing’ data was ignored or the ‘missing’ data were imputed using multiple imputations. Both in 1S and 2S designs, in the case of small or moderate measurement error, regression calibration was shown to be the preferred approach in terms of root mean square error. For 2S designs, regression calibration as implemented by Stata software is not recommended in contrast to our implementation of this method; the ‘problematic’ implementation of regression calibration although substantially improved with use of multiple imputations. The EVROS IV method, under a good/fairly good grouping, outperforms the regression calibration approach in both design scenarios when exposure mismeasurement is severe. Both in 1S and 2S designs with moderate or large measurement error, simulation extrapolation severely failed to correct for bias. Copyright © 2012 John Wiley & Sons, Ltd. 相似文献
6.
Multiple imputation (MI) is becoming increasingly popular for handling missing data. Standard approaches for MI assume normality for continuous variables (conditionally on the other variables in the imputation model). However, it is unclear how to impute non‐normally distributed continuous variables. Using simulation and a case study, we compared various transformations applied prior to imputation, including a novel non‐parametric transformation, to imputation on the raw scale and using predictive mean matching (PMM) when imputing non‐normal data. We generated data from a range of non‐normal distributions, and set 50% to missing completely at random or missing at random. We then imputed missing values on the raw scale, following a zero‐skewness log, Box–Cox or non‐parametric transformation and using PMM with both type 1 and 2 matching. We compared inferences regarding the marginal mean of the incomplete variable and the association with a fully observed outcome. We also compared results from these approaches in the analysis of depression and anxiety symptoms in parents of very preterm compared with term‐born infants. The results provide novel empirical evidence that the decision regarding how to impute a non‐normal variable should be based on the nature of the relationship between the variables of interest. If the relationship is linear in the untransformed scale, transformation can introduce bias irrespective of the transformation used. However, if the relationship is non‐linear, it may be important to transform the variable to accurately capture this relationship. A useful alternative is to impute the variable using PMM with type 1 matching. Copyright © 2016 John Wiley & Sons, Ltd. 相似文献
7.
Zhiguo Li 《Statistics in medicine》2017,36(3):403-415
In sequential multiple assignment randomized trials, longitudinal outcomes may be the most important outcomes of interest because this type of trials is usually conducted in areas of chronic diseases or conditions. We propose to use a weighted generalized estimating equation (GEE) approach to analyzing data from such type of trials for comparing two adaptive treatment strategies based on generalized linear models. Although the randomization probabilities are known, we consider estimated weights in which the randomization probabilities are replaced by their empirical estimates and prove that the resulting weighted GEE estimator is more efficient than the estimators with true weights. The variance of the weighted GEE estimator is estimated by an empirical sandwich estimator. The time variable in the model can be linear, piecewise linear, or more complicated forms. This provides more flexibility that is important because, in the adaptive treatment setting, the treatment changes over time and, hence, a single linear trend over the whole period of study may not be practical. Simulation results show that the weighted GEE estimators of regression coefficients are consistent regardless of the specification of the correlation structure of the longitudinal outcomes. The weighted GEE method is then applied in analyzing data from the Clinical Antipsychotic Trials of Intervention Effectiveness. Copyright © 2016 John Wiley & Sons, Ltd. 相似文献
8.
Yizhuo Wang;Susan Murray; 《Statistics in medicine》2024,43(6):1170-1193
This research introduces a multivariate τ$$ tau $$-inflated beta regression (τ$$ tau $$-IBR) modeling approach for the analysis of censored recurrent event data that is particularly useful when there is a mixture of (a) individuals who are generally less susceptible to recurrent events and (b) heterogeneity in duration of event-free periods amongst those who experience events. The modeling approach is applied to a restructured version of the recurrent event data that consists of censored longitudinal times-to-first-event in τ$$ tau $$ length follow-up windows that potentially overlap. Multiple imputation (MI) and expectation-solution (ES) approaches appropriate for censored data are developed as part of the model fitting process. A suite of useful analysis outputs are provided from the τ$$ tau $$-IBR model that include parameter estimates to help interpret the (a) and (b) mixture of event times in the data, estimates of mean τ$$ tau $$-restricted event-free duration in a τ$$ tau $$-length follow-up window based on a patient's covariate profile, and heat maps of raw τ$$ tau $$-restricted event-free durations observed in the data with censored observations augmented via averages across MI datasets. Simulations indicate good statistical performance of the proposed τ$$ tau $$-IBR approach to modeling censored recurrent event data. An example is given based on the Azithromycin for Prevention of COPD Exacerbations Trial. 相似文献
9.
We consider the estimation of the regression of an outcome Y on a covariate X, where X is unobserved, but a variable W that measures X with error is observed. A calibration sample that measures pairs of values of X and W is also available; we consider calibration samples where Y is measured (internal calibration) and not measured (external calibration). One common approach for measurement error correction is Regression Calibration (RC), which substitutes the unknown values of X by predictions from the regression of X on W estimated from the calibration sample. An alternative approach is to multiply impute the missing values of X given Y and W based on an imputation model, and then use multiple imputation (MI) combining rules for inferences. Most of current work assumes that the measurement error of W has a constant variance, whereas in many situations, the variance varies as a function of X. We consider extensions of the RC and MI methods that allow for heteroscedastic measurement error, and compare them by simulation. The MI method is shown to provide better inferences in this setting. We also illustrate the proposed methods using a data set from the BioCycle study. 相似文献
10.
Jules Brice Tchatchueng Mbougua Christian Laurent Ibra Ndoye Eric Delaporte Henri Gwet Nicolas Molinari 《Statistics in medicine》2013,32(26):4651-4665
Multiple imputation is commonly used to impute missing covariate in Cox semiparametric regression setting. It is to fill each missing data with more plausible values, via a Gibbs sampling procedure, specifying an imputation model for each missing variable. This imputation method is implemented in several softwares that offer imputation models steered by the shape of the variable to be imputed, but all these imputation models make an assumption of linearity on covariates effect. However, this assumption is not often verified in practice as the covariates can have a nonlinear effect. Such a linear assumption can lead to a misleading conclusion because imputation model should be constructed to reflect the true distributional relationship between the missing values and the observed values. To estimate nonlinear effects of continuous time invariant covariates in imputation model, we propose a method based on B‐splines function. To assess the performance of this method, we conducted a simulation study, where we compared the multiple imputation method using Bayesian splines imputation model with multiple imputation using Bayesian linear imputation model in survival analysis setting. We evaluated the proposed method on the motivated data set collected in HIV‐infected patients enrolled in an observational cohort study in Senegal, which contains several incomplete variables. We found that our method performs well to estimate hazard ratio compared with the linear imputation methods, when data are missing completely at random, or missing at random. Copyright © 2013 John Wiley & Sons, Ltd. 相似文献
11.
Louise Marston Janet L. Peacock Keming Yu Peter Brocklehurst Sandra A. Calvert Anne Greenough Neil Marlow 《Paediatric and perinatal epidemiology》2009,23(4):380-392
Studies of prematurely born infants contain a relatively large percentage of multiple births, so the resulting data have a hierarchical structure with small clusters of size 1, 2 or 3. Ignoring the clustering may lead to incorrect inferences. The aim of this study was to compare statistical methods which can be used to analyse such data: generalised estimating equations, multilevel models, multiple linear regression and logistic regression. Four datasets which differed in total size and in percentage of multiple births ( n = 254, multiple 18%; n = 176, multiple 9%; n = 10 098, multiple 3%; n = 1585, multiple 8%) were analysed.
With the continuous outcome, two-level models produced similar results in the larger dataset, while generalised least squares multilevel modelling (ML GLS 'xtreg' in Stata) and maximum likelihood multilevel modelling (ML MLE 'xtmixed' in Stata) produced divergent estimates using the smaller dataset. For the dichotomous outcome, most methods, except generalised least squares multilevel modelling (ML GH 'xtlogit' in Stata) gave similar odds ratios and 95% confidence intervals within datasets. For the continuous outcome, our results suggest using multilevel modelling.
We conclude that generalised least squares multilevel modelling (ML GLS 'xtreg' in Stata) and maximum likelihood multilevel modelling (ML MLE 'xtmixed' in Stata) should be used with caution when the dataset is small. Where the outcome is dichotomous and there is a relatively large percentage of non-independent data, it is recommended that these are accounted for in analyses using logistic regression with adjusted standard errors or multilevel modelling. If, however, the dataset has a small percentage of clusters greater than size 1 (e.g. a population dataset of children where there are few multiples) there appears to be less need to adjust for clustering. 相似文献
With the continuous outcome, two-level models produced similar results in the larger dataset, while generalised least squares multilevel modelling (ML GLS 'xtreg' in Stata) and maximum likelihood multilevel modelling (ML MLE 'xtmixed' in Stata) produced divergent estimates using the smaller dataset. For the dichotomous outcome, most methods, except generalised least squares multilevel modelling (ML GH 'xtlogit' in Stata) gave similar odds ratios and 95% confidence intervals within datasets. For the continuous outcome, our results suggest using multilevel modelling.
We conclude that generalised least squares multilevel modelling (ML GLS 'xtreg' in Stata) and maximum likelihood multilevel modelling (ML MLE 'xtmixed' in Stata) should be used with caution when the dataset is small. Where the outcome is dichotomous and there is a relatively large percentage of non-independent data, it is recommended that these are accounted for in analyses using logistic regression with adjusted standard errors or multilevel modelling. If, however, the dataset has a small percentage of clusters greater than size 1 (e.g. a population dataset of children where there are few multiples) there appears to be less need to adjust for clustering. 相似文献
12.
Multiple imputation (MI) is one of the most popular methods to deal with missing data, and its use has been rapidly increasing in medical studies. Although MI is rather appealing in practice since it is possible to use ordinary statistical methods for a complete data set once the missing values are fully imputed, the method of imputation is still problematic. If the missing values are imputed from some parametric model, the validity of imputation is not necessarily ensured, and the final estimate for a parameter of interest can be biased unless the parametric model is correctly specified. Nonparametric methods have been also proposed for MI, but it is not so straightforward as to produce imputation values from nonparametrically estimated distributions. In this paper, we propose a new method for MI to obtain a consistent (or asymptotically unbiased) final estimate even if the imputation model is misspecified. The key idea is to use an imputation model from which the imputation values are easily produced and to make a proper correction in the likelihood function after the imputation by using the density ratio between the imputation model and the true conditional density function for the missing variable as a weight. Although the conditional density must be nonparametrically estimated, it is not used for the imputation. The performance of our method is evaluated by both theory and simulation studies. A real data analysis is also conducted to illustrate our method by using the Duke Cardiac Catheterization Coronary Artery Disease Diagnostic Dataset. 相似文献
13.
We propose a flexible model for correlated medical cost data with several appealing features. First, the mean function is partially linear. Second, the distributional form for the response is not specified. Third, the covariance structure of correlated medical costs has a semiparametric form. We use extended generalized estimating equations to simultaneously estimate all parameters of interest. B‐splines are used to estimate unknown functions, and a modification to Akaike information criterion is proposed for selecting knots in spline bases. We apply the model to correlated medical costs in the Medical Expenditure Panel Survey dataset. Simulation studies are conducted to assess the performance of our method. Copyright © 2015 John Wiley & Sons, Ltd. 相似文献
14.
《Statistics in medicine》2017,36(6):1014-1028
Breast cancers are clinically heterogeneous based on tumor markers. The National Cancer Institute's Surveillance, Epidemiology, and End Results (SEER) Program provides baseline data on these tumor markers for reporting cancer burden and trends over time in the US general population. These tumor markers, however, are often prone to missing observations. In particular, estrogen receptor (ER) status, a key biomarker in the study of breast cancer, has been collected since 1992 but historically was not well‐reported, with missingness rates as high as 25% in early years. Previous methods used to correct estimates of breast cancer incidence or ER‐related odds or prevalence ratios for unknown ER status have relied on a missing‐at‐random (MAR) assumption. In this paper, we explore the sensitivity of these key estimates to departures from MAR. We develop a predictive mean matching procedure that can be used to multiply impute missing ER status under either an MAR or a missing not at random assumption and apply it to the SEER breast cancer data (1992–2012). The imputation procedure uses the predictive power of the rich set of covariates available in the SEER registry while also allowing us to investigate the impact of departures from MAR. We find some differences in inference under the two assumptions, although the magnitude of differences tends to be small. For the types of analyses typically of primary interest, we recommend imputing SEER breast cancer biomarkers under an MAR assumption, given the small apparent differences under MAR and missing not at random assumptions. Copyright © 2016 John Wiley & Sons, Ltd. 相似文献
15.
Ruth H. Keogh Pamela A. Shaw Paul Gustafson Raymond J. Carroll Veronika Deffner Kevin W. Dodd Helmut Küchenhoff Janet A. Tooze Michael P. Wallace Victor Kipnis Laurence S. Freedman 《Statistics in medicine》2020,39(16):2197-2231
Measurement error and misclassification of variables frequently occur in epidemiology and involve variables important to public health. Their presence can impact strongly on results of statistical analyses involving such variables. However, investigators commonly fail to pay attention to biases resulting from such mismeasurement. We provide, in two parts, an overview of the types of error that occur, their impacts on analytic results, and statistical methods to mitigate the biases that they cause. In this first part, we review different types of measurement error and misclassification, emphasizing the classical, linear, and Berkson models, and on the concepts of nondifferential and differential error. We describe the impacts of these types of error in covariates and in outcome variables on various analyses, including estimation and testing in regression models and estimating distributions. We outline types of ancillary studies required to provide information about such errors and discuss the implications of covariate measurement error for study design. Methods for ascertaining sample size requirements are outlined, both for ancillary studies designed to provide information about measurement error and for main studies where the exposure of interest is measured with error. We describe two of the simpler methods, regression calibration and simulation extrapolation (SIMEX), that adjust for bias in regression coefficients caused by measurement error in continuous covariates, and illustrate their use through examples drawn from the Observing Protein and Energy (OPEN) dietary validation study. Finally, we review software available for implementing these methods. The second part of the article deals with more advanced topics. 相似文献
16.
Kaifeng Lu 《Statistics in medicine》2020,39(27):4025-4036
The standard multiple imputation technique focuses on parameter estimation. In this study, we describe a method for conducting score tests following multiple imputation. As an important application, we use the Cochran-Mantel-Haenszel (CMH) test as a score test and compare the proposed multiple imputation method with a method based on the Wilson-Hilferty transformation of the CMH statistic. We show that the proposed multiple imputation method preserves the nominal significance level for three types of alternative hypotheses, whereas that based on the Wilson-Hilferty transformation inflates type I error for the “row means differ” and “general association” alternative hypotheses. Moreover, we find that this type I error inflation worsens as the amount of missing data increases. 相似文献
17.
It is common for longitudinal clinical trials to face problems of item non-response, unit non-response, and drop-out. In this paper, we compare two alternative methods of handling multivariate incomplete data across a baseline assessment and three follow-up time points in a multi-centre randomized controlled trial of a disease management programme for late-life depression. One approach combines hot-deck (HD) multiple imputation using a predictive mean matching method for item non-response and the approximate Bayesian bootstrap for unit non-response. A second method is based on a multivariate normal (MVN) model using PROC MI in SAS software V8.2. These two methods are contrasted with a last observation carried forward (LOCF) technique and available-case (AC) analysis in a simulation study where replicate analyses are performed on subsets of the originally complete cases. Missing-data patterns were simulated to be consistent with missing-data patterns found in the originally incomplete cases, and observed complete data means were taken to be the targets of estimation. Not surprisingly, the LOCF and AC methods had poor coverage properties for many of the variables evaluated. Multiple imputation under the MVN model performed well for most variables but produced less than nominal coverage for variables with highly skewed distributions. The HD method consistently produced close to nominal coverage, with interval widths that were roughly 7 per cent larger on average than those produced from the MVN model. 相似文献
18.
Thomas R. Belin Ming-yi Hu Alexander S. Young Oscar Grusky 《Health services & outcomes research methodology》2000,1(1):7-22
When data analysis tools require that every variable be observed on each case, then missing items on a subset of variables force investigators either to leave potentially interesting variables out of analysis models or to include these variables but drop incomplete cases from the analysis. For example, in a study considered here, mental health patients were interviewed at two time points about a variety of topics that reflect successful adaptation to outpatient treatment, such as support from family and friends and avoidance of legal problems, although not all patients were successfully interviewed at the second time point. In a previous analysis of these data, logistic regression models were developed to relate baseline patient characteristics and recent treatment cost history to binary outcomes capturing aspects of adaptation. In these models, years of education was omitted as a covariate because it was incompletely observed at baseline. Here, we carry out analyses that include information from partially observed cases. Specifically, we use a multivariate model to produce multiple plausible imputed values for each missing item, and we combine results from separate logistic regression analyses on the completed data sets using the multiple imputation inference technique. Although the majority of inferences about specific regression coefficients paralleled those from the original study, some differences are noted. We discuss the implications of having flexible analysis tools for incomplete data in health services research and comment on issues related to model choice. 相似文献
19.
Xiaodong Li Jingchen Liu Naihua Duan Huiping Jiang Ragy Girgis Jeffrey Lieberman 《Statistics in medicine》2014,33(12):2030-2047
Missing data are ubiquitous in longitudinal studies. In this paper, we propose an imputation procedure to handle dropouts in longitudinal studies. By taking advantage of the monotone missing pattern resulting from dropouts, our imputation procedure can be carried out sequentially, which substantially reduces the computation complexity. In addition, at each step of the sequential imputation, we set up a model selection mechanism that chooses between a parametric model and a nonparametric model to impute each missing observation. Unlike usual model selection procedures that aim at finding a single model fitting the entire data set well, our model selection procedure is customized to find a suitable model for the prediction of each missing observation. Copyright © 2014 John Wiley & Sons, Ltd. 相似文献
20.
Longitudinal binomial data are frequently generated from multiple questionnaires and assessments in various scientific settings for which the binomial data are often overdispersed. The standard generalized linear mixed effects model may result in severe underestimation of standard errors of estimated regression parameters in such cases and hence potentially bias the statistical inference. In this paper, we propose a longitudinal beta‐binomial model for overdispersed binomial data and estimate the regression parameters under a probit model using the generalized estimating equation method. A hybrid algorithm of the Fisher scoring and the method of moments is implemented for computing the method. Extensive simulation studies are conducted to justify the validity of the proposed method. Finally, the proposed method is applied to analyze functional impairment in subjects who are at risk of Huntington disease from a multisite observational study of prodromal Huntington disease. Copyright © 2016 John Wiley & Sons, Ltd. 相似文献