首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Missing covariate values are prevalent in regression applications. While an array of methods have been developed for estimating parameters in regression models with missing covariate data for a variety of response types, minimal focus has been given to validation of the response model and influence diagnostics. Previous research has mainly focused on estimating residuals for observations with missing covariates using expected values, after which specialized techniques are needed to conduct proper inference. We suggest a multiple imputation strategy that allows for the use of standard methods for residual analyses on the imputed data sets or a stacked data set. We demonstrate the suggested multiple imputation method by analyzing the Sleep in Mammals data in the context of a linear regression model and the New York Social Indicators Status data with a logistic regression model.  相似文献   

2.
The potential for bias due to misclassification error in regression analysis is well understood by statisticians and epidemiologists. Assuming little or no available data for estimating misclassification probabilities, investigators sometimes seek to gauge the sensitivity of an estimated effect to variations in the assumed values of those probabilities. We present an intuitive and flexible approach to such a sensitivity analysis, assuming an underlying logistic regression model. For outcome misclassification, we argue that a likelihood‐based analysis is the cleanest and the most preferable approach. In the case of covariate misclassification, we combine observed data on the outcome, error‐prone binary covariate of interest, and other covariates measured without error, together with investigator‐supplied values for sensitivity and specificity parameters, to produce corresponding positive and negative predictive values. These values serve as estimated weights to be used in fitting the model of interest to an appropriately defined expanded data set using standard statistical software. Jackknifing provides a convenient tool for incorporating uncertainty in the estimated weights into valid standard errors to accompany log odds ratio estimates obtained from the sensitivity analysis. Examples illustrate the flexibility of this unified strategy, and simulations suggest that it performs well relative to a maximum likelihood approach carried out via numerical optimization. Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

3.
In a recent paper (Weller EA, Milton DK, Eisen EA, Spiegelman D. Regression calibration for logistic regression with multiple surrogates for one exposure. Journal of Statistical Planning and Inference 2007; 137: 449‐461), the authors discussed fitting logistic regression models when a scalar main explanatory variable is measured with error by several surrogates, that is, a situation with more surrogates than variables measured with error. They compared two methods of adjusting for measurement error using a regression calibration approximate model as if it were exact. One is the standard regression calibration approach consisting of substituting an estimated conditional expectation of the true covariate given observed data in the logistic regression. The other is a novel two‐stage approach when the logistic regression is fitted to multiple surrogates, and then a linear combination of estimated slopes is formed as the estimate of interest. Applying estimated asymptotic variances for both methods in a single data set with some sensitivity analysis, the authors asserted superiority of their two‐stage approach. We investigate this claim in some detail. A troubling aspect of the proposed two‐stage method is that, unlike standard regression calibration and a natural form of maximum likelihood, the resulting estimates are not invariant to reparameterization of nuisance parameters in the model. We show, however, that, under the regression calibration approximation, the two‐stage method is asymptotically equivalent to a maximum likelihood formulation, and is therefore in theory superior to standard regression calibration. However, our extensive finite‐sample simulations in the practically important parameter space where the regression calibration model provides a good approximation failed to uncover such superiority of the two‐stage method. We also discuss extensions to different data structures. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

4.
It is well known that measurement error in the covariates of regression models generally causes bias in parameter estimates. Correction for such biases requires information concerning the measurement error, which is often in the form of internal validation or replication data. Regression calibration (RC) is a popular approach to correct for covariate measurement error, which involves predicting the true covariate using error‐prone measurements. Likelihood methods have previously been proposed as an alternative approach to estimate the parameters in models affected by measurement error, but have been relatively infrequently employed in medical statistics and epidemiology, partly because of computational complexity and concerns regarding robustness to distributional assumptions. We show how a standard random‐intercepts model can be used to obtain maximum likelihood (ML) estimates when the outcome model is linear or logistic regression under certain normality assumptions, when internal error‐prone replicate measurements are available. Through simulations we show that for linear regression, ML gives more efficient estimates than RC, although the gain is typically small. Furthermore, we show that RC and ML estimates remain consistent even when the normality assumptions are violated. For logistic regression, our implementation of ML is consistent if the true covariate is conditionally normal given the outcome, in contrast to RC. In simulations, this ML estimator showed less bias in situations where RC gives non‐negligible biases. Our proposal makes the ML approach to dealing with covariate measurement error more accessible to researchers, which we hope will improve its viability as a useful alternative to methods such as RC. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

5.
This paper addresses the modelling of missing covariate data with the logistic regression model. The aim of this paper is to evaluate the properties of an efficient score for logistic regression in a two-phase design. Simulation studies show that the efficient score is more efficient than two other pseudo-likelihood methods when the correlation between the missing covariate and its surrogate is high or the sampling proportion is small. These methods are illustrated with data from the National Wilms Tumor Study Group. Results from the example confirm the simulation study findings with the exception that the pseudo-likelihood approach produces more reliable estimates than the weighted pseudo-likelihood approach.  相似文献   

6.
PurposeDespite growing popularity of propensity score (PS) methods used in ethnic disparities studies, many researchers lack clear understanding of when to use PS in place of conventional regression models. One such scenario is presented here: When the relationship between ethnicity and primary care utilization is confounded with and modified by socioeconomic status. Here, standard regression fails to produce an overall disparity estimate, whereas PS methods can through the choice of a reference sample (RS) to which the effect estimate is generalized.MethodsUsing data from the National Alcohol Surveys, ethnic disparities between White and Hispanics in access to primary care were estimated using PS methods (PS stratification and weighting), standard logistic regression, and the marginal effects from logistic regression models incorporating effect modification.ResultsWhites, Hispanics, and combined White/Hispanic samples were used separately as the RS. Two strategies utilizing PS generated disparities estimates different from those from standard logistic regression, but similar to marginal odd ratios from logistic regression with ethnicity by covariate interactions included in the model.ConclusionsWhen effect modification is present, PS estimates are comparable with marginal estimates from regression models incorporating effect modification. The estimation process requires a priori hypotheses to guide selection of the RS.  相似文献   

7.
We apply a novel technique to detect significant covariates in linkage analysis using a logistic regression approach. An overall test of linkage is first performed to determine whether there is significant perturbation from the expected 50% sharing under the hypothesis of no linkage; if the overall test is significant, the importance of the individual covariate is assessed. In addition, association analyses were performed. These methods were applied to simulated data from multiple populations, and detected correct marker linkages and associations. No population heterogeneity was detected. These methods have the advantages of using all sib pairs and of providing a formal test for heterogeneity across populations.  相似文献   

8.
Relating time‐varying biomarkers of Alzheimer's disease to time‐to‐event using a Cox model is complicated by the fact that Alzheimer's disease biomarkers are sparsely collected, typically only at study entry; this is problematic since Cox regression with time‐varying covariates requires observation of the covariate process at all failure times. The analysis might be simplified by using study entry as the time origin and treating the time‐varying covariate measured at study entry as a fixed baseline covariate. In this paper, we first derive conditions under which using an incorrect time origin of study entry results in consistent estimation of regression parameters when the time‐varying covariate is continuous and fully observed. We then derive conditions under which treating the time‐varying covariate as fixed at study entry results in consistent estimation. We provide methods for estimating the regression parameter when a functional form can be assumed for the time‐varying biomarker, which is measured only at study entry. We demonstrate our analytical results in a simulation study and apply our methods to data from the Rush Religious Orders Study and Memory and Aging Project and data from the Alzheimer's Disease Neuroimaging Initiative.  相似文献   

9.
For a given regression problem it is possible to identify a suitably defined equivalent two-sample problem such that the power or sample size obtained for the two-sample problem also applies to the regression problem. For a standard linear regression model the equivalent two-sample problem is easily identified, but for generalized linear models and for Cox regression models the situation is more complicated. An approximately equivalent two-sample problem may, however, also be identified here. In particular, we show that for logistic regression and Cox regression models the equivalent two-sample problem is obtained by selecting two equally sized samples for which the parameters differ by a value equal to the slope times twice the standard deviation of the independent variable and further requiring that the overall expected number of events is unchanged. In a simulation study we examine the validity of this approach to power calculations in logistic regression and Cox regression models. Several different covariate distributions are considered for selected values of the overall response probability and a range of alternatives. For the Cox regression model we consider both constant and non-constant hazard rates. The results show that in general the approach is remarkably accurate even in relatively small samples. Some discrepancies are, however, found in small samples with few events and a highly skewed covariate distribution. Comparison with results based on alternative methods for logistic regression models with a single continuous covariate indicates that the proposed method is at least as good as its competitors. The method is easy to implement and therefore provides a simple way to extend the range of problems that can be covered by the usual formulas for power and sample size determination.  相似文献   

10.
Measurement error is common in epidemiological and biomedical studies. When biomarkers are measured in batches or groups, measurement error is potentially correlated within each batch or group. In regression analysis, most existing methods are not applicable in the presence of batch‐specific measurement error in predictors. We propose a robust conditional likelihood approach to account for batch‐specific error in predictors when batch effect is additive and the predominant source of error, which requires no assumptions on the distribution of measurement error. Although a regression model with batch as a categorical covariable yields the same parameter estimates as the proposed conditional likelihood approach for linear regression, this result does not hold in general for all generalized linear models, in particular, logistic regression. Our simulation studies show that the conditional likelihood approach achieves better finite sample performance than the regression calibration approach or a naive approach without adjustment for measurement error. In the case of logistic regression, our proposed approach is shown to also outperform the regression approach with batch as a categorical covariate. In addition, we also examine a ‘hybrid’ approach combining the conditional likelihood method and the regression calibration method, which is shown in simulations to achieve good performance in the presence of both batch‐specific and measurement‐specific errors. We illustrate our method by using data from a colorectal adenoma study. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

11.
Methods of estimation and inference about survival distributions based on length-biased samples are well-established. Comparatively little attention has been given to the assessment of covariate effects in the context of length-biased samples, but prevalent cohort studies often have this objective. We show that, like the survival distribution, the covariate distribution from a prevalent cohort study is length-biased, and that this distribution may contain parametric information about covariate effects on the survival time. As a result, a likelihood based on the joint distribution of the survival time and the covariates yields estimates of covariate effects which are at least as efficient as estimates arising from a traditional likelihood which conditions on covariate values in the length-biased sample. We also investigate the empirical bias of estimators arising from a joint likelihood when the population covariate distribution is misspecified. The asymptotic relative efficiencies and empirical biases under model misspecification are assessed for both proportional hazards and accelerated failure time models. The various methods considered are applied in an illustrative analysis of risk factors for death following onset of dementia using data collected in the Canadian Study of Health and Aging.  相似文献   

12.
Population prevalence rates of dementia using stratified sampling have previously been estimated using two methods: standard weighted estimates and a logistic model-based approach. An earlier study described this application of the model-based approach and reported a small computer simulation comparing the performance of this estimator to the standard weighted estimator. In this article we use large-scale computer simulations based on data from the recently completed Kame survey of prevalent dementia in the Japanese-American residents of King County, Washington, to describe the performance of these estimators. We found that the standard weighted estimator was unbiased. This estimator performed well for a sample design with proportional allocation, but performed poorly for a sample design that included large strata that were lightly sampled. The logistic model-based estimator performed consistently well for all sample designs considered in terms of the extent of variability in estimation, although some modest bias was observed.  相似文献   

13.
PurposePropensity scores (PSs), a powerful bias-reduction tool, can balance treatment groups on measured covariates in nonexperimental studies. We demonstrate the use of multiple PS estimation methods to optimize covariate balance.MethodsWe used secondary data from 1292 adults with nonpsychotic major depressive disorder in the Sequenced Treatment Alternatives to Relieve Depression trial (2001–2004). After initial citalopram treatment failed, patient preference influenced assignment to medication augmentation (n = 565) or switch (n = 727). To reduce selection bias, we used boosted classification and regression trees (BCART) and logistic regression iteratively to identify two potentially optimal PSs. We assessed and compared covariate balance.ResultsAfter iterative selection of interaction terms to minimize imbalance, logistic regression yielded better balance than BCART (average standardized absolute mean difference across 47 covariates: 0.03 vs. 0.08, matching; 0.02 vs. 0.05, weighting).ConclusionsComparing multiple PS estimates is a pragmatic way to optimize balance. Logistic regression remains valuable for this purpose. Simulation studies are needed to compare PS models under varying conditions. Such studies should consider more flexible estimation methods, such as logistic models with automated selection of interactions or hybrid models using main effects logistic regression instead of a constant log-odds as the initial model for BCART.  相似文献   

14.

Objective

To illustrate the use of ensemble tree-based methods (random forest classification [RFC] and bagging) for propensity score estimation and to compare these methods with logistic regression, in the context of evaluating the effect of physical and occupational therapy on preschool motor ability among very low birth weight (VLBW) children.

Data Source

We used secondary data from the Early Childhood Longitudinal Study Birth Cohort (ECLS-B) between 2001 and 2006.

Study Design

We estimated the predicted probability of treatment using tree-based methods and logistic regression (LR). We then modeled the exposure-outcome relation using weighted LR models while considering covariate balance and precision for each propensity score estimation method.

Principal Findings

Among approximately 500 VLBW children, therapy receipt was associated with moderately improved preschool motor ability. Overall, ensemble methods produced the best covariate balance (Mean Squared Difference: 0.03–0.07) and the most precise effect estimates compared to LR (Mean Squared Difference: 0.11). The overall magnitude of the effect estimates was similar between RFC and LR estimation methods.

Conclusion

Propensity score estimation using RFC and bagging produced better covariate balance with increased precision compared to LR. Ensemble methods are a useful alterative to logistic regression to control confounding in observational studies.  相似文献   

15.
Cai B  Small DS  Have TR 《Statistics in medicine》2011,30(15):1809-1824
We present closed-form expressions of asymptotic bias for the causal odds ratio from two estimation approaches of instrumental variable logistic regression: (i) the two-stage predictor substitution (2SPS) method and (ii) the two-stage residual inclusion (2SRI) approach. Under the 2SPS approach, the first stage model yields the predicted value of treatment as a function of an instrument and covariates, and in the second stage model for the outcome, this predicted value replaces the observed value of treatment as a covariate. Under the 2SRI approach, the first stage is the same, but the residual term of the first stage regression is included in the second stage regression, retaining the observed treatment as a covariate. Our bias assessment is for a different context from that of Terza (J. Health Econ. 2008; 27(3):531-543), who focused on the causal odds ratio conditional on the unmeasured confounder, whereas we focus on the causal odds ratio among compliers under the principal stratification framework. Our closed-form bias results show that the 2SPS logistic regression generates asymptotically biased estimates of this causal odds ratio when there is no unmeasured confounding and that this bias increases with increasing unmeasured confounding. The 2SRI logistic regression is asymptotically unbiased when there is no unmeasured confounding, but when there is unmeasured confounding, there is bias and it increases with increasing unmeasured confounding. The closed-form bias results provide guidance for using these IV logistic regression methods. Our simulation results are consistent with our closed-form analytic results under different combinations of parameter settings.  相似文献   

16.
A standard analysis of the Framingham Heart Study data is a generalized person-years approach in which risk factors or covariates are measured every two years with a follow-up between these measurement times to observe the occurrence of events such as cardiovascular disease. Observations over multiple intervals are pooled into a single sample and a logistic regression is employed to relate the risk factors to the occurrence of the event. We show that this pooled logistic regression is close to the time dependent covariate Cox regression analysis. Numerical examples covering a variety of sample sizes and proportions of events display the closeness of this relationship in situations typical of the Framingham Study. A proof of the relationship and the necessary conditions are given in the Appendix.  相似文献   

17.
We discuss Bayesian estimation of a logistic regression model with an unknown threshold limiting value (TLV). In these models it is assumed that there is no effect of a covariate on the response under a certain unknown TLV. The estimation of these models in a Bayesian context by Markov chain Monte Carlo (MCMC) methods is considered with focus on the TLV. We extend the model by accounting for measurement error in the covariate. The Bayesian solution is compared with the likelihood solution proposed by Küchenhoff and Carroll using a data set concerning the relationship between dust concentration in the working place and the occurrence of chronic bronchitis.  相似文献   

18.
We consider the joint modelling of longitudinal and event time data. The longitudinal data are irregularly collected and the event times are subject to right censoring. Most methods described in the literature are quite complex and do not belong to the standard statistical tools. We propose a more practical approach using Cox regression with time-dependent covariates. Since the longitudinal data are observed irregularly, we have to account for differences in observation frequency between individual patients. Therefore, the time elapsed since last observation (TEL) is added to the model. TEL and its interaction with the time-dependent covariate show a strong effect on the hazard. The latter indicates that older recordings have less impact than recent recordings. Pros and cons of this methodology are discussed and a simulation study is performed to study the effect of TEL on the hazard. The fitted Cox model serves as a starting point for the prediction of future patient's events. Our method is applied to a study on chronic myeloid leukaemia (CML) with longitudinal white blood cell counts (WBC) as time-dependent covariate and patient's death as event.  相似文献   

19.

Background

In molecular epidemiology studies biospecimen data are collected, often with the purpose of evaluating the synergistic role between a biomarker and another feature on an outcome. Typically, biomarker data are collected on only a proportion of subjects eligible for study, leading to a missing data problem. Missing data methods, however, are not customarily incorporated into analyses. Instead, complete-case (CC) analyses are performed, which can result in biased and inefficient estimates.

Methods

Through simulations, we characterized the performance of CC methods when interaction effects are estimated. We also investigated whether standard multiple imputation (MI) could improve estimation over CC methods when the data are not missing at random (NMAR) and auxiliary information may or may not exist.

Results

CC analyses were shown to result in considerable bias and efficiency loss. While MI reduced bias and increased efficiency over CC methods under specific conditions, it too resulted in biased estimates depending on the strength of the auxiliary data available and the nature of the missingness. In particular, CC performed better than MI when extreme values of the covariate were more likely to be missing, while MI outperformed CC when missingness of the covariate related to both the covariate and outcome. MI always improved performance when strong auxiliary data were available. In a real study, MI estimates of interaction effects were attenuated relative to those from a CC approach.

Conclusions

Our findings suggest the importance of incorporating missing data methods into the analysis. If the data are MAR, standard MI is a reasonable method. Auxiliary variables may make this assumption more reasonable even if the data are NMAR. Under NMAR we emphasize caution when using standard MI and recommend it over CC only when strong auxiliary data are available. MI, with the missing data mechanism specified, is an alternative when the data are NMAR. In all cases, it is recommended to take advantage of MI's ability to account for the uncertainty of these assumptions.  相似文献   

20.
In order to adjust individual‐level covariate effects for confounding due to unmeasured neighborhood characteristics, we have recently developed conditional pseudolikelihood methods to estimate the parameters of a proportional odds model for clustered ordinal outcomes with complex survey data. The methods require sampling design joint probabilities for each within‐neighborhood pair. In the present article, we develop a similar methodology for a baseline category logit model for clustered multinomial outcomes and for a loglinear model for clustered count outcomes. All of the estimators and asymptotic sampling distributions we present can be conveniently computed using standard logistic regression software for complex survey data, such as sas proc surveylogistic . We demonstrate validity of the methods theoretically and also empirically by using simulations. We apply the new method for clustered multinomial outcomes to data from the 2008 Florida Behavioral Risk Factor Surveillance System survey in order to investigate disparities in frequency of dental cleaning both unadjusted and adjusted for confounding by neighborhood. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号