首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
When using 'intent-to-treat' approaches to compare outcomes between groups in clinical trials, analysts face a decision regarding how to account for missing observations. Most model-based approaches can be summarized as a process whereby the analyst makes assumptions about the distribution of the missing data in an attempt to obtain unbiased estimates that are based on functions of the observed data. Although pointed out by Rubin as often leading to biased estimates of variances, an alternative approach that continues to appear in the applied literature is to use fixed-value imputation of means for missing observations. The purpose of this paper is to provide illustrations of how several fixed-value mean imputation schemes can be formulated in terms of general linear models that characterize the means of distributions of missing observations in terms of the means of the distributions of observed data. We show that several fixed-value imputation strategies will result in estimated intervention effects that correspond to maximum likelihood estimates obtained under analogous assumptions. If the missing data process has been correctly characterized, hypothesis tests based on variances estimated using maximum likelihood techniques asymptotically have the correct size. In contrast, hypothesis tests performed using the uncorrected variance, obtained by applying standard complete data formula to singly imputed data, can provide either conservative or anticonservative results. Surprisingly, under several non-ignorable non-response scenarios, maximum likelihood based analyses can yield equivalent hypothesis tests to those obtained when analysing only the observed data.  相似文献   

2.
Wu H  Wu L 《Statistics in medicine》2002,21(5):753-771
Non-linear mixed-effects models are powerful tools for modelling HIV viral dynamics. In AIDS clinical trials, the viral load measurements for each subject are often sparse. In such cases, linearization procedures are usually used for inferences. Under such linearization procedures, however, standard covariate selection methods based on the approximate likelihood, such as the likelihood ratio test, may not be reliable. In order to identify significant host factors for HIV dynamics, in this paper we consider two alternative approaches for covariate selection: one is based on individual non-linear least square estimates and the other is based on individual empirical Bayes estimates. Our simulation study shows that, if the within-individual data are sparse and the between-individual variation is large, the two alternative covariate selection methods are more reliable than the likelihood ratio test, and the more powerful method based on individual empirical Bayes estimates is especially preferable. We also consider the missing data in covariates. The commonly used missing data methods may lead to misleading results. We recommend a multiple imputation method to handle missing covariates. A real data set from an AIDS clinical trial is analysed based on various covariate selection methods and missing data methods.  相似文献   

3.
We propose the use of centile estimates which are based on the fitting of appropriate densities by maximum likelihood. In the case of cross-sectional centile estimation, we show that this approach will generally lead to more precise estimates than would result from the use of non-parametric centile estimates. When longitudinal data are available or a series of cross-sectional data at different time points, the maximum likelihood approach can be used to simultaneously fit densities to each cross-section, subject to constraints (for example, smoothness constraints) on the parameters. The variances of these centile estimates are readily obtained and missing values and unequally spaced records are easily accommodated. We illustrate the procedure by means of an application using the Johnson family of densities to a study of weight gain in pregnancy.  相似文献   

4.
In evaluating prognostic factors by means of regression models, missing values in the covariate data are a frequent complication. There exist statistical tools to analyse such incomplete data in an efficient manner, and in this paper we make use of the traditional maximum likelihood principle. As well as an analysis including the incompletely measured covariates, such tools also allow further strategies of data analysis. For example, we can use surrogate variables to improve the prediction of missing values or we can try to investigate a questionable ‘missing at random’ assumption. We discuss these techniques using the example of a clinical study where one important covariate is missing for about half the subjects. Additionally we consider two further issues: evaluation of differences between estimates from a complete case analysis and analyses using all subjects and assessment of the predictive value of missing values. © 1997 by John Wiley & Sons, Ltd.  相似文献   

5.
PURPOSE: We describe the impact that missing data may have on model selection for longitudinal multivariate data. METHODS: Maximum likelihood was used to fit several models to ultrasonographic measurements from the Asymptomatic Carotid Artery Progression Study (ACAPS). Graphical techniques were used to examine evidence concerning the underlying missing data mechanisms associated with each model. RESULTS: Using statistical methodology that addressed missing data substantially increased the statistical efficiency of our analysis of ultrasonographic data. Only complex models that included segment-specific parameterizations for longitudinal correlations appeared to allow missing data to be assumed to occur at random. CONCLUSION: Ignoring the nature of missing data in conducting statistical analyses can have serious consequences when missingness is not rare. It may be necessary to fit models of high dimension with maximum likelihood techniques to address missing data appropriately, however these approaches may improve statistical efficiency.  相似文献   

6.
Missing data in longitudinal studies   总被引:11,自引:0,他引:11  
When observations are made repeatedly over time on the same experimental units, unbalanced patterns of observations are a common occurrence. This complication makes standard analyses more difficult or inappropriate to implement, means loss of efficiency, and may introduce bias into the results as well. Some possible approaches to dealing with missing data include complete case analyses, univariate analyses with adjustments for variance estimates, two-step analyses, and likelihood based approaches. Likelihood approaches can be further categorized as to whether or not an explicit model is introduced for the non-response mechanism. This paper will review the use of likelihood based analyses for longitudinal data with missing responses, both from the point of view of ease of implementation and appropriateness in view of the non-response mechanism. Models for both measured and dichotomous outcome data will be discussed. The appropriateness of some non-likelihood based analyses is briefly considered.  相似文献   

7.
It is common to have missing genotypes in practical genetic studies, but the exact underlying missing data mechanism is generally unknown to the investigators. Although some statistical methods can handle missing data, they usually assume that genotypes are missing at random, that is, at a given marker, different genotypes and different alleles are missing with the same probability. These include those methods on haplotype frequency estimation and haplotype association analysis. However, it is likely that this simple assumption does not hold in practice, yet few studies to date have examined the magnitude of the effects when this simplifying assumption is violated. In this study, we demonstrate that the violation of this assumption may lead to serious bias in haplotype frequency estimates, and haplotype association analysis based on this assumption can induce both false-positive and false-negative evidence of association. To address this limitation in the current methods, we propose a general missing data model to characterize missing data patterns across a set of two or more markers simultaneously. We prove that haplotype frequencies and missing data probabilities are identifiable if and only if there is linkage disequilibrium between these markers under our general missing data model. Simulation studies on the analysis of haplotypes consisting of two single nucleotide polymorphisms illustrate that our proposed model can reduce the bias both for haplotype frequency estimates and association analysis due to incorrect assumption on the missing data mechanism. Finally, we illustrate the utilities of our method through its application to a real data set.  相似文献   

8.
The recent successes of genome-wide association studies (GWAS) have revealed that many of the replicated findings have explained only a small fraction of the heritability of common diseases. One hypothesis that investigators have suggested is that higher order interactions between SNPs or SNPs and environmental risk factors may account for some of this missing heritability. Searching for these interactions poses great statistical and computational challenges. In this article, we propose a novel method that addresses these challenges by incorporating external biological knowledge into a fully Bayesian analysis. The method is designed to be scalable for high-dimensional search spaces (where it supports interactions of any order) because priors that use such knowledge focus the search in regions that are more biologically plausible and avoid having to enumerate all possible interactions. We provide several examples based on simulated data demonstrating how external information can enhance power, specificity, and effect estimates in comparison to conventional approaches based on maximum likelihood estimates. We also apply the method to data from a GWAS for breast cancer, revealing a set of interactions enriched for the Gene Ontology terms growth, metabolic process, and biological regulation.  相似文献   

9.
Standard measures of crude association in the context of a cross-sectional study are the risk difference, relative risk and odds ratio as derived from a 2x 2 table. Most such studies are subject to missing data on disease, exposure, or both, introducing bias into the usual complete-case analysis. We describe several scenarios distinguished by the manner in which missing data arise, and for each we adjust the natural multinomial likelihood to properly account for missing data. The situations presented allow for increasing levels of generality with regard to the missing data mechanism. The final case, quite conceivable in epidemiologic studies, assumes that the probability of missing exposure depends on true exposure and disease status, as well as upon whether disease status is missing (and conversely for the probability of missing disease information). When parameters relating to the missing data process are inestimable without strong assumptions, we propose maximum likelihood analysis subsequent to collecting supplemental data in the spirit of a validation study. Analytical results give insight into the bias inherent in complete-case analysis for each scenario, and numerical results illustrate the performance of likelihood-based point and interval estimates in the most general case. Adjustment for potential confounders via stratified analysis is also discussed.  相似文献   

10.
Studies to detect genetic association with disease can be family-based, often using families with multiple affected members, or population based, as in population-based case-control studies. If data on both study types are available from the same population, it is useful to combine them to improve power to detect genetic associations. Two aspects of the data need to be accommodated, the sampling scheme and potential residual correlations among family members. We propose two approaches for combining data from a case-control study and a family study that collected families with multiple cases. In the first approach, we view a family as the sampling unit and specify the joint likelihood for the family members using a two-level mixed effects model to account for random familial effects and for residual genetic correlations among family members. The ascertainment of the families is accommodated by conditioning on the ascertainment event. The individuals in the case-control study are treated as families of size one, and their unconditional likelihood is combined with the conditional likelihood for the families. This approach yields subject specific maximum likelihood estimates of covariate effects. In the second approach, we view an individual as the sampling unit. The sampling scheme is accommodated using two-phase sampling techniques, marginal covariate effects are estimated, and correlations among family members are accounted for in the variance calculations. The models are compared in simulations. Data from a case-control and a family study from north-eastern Italy on melanoma and a low-risk melanoma-susceptibility gene, MC1R, are used to illustrate the approaches.  相似文献   

11.
In studies of older adults, researchers often recruit proxy respondents, such as relatives or caregivers, when study participants cannot provide self‐reports (e.g., because of illness). Proxies are usually only sought to report on behalf of participants with missing self‐reports; thus, either a participant self‐report or proxy report, but not both, is available for each participant. Furthermore, the missing‐data mechanism for participant self‐reports is not identifiable and may be nonignorable. When exposures are binary and participant self‐reports are conceptualized as the gold standard, substituting error‐prone proxy reports for missing participant self‐reports may produce biased estimates of outcome means. Researchers can handle this data structure by treating the problem as one of misclassification within the stratum of participants with missing self‐reports. Most methods for addressing exposure misclassification require validation data, replicate data, or an assumption of nondifferential misclassification; other methods may result in an exposure misclassification model that is incompatible with the analysis model. We propose a model that makes none of the aforementioned requirements and still preserves model compatibility. Two user‐specified tuning parameters encode the exposure misclassification model. Two proposed approaches estimate outcome means standardized for (potentially) high‐dimensional covariates using multiple imputation followed by propensity score methods. The first method is parametric and uses maximum likelihood to estimate the exposure misclassification model (i.e., the imputation model) and the propensity score model (i.e., the analysis model); the second method is nonparametric and uses boosted classification and regression trees to estimate both models. We apply both methods to a study of elderly hip fracture patients. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

12.
For longitudinal binary data with non‐monotone non‐ignorably missing outcomes over time, a full likelihood approach is complicated algebraically, and with many follow‐up times, maximum likelihood estimation can be computationally prohibitive. As alternatives, two pseudo‐likelihood approaches have been proposed that use minimal parametric assumptions. One formulation requires specification of the marginal distributions of the outcome and missing data mechanism at each time point, but uses an ‘independence working assumption,’ i.e. an assumption that observations are independent over time. Another method avoids having to estimate the missing data mechanism by formulating a ‘protective estimator.’ In simulations, these two estimators can be very inefficient, both for estimating time trends in the first case and for estimating both time‐varying and time‐stationary effects in the second. In this paper, we propose the use of the optimal weighted combination of these two estimators, and in simulations we show that the optimal weighted combination can be much more efficient than either estimator alone. Finally, the proposed method is used to analyze data from two longitudinal clinical trials of HIV‐infected patients. Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

13.
PurposeThe aim of this research was to examine, in an exploratory manner, whether cross-sectional multiple imputation generates valid parameter estimates for a latent growth curve model in a longitudinal data set with nonmonotone missingness.MethodsA simulated longitudinal data set of N = 5000 was generated and consisted of a continuous dependent variable, assessed at three measurement occasions and a categorical time-invariant independent variable. Missing data had a nonmonotone pattern and the proportion of missingness increased from the initial to the final measurement occasion (5%–20%). Three methods were considered to deal with missing data: listwise deletion, full-information maximum likelihood, and multiple imputation. A latent growth curve model was specified and analysis of variance was used to compare parameter estimates between the full data set and missing data approaches.ResultsMultiple imputation resulted in significantly lower slope variance compared with the full data set. There were no differences in any parameter estimates between the multiple imputation and full-information maximum likelihood approaches.ConclusionsThis study suggested that in longitudinal studies with nonmonotone missingness, cross-sectional imputation at each time point may be viable and produces estimates comparable with those obtained with full-information maximum likelihood. Future research pursuing the validity of this method is warranted.  相似文献   

14.
We describe a novel Bayesian approach to estimate acquisition and clearance rates for many competing subtypes of a pathogen in a susceptible–infected–susceptible model. The inference relies on repeated measurements of the current status of being a non‐carrier (susceptible) or a carrier (infected) of one of the nq > 1 subtypes. We typically collect the measurements with sampling intervals that may not catch the true speed of the underlying dynamics. We tackle the problem of incompletely observed data with Bayesian data augmentation, which integrates over possible carriage histories, allowing the data to contain intermittently missing values, complete dropouts of study subjects, or inclusion of new study subjects during the follow‐up. We investigate the performance of the described method through simulations by using two different mixing groups (family and daycare) and different sampling intervals. For comparison, we describe crude maximum likelihood‐based estimates derived directly from the observations. We apply the estimation algorithm to data about transmission of Streptococcus pneumonia in Bangladeshi families. The computationally intensive Bayesian approach is a valid method to account for incomplete observations, and we found that it performs generally better than the simple crude method, in particular with large amount of missing data. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

15.
When describing longitudinal binary response data, it may be desirable to estimate the cumulative probability of at least one positive response by some time point. For example, in phase I and II human immunodeficiency virus (HIV) vaccine trials, investigators are often interested in the probability of at least one vaccine-induced CD8+ cytotoxic T-lymphocyte (CTL) response to HIV proteins at different times over the course of the trial. In this setting, traditional estimates of the cumulative probabilities have been based on observed proportions. We show that if the missing data mechanism is ignorable, the traditional estimator of the cumulative success probabilities is biased and tends to underestimate a candidate vaccine's ability to induce CTL responses. As an alternative, we propose applying standard optimization techniques to obtain maximum likelihood estimates of the response profiles and, in turn, the cumulative probabilities of interest. Comparisons of the empirical and maximum likelihood estimates are investigated using data from simulations and HIV vaccine trials. We conclude that maximum likelihood offers a more accurate method of estimation, which is especially important in the HIV vaccine setting as cumulative CTL responses will likely be used as a key criterion for large scale efficacy trial qualification.  相似文献   

16.
Cure models for clustered survival data have the potential for broad applicability. In this paper, we consider the mixture cure model with random effects and propose several estimation methods based on Gaussian quadrature, rejection sampling, and importance sampling to obtain the maximum likelihood estimates of the model for clustered survival data with a cure fraction. The methods are flexible to accommodate various correlation structures. A simulation study demonstrates that the maximum likelihood estimates of parameters in the model tend to have smaller biases and variances than the estimates obtained from the existing methods. We apply the model to a study of tonsil cancer patients clustered by treatment centers to investigate the effect of covariates on the cure rate and on the failure time distribution of the uncured patients. The maximum likelihood estimates of the parameters demonstrate strong correlation among the failure times of the uncured patients and weak correlation among cure statuses in the same center.  相似文献   

17.
A significant source of missing data in longitudinal epidemiologic studies on elderly individuals is death. It is generally believed that these missing data by death are non-ignorable to likelihood based inference. Inference based on data only from surviving participants in the study may lead to biased results. In this paper we model both the probability of disease and the probability of death using shared random effect parameters. We also propose to use the Laplace approximation for obtaining an approximate likelihood function so that high dimensional integration over the distributions of the random effect parameters is not necessary. Parameter estimates can be obtained by maximizing the approximate log-likelihood function. Data from a longitudinal dementia study will be used to illustrate the approach. A small simulation is conducted to compare parameter estimates from the proposed method to the 'naive' method where missing data is considered at random.  相似文献   

18.
In this paper we compare several methods for estimating population disease prevalence from data collected by two-phase sampling when there is non-response at the second phase. The traditional weighting type estimator requires the missing completely at random assumption and may yield biased estimates if the assumption does not hold. We review two approaches and propose one new approach to adjust for non-response assuming that the non-response depends on a set of covariates collected at the first phase: an adjusted weighting type estimator using estimated response probability from a response model; a modelling type estimator using predicted disease probability from a disease model; and a regression type estimator combining the adjusted weighting type estimator and the modelling type estimator. These estimators are illustrated using data from an Alzheimer's disease study in two populations.  相似文献   

19.
Analysis of a major multi-site epidemiologic study of heart disease has required estimation of the pairwise correlation of several measurements across sub-populations. Because the measurements from each sub-population were subject to sampling variability, the Pearson product moment estimator of these correlations produces biased estimates. This paper proposes a model that takes into account within and between sub-population variation, provides algorithms for obtaining maximum likelihood estimates of these correlations and discusses several approaches for obtaining interval estimates. © 1997 John Wiley & Sons, Ltd.  相似文献   

20.
Incomplete and unbalanced multivariate data often arise in longitudinal studies due to missing or unequally-timed repeated measurements and/or the presence of time-varying covariates. A general approach to analysing such data is through maximum likelihood analysis using a linear model for the expected responses, and structural models for the within-subject covariances. Two important advantages of this approach are: (1) the generality of the model allows the analyst to consider a wider range of models than were previously possible using classical methods developed for balanced and complete data, and (2) maximum likelihood estimates obtained from incomplete data are often preferable to other estimates such as those obtained from complete cases from the standpoint of bias and efficiency. A variety of applications of the model are discussed, including univariate and multivariate analysis of incomplete repeated measures data, analysis of growth curves with missing data using random effects and time-series models, and applications to unbalanced longitudinal data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号