首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In case-control studies, the crude odds ratio derived from a 2 x 2 table and the common odds ratio adjusted for stratification variables are staple measures of exposure-disease association. While missing exposure data are encountered in the majority of such studies, formal attempts to deal with them are rare, and a complete-case analysis is the norm. Furthermore, the probability that exposure is missing may depend on true exposure status, so the missing-at-random assumption is often unreasonable. In this paper, the authors present an adjustment to the usual product binomial likelihood to properly account for missing data. Estimation of model parameters without restrictive assumptions requires an additional data collection effort akin to a validation study. Closed-form results are provided to facilitate point and confidence interval estimation of crude and common odds ratios after properly accounting for informatively missing data. Simulations assess performance of the likelihood-based estimates and inferences, and they display the potential for bias in complete-case analyses. An example is presented to illustrate the approach.  相似文献   

2.
We explore the ‘reassessment’ design in a logistic regression setting, where a second wave of sampling is applied to recover a portion of the missing data on a binary exposure and/or outcome variable. We construct a joint likelihood function based on the original model of interest and a model for the missing data mechanism, with emphasis on non‐ignorable missingness. The estimation is carried out by numerical maximization of the joint likelihood function with close approximation of the accompanying Hessian matrix, using sharable programs that take advantage of general optimization routines in standard software. We show how likelihood ratio tests can be used for model selection and how they facilitate direct hypothesis testing for whether missingness is at random. Examples and simulations are presented to demonstrate the performance of the proposed method. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

3.
Cheng KF  Lin WJ 《Statistics in medicine》2005,24(21):3289-3310
Association analysis of genetic polymorphisms has been mostly performed in a case-control setting in connection with the traditional logistic regression analysis. However, in a case-control study, subjects are recruited according to their disease status and their past exposures are determined. Thus the natural model for making inference is the retrospective model. In this paper, we discuss some retrospective models and give maximum likelihood estimators of exposure effects and estimators of asymptotic variances, when the frequency distribution of exposures in controls contains information about the parameters of interest. Two situations about the control population are considered in this paper: (a) the control population or its subpopulations are in Hardy-Weinberg equilibrium; and (b) genetic and environmental factors are independent in the control population. Using the concept of asymptotic relative efficiency, we shall show the precision advantages of such retrospective analysis over the traditional prospective analysis. Maximum likelihood estimates and variance estimates under retrospective models are simple in computation and thus can be applied in many practical applications. We present one real example to illustrate our methods.  相似文献   

4.
The authors extend previous results on nondifferential exposure misclassification to the situation in which multilevel exposure and covariables are both misclassified. They show that if misclassification is nondifferential and the predictive value matrices are independent of other predictor variables it is possible to recover the true relative risks as a function of the biased estimates and the misclassification matrices alone. If the covariable is a confounder, the true relative risks may be recovered from the apparent relative risks derived from misclassified data and the misclassification matrix for the exposure variable with respect to its surrogate. If the covariable is an effect modifier, the true relative risk matrix may be recovered from the apparent relative risk matrix and misclassification matrices for both the exposure variable with respect to its surrogate and the covariable with respect to its surrogate. By varying the misclassification matrices, the sensitivity of published relative risk estimates to different patterns of misclassification can be analyzed. If it is not possible to design a study protocol that is free of misclassification, choosing surrogate variables whose predictive value is constant with respect to other predictors appears to be a desirable design objective.  相似文献   

5.
Association analysis of genetic polymorphisms has been mostly performed in a case-control setting with unrelated affected subjects compared with unrelated unaffected subjects. In this paper, we present a Bayesian method for analyzing such case-control data when the population is in Hardy-Weinberg equilibrium. Our Bayesian method depends on the informative prior which is the retrospective likelihood based on historical data, raised to a power a. By modeling the retrospective likelihood properly, different prior information about the studied population can be incorporated into the specification of the prior. The scalar a is a precision parameter quantifying the heterogeneity between current and historical data. A guide value for a is discussed in this paper. The informative prior and posterior distributions are proper under very general conditions. Therefore, our method can be applied in most case-control studies. Further, for assessing gene-environment interactions, our approach will naturally lead to a Bayesian model depending only on the case data, when genotype and environmental factors are independent in the population. Thus our approach can be applied to case-only studies. A real example is used to show the applications of our method.  相似文献   

6.
Consider a case-control study designed to investigate the possible association between development of a particular disease and the value of a putative risk factor measured on an ordinal scale. Let E denote a subject's true risk factor value and let E* denote a subject's recorded risk factor value. Misclassification bias occurs if conclusions reached regarding the relationship between disease status and E* do not also apply to the relationship between disease status and E. We propose a model for the conditional probability distribution of E* given E. We show how the model may be used to investigate misclassification bias in a validation study where measurements of E* and E are available for both cases and controls and apply the methods developed to data from a test-retest study of recall bias in the context of screening for hypertension. We also consider a situation where the validation study is carried out on a subset of the subjects within a larger case-control study. In that case, values for E* are available for all subjects but values for E are available only for those subjects included in the validation study. We show how correct likelihood-based inference concerning association between disease status and risk factor value may be carried out using all of the available data. A Monte Carlo study shows how the inclusion of a validation study leads to a correction of recall bias problems at the cost of an increased standard error for the estimated association parameter.  相似文献   

7.
Misclassification is a long‐standing statistical problem in epidemiology. In many real studies, either an exposure or a response variable or both may be misclassified. As such, potential threats to the validity of the analytic results (e.g., estimates of odds ratios) that stem from misclassification are widely discussed in the literature. Much of the discussion has been restricted to the nondifferential case, in which misclassification rates for a particular variable are assumed not to depend on other variables. However, complex differential misclassification patterns are common in practice, as we illustrate here using bacterial vaginosis and Trichomoniasis data from the HIV Epidemiology Research Study (HERS). Therefore, clear illustrations of valid and accessible methods that deal with complex misclassification are still in high demand. We formulate a maximum likelihood (ML) framework that allows flexible modeling of misclassification in both the response and a key binary exposure variable, while adjusting for other covariates via logistic regression. The approach emphasizes the use of internal validation data in order to evaluate the underlying misclassification mechanisms. Data‐driven simulations show that the proposed ML analysis outperforms less flexible approaches that fail to appropriately account for complex misclassification patterns. The value and validity of the method are further demonstrated through a comprehensive analysis of the HERS example data. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

8.
Large-scale genome-wide analyses scans on massive numbers of various cases and controls are archived in the genetic databases that are publically available, for example, the Database of Genotypes and Phenotypes ( https://www.ncbi.nlm.nih.gov/gap/ ). These databases offer unprecscendented opportunity to study the genetic effects. Yet, the set of nongenetic variables in these databases is often brief. From the statistical literature, we know that omitting a continuous variable from a logistic regression model can result in biased estimates of odds ratios (OR), even when the omitted and the included variables are independent. We are interested in assessing what information is needed to recover the bias in the OR estimate of genotype due to omitting a continuous variable in settings when the actual values of the omitted variable are not available. We derive two estimating procedures that can recover the degree of bias based on a conditional density of the omitted variable given the disease status and the genotype or the known distribution of the omitted variable and frequency of the disease in the population. Importantly, our derivations show that omitting a continuous variable can result in either under- or over-estimation of the genetic effects. We performed extensive simulation studies to examine bias, variability, false-positive rate, and power in the model that omits a continuous variable. We show the application to two genome-wide studies of Alzheimer's disease.  相似文献   

9.
Many pharmacoepidemiologic case-control studies have to rely on what their subjects relate about the drugs to which they have been exposed and the durations of exposure. There is often good reason to suppose that not all exposures are actually reported and to suspect reporting rates may differ between cases and controls. We introduce two procedures designed to determine the extent of underreporting of exposures. These procedures make use of data from the case-control study itself, as well as sales, demographic and market research data for a reference population to which study subjects belong. We apply these procedures to data from the International Primary Pulmonary Hypertension Study (IPPHS) linking anorexigens with PPH. We show that exposures to the anorectic agent dexfenfluramine beginning in or before 1989 were highly significantly underrepresented in the data for IPPHS controls, relative to exposures beginning after 1989 (P < 0.01); there is no corresponding evidence for relative underrepresentation of early exposure for IPPHS cases. However, data on control exposures from 1990 to 1992 are consistent with the hypothesis that these exposures were not underreported to the IPPHS. Subject to certain key modeling assumptions and the availability of some supplemental data, it is possible to investigate the extent of underreporting of exposure in a pharmacoepidemiologic case-control study and in particular to determine if study results are likely to have been affected by recall bias.  相似文献   

10.
In this paper we consider longitudinal studies in which the outcome to be measured over time is binary, and the covariates of interest are categorical. In longitudinal studies it is common for the outcomes and any time-varying covariates to be missing due to missed study visits, resulting in non-monotone patterns of missingness. Moreover, the reasons for missed visits may be related to the specific values of the response and/or covariates that should have been obtained, i.e. missingness is non-ignorable. With non-monotone non-ignorable missing response and covariate data, a full likelihood approach is quite complicated, and maximum likelihood estimation can be computationally prohibitive when there are many occasions of follow-up. Furthermore, the full likelihood must be correctly specified to obtain consistent parameter estimates. We propose a pseudo-likelihood method for jointly estimating the covariate effects on the marginal probabilities of the outcomes and the parameters of the missing data mechanism. The pseudo-likelihood requires specification of the marginal distributions of the missingness indicator, outcome, and possibly missing covariates at each occasions, but avoids making assumptions about the joint distribution of the data at two or more occasions. Thus, the proposed method can be considered semi-parametric. The proposed method is an extension of the pseudo-likelihood approach in Troxel et al. to handle binary responses and possibly missing time-varying covariates. The method is illustrated using data from the Six Cities study, a longitudinal study of the health effects of air pollution.  相似文献   

11.
The use of complex sampling in population-based case-control studies is becoming more common. Although most single nucleotide polymorphism-based association studies with complex sampling account for the design complications, many of haplotype-based genetic association studies with complex sampling tend to ignore them when estimating haplotype frequencies, regression coefficients, or both. In this article, we develop innovative one-step and two-step statistical methods that account for the design complications in haplotype-based association studies when cases and/or controls are sampled with complex sampling. Attracted by the efficiency advantage of the retrospective method, we explore the assumptions of Hardy-Weinberg equilibrium and gene-environment independence in the underlying population. Results of our simulation studies demonstrate superior performance of the proposed methods over selected existing methods under various complex sampling designs. An application of the proposed methods is illustrated using a population-based case-control study of kidney cancer.  相似文献   

12.
Genetic association studies are a powerful tool to detect genetic variants that predispose to human disease. Once an associated variant is identified, investigators are also interested in estimating the effect of the identified variant on disease risk. Estimates of the genetic effect based on new association findings tend to be upwardly biased due to a phenomenon known as the “winner's curse.” Overestimation of genetic effect size in initial studies may cause follow‐up studies to be underpowered and so to fail. In this paper, we quantify the impact of the winner's curse on the allele frequency difference and odds ratio estimators for one‐ and two‐stage case‐control association studies. We then propose an ascertainment‐corrected maximum likelihood method to reduce the bias of these estimators. We show that overestimation of the genetic effect by the uncorrected estimator decreases as the power of the association study increases and that the ascertainment‐corrected method reduces absolute bias and mean square error unless power to detect association is high. Genet. Epidemiol. 33:453–462, 2009. © 2009 Wiley‐Liss, Inc.  相似文献   

13.
Note on linkage analysis when the mode of transmission is unknown   总被引:2,自引:0,他引:2  
A major difficulty in a linkage analysis arises from the necessity of specifying the mode of inheritance prior to analysis. For a complex disease, such as those encountered in psychiatric illnesses, the mode of inheritance is generally not known in advance. Consequently, some estimation procedure is often combined with linkage analysis to circumvent this. We discuss several precautions that should be taken when using traditional statistical testing methods: correction of the likelihood for the method of sampling families and the computation of the lod score. We analyze simulated data with pedigrees selected under a sampling scheme approximating single ascertainment. In this situation, the severity of the above problems is attenuated.  相似文献   

14.
Rice K 《Statistics in medicine》2003,22(20):3177-3194
We consider analysis of matched case-control studies where a binary exposure is potentially misclassified, and there may be a variety of matching ratios. The parameter of interest is the ratio of odds of case exposure to control exposure. By extending the conditional model for perfectly classified data via a random effects or Bayesian formulation, we obtain estimates and confidence intervals for the misclassified case which reduce back to standard analytic forms as the error probabilities reduce to zero. Several examples are given, highlighting different analytic phenomena. In a simulation study, using mixed matching ratios, the coverage of the intervals are found to be good, although point estimates are slightly biased on the log scale. Extensions of the basic model are given allowing for uncertainty in the knowledge of misclassification rates, and the inclusion of prior information about the parameter of interest.  相似文献   

15.
We show that under the null hypothesis of no linkage the maximum likelihood estimator of the recombination fraction converges to 1/2 even when the trait-related parameter values in the likelihood function are misspecified. Furthermore, we show that under the null hypothesis of no linkage, but with misspecified trait-related parameter values, the negative of twice the natural logarithm of the likelihood ratio statistic still has a limiting chi-square distribution with 1 degree of freedom.  相似文献   

16.
While there is an extensive amount of literature covering prospective designs for phase I trials, the methodology for analyzing these data is limited. Prospective designs select the maximum tolerated dose (MTD) through a dose escalation scheme based on a model or on empirical rules. For example, the '3 + 3' method (standard method: SM) assigns patients in cohorts of three and expands to six if one toxicity is observed. It has been shown previously that the MTD chosen by the SM may be low, possibly leading to a non-efficacious dose. Additionally, when deviation from the original trial design occurs, the rules for determining MTD might not be applicable. We hypothesize that a retrospective analysis would suggest an MTD that is more accurate than the one obtained by the SM. A weighted Continual Reassessment Method (CRM-w) has been suggested (Biometrics 2005; 61:749-756) for analyzing data obtained from designs other than the prospective Continual Reassessment Method (CRM). However, CRM-w has not been evaluated in trials that follow the SM design. In this study, we propose a method to analyze completed phase I trials and possibly confirm or amend the recommended phase II dose, based on a constrained maximum likelihood estimation (CMLE). A comparison of CRM-w, isotonic regression, and CMLE in analyzing simulated SM trials shows that CMLE more accurately selects the true MTD than SM, and is better or comparable to isotonic regression and CRM-w. Confidence intervals around the toxicity probabilities at each dose level are estimated using the cumulative toxicity data. A programming code is included.  相似文献   

17.
Epidemiologic researchers often explore effect modification in case-control studies on more than one statistical scale, an approach that one expects would increase the rate of false-positive findings of interaction. For example, researchers have measured effect modification by using both a multiplicative interaction coefficient (M) in a logistic regression model and a measure of interaction on the additive scale such as the interaction coefficient from an additive relative risk regression model (A). We performed computer simulations to investigate the degree to which type I error may be inflated when statistical interactions are evaluated by using both M and A. The overall type I error rate was often greater than 5% when both tests were performed together. These results provide empiric evidence of the limited validity of a common approach to assessing etiologic effect modification. When the scale has not been specified before analysis, interaction hypothesis tests of effect modification should be interpreted particularly cautiously. Researchers are not justified in choosing the interaction test with the lowest P value.  相似文献   

18.
Kraft et al. [2005] proposed a method for matched haplotype-based association studies and compared the performances of six analytic strategies for estimating the odds ratio parameters using a conditional likelihood function. Zhang et al. [2006] modified the conditional likelihood and proposed a new method for matched haplotype-based association studies. The main assumptions of Zhang et al. were that the disease was rare, the population was in Hardy-Weinberg equilibrium (HWE), and the haplotypes were independent of the covariates and matching variable(s). In this article, we modify the estimation procedure proposed by Zhang et al. and introduce a fixation index so that the assumption of HWE is relaxed. Using the Wald test, we compare the current modified method with the procedure developed by Kraft et al. through simulations. The results show that the modified method is uniformly more powerful than that described in Kraft et al. Furthermore, the results indicate that the modified method is quite robust to the rare disease assumption.  相似文献   

19.
Quantitative traits (QT) are an important focus of human genetic studies both because of interest in the traits themselves and because of their role as risk factors for many human diseases. For large-scale QT association studies including genome-wide association studies, investigators usually focus on genetic loci showing significant evidence for SNP-QT association, and genetic effect size tends to be overestimated as a consequence of the winner's curse. In this paper, we study the impact of the winner's curse on QT association studies in which the genetic effect size is parameterized as the slope in a linear regression model. We demonstrate by analytical calculation that the overestimation in the regression slope estimate decreases as power increases. To reduce the ascertainment bias, we propose a three-parameter maximum likelihood method and then simplify this to a one-parameter method by excluding nuisance parameters. We show that both methods reduce the bias when power to detect association is low or moderate, and that the one-parameter model generally results in smaller variance in the estimate.  相似文献   

20.
Haplotype-based analyses are thought to play a major role in the study of common complex diseases. This has led to the development of a variety of statistical methods for detecting disease-haplotype associations from case-control study data. However, haplotype phase is often uncertain when only genotype data is available. Methods that account for haplotype ambiguity by modeling the distribution of haplotypes can, if this distribution is misspecified, lead to substantial bias in parameter estimates even when complete genotype data is available. Here we study estimators that can be derived from score functions of appropriate likelihoods. We use the efficient score approach to estimation in the presence of nuisance parameters to a derive novel estimators that are robust to the haplotype distribution. We establish key relationships between estimators and study their empirical performance via simulation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号