首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In this paper, we consider fitting semiparametric additive hazards models for case‐cohort studies using a multiple imputation approach. In a case‐cohort study, main exposure variables are measured only on some selected subjects, but other covariates are often available for the whole cohort. We consider this as a special case of a missing covariate by design. We propose to employ a popular incomplete data method, multiple imputation, for estimation of the regression parameters in additive hazards models. For imputation models, an imputation modeling procedure based on a rejection sampling is developed. A simple imputation modeling that can naturally be applied to a general missing‐at‐random situation is also considered and compared with the rejection sampling method via extensive simulation studies. In addition, a misspecification aspect in imputation modeling is investigated. The proposed procedures are illustrated using a cancer data example. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

2.
The case–cohort study design has often been used in studies of a rare disease or for a common disease with some biospecimens needing to be preserved for future studies. A case–cohort study design consists of a random sample, called the subcohort, and all or a portion of the subjects with the disease of interest. One advantage of the case–cohort design is that the same subcohort can be used for studying multiple diseases. Stratified random sampling is often used for the subcohort. Additive hazards models are often preferred in studies where the risk difference, instead of relative risk, is of main interest. Existing methods do not use the available covariate information fully. We propose a more efficient estimator by making full use of available covariate information for the additive hazards model with data from a stratified case–cohort design with rare (the traditional situation) and non‐rare (the generalized situation) diseases. We propose an estimating equation approach with a new weight function. The proposed estimators are shown to be consistent and asymptotically normally distributed. Simulation studies show that the proposed method using all available information leads to efficiency gain and stratification of the subcohort improves efficiency when the strata are highly correlated with the covariates. Our proposed method is applied to data from the Atherosclerosis Risk in Communities study. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

3.
The case–cohort (CC) study design usually has been used for risk factor assessment in epidemiologic studies or disease prevention trials for rare diseases. The sample size/power calculation for a stratified CC (SCC) design has not been addressed before. This article derives such result based on a stratified test statistic. Simulation studies show that the proposed test for the SCC design utilizing small sub‐cohort sampling fractions is valid and efficient for situations where the disease rate is low. Furthermore, optimization of sampling in the SCC design is discussed and compared with proportional and balanced sampling techniques. An epidemiological study is provided to illustrate the sample size calculation under the SCC design. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

4.
Poor measurement of explanatory variables occurs frequently in observational studies. Error‐prone observations may lead to biased estimation and loss of power in detecting the impact of explanatory variables on the response. We consider misclassified binary exposure in the context of case–control studies, assuming the availability of validation data to inform the magnitude of the misclassification. A Bayesian adjustment to correct the misclassification is investigated. Simulation studies show that the Bayesian method can have advantages over non‐Bayesian counterparts, particularly in the face of a rare exposure, small validation sample sizes, and uncertainty about whether exposure misclassification is differential or non‐differential. The method is illustrated via application to several real studies. Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

5.
The stereotype regression model for categorical outcomes, proposed by Anderson (J. Roy. Statist. Soc. B. 1984; 46 :1–30) is nested between the baseline‐category logits and adjacent category logits model with proportional odds structure. The stereotype model is more parsimonious than the ordinary baseline‐category (or multinomial logistic) model due to a product representation of the log‐odds‐ratios in terms of a common parameter corresponding to each predictor and category‐specific scores. The model could be used for both ordered and unordered outcomes. For ordered outcomes, the stereotype model allows more flexibility than the popular proportional odds model in capturing highly subjective ordinal scaling which does not result from categorization of a single latent variable, but are inherently multi‐dimensional in nature. As pointed out by Greenland (Statist. Med. 1994; 13 :1665–1677), an additional advantage of the stereotype model is that it provides unbiased and valid inference under outcome‐stratified sampling as in case–control studies. In addition, for matched case–control studies, the stereotype model is amenable to classical conditional likelihood principle, whereas there is no reduction due to sufficiency under the proportional odds model. In spite of these attractive features, the model has been applied less, as there are issues with maximum likelihood estimation and likelihood‐based testing approaches due to non‐linearity and lack of identifiability of the parameters. We present comprehensive Bayesian inference and model comparison procedure for this class of models as an alternative to the classical frequentist approach. We illustrate our methodology by analyzing data from The Flint Men's Health Study, a case–control study of prostate cancer in African‐American men aged 40–79 years. We use clinical staging of prostate cancer in terms of Tumors, Nodes and Metastasis as the categorical response of interest. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

6.
In genetic association studies it is becoming increasingly imperative to have large sample sizes to identify and replicate genetic effects. To achieve these sample sizes, many research initiatives are encouraging the collaboration and combination of several existing matched and unmatched case–control studies. Thus, it is becoming more common to compare multiple sets of controls with the same case group or multiple case groups to validate or confirm a positive or negative finding. Usually, a naive approach of fitting separate models for each case–control comparison is used to make inference about disease–exposure association. But, this approach does not make use of all the observed data and hence could lead to inconsistent results. The problem is compounded when a common case group is used in each case–control comparison. An alternative to fitting separate models is to use a polytomous logistic model but, this model does not combine matched and unmatched case–control data. Thus, we propose a polytomous logistic regression approach based on a latent group indicator and a conditional likelihood to do a combined analysis of matched and unmatched case–control data. We use simulation studies to evaluate the performance of the proposed method and a case–control study of multiple myeloma and Inter‐Leukin‐6 as an example. Our results indicate that the proposed method leads to a more efficient homogeneity test and a pooled estimate with smaller standard error. Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

7.
Joint effects of genetic and environmental factors have been increasingly recognized in the development of many complex human diseases. Despite the popularity of case‐control and case‐only designs, longitudinal cohort studies that can capture time‐varying outcome and exposure information have long been recommended for gene–environment (G × E) interactions. To date, literature on sampling designs for longitudinal studies of G × E interaction is quite limited. We therefore consider designs that can prioritize a subsample of the existing cohort for retrospective genotyping on the basis of currently available outcome, exposure, and covariate data. In this work, we propose stratified sampling based on summaries of individual exposures and outcome trajectories and develop a full conditional likelihood approach for estimation that adjusts for the biased sample. We compare the performance of our proposed design and analysis with combinations of different sampling designs and estimation approaches via simulation. We observe that the full conditional likelihood provides improved estimates for the G × E interaction and joint exposure effects over uncorrected complete‐case analysis, and the exposure enriched outcome trajectory dependent design outperforms other designs in terms of estimation efficiency and power for detection of the G × E interaction. We also illustrate our design and analysis using data from the Normative Aging Study, an ongoing longitudinal cohort study initiated by the Veterans Administration in 1963. Copyright © 2017 John Wiley & Sons, Ltd.  相似文献   

8.
Survival bias is difficult to detect and adjust for in case–control genetic association studies but can invalidate findings when only surviving cases are studied and survival is associated with the genetic variants under study. Here, we propose a design where one genotypes genetically informative family members (such as offspring, parents, and spouses) of deceased cases and incorporates that surrogate genetic information into a retrospective maximum likelihood analysis. We show that inclusion of genotype data from first‐degree relatives permits unbiased estimation of genotype association parameters. We derive closed‐form maximum likelihood estimates for association parameters under the widely used log‐additive and dominant association models. Our proposed design not only permits a valid analysis but also enhances statistical power by augmenting the sample with indirectly studied individuals. Gene variants associated with poor prognosis can also be identified under this design. We provide simulation results to assess performance of the methods. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

9.
We propose a semiparametric odds ratio model that extends Umbach and Weinberg's approach to exploiting gene–environment association model for efficiency gains in case–control designs to both discrete and continuous data. We directly model the gene–environment association in the control population to avoid estimating the intercept in the disease risk model, which is inherently difficult because of the scarcity of information on the parameter with the sampling designs. We propose a novel permutation‐based approach to eliminate the high‐dimensional nuisance parameters in the matched case–control design. The proposed approach reduces to the conditional logistic regression when the model for the gene–environment association is unrestricted. Simulation studies demonstrate good performance of the proposed approach. We apply the proposed approach to a study of gene–environment interaction on coronary artery disease. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

10.
The predictiveness curve is a graphical tool that characterizes the population distribution of Risk(Y)=P(D=1|Y), where D denotes a binary outcome such as occurrence of an event within a specified time period and Y denotes predictors. A wider distribution of Risk(Y) indicates better performance of a risk model in the sense that making treatment recommendations is easier for more subjects. Decisions are more straightforward when a subject's risk is deemed to be high or low. Methods have been developed to estimate predictiveness curves from cohort studies. However, early phase studies to evaluate novel risk prediction markers typically employ case–control designs. Here, we present semiparametric and nonparametric methods for evaluating a continuous risk prediction marker that accommodates case–control data. Small sample properties are investigated through simulation studies. The semiparametric methods are substantially more efficient than their nonparametric counterparts under a correctly specified model. We generalize them to settings where multiple prediction markers are involved. Applications to prostate cancer risk prediction markers illustrate methods for comparing the risk prediction capacities of markers and for evaluating the increment in performance gained by adding a marker to a baseline risk model. We propose a modified Hosmer–Lemeshow test for case–control study data to assess calibration of the risk model that is a natural complement to this graphical tool. Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

11.
Using both simulated and real datasets, we compared two approaches for estimating absolute risk from nested case‐control (NCC) data and demonstrated the feasibility of using the NCC design for estimating absolute risk. In contrast to previously published results, we successfully demonstrated not only that data from a matched NCC study can be used to unbiasedly estimate absolute risk but also that matched studies give better statistical efficiency and classify subjects into more appropriate risk categories. Our result has implications for studies that aim to develop or validate risk prediction models. In addition to the traditional full cohort study and case‐cohort study, researchers designing these studies now have the option of performing a NCC study with huge potential savings in cost and resources. Detailed explanations on how to obtain the absolute risk estimates under the proposed approach are given. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

12.
In unmatched case–control studies, the area under the receiver operating characteristic (ROC) curve (AUC) may be used to measure how well a variable discriminates between cases and controls. The AUC is sometimes used in matched case–control studies by ignoring matching, but it lacks interpretation because it is not based on an estimate of the ROC for the population of interest. We introduce an alternative measure of discrimination that is the concordance of risk factors conditional on the matching factors. Parametric and non‐parametric estimators are given for different matching scenarios, and applied to real data from breast and lung cancer case–control studies. Diagnostic plots to verify the constancy of discrimination over matching factors are demonstrated. The proposed simple measure is easy to use, interpret, more efficient than unmatched AUC statistics and may be applied to compare the conditional discrimination performance of risk factors. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

13.
Analysing the determinants and consequences of hospital‐acquired infections involves the evaluation of large cohorts. Infected patients in the cohort are often rare for specific pathogens, because most of the patients admitted to the hospital are discharged or die without such an infection. Death and discharge are competing events to acquiring an infection, because these individuals are no longer at risk of getting a hospital‐acquired infection. Therefore, the data is best analysed with an extended survival model – the extended illness‐death model. A common problem in cohort studies is the costly collection of covariate values. In order to provide efficient use of data from infected as well as uninfected patients, we propose a tailored case‐cohort approach for the extended illness‐death model. The basic idea of the case‐cohort design is to only use a random sample of the full cohort, referred to as subcohort, and all cases, namely the infected patients. Thus, covariate values are only obtained for a small part of the full cohort. The method is based on existing and established methods and is used to perform regression analysis in adapted Cox proportional hazards models. We propose estimation of all cause‐specific cumulative hazards and transition probabilities in an extended illness‐death model based on case‐cohort sampling. As an example, we apply the methodology to infection with a specific pathogen using a large cohort from Spanish hospital data. The obtained results of the case‐cohort design are compared with the results in the full cohort to investigate the performance of the proposed method. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

14.
The case–control study is a simple and an useful method to characterize the effect of a gene, the effect of an exposure, as well as the interaction between the two. The control‐free case‐only study is yet an even simpler design, if interest is centered on gene–environment interaction only. It requires the sometimes plausible assumption that the gene under study is independent of exposures among the non‐diseased in the study populations. The Hardy–Weinberg equilibrium is also sometimes reasonable to assume. This paper presents an easy‐to‐implement approach for analyzing case–control and case‐only studies under the above dual assumptions. The proposed approach, the ‘conditional logistic regression with counterfactuals’, offers the flexibility for complex modeling yet remains well within the reach to the practicing epidemiologists. When the dual assumptions are met, the conditional logistic regression with counterfactuals is unbiased and has the correct type I error rates. It also results in smaller variances and achieves higher powers as compared with using the conventional analysis (unconditional logistic regression). Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

15.
In this paper, we propose nonlinear distance‐odds models investigating elevated odds around point sources of exposure, under a matched case‐control design where there are subtypes within cases. We consider models analogous to the polychotomous logit models and adjacent‐category logit models for categorical outcomes and extend them to the nonlinear distance‐odds context. We consider multiple point sources as well as covariate adjustments. We evaluate maximum likelihood, profile likelihood, iteratively reweighted least squares, and a hierarchical Bayesian approach using Markov chain Monte Carlo techniques under these distance‐odds models. We compare these methods using an extensive simulation study and show that with multiple parameters and a nonlinear model, Bayesian methods have advantages in terms of estimation stability, precision, and interpretation. We illustrate the methods by analyzing Medicaid claims data corresponding to the pediatric asthma population in Detroit, Michigan, from 2004 to 2006. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

16.
We propose a Bayesian adjustment for the misclassification of a binary exposure variable in a matched case–control study. The method admits a priori knowledge about both the misclassification parameters and the exposure–disease association. The standard Dirichlet prior distribution for a multinomial model is extended to allow separation of prior assertions about the exposure–disease association from assertions about other parameters. The method is applied to a study of occupational risk factors for new‐onset adult asthma. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

17.
Biomarkers are often measured over time in epidemiological studies and clinical trials for better understanding of the mechanism of diseases. In large cohort studies, case‐cohort sampling provides a cost effective method to collect expensive biomarker data for revealing the relationship between biomarker trajectories and time to event. However, biomarker measurements are often limited by the sensitivity and precision of a given assay, resulting in data that are censored at detection limits and prone to measurement errors. Additionally, the occurrence of an event of interest may preclude biomarkers from being further evaluated. Inappropriate handling of these types of data can lead to biased conclusions. Under a classical case cohort design, we propose a modified likelihood‐based approach to accommodate these special features of longitudinal biomarker measurements in the accelerated failure time models. The maximum likelihood estimators based on the full likelihood function are obtained by Gaussian quadrature method. We evaluate the performance of our case‐cohort estimator and compare its relative efficiency to the full cohort estimator through simulation studies. The proposed method is further illustrated using the data from a biomarker study of sepsis among patients with community acquired pneumonia. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

18.
Countermatching designs can provide more efficient estimates than simple matching or case–cohort designs in certain situations such as when good surrogate variables for an exposure of interest are available. We extend pseudolikelihood estimation for the Cox model under countermatching designs to models where time‐varying covariates are considered. We also implement pseudolikelihood with calibrated weights to improve efficiency in nested case–control designs in the presence of time‐varying variables. A simulation study is carried out, which considers four different scenarios including a binary time‐dependent variable, a continuous time‐dependent variable, and the case including interactions in each. Simulation results show that pseudolikelihood with calibrated weights under countermatching offers large gains in efficiency if compared to case–cohort. Pseudolikelihood with calibrated weights yielded more efficient estimators than pseudolikelihood estimators. Additionally, estimators were more efficient under countermatching than under case–cohort for the situations considered. The methods are illustrated using the Colorado Plateau uranium miners cohort. Furthermore, we present a general method to generate survival times with time‐varying covariates. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

19.
Missing data are common in longitudinal studies due to drop‐out, loss to follow‐up, and death. Likelihood‐based mixed effects models for longitudinal data give valid estimates when the data are missing at random (MAR). These assumptions, however, are not testable without further information. In some studies, there is additional information available in the form of an auxiliary variable known to be correlated with the missing outcome of interest. Availability of such auxiliary information provides us with an opportunity to test the MAR assumption. If the MAR assumption is violated, such information can be utilized to reduce or eliminate bias when the missing data process depends on the unobserved outcome through the auxiliary information. We compare two methods of utilizing the auxiliary information: joint modeling of the outcome of interest and the auxiliary variable, and multiple imputation (MI). Simulation studies are performed to examine the two methods. The likelihood‐based joint modeling approach is consistent and most efficient when correctly specified. However, mis‐specification of the joint distribution can lead to biased results. MI is slightly less efficient than a correct joint modeling approach and can also be biased when the imputation model is mis‐specified, though it is more robust to mis‐specification of the imputation distribution when all the variables affecting the missing data mechanism and the missing outcome are included in the imputation model. An example is presented from a dementia screening study. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

20.
Multivariable analysis of proteomics data using standard statistical models is hindered by the presence of incomplete data. We faced this issue in a nested case–control study of 135 incident cases of myocardial infarction and 135 pair‐matched controls from the Framingham Heart Study Offspring cohort. Plasma protein markers (K = 861) were measured on the case–control pairs (N = 135), and the majority of proteins had missing expression values for a subset of samples. In the setting of many more variables than observations (K ? N), we explored and documented the feasibility of multiple imputation approaches along with subsequent analysis of the imputed data sets. Initially, we selected proteins with complete expression data (K = 261) and randomly masked some values as the basis of simulation to tune the imputation and analysis process. We randomly shuffled proteins into several bins, performed multiple imputation within each bin, and followed up with stepwise selection using conditional logistic regression within each bin. This process was repeated hundreds of times. We determined the optimal method of multiple imputation, number of proteins per bin, and number of random shuffles using several performance statistics. We then applied this method to 544 proteins with incomplete expression data (≤40% missing values), from which we identified a panel of seven proteins that were jointly associated with myocardial infarction. © 2015 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号