首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 578 毫秒
1.
The case–control study is a simple and an useful method to characterize the effect of a gene, the effect of an exposure, as well as the interaction between the two. The control‐free case‐only study is yet an even simpler design, if interest is centered on gene–environment interaction only. It requires the sometimes plausible assumption that the gene under study is independent of exposures among the non‐diseased in the study populations. The Hardy–Weinberg equilibrium is also sometimes reasonable to assume. This paper presents an easy‐to‐implement approach for analyzing case–control and case‐only studies under the above dual assumptions. The proposed approach, the ‘conditional logistic regression with counterfactuals’, offers the flexibility for complex modeling yet remains well within the reach to the practicing epidemiologists. When the dual assumptions are met, the conditional logistic regression with counterfactuals is unbiased and has the correct type I error rates. It also results in smaller variances and achieves higher powers as compared with using the conventional analysis (unconditional logistic regression). Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

2.
Case–control studies are particularly prone to selection bias, which can affect odds ratio estimation. Approaches to discovering and adjusting for selection bias have been proposed in the literature using graphical and heuristic tools as well as more complex statistical methods. The approach we propose is based on a survey‐weighting method termed Bayesian post‐stratification and follows from the conditional independences that characterise selection bias. We use our approach to perform a selection bias sensitivity analysis by using ancillary data sources that describe the target case–control population to re‐weight the odds ratio estimates obtained from the study. The method is applied to two case–control studies, the first investigating the association between exposure to electromagnetic fields and acute lymphoblastic leukaemia in children and the second investigating the association between maternal occupational exposure to hairspray and a congenital anomaly in male babies called hypospadias. In both case–control studies, our method showed that the odds ratios were only moderately sensitive to selection bias. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

3.
In case‐control studies, exposure assessments are almost always error‐prone. In the absence of a gold standard, two or more assessment approaches are often used to classify people with respect to exposure. Each imperfect assessment tool may lead to misclassification of exposure assignment; the exposure misclassification may be differential with respect to case status or not; and, the errors in exposure classification under the different approaches may be independent (conditional upon the true exposure status) or not. Although methods have been proposed to study diagnostic accuracy in the absence of a gold standard, these methods are infrequently used in case‐control studies to correct exposure misclassification that is simultaneously differential and dependent. In this paper, we proposed a Bayesian method to estimate the measurement‐error corrected exposure‐disease association, accounting for both differential and dependent misclassification. The performance of the proposed method is investigated using simulations, which show that the proposed approach works well, as well as an application to a case‐control study assessing the association between asbestos exposure and mesothelioma. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

4.
The matched case‐control designs are commonly used to control for potential confounding factors in genetic epidemiology studies especially epigenetic studies with DNA methylation. Compared with unmatched case‐control studies with high‐dimensional genomic or epigenetic data, there have been few variable selection methods for matched sets. In an earlier paper, we proposed the penalized logistic regression model for the analysis of unmatched DNA methylation data using a network‐based penalty. However, for popularly applied matched designs in epigenetic studies that compare DNA methylation between tumor and adjacent non‐tumor tissues or between pre‐treatment and post‐treatment conditions, applying ordinary logistic regression ignoring matching is known to bring serious bias in estimation. In this paper, we developed a penalized conditional logistic model using the network‐based penalty that encourages a grouping effect of (1) linked Cytosine‐phosphate‐Guanine (CpG) sites within a gene or (2) linked genes within a genetic pathway for analysis of matched DNA methylation data. In our simulation studies, we demonstrated the superiority of using conditional logistic model over unconditional logistic model in high‐dimensional variable selection problems for matched case‐control data. We further investigated the benefits of utilizing biological group or graph information for matched case‐control data. We applied the proposed method to a genome‐wide DNA methylation study on hepatocellular carcinoma (HCC) where we investigated the DNA methylation levels of tumor and adjacent non‐tumor tissues from HCC patients by using the Illumina Infinium HumanMethylation27 Beadchip. Several new CpG sites and genes known to be related to HCC were identified but were missed by the standard method in the original paper. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

5.
In many large prospective cohorts, expensive exposure measurements cannot be obtained for all individuals. Exposure–disease association studies are therefore often based on nested case–control or case–cohort studies in which complete information is obtained only for sampled individuals. However, in the full cohort, there may be a large amount of information on cheaply available covariates and possibly a surrogate of the main exposure(s), which typically goes unused. We view the nested case–control or case–cohort study plus the remainder of the cohort as a full‐cohort study with missing data. Hence, we propose using multiple imputation (MI) to utilise information in the full cohort when data from the sub‐studies are analysed. We use the fully observed data to fit the imputation models. We consider using approximate imputation models and also using rejection sampling to draw imputed values from the true distribution of the missing values given the observed data. Simulation studies show that using MI to utilise full‐cohort information in the analysis of nested case–control and case–cohort studies can result in important gains in efficiency, particularly when a surrogate of the main exposure is available in the full cohort. In simulations, this method outperforms counter‐matching in nested case–control studies and a weighted analysis for case–cohort studies, both of which use some full‐cohort information. Approximate imputation models perform well except when there are interactions or non‐linear terms in the outcome model, where imputation using rejection sampling works well. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

6.
Investigators interested in whether a disease aggregates in families often collect case‐control family data, which consist of disease status and covariate information for members of families selected via case or control probands. Here, we focus on the use of case‐control family data to investigate the relative contributions to the disease of additive genetic effects (A), shared family environment (C), and unique environment (E). We describe an ACE model for binary family data; this structural equation model, which has been described previously, combines a general‐family extension of the classic ACE twin model with a (possibly covariate‐specific) liability‐threshold model for binary outcomes. We then introduce our contribution, a likelihood‐based approach to fitting the model to singly ascertained case‐control family data. The approach, which involves conditioning on the proband's disease status and also setting prevalence equal to a prespecified value that can be estimated from the data, makes it possible to obtain valid estimates of the A, C, and E variance components from case‐control (rather than only from population‐based) family data. In fact, simulation experiments suggest that our approach to fitting yields approximately unbiased estimates of the A, C, and E variance components, provided that certain commonly made assumptions hold. Further, when our approach is used to fit the ACE model to Austrian case‐control family data on depression, the resulting estimate of heritability is very similar to those from previous analyses of twin data. Genet. Epidemiol. 34: 238–245, 2010. © 2009 Wiley‐Liss, Inc.  相似文献   

7.
Genome‐wide association studies (GWAS) require considerable investment, so researchers often study multiple traits collected on the same set of subjects to maximize return. However, many GWAS have adopted a case‐control design; improperly accounting for case‐control ascertainment can lead to biased estimates of association between markers and secondary traits. We show that under the null hypothesis of no marker‐secondary trait association, naïve analyses that ignore ascertainment or stratify on case‐control status have proper Type I error rates except when both the marker and secondary trait are independently associated with disease risk. Under the alternative hypothesis, these methods are unbiased when the secondary trait is not associated with disease risk. We also show that inverse‐probability‐of‐sampling‐weighted (IPW) regression provides unbiased estimates of marker‐secondary trait association. We use simulation to quantify the Type I error, power and bias of naïve and IPW methods. IPW regression has appropriate Type I error in all situations we consider, but has lower power than naïve analyses. The bias for naïve analyses is small provided the marker is independent of disease risk. Considering the majority of tested markers in a GWAS are not associated with disease risk, naïve analyses provide valid tests of and nearly unbiased estimates of marker‐secondary trait association. Care must be taken when there is evidence that both the secondary trait and tested marker are associated with the primary disease, a situation we illustrate using an analysis of the relationship between a marker in FGFR2 and mammographic density in a breast cancer case‐control sample. Genet. Epidemiol. 33:717–728, 2009. © 2009 Wiley‐Liss, Inc.  相似文献   

8.
Using both simulated and real datasets, we compared two approaches for estimating absolute risk from nested case‐control (NCC) data and demonstrated the feasibility of using the NCC design for estimating absolute risk. In contrast to previously published results, we successfully demonstrated not only that data from a matched NCC study can be used to unbiasedly estimate absolute risk but also that matched studies give better statistical efficiency and classify subjects into more appropriate risk categories. Our result has implications for studies that aim to develop or validate risk prediction models. In addition to the traditional full cohort study and case‐cohort study, researchers designing these studies now have the option of performing a NCC study with huge potential savings in cost and resources. Detailed explanations on how to obtain the absolute risk estimates under the proposed approach are given. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

9.
We propose a semiparametric odds ratio model that extends Umbach and Weinberg's approach to exploiting gene–environment association model for efficiency gains in case–control designs to both discrete and continuous data. We directly model the gene–environment association in the control population to avoid estimating the intercept in the disease risk model, which is inherently difficult because of the scarcity of information on the parameter with the sampling designs. We propose a novel permutation‐based approach to eliminate the high‐dimensional nuisance parameters in the matched case–control design. The proposed approach reduces to the conditional logistic regression when the model for the gene–environment association is unrestricted. Simulation studies demonstrate good performance of the proposed approach. We apply the proposed approach to a study of gene–environment interaction on coronary artery disease. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

10.
One of the main perceived advantages of using a case‐cohort design compared with a nested case‐control design in an epidemiologic study is the ability to evaluate with the same subcohort outcomes other than the primary outcome of interest. In this paper, we show that valid inferences about secondary outcomes can also be achieved in nested case‐control studies by using the inclusion probability weighting method in combination with an approximate jackknife standard error that can be computed using existing software. Simulation studies demonstrate that when the sample size is sufficient, this approach yields valid type 1 error and coverage rates for the analysis of secondary outcomes in nested case‐control designs. Interestingly, the statistical power of the nested case‐control design was comparable with that of the case‐cohort design when the primary and secondary outcomes were positively correlated. The proposed method is illustrated with the data from a cohort in Cardiovascular Health Study to study the association of C‐reactive protein levels and the incidence of congestive heart failure. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

11.
We examine the impact of nondifferential outcome misclassification on odds ratios estimated from pair‐matched case‐control studies and propose a Bayesian model to adjust these estimates for misclassification bias. The model relies on access to a validation subgroup with confirmed outcome status for all case‐control pairs as well as prior knowledge about the positive and negative predictive value of the classification mechanism. We illustrate the model's performance on simulated data and apply it to a database study examining the presence of ten morbidities in the prodromal phase of multiple sclerosis.  相似文献   

12.
Familial aggregation and the role of genetic and environmental factors can be investigated through family studies analysed using the liability‐threshold model. The liability‐threshold model ignores the timing of events including the age of disease onset and right censoring, which can lead to estimates that are difficult to interpret and are potentially biased. We incorporate the time aspect into the liability‐threshold model for case‐control‐family data following the same approach that has been applied in the twin setting. Thus, the data are considered as arising from a competing risks setting and inverse probability of censoring weights are used to adjust for right censoring. In the case‐control‐family setting, recognising the existence of competing events is highly relevant to the sampling of control probands. Because of the presence of multiple family members who may be censored at different ages, the estimation of inverse probability of censoring weights is not as straightforward as in the twin setting but requires consideration. We propose to employ a composite likelihood conditioning on proband status that markedly simplifies adjustment for right censoring. We assess the proposed approach using simulation studies and apply it in the analysis of two Danish register‐based case‐control‐family studies: one on cancer diagnosed in childhood and adolescence, and one on early‐onset breast cancer. Copyright © 2017 John Wiley & Sons, Ltd.  相似文献   

13.
The self‐controlled case series (SCCS) method is an alternative to study designs such as cohort and case control methods and is used to investigate potential associations between the timing of vaccine or other drug exposures and adverse events. It requires information only on cases, individuals who have experienced the adverse event at least once, and automatically controls all fixed confounding variables that could modify the true association between exposure and adverse event. Time‐varying confounders such as age, on the other hand, are not automatically controlled and must be allowed for explicitly. The original SCCS method used step functions to represent risk periods (windows of exposed time) and age effects. Hence, exposure risk periods and/or age groups have to be prespecified a priori, but a poor choice of group boundaries may lead to biased estimates. In this paper, we propose a nonparametric SCCS method in which both age and exposure effects are represented by spline functions at the same time. To avoid a numerical integration of the product of these two spline functions in the likelihood function of the SCCS method, we defined the first, second, and third integrals of I‐splines based on the definition of integrals of M‐splines. Simulation studies showed that the new method performs well. This new method is applied to data on pediatric vaccines. Copyright © 2017 John Wiley & Sons, Ltd.  相似文献   

14.
Estimation of marginal causal effects from case‐control data has two complications: (i) confounding due to the fact that the exposure under study is not randomized, and (ii) bias from the case‐control sampling scheme. In this paper, we study estimators of the marginal causal odds ratio, addressing these issues for matched and unmatched case‐control designs when utilizing the knowledge of the known prevalence of being a case. The estimators are implemented in simulations where their finite sample properties are studied and approximations of their variances are derived with the delta method. Also, we illustrate the methods by analyzing the effect of low birth weight on the risk of type 1 diabetes mellitus using data from the Swedish Childhood Diabetes Register, a nationwide population‐based incidence register. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

15.
For binary or categorical response models, most goodness‐of‐fit statistics are based on the notion of partitioning the subjects into groups or regions and comparing the observed and predicted responses in these regions by a suitable chi‐squared distribution. Existing strategies create this partition based on the predicted response probabilities, or propensity scores, from the fitted model. In this paper, we follow a retrospective approach, borrowing the notion of balancing scores used in causal inference to inspect the conditional distribution of the predictors, given the propensity scores, in each category of the response to assess model adequacy. We can use this diagnostic under both prospective and retrospective sampling designs, and it may ascertain general forms of misspecification. We first present simple graphical and numerical summaries that can be used in a binary logistic model. We then generalize the tools to propose model diagnostics for the proportional odds model. We illustrate the methods with simulation studies and two data examples: (i) a case‐control study of the association between cumulative lead exposure and Parkinson's disease in the Boston, Massachusetts, area and (ii) and a cohort study of biomarkers possibly associated with diabetes, from the VA Normative Aging Study. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

16.
Outcome‐dependent sampling (ODS) scheme is a cost‐effective way to conduct a study. For a study with continuous primary outcome, an ODS scheme can be implemented where the expensive exposure is only measured on a simple random sample and supplemental samples selected from 2 tails of the primary outcome variable. With the tremendous cost invested in collecting the primary exposure information, investigators often would like to use the available data to study the relationship between a secondary outcome and the obtained exposure variable. This is referred as secondary analysis. Secondary analysis in ODS designs can be tricky, as the ODS sample is not a random sample from the general population. In this article, we use the inverse probability weighted and augmented inverse probability weighted estimating equations to analyze the secondary outcome for data obtained from the ODS design. We do not make any parametric assumptions on the primary and secondary outcome and only specify the form of the regression mean models, thus allow an arbitrary error distribution. Our approach is robust to second‐ and higher‐order moment misspecification. It also leads to more precise estimates of the parameters by effectively using all the available participants. Through simulation studies, we show that the proposed estimator is consistent and asymptotically normal. Data from the Collaborative Perinatal Project are analyzed to illustrate our method.  相似文献   

17.
Genetic association studies of obstetric complications may genotype case and control mothers, or their respective newborns, or both case‐control mothers and their children. The relatively high prevalence of many obstetric complications and the availability of both maternal and offspring's genotype data have provided motivation to study new methods for testing for deviations from Hardy‐Weinberg equilibrium (HWE). We propose four novel test statistics, each of which uses a different type of data as follows: (1) a test using maternal case‐control genotype data, (2) a test using offspring genotype data, (3) a combination of the first and second tests, and (4) a test based on the joint classification of case‐control maternal‐child genotype data. The selection of case and control mothers (and thus their children) is accounted for by weighting both maternal and child contributions to the test statistics with sampling probabilities. Our tests thus do not require that the phenotype be rare as is the case for HWE tests using only controls, and are particularly suitable for genetic association studies of relatively common complications such as premature birth. The third and fourth tests described above utilize both maternal and child genotype data and appropriately account for the correlation between maternal and child genotypes. On the basis of extensive simulation studies to compare the type‐I error and power for proposed tests, we recommend the third combined test statistic for routine use in the analysis of case‐control studies of mother‐child pairs. Genet. Epidemiol. 33:539–548, 2009. © 2009 Wiley‐Liss, Inc.  相似文献   

18.
There has been increasing interest in identifying genes within the human genome that influence multiple diverse phenotypes. In the presence of pleiotropy, joint testing of these phenotypes is not only biologically meaningful but also statistically more powerful than univariate analysis of each separate phenotype accounting for multiple testing. Although many cross‐phenotype association tests exist, the majority of such methods assume samples composed of unrelated subjects and therefore are not applicable to family‐based designs, including the valuable case‐parent trio design. In this paper, we describe a robust gene‐based association test of multiple phenotypes collected in a case‐parent trio study. Our method is based on the kernel distance covariance (KDC) method, where we first construct a similarity matrix for multiple phenotypes and a similarity matrix for genetic variants in a gene; we then test the dependency between the two similarity matrices. The method is applicable to either common variants or rare variants in a gene, and resulting tests from the method are by design robust to confounding due to population stratification. We evaluated our method through simulation studies and observed that the method is substantially more powerful than standard univariate testing of each separate phenotype. We also applied our method to phenotypic and genotypic data collected in case‐parent trios as part of the Genetics of Kidneys in Diabetes (GoKinD) study and identified a genome‐wide significant gene demonstrating cross‐phenotype effects that was not identified using standard univariate approaches.  相似文献   

19.
Analysis of population‐based case–control studies with complex sampling designs is challenging because the sample selection probabilities (and, therefore, the sample weights) depend on the response variable and covariates. Commonly, the design‐consistent (weighted) estimators of the parameters of the population regression model are obtained by solving (sample) weighted estimating equations. Weighted estimators, however, are known to be inefficient when the weights are highly variable as is typical for case–control designs. In this paper, we propose two alternative estimators that have higher efficiency and smaller finite sample bias compared with the weighted estimator. Both methods incorporate the information included in the sample weights by modeling the sample expectation of the weights conditional on design variables. We discuss benefits and limitations of each of the two proposed estimators emphasizing efficiency and robustness. We compare the finite sample properties of the two new estimators and traditionally used weighted estimators with the use of simulated data under various sampling scenarios. We apply the methods to the U.S. Kidney Cancer Case‐Control Study to identify risk factors. Published 2012. This article is a US Government work and is in the public domain in the USA.  相似文献   

20.
Joint effects of genetic and environmental factors have been increasingly recognized in the development of many complex human diseases. Despite the popularity of case‐control and case‐only designs, longitudinal cohort studies that can capture time‐varying outcome and exposure information have long been recommended for gene–environment (G × E) interactions. To date, literature on sampling designs for longitudinal studies of G × E interaction is quite limited. We therefore consider designs that can prioritize a subsample of the existing cohort for retrospective genotyping on the basis of currently available outcome, exposure, and covariate data. In this work, we propose stratified sampling based on summaries of individual exposures and outcome trajectories and develop a full conditional likelihood approach for estimation that adjusts for the biased sample. We compare the performance of our proposed design and analysis with combinations of different sampling designs and estimation approaches via simulation. We observe that the full conditional likelihood provides improved estimates for the G × E interaction and joint exposure effects over uncorrected complete‐case analysis, and the exposure enriched outcome trajectory dependent design outperforms other designs in terms of estimation efficiency and power for detection of the G × E interaction. We also illustrate our design and analysis using data from the Normative Aging Study, an ongoing longitudinal cohort study initiated by the Veterans Administration in 1963. Copyright © 2017 John Wiley & Sons, Ltd.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号