首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
We propose a semiparametric odds ratio model that extends Umbach and Weinberg's approach to exploiting gene–environment association model for efficiency gains in case–control designs to both discrete and continuous data. We directly model the gene–environment association in the control population to avoid estimating the intercept in the disease risk model, which is inherently difficult because of the scarcity of information on the parameter with the sampling designs. We propose a novel permutation‐based approach to eliminate the high‐dimensional nuisance parameters in the matched case–control design. The proposed approach reduces to the conditional logistic regression when the model for the gene–environment association is unrestricted. Simulation studies demonstrate good performance of the proposed approach. We apply the proposed approach to a study of gene–environment interaction on coronary artery disease. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

2.
In matched case‐crossover studies, it is generally accepted that the covariates on which a case and associated controls are matched cannot exert a confounding effect on independent predictors included in the conditional logistic regression model. This is because any stratum effect is removed by the conditioning on the fixed number of sets of the case and controls in the stratum. Hence, the conditional logistic regression model is not able to detect any effects associated with the matching covariates by stratum. However, some matching covariates such as time often play an important role as an effect modification leading to incorrect statistical estimation and prediction. Therefore, we propose three approaches to evaluate effect modification by time. The first is a parametric approach, the second is a semiparametric penalized approach, and the third is a semiparametric Bayesian approach. Our parametric approach is a two‐stage method, which uses conditional logistic regression in the first stage and then estimates polynomial regression in the second stage. Our semiparametric penalized and Bayesian approaches are one‐stage approaches developed by using regression splines. Our semiparametric one stage approach allows us to not only detect the parametric relationship between the predictor and binary outcomes, but also evaluate nonparametric relationships between the predictor and time. We demonstrate the advantage of our semiparametric one‐stage approaches using both a simulation study and an epidemiological example of a 1‐4 bi‐directional case‐crossover study of childhood aseptic meningitis with drinking water turbidity. We also provide statistical inference for the semiparametric Bayesian approach using Bayes Factors. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

3.
In genetic association studies it is becoming increasingly imperative to have large sample sizes to identify and replicate genetic effects. To achieve these sample sizes, many research initiatives are encouraging the collaboration and combination of several existing matched and unmatched case–control studies. Thus, it is becoming more common to compare multiple sets of controls with the same case group or multiple case groups to validate or confirm a positive or negative finding. Usually, a naive approach of fitting separate models for each case–control comparison is used to make inference about disease–exposure association. But, this approach does not make use of all the observed data and hence could lead to inconsistent results. The problem is compounded when a common case group is used in each case–control comparison. An alternative to fitting separate models is to use a polytomous logistic model but, this model does not combine matched and unmatched case–control data. Thus, we propose a polytomous logistic regression approach based on a latent group indicator and a conditional likelihood to do a combined analysis of matched and unmatched case–control data. We use simulation studies to evaluate the performance of the proposed method and a case–control study of multiple myeloma and Inter‐Leukin‐6 as an example. Our results indicate that the proposed method leads to a more efficient homogeneity test and a pooled estimate with smaller standard error. Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

4.
The matched case‐control designs are commonly used to control for potential confounding factors in genetic epidemiology studies especially epigenetic studies with DNA methylation. Compared with unmatched case‐control studies with high‐dimensional genomic or epigenetic data, there have been few variable selection methods for matched sets. In an earlier paper, we proposed the penalized logistic regression model for the analysis of unmatched DNA methylation data using a network‐based penalty. However, for popularly applied matched designs in epigenetic studies that compare DNA methylation between tumor and adjacent non‐tumor tissues or between pre‐treatment and post‐treatment conditions, applying ordinary logistic regression ignoring matching is known to bring serious bias in estimation. In this paper, we developed a penalized conditional logistic model using the network‐based penalty that encourages a grouping effect of (1) linked Cytosine‐phosphate‐Guanine (CpG) sites within a gene or (2) linked genes within a genetic pathway for analysis of matched DNA methylation data. In our simulation studies, we demonstrated the superiority of using conditional logistic model over unconditional logistic model in high‐dimensional variable selection problems for matched case‐control data. We further investigated the benefits of utilizing biological group or graph information for matched case‐control data. We applied the proposed method to a genome‐wide DNA methylation study on hepatocellular carcinoma (HCC) where we investigated the DNA methylation levels of tumor and adjacent non‐tumor tissues from HCC patients by using the Illumina Infinium HumanMethylation27 Beadchip. Several new CpG sites and genes known to be related to HCC were identified but were missed by the standard method in the original paper. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

5.
Case–control studies are particularly prone to selection bias, which can affect odds ratio estimation. Approaches to discovering and adjusting for selection bias have been proposed in the literature using graphical and heuristic tools as well as more complex statistical methods. The approach we propose is based on a survey‐weighting method termed Bayesian post‐stratification and follows from the conditional independences that characterise selection bias. We use our approach to perform a selection bias sensitivity analysis by using ancillary data sources that describe the target case–control population to re‐weight the odds ratio estimates obtained from the study. The method is applied to two case–control studies, the first investigating the association between exposure to electromagnetic fields and acute lymphoblastic leukaemia in children and the second investigating the association between maternal occupational exposure to hairspray and a congenital anomaly in male babies called hypospadias. In both case–control studies, our method showed that the odds ratios were only moderately sensitive to selection bias. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

6.
A number of new study designs have appeared in which the exposure distribution of a case series is compared to an exposure distribution representing a complete theoretical population or distribution. These designs include the case‐genotype study, the case‐cross‐over study, and the case‐specular study. This paper describes a unified likelihood‐based approach to the analysis of such studies, and discusses extensions of these methods when a control group is available. The approach clarifies certain assumptions implicit in the methods, and helps contrast these assumptions to those underlying ordinary case‐control studies. There are several reasons to expect discrepancies between ordinary case‐control estimates and case‐distribution estimates; for example, case‐distribution estimates can be more sensitive to exposure misclassification. Some discrepancies are illustrated in an application to case‐specular data on wire codes and childhood cancer. Copyright © 1999 John Wiley & Sons, Ltd.  相似文献   

7.
Genome‐wide association studies (GWAS) are now routinely imputed for untyped single nucleotide polymorphisms (SNPs) based on various powerful statistical algorithms for imputation trained on reference datasets. The use of predicted allele counts for imputed SNPs as the dosage variable is known to produce valid score test for genetic association. In this paper, we investigate how to best handle imputed SNPs in various modern complex tests for genetic associations incorporating gene–environment interactions. We focus on case‐control association studies where inference for an underlying logistic regression model can be performed using alternative methods that rely on varying degree on an assumption of gene–environment independence in the underlying population. As increasingly large‐scale GWAS are being performed through consortia effort where it is preferable to share only summary‐level information across studies, we also describe simple mechanisms for implementing score tests based on standard meta‐analysis of “one‐step” maximum‐likelihood estimates across studies. Applications of the methods in simulation studies and a dataset from GWAS of lung cancer illustrate ability of the proposed methods to maintain type‐I error rates for the underlying testing procedures. For analysis of imputed SNPs, similar to typed SNPs, the retrospective methods can lead to considerable efficiency gain for modeling of gene–environment interactions under the assumption of gene–environment independence. Methods are made available for public use through CGEN R software package.  相似文献   

8.
In unmatched case–control studies, the area under the receiver operating characteristic (ROC) curve (AUC) may be used to measure how well a variable discriminates between cases and controls. The AUC is sometimes used in matched case–control studies by ignoring matching, but it lacks interpretation because it is not based on an estimate of the ROC for the population of interest. We introduce an alternative measure of discrimination that is the concordance of risk factors conditional on the matching factors. Parametric and non‐parametric estimators are given for different matching scenarios, and applied to real data from breast and lung cancer case–control studies. Diagnostic plots to verify the constancy of discrimination over matching factors are demonstrated. The proposed simple measure is easy to use, interpret, more efficient than unmatched AUC statistics and may be applied to compare the conditional discrimination performance of risk factors. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

9.
The stereotype regression model for categorical outcomes, proposed by Anderson (J. Roy. Statist. Soc. B. 1984; 46 :1–30) is nested between the baseline‐category logits and adjacent category logits model with proportional odds structure. The stereotype model is more parsimonious than the ordinary baseline‐category (or multinomial logistic) model due to a product representation of the log‐odds‐ratios in terms of a common parameter corresponding to each predictor and category‐specific scores. The model could be used for both ordered and unordered outcomes. For ordered outcomes, the stereotype model allows more flexibility than the popular proportional odds model in capturing highly subjective ordinal scaling which does not result from categorization of a single latent variable, but are inherently multi‐dimensional in nature. As pointed out by Greenland (Statist. Med. 1994; 13 :1665–1677), an additional advantage of the stereotype model is that it provides unbiased and valid inference under outcome‐stratified sampling as in case–control studies. In addition, for matched case–control studies, the stereotype model is amenable to classical conditional likelihood principle, whereas there is no reduction due to sufficiency under the proportional odds model. In spite of these attractive features, the model has been applied less, as there are issues with maximum likelihood estimation and likelihood‐based testing approaches due to non‐linearity and lack of identifiability of the parameters. We present comprehensive Bayesian inference and model comparison procedure for this class of models as an alternative to the classical frequentist approach. We illustrate our methodology by analyzing data from The Flint Men's Health Study, a case–control study of prostate cancer in African‐American men aged 40–79 years. We use clinical staging of prostate cancer in terms of Tumors, Nodes and Metastasis as the categorical response of interest. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

10.
Detection of gene–gene interactions is one of the most important challenges in genome‐wide case–control studies. Besides traditional logistic regression analysis, recently the entropy‐based methods attracted a significant attention. Among entropy‐based methods, interaction information is one of the most promising measures having many desirable properties. Although both logistic regression and interaction information have been used in several genome‐wide association studies, the relationship between them has not been thoroughly investigated theoretically. The present paper attempts to fill this gap. We show that although certain connections between the two methods exist, in general they refer two different concepts of dependence and looking for interactions in those two senses leads to different approaches to interaction detection. We introduce ordering between interaction measures and specify conditions for independent and dependent genes under which interaction information is more discriminative measure than logistic regression. Moreover, we show that for so‐called perfect distributions those measures are equivalent. The numerical experiments illustrate the theoretical findings indicating that interaction information and its modified version are more universal tools for detecting various types of interaction than logistic regression and linkage disequilibrium measures.  相似文献   

11.
The behavior of the conditional logistic estimator is analyzed under a causal model for two‐arm experimental studies with possible non‐compliance in which the effect of the treatment is measured by a binary response variable. We show that, when non‐compliance may only be observed in the treatment arm, the effect (measured on the logit scale) of the treatment on compliers and that of the control on non‐compliers can be identified and consistently estimated under mild conditions. The same does not happen for the effect of the control on compliers. A simple correction of the conditional logistic estimator is then proposed, which allows us to considerably reduce the bias in estimating this quantity and the causal effect of the treatment over control on compliers. A two‐step estimator results on the basis of which we can also set up a Wald test for the hypothesis of absence of a causal effect of the treatment. The asymptotic properties of the estimator are studied by exploiting the general theory on maximum likelihood estimation of misspecified models. Finite‐sample properties of the estimator and of the related Wald test are studied by simulation. The extension of the approach to the case of missing responses is also outlined. The approach is illustrated by an application to a dataset deriving from a study on the efficacy of a training course on the breast self examination practice. Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

12.
Measurement error is common in epidemiological and biomedical studies. When biomarkers are measured in batches or groups, measurement error is potentially correlated within each batch or group. In regression analysis, most existing methods are not applicable in the presence of batch‐specific measurement error in predictors. We propose a robust conditional likelihood approach to account for batch‐specific error in predictors when batch effect is additive and the predominant source of error, which requires no assumptions on the distribution of measurement error. Although a regression model with batch as a categorical covariable yields the same parameter estimates as the proposed conditional likelihood approach for linear regression, this result does not hold in general for all generalized linear models, in particular, logistic regression. Our simulation studies show that the conditional likelihood approach achieves better finite sample performance than the regression calibration approach or a naive approach without adjustment for measurement error. In the case of logistic regression, our proposed approach is shown to also outperform the regression approach with batch as a categorical covariate. In addition, we also examine a ‘hybrid’ approach combining the conditional likelihood method and the regression calibration method, which is shown in simulations to achieve good performance in the presence of both batch‐specific and measurement‐specific errors. We illustrate our method by using data from a colorectal adenoma study. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

13.
With challenges in data harmonization and environmental heterogeneity across various data sources, meta‐analysis of gene–environment interaction studies can often involve subtle statistical issues. In this paper, we study the effect of environmental covariate heterogeneity (within and between cohorts) on two approaches for fixed‐effect meta‐analysis: the standard inverse‐variance weighted meta‐analysis and a meta‐regression approach. Akin to the results in Simmonds and Higgins ( 2007 ), we obtain analytic efficiency results for both methods under certain assumptions. The relative efficiency of the two methods depends on the ratio of within versus between cohort variability of the environmental covariate. We propose to use an adaptively weighted estimator (AWE), between meta‐analysis and meta‐regression, for the interaction parameter. The AWE retains full efficiency of the joint analysis using individual level data under certain natural assumptions. Lin and Zeng (2010a, b) showed that a multivariate inverse‐variance weighted estimator retains full efficiency as joint analysis using individual level data, if the estimates with full covariance matrices for all the common parameters are pooled across all studies. We show consistency of our work with Lin and Zeng (2010a, b). Without sacrificing much efficiency, the AWE uses only univariate summary statistics from each study, and bypasses issues with sharing individual level data or full covariance matrices across studies. We compare the performance of the methods both analytically and numerically. The methods are illustrated through meta‐analysis of interaction between Single Nucleotide Polymorphisms in FTO gene and body mass index on high‐density lipoprotein cholesterol data from a set of eight studies of type 2 diabetes.  相似文献   

14.
In many large prospective cohorts, expensive exposure measurements cannot be obtained for all individuals. Exposure–disease association studies are therefore often based on nested case–control or case–cohort studies in which complete information is obtained only for sampled individuals. However, in the full cohort, there may be a large amount of information on cheaply available covariates and possibly a surrogate of the main exposure(s), which typically goes unused. We view the nested case–control or case–cohort study plus the remainder of the cohort as a full‐cohort study with missing data. Hence, we propose using multiple imputation (MI) to utilise information in the full cohort when data from the sub‐studies are analysed. We use the fully observed data to fit the imputation models. We consider using approximate imputation models and also using rejection sampling to draw imputed values from the true distribution of the missing values given the observed data. Simulation studies show that using MI to utilise full‐cohort information in the analysis of nested case–control and case–cohort studies can result in important gains in efficiency, particularly when a surrogate of the main exposure is available in the full cohort. In simulations, this method outperforms counter‐matching in nested case–control studies and a weighted analysis for case–cohort studies, both of which use some full‐cohort information. Approximate imputation models perform well except when there are interactions or non‐linear terms in the outcome model, where imputation using rejection sampling works well. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

15.
In case‐control single nucleotide polymorphism (SNP) data, the allele frequency, Hardy Weinberg Disequilibrium, and linkage disequilibrium (LD) contrast tests are three distinct sources of information about genetic association. While all three tests are typically developed in a retrospective context, we show that prospective logistic regression models may be developed that correspond conceptually to the retrospective tests. This approach provides a flexible framework for conducting a systematic series of association analyses using unphased genotype data and any number of covariates. For a single stage study, two single‐marker tests and four two‐marker tests are discussed. The true association models are derived and they allow us to understand why a model with only a linear term will generally fit well for a SNP in weak LD with a causal SNP, whatever the disease model, but not for a SNP in high LD with a non‐additive disease SNP. We investigate the power of the association tests using real LD parameters from chromosome 11 in the HapMap CEU population data. Among the single‐marker tests, the allelic test has on average the most power in the case of an additive disease, but for dominant, recessive, and heterozygote disadvantage diseases, the genotypic test has the most power. Among the four two‐marker tests, the Allelic‐LD contrast test, which incorporates linear terms for two markers and their interaction term, provides the most reliable power overall for the cases studied. Therefore, our result supports incorporating an interaction term as well as linear terms in multi‐marker tests. Genet. Epidemiol. 34:67–77, 2010. © 2009 Wiley‐Liss, Inc.  相似文献   

16.
Genome‐wide association studies are helping to dissect the etiology of complex diseases. Although case‐control association tests are generally more powerful than family‐based association tests, population stratification can lead to spurious disease‐marker association or mask a true association. Several methods have been proposed to match cases and controls prior to genotyping, using family information or epidemiological data, or using genotype data for a modest number of genetic markers. Here, we describe a genetic similarity score matching (GSM) method for efficient matched analysis of cases and controls in a genome‐wide or large‐scale candidate gene association study. GSM comprises three steps: (1) calculating similarity scores for pairs of individuals using the genotype data; (2) matching sets of cases and controls based on the similarity scores so that matched cases and controls have similar genetic background; and (3) using conditional logistic regression to perform association tests. Through computer simulation we show that GSM correctly controls false‐positive rates and improves power to detect true disease predisposing variants. We compare GSM to genomic control using computer simulations, and find improved power using GSM. We suggest that initial matching of cases and controls prior to genotyping combined with careful re‐matching after genotyping is a method of choice for genome‐wide association studies. Genet. Epidemiol. 33:508–517, 2009. © 2009 Wiley‐Liss, Inc.  相似文献   

17.
Genetic heterogeneity, which may manifest on a population level as different frequencies of a specific disease susceptibility allele in different subsets of patients, is a common problem for candidate gene and genome‐wide association studies of complex human diseases. The ordered subset analysis (OSA) was originally developed as a method to reduce genetic heterogeneity in the context of family‐based linkage studies. Here, we have extended a previously proposed method (OSACC) for applying the OSA methodology to case‐control datasets. We have evaluated the type I error and power of different OSACC permutation tests with an extensive simulation study. Case‐control datasets were generated under two different models by which continuous clinical or environmental covariates may influence the relationship between susceptibility genotypes and disease risk. Our results demonstrate that OSACC is more powerful under some disease models than the commonly used trend test and a previously proposed joint test of main genetic and gene‐environment interaction effects. An additional unique benefit of OSACC is its ability to identify a more informative subset of cases that may be subjected to more detailed molecular analysis, such as DNA sequencing of selected genomic regions to detect functional variants in linkage disequilibrium with the associated polymorphism. The OSACC‐identified covariate threshold may also improve the power of an additional dataset to replicate previously reported associations that may only be detectable in a fraction of the original and replication datasets. In summary, we have demonstrated that OSACC is a useful method for improving SNP association signals in genetically heterogeneous datasets. Genet. Epidemiol. 34: 407–417, 2010. © 2010 Wiley‐Liss, Inc.  相似文献   

18.
BACKGROUND: A design combining both related and unrelated controls, named the case-combined-control design, was recently proposed to increase the power for detecting gene-environment (GxE) interaction. Under a conditional analytic approach, the case-combined-control design appeared to be more efficient and feasible than a classical case-control study for detecting interaction involving rare events. METHODS: We now propose an unconditional analytic strategy to further increase the power for detecting gene-environment (GxE) interactions. This strategy allows the estimation of GxE interaction and exposure (E) main effects under certain assumptions (e.g. no correlation in E between siblings and the same exposure frequency in both control groups). Only the genetic (G) main effect cannot be estimated because it is biased. RESULTS: Using simulations, we show that unconditional logistic regression analysis is often more efficient than conditional analysis for detecting GxE interaction, particularly for a rare gene and strong effects. The unconditional analysis is also at least as efficient as the conditional analysis when the gene is common and the main and joint effects of E and G are small. CONCLUSIONS: Under the required assumptions, the unconditional analysis retains more information than does the conditional analysis for which only discordant case-control pairs are informative leading to more precise estimates of the odds ratios.  相似文献   

19.
20.
Investigators interested in whether a disease aggregates in families often collect case‐control family data, which consist of disease status and covariate information for members of families selected via case or control probands. Here, we focus on the use of case‐control family data to investigate the relative contributions to the disease of additive genetic effects (A), shared family environment (C), and unique environment (E). We describe an ACE model for binary family data; this structural equation model, which has been described previously, combines a general‐family extension of the classic ACE twin model with a (possibly covariate‐specific) liability‐threshold model for binary outcomes. We then introduce our contribution, a likelihood‐based approach to fitting the model to singly ascertained case‐control family data. The approach, which involves conditioning on the proband's disease status and also setting prevalence equal to a prespecified value that can be estimated from the data, makes it possible to obtain valid estimates of the A, C, and E variance components from case‐control (rather than only from population‐based) family data. In fact, simulation experiments suggest that our approach to fitting yields approximately unbiased estimates of the A, C, and E variance components, provided that certain commonly made assumptions hold. Further, when our approach is used to fit the ACE model to Austrian case‐control family data on depression, the resulting estimate of heritability is very similar to those from previous analyses of twin data. Genet. Epidemiol. 34: 238–245, 2010. © 2009 Wiley‐Liss, Inc.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号