首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 125 毫秒
1.
We compare the asymptotic relative efficiency (ARE) of different study designs for estimating gene and gene-environment interaction effects using matched case-control data. In the sampling schemes considered, cases are selected differentially based on their family history of disease. Controls are selected either from unrelated subjects or from among the case's unaffected siblings and cousins. Parameters are estimated using weighted conditional logistic regression, where the likelihood contributions for each subject are weighted by the fraction of cases sampled sharing the same family history. Results showed that compared to random sampling, over-sampling cases with a positive family history increased the efficiency for estimating the main effect of a gene for sib-control designs (103-254% ARE) and decreased efficiency for cousin-control and population-control designs (68-94% ARE and 67-84% ARE, respectively). Population controls and random sampling of cases were most efficient for a recessive gene or a dominant gene with an relative risk less than 9. For estimating gene-environment interactions, over-sampling positive-family-history cases again led to increased efficiency using sib controls (111-180% ARE) and decreased efficiency using population controls (68-87% ARE). Using case-cousin pairs, the results differed based on the genetic model and the size of the interaction effect; biased sampling was only slightly more efficient than random sampling for large interaction effects under a dominant gene model (relative risk ratio = 8, 106% ARE). Overall, the most efficient study design for studying gene-environment interaction was the case-sib-control design with over-sampling of positive-family-history-cases.  相似文献   

2.
Family-based designs protect analyses of genetic effects from bias that is due to population stratification. Investigators have assumed that this robustness extends to assessments of gene-environment interaction. Unfortunately, this assumption fails for the common scenario in which the genotyped variant is related to risk through linkage with a causative allele. Bias also plagues other methods of assessment of gene-environment interaction. When testing against multiplicative joint effects, the case-only design offers excellent power, but it is invalid if genotype and exposure are correlated in the population. The authors describe 4 mechanisms that produce genotype-exposure dependence: exposure-related genetic population stratification, effects of family history on behavior, genotype effects on exposure, and selective attrition. They propose a sibling-augmented case-only (SACO) design that protects against the former 2 mechanisms and is therefore valid for studying young-onset disease in which genotype does not influence exposure. A SACO design allows the ascertainment of genotype and exposure for cases and exposure for 1 or more unaffected siblings selected randomly. Conditional logistic regression permits assessment of exposure effects and gene-environment interactions. Via simulations, the authors compare the likelihood-based inference on interactions using the SACO design with that based on other designs. They also show that robust analyses of interactions using tetrads or disease-discordant sibling pairs are equivalent to analyses using the SACO design.  相似文献   

3.
BACKGROUND: The case-only study for investigating gene-environment interactions provides increased statistical efficiency over case-control analyses. This design has been criticized for being susceptible to bias arising from non-independence between the genetic and environmental factors in the population. Given that independence is critical to the validity of case-only estimates of interaction, researchers frequently use controls to evaluate whether the independence assumption is tenable, as advised in the literature. Our work investigates to what extent this approach is appropriate and how non-independence can be accounted for in case-only analyses. METHODS: We provide a formula in epidemiological terms that illustrates the relationship between the gene-environment association measured among controls and the gene-environment association in the source population. Using this formula, we conducted sensitivity analyses to describe the circumstances in which controls can be used as proxy for the source population when evaluating gene-environment independence. Lastly, we generated hypothetical cohort data to examine whether multivariable modelling approaches can be used to control for non-independence. RESULTS: Our sensitivity analyses show that controls should not be used to evaluate gene-environment independence in the population, even when the baseline risk of disease is low (i.e. 1%), and the interaction and independent effects are moderate (i.e. risk ratio = 2). When the factors are associated, it is possible to remove bias arising from non-independence using standard statistical multivariable techniques in case-only analyses. CONCLUSIONS: Even when the disease risk is low, evaluation of gene-environment independence in controls does not provide a consistent test for bias in the case-only study. Given that control for non-independence is possible when the source of the non-independence can be conceptualized, the case-only design may still be a useful epidemiological tool for examining gene-environment interactions.  相似文献   

4.
Large prospective cohorts originally assembled to study environmental risk factors are increasingly exploited to study gene-environment interactions. Given the cost of genetic studies in large samples, being able to select a subsample for genotyping that contains most of the information from the cohort would lead to substantial savings. We consider nested case-control and case-cohort sampling designs with and without stratification and compare their efficiency relative to the entire cohort for estimating the effects of genetic and environmental risk factors and their interactions. Asymptotic calculations show that the relative efficiency of the case-cohort and nested case-control designs implementing the same sampling stratification are similar over a range of scenarios for the relationships among genes, environmental exposures, and disease status. Sampling equal numbers of exposed and unexposed subjects improves efficiency when the exposure is rare. The case-cohort designs had a slight advantage in simulations of sampling designs within the Framingham Offspring Study, using the interaction between apolipoprotein E and smoking on the risk of coronary heart disease as an example. It was possible to estimate the interaction effect with precision close to that of the full cohort when using case-cohort or nested case-control samples containing fewer than half the subjects of the cohort.  相似文献   

5.
Novel epidemiologic study designs are often required to assess gene-environment interaction. A design using only cases, without controls, is one of several approaches that have been proposed as more efficient alternatives to the typical random sampling of cases and controls. However, it has not been pointed out that a case-only analysis estimates a different interaction parameter than does a traditional case-control analysis: The latter typically estimates departure from multiplicative population odds or rate ratios, depending on the method of control selection, while the former estimates departure from multiplicative risk ratios if genotype and environmental exposure are not associated in the population. These parameters are approximately equal if the disease risk is small at all levels of the study variables. The authors quantify the impact of allowing for higher disease risk among gene carriers, a relevant situation when the gene under study is highly penetrant. Their findings show that the cross-product ratio computed from case-only data may be substantially smaller than the odds ratio computed from case-control data and may therefore underestimate either the population odds or the rate ratio. Thus, to avoid misinterpretation of interaction parameters estimated from case-only data, the definition of multiplicative interaction should be made explicit.  相似文献   

6.
A method is described for estimating excess relative risks of a disease from familial factors. Beginning with population-based series of cases and controls, a cohort of each subject's relatives is formed and checked for disease against a population based registry. The disease experience of the cohort formed from each subject's relatives is summarized as a kinship-weighted familial standardized incidence ratio (FSIR). The FSIR's are used as exposure estimates in conditional linear excess relative risk models, which may be used not only to screen for significant familial disease aggregations, but also to estimate relative risks, population attributable risks, and gene-environment interactions. The method is demonstrated on 4083 breast cancer cases from Utah and a set of matched controls. ©1995 Wiley-Liss, Inc.  相似文献   

7.
Cheng KF 《Statistics in medicine》2006,25(18):3093-3109
Given the biomedical interest in gene-environment interactions along with the difficulties inherent in gathering genetic data from controls, epidemiologists need methodologies that can increase precision of estimating interactions while minimizing the genotyping of controls. To achieve this purpose, many epidemiologists suggested that one can use case-only design. In this paper, we present a maximum likelihood method for making inference about gene-environment interactions using case-only data. The probability of disease development is described by a logistic risk model. Thus the interactions are model parameters measuring the departure of joint effects of exposure and genotype from multiplicative odds ratios. We extend the typical inference method derived under the assumption of independence between genotype and exposure to that under a more general assumption of conditional independence. Our maximum likelihood method can be applied to analyse both categorical and continuous environmental factors, and generalized to make inference about gene-gene-environment interactions. Moreover, the application of this method can be reduced to simply fitting a multinomial logistic model when we have case-only data. As a consequence, the maximum likelihood estimates of interactions and likelihood ratio tests for hypotheses concerning interactions can be easily computed. The methodology is illustrated through an example based on a study about the joint effects of XRCC1 polymorphisms and smoking on bladder cancer. We also give two simulation studies to show that the proposed method is reliable in finite sample situation.  相似文献   

8.
The authors consider issues that should be weighed when designing a retrospective study in which a focus of interest is the joint role of genetic and environmental factors in causing a disease. In place of the classical case-control design, in which controls are sampled from the same population that gives rise to the cases, one could study cases only. The case-only approach can be usefully extended by genotyping the two biologic parents of each case and in effect letting the parental genotype data provide the genetic control. Alternatively, one could carry out a case-control study in which the controls are siblings or cousins of the cases and inference is based on within-family parameters. The authors compare and contrast the parameters that can be estimated and the assumptions that must be made when each of these designs is used. The investigator must also consider certain practical issues, such as the availability of parents or sibling controls.  相似文献   

9.
Although genetic association studies using unrelated individuals may be subject to bias caused by population stratification, alternative methods that are robust to population stratification such as family-based association designs may be less powerful. Recently, various statistical methods robust to population stratification were proposed for association studies, using unrelated individuals to identify associations between candidate markers and traits of interest (both qualitative and quantitative). Here, we propose a semiparametric test for association (SPTA). SPTA controls for population stratification through a set of genomic markers by first deriving a genetic background variable for each sampled individual through his/her genotypes at a series of independent markers, and then modeling the relationship between trait values, genotypic scores at the candidate marker, and genetic background variables through a semiparametric model. We assume that the exact form of relationship between the trait value and the genetic background variable is unknown and estimated through smoothing techniques. We evaluate the performance of SPTA through simulations both with discrete subpopulation models and with continuous admixture population models. The simulation results suggest that our procedure has a correct type I error rate in the presence of population stratification and is more powerful than statistical association tests for family-based association designs in all the cases considered. Moreover, SPTA is more powerful than the Quantitative Similarity-Based Association Test (QSAT) developed by us under continuous admixture populations, and the number of independent markers needed by SPTA to control for population stratification is substantially fewer than that required by QSAT.  相似文献   

10.
As medical applications for cluster randomization designs become more common, investigators look for guidance on optimal methods for estimating the effect of group-based interventions over time. This study examines two distinct cluster randomization designs: (1) the repeated cross-sectional design in which centres are followed over time but patients change, and (2) the longitudinal design in which individual patients are followed over time within treatment clusters. Simulations of each study design stipulated a multiplicative treatment effect (on the log odds scale), between 5 and 15 clusters in each of two treatment arms, and followed over two time periods. Estimation options included linear mixed effects models using restricted maximum likelihood (REML), generalized estimating equations (GEE), mixed effects logistic regression using both penalized quasi likelihood (PQL) and numerical integration, and Bayesian Monte Carlo analysis. For the repeated cross-sectional designs, most methods performed well in terms of bias and coverage when clusters were numerous (30) and variability across clusters of baseline risk and treatment effect was modest. With few clusters (two groups of five) and higher variability, only the Bayesian methods maintained coverage. In the longitudinal designs, the common methods of REML, GEE, or PQL performed poorly when compared to numerical integration, while Bayesian methods demonstrated less bias and better coverage for estimates of both log odds ratios and risk differences. The performance of common statistical tools for the analysis of cluster randomization designs depends heavily on the precise design, the number of clusters, and the variability of baseline outcomes and treatment effects across centres.  相似文献   

11.
In diseases caused by deleterious gene mutations, knowledge of age-specific cumulative risks is necessary for medical management of mutation carriers. When pedigrees are ascertained through several affected persons, ascertainment bias can be corrected by using a retrospective likelihood. This likelihood is a function of the genotypes of pedigree members given their phenotypes and provides unbiased estimates of penetrance without modeling the selection process, provided that selection is independent of genotypes. However, since mutation testing is offered only to relatives of mutation carriers, the genotypes of family members are available only in mutated families and selection does depend on genotype. In the present study, we quantified the bias due to selection on genotype using simulations. We found that this bias depended on the true penetrance value: the lower the penetrance, the higher the bias (risk by age 80 estimated to be 46% for a true penetrance value of 20%). When age of onset is added to the selection criteria, as usually done, we showed that the bias was even higher. We modified the conditioning in the retrospective likelihood, what we call "genotype restricted likelihood" (GRL). Using simulations, we show that this method provided unbiased parameter estimates under all the selection designs considered.  相似文献   

12.
The case-only study and family-based study are two popular study designs for detecting gene-environment interactions. It is well known that the case-only analysis is efficient, but its validity relies crucially on the assumption of gene-environment independence in the study population. In contrast, the family-based analysis is robust to the violation of such an assumption, but is less efficient. We propose a two-stage study design for detecting gene-environment interactions, where a case-only study is performed at the first stage, and a case-parent/case-sibling study is performed at the second stage on a random subsample of the first-stage case sample as well as their parents/unaffected siblings. Statistical inference procedures are developed for the proposed two-stage study designs, which not only preserve the robustness property of the family-based analysis, but also utilize information from the case-only analysis to enhance estimation efficiency and testing power. Simulation results reveal both the robustness and efficiency of the proposed strategies.  相似文献   

13.
Studies which compare cases to disease-free siblings are useful for assessing association between a genetic locus and a phenotypic trait, as they eliminate the possibility of confounding by population stratification. Many analytic methods for such family-based studies are based on a binary disease model. However, complex diseases have variable age at onset. Consequently, binary-outcome methods can be inefficient or biased. We review methods for analysing censored age-at-onset data from family studies, including stratified Cox regression and genotype-decomposition regression, an unstratified procedure which regresses age-at-onset on between- and within-family genotype components. We also introduce a retrospective likelihood for censored age-at-onset data, which requires an external estimate of the baseline hazard. Stratified Cox regression does not use controls who have not attained the age of their case sibling(s), potentially leading to a loss of efficiency. Both genotype-decomposition regression and the retrospective likelihood use these younger controls. We assess the performance of these methods via simulation studies. Stratified Cox regression and the retrospective likelihood have appropriate type I error rates in almost all situations studied; genotype-decomposition regression is often anti-conservative. Away from the null, confidence intervals for the relative risk derived from stratified Cox regression are anti-conservative when the disease is rare and case-rich families are sampled. The retrospective likelihood is more efficient than stratified Cox regression and its confidence intervals have correct coverage when the disease is rare or the estimate of the baseline hazard is reasonably accurate. These results suggest that when estimating genotype relative risks is the principal analytic goal, stratified Cox regression is appropriate as long as the disease is common; when the disease is rare, the retrospective likelihood may be more appropriate.  相似文献   

14.
Biomarkers are often measured with error due to imperfect lab conditions or temporal variability within subjects. Using an internal reliability sample of the biomarker, we develop a parametric bias‐correction approach for estimating a variety of diagnostic performance measures including sensitivity, specificity, the Youden index with its associated optimal cut‐point, positive and negative predictive values, and positive and negative diagnostic likelihood ratios when the biomarker is subject to measurement error. We derive the asymptotic properties of the proposed likelihood‐based estimators and show that they are consistent and asymptotically normally distributed. We propose confidence intervals for these estimators and confidence bands for the receiver operating characteristic curve. We demonstrate through extensive simulations that the proposed approach removes the bias due to measurement error and outperforms the naïve approach (which ignores the measurement error) in both point and interval estimation. We also derive the asymptotic bias of naïve estimates and discuss conditions in which naïve estimates of the diagnostic measures are biased toward estimates produced when the biomarker is ineffective (i.e., when sensitivity equals 1 ? specificity) or are anticonservatively biased. The proposed method has broad biomedical applications and is illustrated using a biomarker study in Alzheimer's disease. We recommend collecting an internal reliability sample during the biomarker discovery phase in order to adequately evaluate the performance of biomarkers with careful adjustment for measurement error. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

15.
The case-only study is a convenient approach and provides increased statistical efficiency in detecting gene-environment interactions. The validity of a case-only study hinges on one well-recognized assumption: The susceptibility genotypes and the environmental exposures of interest are independent in the population. Otherwise, the study will be biased. The authors show that hidden stratification in the study population could also ruin a case-only study. They derive the formulas for population stratification bias. The bias involves three terms: 1) the coefficient of variation of the exposure prevalence odds, 2) the coefficient of variation of the genotype frequency odds, and 3) the correlation coefficient between the exposure prevalence odds and the genotype frequency odds. The authors perform simulation to investigate the magnitude of bias over a wide range of realistic scenarios. It is found that the estimated interaction effect is frequently biased by more than 5%. For a rarer gene and a rarer exposure, the bias becomes even larger (>30%). Because of the potentially large bias, researchers conducting case-only studies should use the boundary formula presented in this paper to make more prudent interpretations of their results, or they should use stratified analysis or a modeling approach to adjust for population stratification bias in their studies.  相似文献   

16.
The effect of selection bias has not been well evaluated in epidemiologic studies which focus on familial aggregation. The authors illustrate this type of bias for a reconstructed cohort study. With the reconstructed cohort design, cases and controls are first selected from the population and their relatives form the exposed and unexposed cohorts, respectively. The recurrence risk ratio (RRR) is calculated to assess and measure familial aggregation. The ways of utilizing information from relatives affects the estimate of RRR, and the authors show that a traditional method used in epidemiologic studies can yield a severely biased estimate of the RRR. However, this traditional approach can give approximately unbiased estimates under special conditions. A novel selection approach is proposed which yields an unbiased estimate of RRR. In conclusion, when relatives are identified through cases or controls, they should be included and counted in the study cohorts each time a case or control is selected, even if they or other family members have already been included.  相似文献   

17.
Family-based case-control studies are popularly used to study the effect of genes and gene-environment interactions in the etiology of rare complex diseases. We consider methods for the analysis of such studies under the assumption that genetic susceptibility (G) and environmental exposures (E) are independently distributed of each other within families in the source population. Conditional logistic regression, the traditional method of analysis of the data, fails to exploit the independence assumption and hence can be inefficient. Alternatively, one can estimate the multiplicative interaction between G and E more efficiently using cases only, but the required population-based G-E independence assumption is very stringent. In this article, we propose a novel conditional likelihood framework for exploiting the within-family G-E independence assumption. This approach leads to a simple and yet highly efficient method of estimating interaction and various other risk parameters of scientific interest. Moreover, we show that the same paradigm also leads to a number of alternative and even more efficient methods for analysis of family-based case-control studies when parental genotype information is available on the case-control study participants. Based on these methods, we evaluate different family-based study designs by examining their relative efficiencies to each other and their efficiencies compared to a population-based case-control design of unrelated subjects. These comparisons reveal important design implications. Extensions of the methodologies for dealing with complex family studies are also discussed.  相似文献   

18.
With the increasing availability of genetic data, many studies of quantitative traits focus on hypotheses related to candidate genes, and also gene-environment (G x E) and gene-gene (G x G) interactions. In a population-based sample, estimates and tests of candidate gene effects can be biased by ethnic confounding, also known as population stratification bias. This paper demonstrates that even a modest degree of ethnic confounding can lead to unacceptably high type I error rates for tests of genetic effects. The parent-offspring trio design is reviewed, and several forms of the quantitative transmission disequilibrium test (QTDT) are summarized. A variation of the QTDT (QTDTM) is described that is based on a linear regression model with multiple intercepts, one per parental mating type. This and other models are expanded to allow testing of G x E and G x G interactions. A method for computing required sample sizes using direct computations is described. Sample size requirements for tests of genetic main effects and G x E and G x G interactions are compared across various QTDT approaches to infer their efficiencies relative to one another. The QTDTM is found to meet or exceed the efficiency of other QTDT approaches. For example, the QTDTM is approximately 3% more efficient than the QTDT of Rabinowitz ([1997] Hum. Hered. 47:342-350) for testing a genetic main effect, but can be as much as twice as efficient for testing G x E interaction, and three times more efficient for testing G x G interaction.  相似文献   

19.
Accumulated evidence from searching for candidate gene-disease associations of complex diseases can offer some insights as the field moves toward discovery-oriented approaches with massive genome-wide testing. Meta-analyses of 50 non-human lymphocyte antigen gene-disease associations with documented overall statistical significance (752 studies) show summary odds ratios with a median of 1.43 (interquartile range, 1.28-1.65). Many different biases may operate in this field, for both single studies and meta-analyses, and these biases could invalidate some of these seemingly "validated" associations. Studies with a sample size of >500 show a median odds ratio of only 1.15. The median sample size required to detect the observed summary effects in each population addressed in the 752 studies is estimated to be 3,535 (interquartile range, 1,936-9,119 for cases and controls combined). These estimates are steeply inflated in the presence of modest bias. Population heterogeneity, as well as gene-gene and gene-environment interactions, could steeply increase these estimates and may be difficult to address even by very large biobanks and observational cohorts. The one visible solution is for a large number of teams to join forces on the same research platforms. These collaborative studies ideally should be designed up front to also assess more complex gene-gene and gene-environment interactions.  相似文献   

20.
We assess the asymptotic bias of estimates of exposure effects conditional on covariates when summary scores of confounders, instead of the confounders themselves, are used to analyze observational data. First, we study regression models for cohort data that are adjusted for summary scores. Second, we derive the asymptotic bias for case‐control studies when cases and controls are matched on a summary score, and then analyzed either using conditional logistic regression or by unconditional logistic regression adjusted for the summary score. Two scores, the propensity score (PS) and the disease risk score (DRS) are studied in detail. For cohort analysis, when regression models are adjusted for the PS, the estimated conditional treatment effect is unbiased only for linear models, or at the null for non‐linear models. Adjustment of cohort data for DRS yields unbiased estimates only for linear regression; all other estimates of exposure effects are biased. Matching cases and controls on DRS and analyzing them using conditional logistic regression yields unbiased estimates of exposure effect, whereas adjusting for the DRS in unconditional logistic regression yields biased estimates, even under the null hypothesis of no association. Matching cases and controls on the PS yield unbiased estimates only under the null for both conditional and unconditional logistic regression, adjusted for the PS. We study the bias for various confounding scenarios and compare our asymptotic results with those from simulations with limited sample sizes. To create realistic correlations among multiple confounders, we also based simulations on a real dataset. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号