首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
应用多因子降维法分析基因-基因交互作用   总被引:3,自引:6,他引:3       下载免费PDF全文
目的介绍在遗传流行病学病例对照研究中,应用多因子降维法(MDR)分析基因-基因交互作用.方法简述MDR的基本步骤、原理及其特点,并结合研究实例说明在病例对照研究中如何应用软件进行MDR分析.结果相对于传统的统计学方法,MDR是一种无参数、无遗传模式的分析交互作用的方法,理论和实例研究均表明其分析交互作用具有较好的效能,目前已成功应用于散发性乳腺癌、心房颤动和原发性高血压等疾病的研究.结论 MDR能够应用于病例对照研究进行基因-基因交互作用的分析,且具有较传统的统计学分析方法无法比拟的优势.  相似文献   

2.
多因子降维法分析基因-基因交互作用的应用进展   总被引:1,自引:1,他引:1       下载免费PDF全文
多因子降维法(multifactor dimensionality reduction,MDR)是近年来发展的一种分析交互作用的新方法,“因子”即交互作用研究中的变量(如基因型或环境因素),“维”是指研究的多因子组合中的因子数(如基因型数目);该方法以疾病易感性分类(高危、低危)的方式建模,将研究中的多个因子看作一个多因子组合(基因型组合),  相似文献   

3.
4.
5.
Multifactor Dimensionality Reduction (MDR) was developed to detect genetic polymorphisms that present an increased risk of disease. Cross-validation (CV) is an important part of the MDR algorithm, as it prevents over-fitting and allows the predictive ability of a model to be evaluated. CV is a computationally intensive step in the MDR algorithm. Traditionally, MDR has been implemented using 10-fold CV. In order to reduce computation time and therefore allow MDR analysis to be applied to larger datasets, we evaluated the possibility of eliminating or reducing the number of CV intervals used for analysis. We found that eliminating CV made final model selection impossible, but that reducing the number of CV intervals from ten to five caused no loss of power, thereby reducing the computation time of the algorithm by half. The validity of this reduction was confirmed with data from an Alzheimer's disease (AD) study.  相似文献   

6.
目的 探讨DNA修复基因多态性与中国南方汉族人群散发性结直肠癌发病的相关性,验证多因子降维法(MDR)应用于多因子疾病基因-基因、基因-环境交互作用分析的可行性.方法 采用自然人群为基础的病例对照研究设计,运用PCR-RFLP方法对206例结直肠癌病例和845例正常对照开展OGG1 Ser326Cys,XRCC1 Arg194Trp、Arg280His和Arg399Gln,XPD Lys751Gln和XRCC3 Thr241Met等DNA修复体系常见单核苷酸多态性(SNP)的检测分型.结果 个体特征与结直肠癌的关联分析表明,年龄与结直肠癌的发病正相关,高年龄组(≥61岁)与低年龄组(≥42岁至<61岁)相比,结直肠癌患病风险增高有统计学意义(校正OR=2.04,95%CI:1.49~2.80);家族肿瘤史同样与结直肠癌的发病存在有统计学意义的正相关关系(校正OR=1.51,95%CI:1.05~2.17).前述各SNP的等位基因和基因型分布频率在正常对照组和病例组间差异均无统计学意义(P>0.05).采用MDR对基因-基因、基因-环境交互作用模型的筛选分析表明,最佳的交互作用模型包含了年龄分布、饮酒史、XRCC1 Arg194Trp和OGG1 Ser326Cys等4个因子(平均检验准确度=0.616,交叉验证一致性=10/10,P=0.011);进一步以筛选出的低风险组合为参照,logistic拟合分析发现高风险组合可以使结直肠癌的患病风险增高并有统计学意义(OR=2.72,95%CI:1.66~4.47).结论 DNA修复基因多态性对中国人散发性结直肠癌风险的遗传影响符合低外显性特征,并与环境因子可能存在着复杂的联合作用.  相似文献   

7.
We compared different ascertainment schemes for genetic association analysis: affected sib-pairs (ASPs), case-parent trios, and unrelated cases and controls. We found, with empirical type 1 diabetes data at four known disease loci, that studies based on case-parent trios and on unmatched cases and controls often gave higher odds ratio estimates and stronger significance test values than ASP designs. We used simulations and a simplified disease model involving two interacting loci, one of large effect and one smaller, to examine interaction models that could cause such an effect. The different ascertainment schemes were compared for power to detect an effect when only the locus of smaller effect was genotyped. ASPs showed the greatest power for association testing under most models of interaction except under additive and certain epistatic crossover models, for which case/controls and case-parent trios did better. All ascertainment schemes gave an unbiased estimation of log genotype relative risks (GRRs) under a multiplicative model. Under nonmultiplicative interactions, GRRs at the minor locus as estimated from ASPs could be biased upwards or downwards, resulting in either an increase or decrease in power compared to the case/control or trio design. For the four known type 1 diabetes loci, we observed decreased risks with ASPs, which could be due to additive interactions with the remaining susceptibility loci. Thus, the optimal ascertainment strategy in genetic association studies depends on the unknown underlying multilocus genetic model, and on whether the goal of the study is to detect an effect or to accurately estimate the resulting disease risks.  相似文献   

8.
As genetic epidemiology looks beyond mapping single disease susceptibility loci, interest in detecting epistatic interactions between genes has grown. The dimensionality and comparisons required to search the epistatic space and the inference for a significant result pose challenges for testing epistatic disease models. The multifactor dimensionality reduction–pedigree disequilibrium test (MDR‐PDT) was developed to test for multilocus models in pedigree data. In the present study we rigorously tested MDR‐PDT with new cross‐validation (CV) (both 5‐ and 10‐fold) and omnibus model selection algorithms by simulating a range of heritabilities, odds ratios, minor allele frequencies, sample sizes, and numbers of interacting loci. Power was evaluated using 100, 500, and 1,000 families, with minor allele frequencies 0.2 and 0.4 and broad‐sense heritabilities of 0.005, 0.01, 0.03, 0.05, and 0.1 for 2‐ and 3‐locus purely epistatic penetrance models. We also compared the prediction error (PE) measure of effect with a predicted matched odds ratio (MOR) for final model selection and testing. We report that the CV procedure is valid with the permutation test, MDR‐PDT performs similarly with 5‐ and 10‐fold CV, and that the MOR is more powerful than PE as the fitness metric for MDR‐PDT. Genet. Epidemiol. 34: 194–199, 2010. © 2009 Wiley‐Liss, Inc.  相似文献   

9.
Estimation and testing of genetic effects (genotype relative risks) are often performed conditionally on parental genotypes, using data from case-parent trios. This strategy avoids having to estimate nuisance parameters such as parental mating type frequencies, and also avoids generating spurious results due to confounding causes of association such as population stratification. For effects at a single locus, the resulting analysis is equivalent to matched case/control analysis via conditional logistic regression, using the case and three "pseudocontrols" derived from the untransmitted parental alleles. We previously showed that a similar approach can be used for analyzing genotype and haplotype effects at a set of closely linked loci, but with a required adjustment to the conditioning argument that results in varying numbers of pseudocontrols, depending on the disease model that is to be fitted. Here we extend this method to include the analysis of epistatic effects (gene-gene interactions) at unlinked loci, to include parent-of-origin effects at one or more loci, and to allow additional incorporation of gene-environment interactions. The conditional logistic approach provides a natural and flexible framework for incorporating these additional effects. By relaxing the conditioning on parental genotypes to allow exchangeability of parental genotypes, we show how the power of this approach can be increased when studying parent-of-origin effects. Simulations suggest that there is limited power to distinguish between parent-of-origin effects and effects due to interaction between genotypes of mother and child.  相似文献   

10.
We present the “sumLINK” statistic—the sum of multipoint LOD scores for the subset of pedigrees with nominally significant linkage evidence at a given locus—as an alternative to common methods to identify susceptibility loci in the presence of heterogeneity. We also suggest the “sumLOD” statistic (the sum of positive multipoint LOD scores) as a companion to the sumLINK. sumLINK analysis identifies genetic regions of extreme consistency across pedigrees without regard to negative evidence from unlinked or uninformative pedigrees. Significance is determined by an innovative permutation procedure based on genome shuffling that randomizes linkage information across pedigrees. This procedure for generating the empirical null distribution may be useful for other linkage‐based statistics as well. Using 500 genome‐wide analyses of simulated null data, we show that the genome shuffling procedure results in the correct type 1 error rates for both the sumLINK and sumLOD. The power of the statistics was tested using 100 sets of simulated genome‐wide data from the alternative hypothesis from GAW13. Finally, we illustrate the statistics in an analysis of 190 aggressive prostate cancer pedigrees from the International Consortium for Prostate Cancer Genetics, where we identified a new susceptibility locus. We propose that the sumLINK and sumLOD are ideal for collaborative projects and meta‐analyses, as they do not require any sharing of identifiable data between contributing institutions. Further, loci identified with the sumLINK have good potential for gene localization via statistical recombinant mapping, as, by definition, several linked pedigrees contribute to each peak. Genet. Epidemiol. 33:628–636, 2009. © 2009 Wiley‐Liss, Inc.  相似文献   

11.
Adjusting for publication bias in the presence of heterogeneity   总被引:5,自引:0,他引:5  
It is known that the existence of publication bias can influence the conclusions of a meta-analysis. Some methods have been developed to deal with publication bias, but issues remain. One particular method called 'trim and fill' is designed to adjust for publication bias. The method, which is intuitively appealing and comprehensible by non-statisticians, is based on a simple and popular graphical tool called the funnel plot. We present a simulation study designed to evaluate the behaviour of this method. Our results indicate that when the studies are heterogeneous (that is, when they estimate different effects), trim and fill may inappropriately adjust for publication bias where none exists. We found that trim and fill may spuriously adjust for non-existent bias if (i) the variability among studies causes some precisely estimated studies to have effects far from the global mean or (ii) an inverse relationship between treatment efficacy and sample size is introduced by the studies' a priori power calculations. The results suggest that the funnel plot itself is inappropriate for heterogeneous meta-analyses. Selection modelling is an alternative method warranting further study. It performed better than trim and fill in our simulations, although its frequency of convergence varied, depending on the simulation parameters.  相似文献   

12.
In genetic mapping of complex traits, scored haplotypes are likely to represent only a subset of all causal polymorphisms. At the extreme of this scenario, observed polymorphisms are not themselves functional, and only linked to causal ones via linkage disequilibrium (LD). We will demonstrate that due to such incomplete knowledge regarding the underlying genetic mechanism, the variance of a trait may become different between the scored haplotypes. Thus, unequal variances between haplotypes may be indicative of additional functional polymorphisms affecting the trait. Methods accounting for such haplotype-specific variance may also provide an increased power to detect complex associations. We suggest ways to estimate and test these haplotypic variance contrasts, and incorporate them into the haplotypic tests for association. We further extend this approach to data with unknown gametic phase via likelihood-based simultaneous estimation of haplotypic effects and their frequencies. We find our approach to provide additional power, especially under the following types of models: (a) where scored and unobserved variants are epistatically interacting with each other; and (b) under heterogeneity models, where multiple unobserved mutations are linked to non-functional observed polymorphisms via LD. An illustrative example of usefulness of the method is discussed, utilizing analysis of multilocus effects within the catechol-O-methyltransferase gene.  相似文献   

13.
Path analytic models are useful tools in quantitative nursing research. They allow researchers to hypothesize causal inferential paths and test the significance of these paths both directly and indirectly through a mediating variable. A standard statistical method in the path analysis literature is to treat the variables as having a normal distribution and to estimate paths using several least squares regression equations. The parameters corresponding to the direct paths have point and interval estimates based on normal distribution theory. Indirect paths are a product of the direct path from the independent variable to the mediating variable and the direct path of the mediating variable to the dependent variable. However, in the case of non-normal distributions, the point and interval estimates of the indirect path become much more difficult to estimate. We address the issue of calculating indirect point and interval estimates in the case of non-normally distributed data. Our substantive application is a nursing home research problem in which the variables in the path analysis of interest involve variables with normal, Bernoulli, or Poisson distributions. Additionally, one of the Poisson variables is observed with error. This paper addresses estimating point and interval estimation of indirect paths for variables with non-normal distributions in the presence of missing data and measurement error. We handle these difficulties from a fully Bayesian point of view. We present our substantive path analysis motivated from a nursing home structure, process, and outcomes model. Our results focus on the impact job turnover in the nursing homes has on nursing home outcomes.  相似文献   

14.
Targeted therapies for cancers are sometimes only effective in a subset of patients with a particular biomarker status. In clinical development, the biomarker status is typically determined by an investigational‐use‐only/laboratory‐developed test. A market ready test (MRT) is developed later to meet regulatory requirements and for future commercial use. In the USA, the clinical validation of MRT showing efficacy and safety profile of the targeted therapy in the biomarker subgroups determined by MRT is needed for pre‐market approval. One of the major challenges in carrying out clinical validation is that the biomarker status per MRT is often missing for many subjects. In this paper, we treat biomarker status as a missing covariate and develop a novel pattern mixture model in the setting of a proportional hazards model for the time‐to‐event outcome variable. We specify a multinomial regression model for the missing biomarker statuses, and develop an expectation–maximization algorithm by the Method of Weights (Ibrahim, Journal of the American Statistical Association, 1990) to estimate the parameters in the regression model. We use Louis' formula (Louis, Journal of the Royal Statistical Society. Series B, 1982) to obtain standard errors estimates. We examine the performance of our method in extensive simulation studies and apply our method to a clinical trial in metastatic colorectal cancer. Copyright © 2017 John Wiley & Sons, Ltd.  相似文献   

15.
Some states’ death certificate form includes a diabetes yes/no check box that enables policy makers to investigate the change in heart disease mortality rates by diabetes status. Because the check boxes are sometimes unmarked, a method accounting for missing data is needed when estimating heart disease mortality rates by diabetes status. Using North Dakota’s data (1992–2003), we generate the posterior distribution of diabetes status to estimate diabetes status among those with heart disease and an unmarked check box using Monte Carlo methods. Combining this estimate with the number of death certificates with known diabetes status provides a numerator for heart disease mortality rates. Denominators for rates were estimated from the North Dakota Behavioral Risk Factor Surveillance System. Accounting for missing data, age-adjusted heart disease mortality rates (per 1,000) among women with diabetes were 8.6 during 1992–1998 and 6.7 during 1999–2003. Among men with diabetes, rates were 13.0 during 1992–1998 and 10.0 during 1999–2003. The Bayesian approach accounted for the uncertainty due to missing diabetes status as well as the uncertainty in estimating the populations with diabetes. The findings and conclusions in this report are those of the authors and do not necessarily represent the views of CDC.  相似文献   

16.
We consider the problem of model‐based clustering in the presence of many correlated, mixed continuous, and discrete variables, some of which may have missing values. Discrete variables are treated with a latent continuous variable approach, and the Dirichlet process is used to construct a mixture model with an unknown number of components. Variable selection is also performed to identify the variables that are most influential for determining cluster membership. The work is motivated by the need to cluster patients thought to potentially have autism spectrum disorder on the basis of many cognitive and/or behavioral test scores. There are a modest number of patients (486) in the data set along with many (55) test score variables (many of which are discrete valued and/or missing). The goal of the work is to (1) cluster these patients into similar groups to help identify those with similar clinical presentation and (2) identify a sparse subset of tests that inform the clusters in order to eliminate unnecessary testing. The proposed approach compares very favorably with other methods via simulation of problems of this type. The results of the autism spectrum disorder analysis suggested 3 clusters to be most likely, while only 4 test scores had high (>0.5) posterior probability of being informative. This will result in much more efficient and informative testing. The need to cluster observations on the basis of many correlated, continuous/discrete variables with missing values is a common problem in the health sciences as well as in many other disciplines.  相似文献   

17.
We propose a transition model for analysing data from complex longitudinal studies. Because missing values are practically unavoidable in large longitudinal studies, we also present a two-stage imputation method for handling general patterns of missing values on both the outcome and the covariates by combining multiple imputation with stochastic regression imputation. Our model is a time-varying auto-regression on the past innovations (residuals), and it can be used in cases where general dynamics must be taken into account, and where the model selection is important. The entire estimation process was carried out using available procedures in statistical packages such as SAS and S-PLUS. To illustrate the viability of the proposed model and the two-stage imputation method, we analyse data collected in an epidemiological study that focused on various factors relating to childhood growth. Finally, we present a simulation study to investigate the behaviour of our two-stage imputation procedure.  相似文献   

18.
Imputation and inference (or analysis) models that cannot be true simultaneously are frequently used in practice when missing outcomes are present. In these situations, the conclusions can be misleading depending on how “different” the implicit inference model, induced by the imputation model, is from the inference model actually used. We introduce model-based compatibility (MBC) and compare two MBC approaches to a non-MBC approach and explore the inferential validity of the latter in a simple case. In addition, we evaluate more complex cases through a series of simulation studies. Overall, we recommend caution when making inferences using a non-MBC analysis and point out when the inferential “cost” is the largest.  相似文献   

19.
Ping Z  Liu L  Guo X 《卫生研究》2011,40(4):472-3, 477
目的采用多因子降维法分析血浆硒水平与D12S304位点的交互作用,研究其与大骨节病发病的关系。方法采用病例—对照设计分析组间血浆硒水平及D12S304位点基因型在组间的差异,采用多因子降维法分析两者的交互作用。结果病例组血浆硒水平低于对照组,而组间基因型无差别,且未发现血浆硒与基因型之间存在交互作用。结论血浆硒与D12S304位点之间暂未发现交互作用,可能需要扩大样本量或选择其它位点进一步探索基因-环境相互作用在大骨节病发生中的机制。  相似文献   

20.
Three types of nonrandom sampling of family data are described, and appropriate maximum likelihood methods are proposed for each. The three types arise depending on whether the selection of probands, based on truncation, is applied directly to the phenotypic distribution, to the distribution of a correlated trait, or to the liability distribution of an associated disease. Family data ascertained through random and nonrandom sampling can be analyzed together in a unified approach. Results of a Monte Carlo study are presented that demonstrate the utility of the proposed methods. In particular, likelihood ratio tests of null hypotheses are shown to be distributed as chi-square, even in samples as small as 50 families (with variable sibship size).  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号