首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 49 毫秒
1.
The simulated sequence data for the Genetic Analysis Workshop 12 were analyzed using data mining techniques provided by SAS ENTERPRISE MINERTM Release 4.0 in addition to traditional statistical tests for linkage and association of genetic markers with disease status. We examined two ways of combining these approaches to make use of the covariate data along with the genotypic data. The result of incorporating data mining techniques with more classical methods is an improvement in the analysis, both by correctly classifying the affection status of more individuals and by locating more single nucleotide polymorphisms related to the disease, relative to analyses that use classical methods alone. © 2001 Wiley‐Liss, Inc.  相似文献   

2.
We compared two joint likelihood approaches, with complete (L1) or without (L2) linkage disequilibrium, under different ascertainment schemes, for the genetic analysis of the disease trait and marker gene 1 in replicate 42. Joint likelihoods were computed without a correction for the selection scheme. For the different sampling schemes we have explored, our results suggest that L1 is a more powerful approach than L2 to detect major gene and covariatc effects as well as to identify accurately gene×covariate interaction effects in a common and complex disease such as the Genetic Analysis Workshop 12 MG6 simulated trait. © 2001 Wiley‐Liss, Inc.  相似文献   

3.
Using simulated data from GAW 12, problem 2, we further develop a novel technique to detect and use significant covariates in linkage analysis. The method, first introduced by Rice et al. [Genet Epidemiol 17(Suppl. 1):S691–5, 19991, uses logistic regression to model perturbation in sharing as a function of covariate levels. The original method allows use of all sib pairs (concordant affected, concordant unaffected, and discordant). Here we extend this method to include cousin pairs in analysis. © 2001 Wiley‐Liss, Inc.  相似文献   

4.
When analyzing the relation between genetic sequence information and disease traits, false‐positive associations can arise due to multiple comparisons and population stratification. In an attempt to address these issues, we incorporate into a conventional analytic model higher‐level—or “prior”—models that use additional information to improve estimates while allowing for differing population structures. We apply this hierarchical model to simulated data from the Genetic Analysis Workshop 12. We focus on the effects of common candidate gene sequence variants on quantitative risk factor 5 (Q5) levels. In particular, we compare the regression coefficients (and 95% confidence intervals) obtained from conventional (one‐stage) analyses versus the corresponding results from the hierarchical analyses. When examining either the marry‐ins or all subjects in the general and isolate populations, the conventional model detected numerous sites in candidate genes 1–5 and 7 that had statistically significant regression coefficients (alpha level = 0.05). In contrast, our hierarchical model primarily only detected associations for variants in candidate gene 2, which is the casual gene for Q5. © 2001 Wiley‐Liss, Inc.  相似文献   

5.
Relative-risk regression models are presented for studies of the association of genetic markers with disease status when the study design uses affected cases and their parents, with or without unaffected sibs. These models generalize the “haplotype relative-risk” method and allow for censored unaffected sibs; in this sense, these models resemble proportional hazards models that are commonly used in survival analysis. A critical distinction between these models and the usual Cox proportional hazards model is that the frequencies of the genotypes of the cases are compared to controls based on Mendelian expectations, and not simply to the genotypes of the sib controls who are at risk of disease. These models allow modeling of the contribution of specific alleles to the relative risk of disease, as well as interactions of allelic effects with environmental risk factors. To demonstrate the application of these models, we have fit them to the binary affection status of the Problem 2 data set. Four candidate-gene loci were found to have a significant association with affection status, after allowance for relative risks that decrease with age; two of these associations correctly identified two of the major gene loci, and the other two were false-positive associations. © 1995 Wiley-Liss, Inc.  相似文献   

6.
In this paper we propose a Bayesian modeling approach to the analysis of genome-wide association studies based on single nucleotide polymorphism (SNP) data. Our latent seed model combines various aspects of k-means clustering, hidden Markov models (HMMs) and logistic regression into a fully Bayesian model. It is fitted using the Markov chain Monte Carlo stochastic simulation method, with Metropolis-Hastings update steps. The approach is flexible, both in allowing different types of genetic models, and because it can be easily extended while remaining computationally feasible due to the use of fast algorithms for HMMs. It allows for inference primarily on the location of the causal locus and also on other parameters of interest. The latent seed model is used here to analyze three data sets, using both synthetic and real disease phenotypes with real SNP data, and shows promising results. Our method is able to correctly identify the causal locus in examples where single SNP analysis is both successful and unsuccessful at identifying the causal SNP.  相似文献   

7.
In association studies of complex traits, fixed‐effect regression models are usually used to test for association between traits and major gene loci. In recent years, variance‐component tests based on mixed models were developed for region‐based genetic variant association tests. In the mixed models, the association is tested by a null hypothesis of zero variance via a sequence kernel association test (SKAT), its optimal unified test (SKAT‐O), and a combined sum test of rare and common variant effect (SKAT‐C). Although there are some comparison studies to evaluate the performance of mixed and fixed models, there is no systematic analysis to determine when the mixed models perform better and when the fixed models perform better. Here we evaluated, based on extensive simulations, the performance of the fixed and mixed model statistics, using genetic variants located in 3, 6, 9, 12, and 15 kb simulated regions. We compared the performance of three models: (i) mixed models that lead to SKAT, SKAT‐O, and SKAT‐C, (ii) traditional fixed‐effect additive models, and (iii) fixed‐effect functional regression models. To evaluate the type I error rates of the tests of fixed models, we generated genotype data by two methods: (i) using all variants, (ii) using only rare variants. We found that the fixed‐effect tests accurately control or have low false positive rates. We performed simulation analyses to compare power for two scenarios: (i) all causal variants are rare, (ii) some causal variants are rare and some are common. Either one or both of the fixed‐effect models performed better than or similar to the mixed models except when (1) the region sizes are 12 and 15 kb and (2) effect sizes are small. Therefore, the assumption of mixed models could be satisfied and SKAT/SKAT‐O/SKAT‐C could perform better if the number of causal variants is large and each causal variant contributes a small amount to the traits (i.e., polygenes). In major gene association studies, we argue that the fixed‐effect models perform better or similarly to mixed models in most cases because some variants should affect the traits relatively large. In practice, it makes sense to perform analysis by both the fixed and mixed effect models and to make a comparison, and this can be readily done using our R codes and the SKAT packages.  相似文献   

8.
Objectives: The objective of this study is to understand the characteristics of households who treat their water in the home. In promoting home water treatment, there may be valuable lessons to be learnt from countries with many home water treatment users.  相似文献   

9.
目的探讨蒙古族居民臀围与代谢综合征(MS)的关系,为MS病因研究及其防治提供科学依据。方法采用整群抽样方法,对在内蒙古自治区通辽市科左后旗朝鲁吐苏木和奈曼旗固日班花苏木抽取的32个自然村2 534名蒙古族居民进行问卷调查、体格测量和生化检测。结果调查的2 534名蒙古族居民MS患病率为26.4%,标化患病率为17.1%;男性和女性的MS患病率分别为12.9%和26.4%;多因素非条件Logistic回归分析结果表明,女性、年龄≥30岁、C-反应蛋白对数值(LnCRP)≥1.372 mg/L、臀围≥88 cm和体质指数(BMI)≥24 kg/m2是蒙古族居民发生MS的独立危险因素;将调查对象分为无MS组分、1~2个MS组分和≥3个MS组分进行单因素无序多分类Logistic回归分析结果表明,随着臀围的增大,MS的危险性逐步增加,进一步调整年龄、性别、BMI和CRP后,与臀围<88 cm比较,臀围88~、92~和≥97 cm有1~2个MS组分与无组分比较的OR(95%CI)分别为1.191(0.884~1.605)、1.709(1.232~2.370)和2.646(1.538~4.551),且存在剂量反应关系(χ趋势2=18.046,P<0.001);有≥3个MS组分与无组分比较的OR(95%CI)分别为2.112(1.332~3.349)、4.910(3.084~7.820)和10.931(5.746~20.796),且存在剂量反应关系(χ趋势2=73.709,P<0.001)。结论臀围是蒙古族居民发生MS的连续独立危险因素。  相似文献   

10.
Calibration is one of the main properties that must be accomplished by any predictive model. Overcoming the limitations of many approaches developed so far, a study has recently proposed the calibration belt as a graphical tool to identify ranges of probability where a model based on dichotomous outcomes miscalibrates. In this new approach, the relation between the logits of the probability predicted by a model and of the event rates observed in a sample is represented by a polynomial function, whose coefficients are fitted and its degree is fixed by a series of likelihood‐ratio tests. We propose here a test associated with the calibration belt and show how the algorithm to select the polynomial degree affects the distribution of the test statistic. We calculate its exact distribution and confirm its validity via a numerical simulation. Starting from this distribution, we finally reappraise the procedure to construct the calibration belt and illustrate an application in the medical context. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

11.
A number of tests for linkage and association with qualitative traits have been developed, with the most well-known being the transmission/disequilibrium test (TDT). For quantitative traits, varying extensions of the TDT have been suggested. The quantitative trait approach we propose is based on extending the log-linear model for case-parent trio data (Weinberg et al. [1998] Am. J. Hum. Genet. 62:969-978). Like the log-linear approach for qualitative traits, our proposed polytomous logistic approach for quantitative traits allows for population admixture by conditioning on parental genotypes. Compared to other methods, simulations demonstrate good power and robustness of the proposed test under various scenarios of the genotype effect, distribution of the quantitative trait, and population stratification. In addition, missing parental genotype data can be accommodated through an expectation-maximization (EM) algorithm approach. The EM approach allows recovery of most of the lost power due to incomplete trios.  相似文献   

12.
The bank vole (Clethrionomys glareolus) is the natural reservoir of Puumala virus (PUUV), a species in the genus Hantavirus. PUUV is the etiologic agent of nephropathia epidemica, a mild form of hemorrhagic fever with renal syndrome. Factors that influence hantavirus transmission within host populations are not well understood. We evaluated a number of factors influencing on the association of increased PUUV infection in bank voles captured in a region in northern Sweden endemic for the virus. Logistic regression showed four factors that together correctly predicted 80% of the model outcome: age, body mass index, population phase during sampling (increase, peak, or decline/low), and gender. This analysis highlights the importance of population demography in the successful circulation of hantavirus. The chance of infection was greatest during the peak of the population cycle, implying that the likelihood of exposure to hantavirus increases with increasing population density.  相似文献   

13.
Multilevel data occur frequently in health services, population and public health, and epidemiologic research. In such research, binary outcomes are common. Multilevel logistic regression models allow one to account for the clustering of subjects within clusters of higher‐level units when estimating the effect of subject and cluster characteristics on subject outcomes. A search of the PubMed database demonstrated that the use of multilevel or hierarchical regression models is increasing rapidly. However, our impression is that many analysts simply use multilevel regression models to account for the nuisance of within‐cluster homogeneity that is induced by clustering. In this article, we describe a suite of analyses that can complement the fitting of multilevel logistic regression models. These ancillary analyses permit analysts to estimate the marginal or population‐average effect of covariates measured at the subject and cluster level, in contrast to the within‐cluster or cluster‐specific effects arising from the original multilevel logistic regression model. We describe the interval odds ratio and the proportion of opposed odds ratios, which are summary measures of effect for cluster‐level covariates. We describe the variance partition coefficient and the median odds ratio which are measures of components of variance and heterogeneity in outcomes. These measures allow one to quantify the magnitude of the general contextual effect. We describe an R2 measure that allows analysts to quantify the proportion of variation explained by different multilevel logistic regression models. We illustrate the application and interpretation of these measures by analyzing mortality in patients hospitalized with a diagnosis of acute myocardial infarction. © 2017 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd.  相似文献   

14.
This research was conducted to examine the effect of model choice on the epidemiologic interpretation of occupational cohort data. Three multiplicative models commonly employed in the analysis of occupational cohort studies—proportional hazards, Poisson, and logistic regression—were used to analyze data from an historical cohort study of workers exposed to formaldehyde. Samples were taken from this dataset to create a number of predetermined scenarios for comparing the models, varying study size, outcome frequency, strength of risk factors, and follow-up length. The Poisson and proportional hazards models yielded nearly identical relative risk estimates and confidence intervals in all situations except when confounding by age could not be closely controlled in the Poisson analysis. Logistic regression findings were more variable, with risk estimates differing most from the proportional hazards results when there was a common outcome or strong relative risk. The logistic model also provided less precise estimates than the other two. Thus, although logistic was the easiest model to implement, it should be used only in occupational cohort studies when the outcome is rare (5% or less), and the relative risk is less than ∼2. Even then, the proportional hazards and Poisson models are better choices. Selecting between these two can be based on convenience in most circumstances. Am. J. Ind. Med. 33:33–47, 1998. © 1998 Wiley-Liss, Inc.  相似文献   

15.
Genome‐wide association studies (GWAS) have been successful in finding numerous new risk variants for complex diseases, but the results almost exclusively rely on single‐marker scans. Methods that can analyze joint effects of many variants in GWAS data are still being developed and trialed. To evaluate the performance of such methods it is essential to have a GWAS data simulator that can rapidly simulate a large number of samples, and capture key features of real GWAS data such as linkage disequilibrium (LD) among single‐nucleotide polymorphisms (SNPs) and joint effects of multiple loci (multilocus epistasis). In the current study, we combine techniques for specifying high‐order epistasis among risk SNPs with an existing program GWAsimulator [Li and Li, 2008] to achieve rapid whole‐genome simulation with accurate modeling of complex interactions. We considered various approaches to specifying interaction models including the following: departure from product of marginal effects for pairwise interactions, product terms in logistic regression models for low‐order interactions, and penetrance tables conforming to marginal effect constraints for high‐order interactions or prescribing known biological interactions. Methods for conversion among different model specifications are developed using penetrance table as the fundamental characterization of disease models. The new program, called simGWA, is capable of efficiently generating large samples of GWAS data with high precision. We show that data simulated by simGWA are faithful to template LD structures, and conform to prespecified diseases models with (or without) interactions.  相似文献   

16.
目的 分析脑源性神经营养因子(BDNF)对抑郁症共病2型糖尿病发病及预后的影响,了解BDNF水平变化是否是2型糖尿病和抑郁症共同的相关因素。方法 收集45例抑郁症伴2型糖尿病患者资料(DDM组),同时收集41例抑郁症不伴2型糖尿病患者资料(NDDM组),以45例健康体检人群资料作为对照组。采集各组心理量表得分、糖代谢指标、体检指标、糖尿病并发症情况及治疗前后BDNF含量,并通过构建Logistic回归模型,探讨血清BDNF水平是否为抑郁症和2型糖尿病的相关因素。结果 DDM组和NDDM组接受治疗后BDNF含量均高于治疗前水平,差异有统计学意义(P<0.001),表明经过抗抑郁症治疗,患者血清BDNF水平明显升高;BDNF水平为抑郁症的保护因素,OR为0.782,95% CI(0.702~0.872),BDNF的水平越高,患抑郁症的危险性越小;年龄、空腹血糖(FPG)浓度及BDNF水平为抑郁症共病2型糖尿病患病的相关因素,其中年龄和FPG为负相关,血清BDNF为保护因素,OR为0.835,95% CI(0.736~0.948)。结论 血清BDNF浓度增高可降低抑郁症及共病2型糖尿病的风险。  相似文献   

17.
During the recent decades, interest in prediction models has substantially increased, but approaches to synthesize evidence from previously developed models have failed to keep pace. This causes researchers to ignore potentially useful past evidence when developing a novel prediction model with individual participant data (IPD) from their population of interest. We aimed to evaluate approaches to aggregate previously published prediction models with new data. We consider the situation that models are reported in the literature with predictors similar to those available in an IPD dataset. We adopt a two‐stage method and explore three approaches to calculate a synthesis model, hereby relying on the principles of multivariate meta‐analysis. The former approach employs a naive pooling strategy, whereas the latter accounts for within‐study and between‐study covariance. These approaches are applied to a collection of 15 datasets of patients with traumatic brain injury, and to five previously published models for predicting deep venous thrombosis. Here, we illustrated how the generally unrealistic assumption of consistency in the availability of evidence across included studies can be relaxed. Results from the case studies demonstrate that aggregation yields prediction models with an improved discrimination and calibration in a vast majority of scenarios, and result in equivalent performance (compared with the standard approach) in a small minority of situations. The proposed aggregation approaches are particularly useful when few participant data are at hand. Assessing the degree of heterogeneity between IPD and literature findings remains crucial to determine the optimal approach in aggregating previous evidence into new prediction models. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

18.
We carried out a discriminant analysis with identity by descent (IBD) at each marker as inputs, and the sib pair type (affected‐affected versus affected‐unaffected) as the output. Using simple logistic regression for this discriminant analysis, we illustrate the importance of comparing models with different number of parameters. Such model comparisons are best carried out using either the Akaike information criterion (AIC) or the Bayesian information criterion (BIC). When AIC (or BIC) stepwise variable selection was applied to the German Asthma data set, a group of markers were selected which provide the best fit to the data (assuming an additive effect). Interestingly, these 25–26 markers were not identical to those with the highest (in magnitude) single‐locus lod scores. © 2001 Wiley‐Liss, Inc.  相似文献   

19.
A two‐step process was used to find loci contributing to the qualitative disease phenotype in the Genetic Analysis Workshop (GAW) 12 simulated data. The first step used parametric linkage analysis with a limited number of dominant and recessive models to detect linkage to chromosomal regions. Subsequently, a subset of the simulated biallelic sequence polymorphisms was used for transmission/disequilibrium tests and to build haplotypes to fine map the disease‐predisposing polymorphism(s). A haplotype, strongly associated with the disease phenotype whose proximal end was within 39 base pairs of the functional allele for simulated major gene 6, was identified in the isolated population. © 2001 Wiley‐Liss, Inc.  相似文献   

20.
Predicting the probability of the occurrence of a binary outcome or condition is important in biomedical research. While assessing discrimination is an essential issue in developing and validating binary prediction models, less attention has been paid to methods for assessing model calibration. Calibration refers to the degree of agreement between observed and predicted probabilities and is often assessed by testing for lack‐of‐fit. The objective of our study was to examine the ability of graphical methods to assess the calibration of logistic regression models. We examined lack of internal calibration, which was related to misspecification of the logistic regression model, and external calibration, which was related to an overfit model or to shrinkage of the linear predictor. We conducted an extensive set of Monte Carlo simulations with a locally weighted least squares regression smoother (i.e., the loess algorithm) to examine the ability of graphical methods to assess model calibration. We found that loess‐based methods were able to provide evidence of moderate departures from linearity and indicate omission of a moderately strong interaction. Misspecification of the link function was harder to detect. Visual patterns were clearer with higher sample sizes, higher incidence of the outcome, or higher discrimination. Loess‐based methods were also able to identify the lack of calibration in external validation samples when an overfit regression model had been used. In conclusion, loess‐based smoothing methods are adequate tools to graphically assess calibration and merit wider application. © 2013 The Authors. Statistics in Medicine published by John Wiley & Sons, Ltd  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号