首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 11 毫秒
1.
Accurate genetic prediction of quantitative traits related to complex disease risk would have potential clinical impact, so investigation of statistical methodology to improve predictive performance is important. We compare a simple approach of polygenic scores using top ranking single nucleotide polymorphisms (SNPs) to a set of shrinkage models, namely Ridge Regression, Lasso and Hyper‐Lasso. These penalised regression methods analyse all genotyped SNPs simultaneously, potentially including much larger sets of SNPs in the models, not only those with the smallest P values. We compare the accuracy of these models for predicting low‐density lipoprotein (LDL) and high‐density lipoprotein (HDL) cholesterol, two lipid traits of clinical relevance, in the Whitehall II and British Women's Health and Heart Study cohorts, using SNPs from the HumanCVD BeadChip. For gene scores, the most accurate predictions arise from multivariate weighted scores and include only a small number of SNPs, identified as top hits by the HumanCVD BeadChip. Furthermore, there was little benefit from including external results from published sets of SNPs. We found that shrinkage approaches rarely improved significantly on gene score results. Genetic predictive performance is trait specific, depending on the heritability and genetic architecture of the trait, and is limited by the training data sample size. Our results for lipid traits suggest no current benefit of more complex methods over existing gene score methods. Instead, the most important choice for the prediction model is the number of SNPs and selection of the most predictive SNPs to include. However further comparisons, in larger samples and for other phenotypes, would still be of interest.  相似文献   

2.
Neural networks are capable of simultaneously analyzing multiple loci at one time in order to identify patterns of loci involved in complex traits. © 1997 Wiley-Liss, Inc.  相似文献   

3.
This paper proposes a general model, based on what is known about the nature of (complex) systems, of how systems—in particular, health care systems—respond to attempted change. Inferences are drawn from a critical literature review and reinterpretation of two primary studies. The two fundamental system‐change approaches are “stipulation” and “stimulation”: stip(ulation) attempts to elicit a specific response from the system; stim(ulation) encourages the system to generate diverse responses. Each has a unique strength: stip's is precision, the ability to directly impact the desired outcome and only that outcome; stim's is resonance, the ability to take advantage of behavior already present within the system. Each approach's inherent strength is its complement's inherent weakness; thus, stip and stim often clash if attempted simultaneously but can reinforce each other if applied in alternation. Opposite patterns (the “stip‐stim spiral” vs “stip‐stim stalemate”) are observed to underpin successful vs failed system change: The crucial difference is whether decision‐makers respond to a need for precision/resonance by strengthening the appropriate approach (stipulation/stimulation, respectively), or merely by weakening its complement. With further validation, the model has the potential to yield a more fundamental understanding of why system‐change efforts fail and how they can succeed.  相似文献   

4.
A central goal of medical genetics is to accurately predict complex disease from genotypes. Here, we present a comprehensive analysis of simulated and real data using lasso and elastic‐net penalized support‐vector machine models, a mixed‐effects linear model, a polygenic score, and unpenalized logistic regression. In simulation, the sparse penalized models achieved lower false‐positive rates and higher precision than the other methods for detecting causal SNPs. The common practice of prefiltering SNP lists for subsequent penalized modeling was examined and shown to substantially reduce the ability to recover the causal SNPs. Using genome‐wide SNP profiles across eight complex diseases within cross‐validation, lasso and elastic‐net models achieved substantially better predictive ability in celiac disease, type 1 diabetes, and Crohn's disease, and had equivalent predictive ability in the rest, with the results in celiac disease strongly replicating between independent datasets. We investigated the effect of linkage disequilibrium on the predictive models, showing that the penalized methods leverage this information to their advantage, compared with methods that assume SNP independence. Our findings show that sparse penalized approaches are robust across different disease architectures, producing as good as or better phenotype predictions and variance explained. This has fundamental ramifications for the selection and future development of methods to genetically predict human disease.  相似文献   

5.
Polygenic risk scores (PRSs) are a method to summarize the additive trait variance captured by a set of SNPs, and can increase the power of set‐based analyses by leveraging public genome‐wide association study (GWAS) datasets. PRS aims to assess the genetic liability to some phenotype on the basis of polygenic risk for the same or different phenotype estimated from independent data. We propose the application of PRSs as a set‐based method with an additional component of adjustment for linkage disequilibrium (LD), with potential extension of the PRS approach to analyze biologically meaningful SNP sets. We call this method POLARIS: POlygenic Ld‐Adjusted RIsk Score. POLARIS identifies the LD structure of SNPs using spectral decomposition of the SNP correlation matrix and replaces the individuals' SNP allele counts with LD‐adjusted dosages. Using a raw genotype dataset together with SNP effect sizes from a second independent dataset, POLARIS can be used for set‐based analysis. MAGMA is an alternative set‐based approach employing principal component analysis to account for LD between markers in a raw genotype dataset. We used simulations, both with simple constructed and real LD‐structure, to compare the power of these methods. POLARIS shows more power than MAGMA applied to the raw genotype dataset only, but less or comparable power to combined analysis of both datasets. POLARIS has the advantages that it produces a risk score per person per set using all available SNPs, and aims to increase power by leveraging the effect sizes from the discovery set in a self‐contained test of association in the test dataset.  相似文献   

6.
In association studies of complex traits, fixed‐effect regression models are usually used to test for association between traits and major gene loci. In recent years, variance‐component tests based on mixed models were developed for region‐based genetic variant association tests. In the mixed models, the association is tested by a null hypothesis of zero variance via a sequence kernel association test (SKAT), its optimal unified test (SKAT‐O), and a combined sum test of rare and common variant effect (SKAT‐C). Although there are some comparison studies to evaluate the performance of mixed and fixed models, there is no systematic analysis to determine when the mixed models perform better and when the fixed models perform better. Here we evaluated, based on extensive simulations, the performance of the fixed and mixed model statistics, using genetic variants located in 3, 6, 9, 12, and 15 kb simulated regions. We compared the performance of three models: (i) mixed models that lead to SKAT, SKAT‐O, and SKAT‐C, (ii) traditional fixed‐effect additive models, and (iii) fixed‐effect functional regression models. To evaluate the type I error rates of the tests of fixed models, we generated genotype data by two methods: (i) using all variants, (ii) using only rare variants. We found that the fixed‐effect tests accurately control or have low false positive rates. We performed simulation analyses to compare power for two scenarios: (i) all causal variants are rare, (ii) some causal variants are rare and some are common. Either one or both of the fixed‐effect models performed better than or similar to the mixed models except when (1) the region sizes are 12 and 15 kb and (2) effect sizes are small. Therefore, the assumption of mixed models could be satisfied and SKAT/SKAT‐O/SKAT‐C could perform better if the number of causal variants is large and each causal variant contributes a small amount to the traits (i.e., polygenes). In major gene association studies, we argue that the fixed‐effect models perform better or similarly to mixed models in most cases because some variants should affect the traits relatively large. In practice, it makes sense to perform analysis by both the fixed and mixed effect models and to make a comparison, and this can be readily done using our R codes and the SKAT packages.  相似文献   

7.
Penetrance‐based linkage analysis and variance component linkage analysis are two methods that are widely used to localize genes influencing quantitative traits. Using computer programs PAP and SOLAR as representative software implementations, we have conducted an empirical comparison of both methods' power to map quantitative trait loci in extended, randomly ascertained pedigrees, using simulated data. Two‐point linkage analyses were conducted on several quantitative traits of different genetic and environmental etiology using both programs, and the lod scores were compared. The two methods appear to have similar power when the underlying quantitative trait locus is diallelic, with one or the other method being slightly more powerful depending on the characteristics of the quantitative trait and the quantitative trait locus. In the case of a multiallelic quantitative trait locus, however, the variance component approach has much greater power. These findings suggest that one should give careful thought to the likely allelic architecture of the quantitative trait to be analyzed when choosing between these two analytical approaches. It may be the case in general that linkage methods which explicitly or implicitly rely on the assumption of a diallelic trait locus fare poorly when this assumption is incorrect. © 2001 Wiley‐Liss, Inc.  相似文献   

8.
Polygenic prediction using genome‐wide SNPs can provide high prediction accuracy for complex traits. Here, we investigate the question of how to account for genetic ancestry when conducting polygenic prediction. We show that the accuracy of polygenic prediction in structured populations may be partly due to genetic ancestry. However, we hypothesized that explicitly modeling ancestry could improve polygenic prediction accuracy. We analyzed three GWAS of hair color (HC), tanning ability (TA), and basal cell carcinoma (BCC) in European Americans (sample size from 7,440 to 9,822) and considered two widely used polygenic prediction approaches: polygenic risk scores (PRSs) and best linear unbiased prediction (BLUP). We compared polygenic prediction without correction for ancestry to polygenic prediction with ancestry as a separate component in the model. In 10‐fold cross‐validation using the PRS approach, the R2 for HC increased by 66% (0.0456–0.0755; P < 10−16), the R2 for TA increased by 123% (0.0154 to 0.0344; P < 10−16), and the liability‐scale R2 for BCC increased by 68% (0.0138–0.0232; P < 10−16) when explicitly modeling ancestry, which prevents ancestry effects from entering into each SNP effect and being overweighted. Surprisingly, explicitly modeling ancestry produces a similar improvement when using the BLUP approach, which fits all SNPs simultaneously in a single variance component and causes ancestry to be underweighted. We validate our findings via simulations, which show that the differences in prediction accuracy will increase in magnitude as sample sizes increase. In summary, our results show that explicitly modeling ancestry can be important in both PRS and BLUP prediction.  相似文献   

9.

Background

The U.S. Environmental Protection Agency is facing large challenges in managing environmental chemicals with increasingly complex requirements for assessing risk that push the limits of our current approaches. To address some of these challenges, the National Research Council (NRC) developed a new vision for toxicity testing. Although the report focused only on toxicity testing, it recognized that exposure science will play a crucial role in a new risk-based framework.

Objective

In this commentary we expand on the important role of exposure science in a fully integrated system for risk assessment. We also elaborate on the exposure research needed to achieve this vision.

Discussion

Exposure science, when applied in an integrated systems approach for risk assessment, can be used to inform and prioritize toxicity testing, describe risks, and verify the outcomes of testing. Exposure research in several areas will be needed to achieve the NRC vision. For example, models are needed to screen chemicals based on exposure. Exposure, dose–response, and biological pathway models must be developed and linked. Advanced computational approaches are required for dose reconstruction. Monitoring methods are needed that easily measure exposure, internal dose, susceptibility, and biological outcome. Finally, population monitoring studies are needed to interpret toxicity test results in terms of real-world risk.

Conclusion

This commentary is a call for the exposure community to step up to the challenge by developing a predictive science with the knowledge and tools for moving into the 21st century.  相似文献   

10.
The performances of four published risk prediction systems for sudden infant death syndrome (SIDS) were compared for the 34 cases of SIDS and 48 explained deaths among a cohort of all births in the U.K. during one week in April 1970. With cut-points for the scores which would include about 20 per cent of the population, the sensitivities of the scoring methods ranged from identifying 40 per cent of the explained deaths to 70 per cent of SIDS. The highest sensitivities were achieved with the Sheffield 'at birth' system and a system based on both data from Oxford and general observations from the literature, with the latter system providing the most powerful predictor of SIDS for the study sample.  相似文献   

11.
Functional linear models are developed in this paper for testing associations between quantitative traits and genetic variants, which can be rare variants or common variants or the combination of the two. By treating multiple genetic variants of an individual in a human population as a realization of a stochastic process, the genome of an individual in a chromosome region is a continuum of sequence data rather than discrete observations. The genome of an individual is viewed as a stochastic function that contains both linkage and linkage disequilibrium (LD) information of the genetic markers. By using techniques of functional data analysis, both fixed and mixed effect functional linear models are built to test the association between quantitative traits and genetic variants adjusting for covariates. After extensive simulation analysis, it is shown that the F‐distributed tests of the proposed fixed effect functional linear models have higher power than that of sequence kernel association test (SKAT) and its optimal unified test (SKAT‐O) for three scenarios in most cases: (1) the causal variants are all rare, (2) the causal variants are both rare and common, and (3) the causal variants are common. The superior performance of the fixed effect functional linear models is most likely due to its optimal utilization of both genetic linkage and LD information of multiple genetic variants in a genome and similarity among different individuals, while SKAT and SKAT‐O only model the similarities and pairwise LD but do not model linkage and higher order LD information sufficiently. In addition, the proposed fixed effect models generate accurate type I error rates in simulation studies. We also show that the functional kernel score tests of the proposed mixed effect functional linear models are preferable in candidate gene analysis and small sample problems. The methods are applied to analyze three biochemical traits in data from the Trinity Students Study.  相似文献   

12.
We develop a new genetic prediction method, smooth‐threshold multivariate genetic prediction, using single nucleotide polymorphisms (SNPs) data in genome‐wide association studies (GWASs). Our method consists of two stages. At the first stage, unlike the usual discontinuous SNP screening as used in the gene score method, our method continuously screens SNPs based on the output from standard univariate analysis for marginal association of each SNP. At the second stage, the predictive model is built by a generalized ridge regression simultaneously using the screened SNPs with SNP weight determined by the strength of marginal association. Continuous SNP screening by the smooth thresholding not only makes prediction stable but also leads to a closed form expression of generalized degrees of freedom (GDF). The GDF leads to the Stein's unbiased risk estimation (SURE), which enables data‐dependent choice of optimal SNP screening cutoff without using cross‐validation. Our method is very rapid because computationally expensive genome‐wide scan is required only once in contrast to the penalized regression methods including lasso and elastic net. Simulation studies that mimic real GWAS data with quantitative and binary traits demonstrate that the proposed method outperforms the gene score method and genomic best linear unbiased prediction (GBLUP), and also shows comparable or sometimes improved performance with the lasso and elastic net being known to have good predictive ability but with heavy computational cost. Application to whole‐genome sequencing (WGS) data from the Alzheimer's Disease Neuroimaging Initiative (ADNI) exhibits that the proposed method shows higher predictive power than the gene score and GBLUP methods.  相似文献   

13.
In this paper, extensive simulations are performed to compare two statistical methods to analyze multiple correlated quantitative phenotypes: (1) approximate F‐distributed tests of multivariate functional linear models (MFLM) and additive models of multivariate analysis of variance (MANOVA), and (2) Gene Association with Multiple Traits (GAMuT) for association testing of high‐dimensional genotype data. It is shown that approximate F‐distributed tests of MFLM and MANOVA have higher power and are more appropriate for major gene association analysis (i.e., scenarios in which some genetic variants have relatively large effects on the phenotypes); GAMuT has higher power and is more appropriate for analyzing polygenic effects (i.e., effects from a large number of genetic variants each of which contributes a small amount to the phenotypes). MFLM and MANOVA are very flexible and can be used to perform association analysis for (i) rare variants, (ii) common variants, and (iii) a combination of rare and common variants. Although GAMuT was designed to analyze rare variants, it can be applied to analyze a combination of rare and common variants and it performs well when (1) the number of genetic variants is large and (2) each variant contributes a small amount to the phenotypes (i.e., polygenes). MFLM and MANOVA are fixed effect models that perform well for major gene association analysis. GAMuT can be viewed as an extension of sequence kernel association tests (SKAT). Both GAMuT and SKAT are more appropriate for analyzing polygenic effects and they perform well not only in the rare variant case, but also in the case of a combination of rare and common variants. Data analyses of European cohorts and the Trinity Students Study are presented to compare the performance of the two methods.  相似文献   

14.
Although transmission disequilibrium tests (TDT) and the FBAT statistic are robust against population substructure, they have reduced statistical power, as compared with fully efficient tests that are not guarded against confounding because of population substructure. This has often limited the application of transmission disequilibrium tests/FBATs to candidate gene analysis, because, in a genome‐wide association study, population substructure can be adjusted by approaches such as genomic control and EIGENSTRAT. Here, we provide new statistical methods for the analysis of quantitative and dichotomous phenotypes in extended families. Although the approach utilizes the polygenic model to maximize the efficiency, it still preserves the robustness to non‐normality and misspecified covariance structures. In addition, the proposed method performs better than the existing methods for dichotomous phenotype, and the new transmission disequilibrium test for candidate gene analysis is more efficient than FBAT statistics. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

15.
Many statistical methods have been proposed in recent years to test for genetic linkage and association between genetic markers and traits of interest through unrelated nuclear families. However, most of these methods are not valid tests of association in the presence of linkage when some of the nuclear families are related. As a result, related nuclear families in large pedigrees cannot be included in a single analysis to test for linkage disequilibrium. Recently, Martin et al. [Am J Hum Genet 67:146–54, 2000] proposed the pedigree disequilibrium test (PDT) to test for linkage and association in general pedigrees for qualitative traits. In this article, we develop a similar quantitative pedigree disequilibrium test (QPDT) to test for linkage and association in general pedigrees for quantitative traits. We apply both the PDT and the QPDT to analyze the sequence data from the seven candidate genes in the simulated data sets in the Genetic Analysis Workshop 12. © 2001 Wiley‐Liss, Inc.  相似文献   

16.
Genetic studies of survival outcomes have been proposed and conducted recently, but statistical methods for identifying genetic variants that affect disease progression are rarely developed. Motivated by our ongoing real studies, here we develop Cox proportional hazard models using functional regression (FR) to perform gene‐based association analysis of survival traits while adjusting for covariates. The proposed Cox models are fixed effect models where the genetic effects of multiple genetic variants are assumed to be fixed. We introduce likelihood ratio test (LRT) statistics to test for associations between the survival traits and multiple genetic variants in a genetic region. Extensive simulation studies demonstrate that the proposed Cox RF LRT statistics have well‐controlled type I error rates. To evaluate power, we compare the Cox FR LRT with the previously developed burden test (BT) in a Cox model and sequence kernel association test (SKAT), which is based on mixed effect Cox models. The Cox FR LRT statistics have higher power than or similar power as Cox SKAT LRT except when 50%/50% causal variants had negative/positive effects and all causal variants are rare. In addition, the Cox FR LRT statistics have higher power than Cox BT LRT. The models and related test statistics can be useful in the whole genome and whole exome association studies. An age‐related macular degeneration dataset was analyzed as an example.  相似文献   

17.
引入复杂适应系统理论和基于Agent的建模方法探索固体废物管理系统的动态演化进程。给出了基于Agent的固体废物管理系统演化仿真模型,该模型能正确描述废物管理系统的演化与发展趋势,对模型中涉及的Agent及其交互关系进行了设计。  相似文献   

18.
Next generation sequencing technology has enabled the paradigm shift in genetic association studies from the common disease/common variant to common disease/rare‐variant hypothesis. Analyzing individual rare variants is known to be underpowered; therefore association methods have been developed that aggregate variants across a genetic region, which for exome sequencing is usually a gene. The foreseeable widespread use of whole genome sequencing poses new challenges in statistical analysis. It calls for new rare‐variant association methods that are statistically powerful, robust against high levels of noise due to inclusion of noncausal variants, and yet computationally efficient. We propose a simple and powerful statistic that combines the disease‐associated P‐values of individual variants using a weight that is the inverse of the expected standard deviation of the allele frequencies under the null. This approach, dubbed as Sigma‐P method, is extremely robust to the inclusion of a high proportion of noncausal variants and is also powerful when both detrimental and protective variants are present within a genetic region. The performance of the Sigma‐P method was tested using simulated data based on realistic population demographic and disease models and its power was compared to several previously published methods. The results demonstrate that this method generally outperforms other rare‐variant association methods over a wide range of models. Additionally, sequence data on the ANGPTL family of genes from the Dallas Heart Study were tested for associations with nine metabolic traits and both known and novel putative associations were uncovered using the Sigma‐P method.  相似文献   

19.
Predicting the occurrence of an adverse event over time is an important issue in clinical medicine. Clinical prediction models and associated points‐based risk‐scoring systems are popular statistical methods for summarizing the relationship between a multivariable set of patient risk factors and the risk of the occurrence of an adverse event. Points‐based risk‐scoring systems are popular amongst physicians as they permit a rapid assessment of patient risk without the use of computers or other electronic devices. The use of such points‐based risk‐scoring systems facilitates evidence‐based clinical decision making. There is a growing interest in cause‐specific mortality and in non‐fatal outcomes. However, when considering these types of outcomes, one must account for competing risks whose occurrence precludes the occurrence of the event of interest. We describe how points‐based risk‐scoring systems can be developed in the presence of competing events. We illustrate the application of these methods by developing risk‐scoring systems for predicting cardiovascular mortality in patients hospitalized with acute myocardial infarction. Code in the R statistical programming language is provided for the implementation of the described methods. © 2016 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd.  相似文献   

20.
The primary circulating form of vitamin D is 25‐hydroxy vitamin D (25(OH)D), a modifiable trait linked with a growing number of chronic diseases. In addition to environmental determinants of 25(OH)D, including dietary sources and skin ultraviolet B (UVB) exposure, twin‐ and family‐based studies suggest that genetics contribute substantially to vitamin D variability with heritability estimates ranging from 43% to 80%. Genome‐wide association studies (GWAS) have identified single nucleotide polymorphisms (SNPs) located in four gene regions associated with 25(OH)D. These SNPs collectively explain only a fraction of the heritability in 25(OH)D estimated by twin‐ and family‐based studies. Using 25(OH)D concentrations and GWAS data on 5,575 subjects drawn from five cohorts, we hypothesized that genome‐wide data, in the form of (1) a polygenic score comprised of hundreds or thousands of SNPs that do not individually reach GWAS significance, or (2) a linear mixed model for genome‐wide complex trait analysis, would explain variance in measured circulating 25(OH)D beyond that explained by known genome‐wide significant 25(OH)D‐associated SNPs. GWAS identified SNPs explained 5.2% of the variation in circulating 25(OH)D in these samples and there was little evidence additional markers significantly improved predictive ability. On average, a polygenic score comprised of GWAS‐identified SNPs explained a larger proportion of variation in circulating 25(OH)D than scores comprised of thousands of SNPs that were on average, nonsignificant. Employing a linear mixed model for genome‐wide complex trait analysis explained little additional variability (range 0–22%). The absence of a significant polygenic effect in this relatively large sample suggests an oligogenetic architecture for 25(OH)D.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号