首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 14 毫秒
1.
Family‐based genetic association studies of related individuals provide opportunities to detect genetic variants that complement studies of unrelated individuals. Most statistical methods for family association studies for common variants are single marker based, which test one SNP a time. In this paper, we consider testing the effect of an SNP set, e.g., SNPs in a gene, in family studies, for both continuous and discrete traits. Specifically, we propose a generalized estimating equations (GEEs) based kernel association test, a variance component based testing method, to test for the association between a phenotype and multiple variants in an SNP set jointly using family samples. The proposed approach allows for both continuous and discrete traits, where the correlation among family members is taken into account through the use of an empirical covariance estimator. We derive the theoretical distribution of the proposed statistic under the null and develop analytical methods to calculate the P‐values. We also propose an efficient resampling method for correcting for small sample size bias in family studies. The proposed method allows for easily incorporating covariates and SNP‐SNP interactions. Simulation studies show that the proposed method properly controls for type I error rates under both random and ascertained sampling schemes in family studies. We demonstrate through simulation studies that our approach has superior performance for association mapping compared to the single marker based minimum P‐value GEE test for an SNP‐set effect over a range of scenarios. We illustrate the application of the proposed method using data from the Cleveland Family GWAS Study.  相似文献   

2.
A goal of association analysis is to determine whether variation in a particular candidate region or gene is associated with liability to complex disease. To evaluate such candidates, ubiquitous Single Nucleotide Polymorphisms (SNPs) are useful. It is critical, however, to select a set of SNPs that are in substantial linkage disequilibrium (LD) with all other polymorphisms in the region. Whether there is an ideal statistical framework to test such a set of ‘tag SNPs’ for association is unknown. Compared to tests for association based on frequencies of haplotypes, recent evidence suggests tests for association based on linear combinations of the tag SNPs (Hotelling T2 test) are more powerful. Following this logical progression, we wondered if single‐locus tests would prove generally more powerful than the regression‐based tests? We answer this question by investigating four inferential procedures: the maximum of a series of test statistics corrected for multiple testing by the Bonferroni procedure, TB, or by permutation of case‐control status, TP; a procedure that tests the maximum of a smoothed curve fitted to the series of of test statistics, TS; and the Hotelling T2 procedure, which we call TR. These procedures are evaluated by simulating data like that from human populations, including realistic levels of LD and realistic effects of alleles conferring liability to disease. We find that power depends on the correlation structure of SNPs within a gene, the density of tag SNPs, and the placement of the liability allele. The clearest pattern emerges between power and the number of SNPs selected. When a large fraction of the SNPs within a gene are tested, and multiple SNPs are highly correlated with the liability allele, TS has better power. Using a SNP selection scheme that optimizes power but also requires a substantial number of SNPs to be genotyped (roughly 10–20 SNPs per gene), power of TP is generally superior to that for the other procedures, including TR. Finally, when a SNP selection procedure that targets a minimal number of SNPs per gene is applied, the average performances of TP and TR are indistinguishable. Genet. Epidemiol. © 2005 Wiley‐Liss, Inc.  相似文献   

3.
Kernel machine learning methods, such as the SNP‐set kernel association test (SKAT), have been widely used to test associations between traits and genetic polymorphisms. In contrast to traditional single‐SNP analysis methods, these methods are designed to examine the joint effect of a set of related SNPs (such as a group of SNPs within a gene or a pathway) and are able to identify sets of SNPs that are associated with the trait of interest. However, as with many multi‐SNP testing approaches, kernel machine testing can draw conclusion only at the SNP‐set level, and does not directly inform on which one(s) of the identified SNP set is actually driving the associations. A recently proposed procedure, KerNel Iterative Feature Extraction (KNIFE), provides a general framework for incorporating variable selection into kernel machine methods. In this article, we focus on quantitative traits and relatively common SNPs, and adapt the KNIFE procedure to genetic association studies and propose an approach to identify driver SNPs after the application of SKAT to gene set analysis. Our approach accommodates several kernels that are widely used in SNP analysis, such as the linear kernel and the Identity by State (IBS) kernel. The proposed approach provides practically useful utilities to prioritize SNPs, and fills the gap between SNP set analysis and biological functional studies. Both simulation studies and real data application are used to demonstrate the proposed approach.  相似文献   

4.
Accurate genetic prediction of quantitative traits related to complex disease risk would have potential clinical impact, so investigation of statistical methodology to improve predictive performance is important. We compare a simple approach of polygenic scores using top ranking single nucleotide polymorphisms (SNPs) to a set of shrinkage models, namely Ridge Regression, Lasso and Hyper‐Lasso. These penalised regression methods analyse all genotyped SNPs simultaneously, potentially including much larger sets of SNPs in the models, not only those with the smallest P values. We compare the accuracy of these models for predicting low‐density lipoprotein (LDL) and high‐density lipoprotein (HDL) cholesterol, two lipid traits of clinical relevance, in the Whitehall II and British Women's Health and Heart Study cohorts, using SNPs from the HumanCVD BeadChip. For gene scores, the most accurate predictions arise from multivariate weighted scores and include only a small number of SNPs, identified as top hits by the HumanCVD BeadChip. Furthermore, there was little benefit from including external results from published sets of SNPs. We found that shrinkage approaches rarely improved significantly on gene score results. Genetic predictive performance is trait specific, depending on the heritability and genetic architecture of the trait, and is limited by the training data sample size. Our results for lipid traits suggest no current benefit of more complex methods over existing gene score methods. Instead, the most important choice for the prediction model is the number of SNPs and selection of the most predictive SNPs to include. However further comparisons, in larger samples and for other phenotypes, would still be of interest.  相似文献   

5.
We propose a two-stage approach to analyze genome-wide association data in order to identify a set of promising single-nucleotide polymorphisms (SNPs). In stage one, we select a list of top signals from single SNP analyses by controlling false discovery rate. In stage two, we use the least absolute shrinkage and selection operator (LASSO) regression to reduce false positives. The proposed approach was evaluated using simulated quantitative traits based on genome-wide SNP data on 8,861 Caucasian individuals from the Atherosclerosis Risk in Communities (ARIC) Study. Our first stage, targeted at controlling false negatives, yields better power than using Bonferroni-corrected significance level. The LASSO regression reduces the number of significant SNPs in stage two: it reduces false-positive SNPs and it reduces true-positive SNPs also at simulated causal loci due to linkage disequilibrium. Interestingly, the LASSO regression preserves the power from stage one, i.e., the number of causal loci detected from the LASSO regression in stage two is almost the same as in stage one, while reducing false positives further. Real data on systolic blood pressure in the ARIC study was analyzed using our two-stage approach which identified two significant SNPs, one of which was reported to be genome-significant in a meta-analysis containing a much larger sample size. On the other hand, a single SNP association scan did not yield any significant results.  相似文献   

6.
In spite of the tremendous success of genome-wide association studies (GWAS) in identifying genetic variants associated with complex traits and common diseases, many more are yet to be discovered. Hence, it is always desirable to improve the statistical power of GWAS. Paralleling with the intensive efforts of integrating GWAS with functional annotations or other omic data, we propose leveraging other published GWAS summary data to boost statistical power for a new/focus GWAS; the traits of the published GWAS may or may not be genetically correlated with the target trait of the new GWAS. Building on weighted hypothesis testing with a solid theoretical foundation, we develop a novel and effective method to construct single-nucleotide polymorphism (SNP)-specific weights based on 22 published GWAS data sets with various traits, detecting sometimes dramatically increased numbers of significant SNPs and independent loci as compared to the standard/unweighted analysis. For example, by integrating a schizophrenia GWAS summary data set with 19 other GWAS summary data sets of nonschizophrenia traits, our new method identified 1,585 genome-wide significant SNPs mapping to 15 linkage disequilibrium-independent loci, largely exceeding 818 significant SNPs in 13 independent loci identified by the standard/unweighted analysis; furthermore, using a later and larger schizophrenia GWAS summary data set as the validation data, 1,423 (out of 1,585) significant SNPs identified by the weighted analysis, compared to 705 (out of 818) by the unweighted analysis, were confirmed, while all 15 and 13 independent loci were also confirmed. Similar conclusions were reached with lipids and Alzheimer's disease (AD) traits. We conclude that the proposed approach is simple and cost-effective to improve GWAS power.  相似文献   

7.
Haplotype‐based association studies have been proposed as a powerful comprehensive approach to identify causal genetic variation underlying complex diseases. Data comparisons within families offer the additional advantage of dealing naturally with complex sources of noise, confounding and population stratification. Two problems encountered when investigating associations between haplotypes and a continuous trait using data from sibships are (i) the need to define within‐sibship comparisons for sibships of size greater than two and (ii) the difficulty of resolving the joint distribution of haplotype pairs within sibships in the absence of parental genotypes. We therefore propose first a method of orthogonal transformation of both outcomes and exposures that allow the decomposition of between‐ and within‐sibship regression effects when sibship size is greater than two. We conducted a simulation study, which confirmed analysis using all members of a sibship is statistically more powerful than methods based on cross‐sectional analysis or using subsets of sib‐pairs. Second, we propose a simple permutation approach to avoid errors of inference due to the within‐sibship correlation of any errors in haplotype assignment. These methods were applied to investigate the association between mammographic density (MD), a continuously distributed and heritable risk factor for breast cancer, and single nucleotide polymorphisms (SNPs) and haplotypes from the VDR gene using data from a study of 430 twins and sisters. We found evidence of association between MD and a 4‐SNP VDR haplotype. In conclusion, our proposed method retains the benefits of the between‐ and within‐pair analysis for pairs of siblings and can be implemented in standard software. Genet. Epidemiol. 34: 309–318, 2010. © 2009 Wiley‐Liss, Inc.  相似文献   

8.
Genome-wide expression quantitative trait loci (eQTLs) mapping explores the relationship between gene expression and DNA variants, such as single-nucleotide polymorphism (SNPs), to understand genetic basis of human diseases. Due to the large number of genes and SNPs that need to be assessed, current methods for eQTL mapping often suffer from low detection power, especially for identifying trans-eQTLs. In this paper, we propose the idea of performing SNP ranking based on the higher criticism statistic, a summary statistic developed in large-scale signal detection. We illustrate how the HC-based SNP ranking can effectively prioritize eQTL signals over noise, greatly reduce the burden of joint modeling, and improve the power for eQTL mapping. Numerical results in simulation studies demonstrate the superior performance of our method compared to existing methods. The proposed method is also evaluated in HapMap eQTL data analysis and the results are compared to a database of known eQTLs.  相似文献   

9.
Genome‐wide association studies (GWAS) have identified many single nucleotide polymorphisms (SNPs) associated with complex traits. However, the genetic heritability of most of these traits remains unexplained. To help guide future studies, we address the crucial question of whether future GWAS can detect new SNP associations and explain additional heritability given the new availability of larger GWAS SNP arrays, imputation, and reduced genotyping costs. We first describe the pairwise and imputation coverage of all SNPs in the human genome by commercially available GWAS SNP arrays, using the 1000 Genomes Project as a reference. Next, we describe the findings from 6 years of GWAS of 172 chronic diseases, calculating the power to detect each of them while taking array coverage and sample size into account. We then calculate the power to detect these SNP associations under different conditions using improved coverage and/or sample sizes. Finally, we estimate the percentages of SNP associations and heritability previously detected and detectable by future GWAS under each condition. Overall, we estimated that previous GWAS have detected less than one‐fifth of all GWAS‐detectable SNPs underlying chronic disease. Furthermore, increasing sample size has a much larger impact than increasing coverage on the potential of future GWAS to detect additional SNP‐disease associations and heritability.  相似文献   

10.
Recent studies have shown that quantitative phenotypes may be influenced not only by multiple single nucleotide polymorphisms (SNPs) within a gene but also by the interaction between SNPs at unlinked genes. We propose a new statistical approach that can detect gene‐gene interactions at the allelic level which contribute to the phenotypic variation in a quantitative trait. By testing for the association of allelic combinations at multiple unlinked loci with a quantitative trait, we can detect the SNP allelic interaction whether or not it can be detected as a main effect. Our proposed method assigns a score to unrelated subjects according to their allelic combination inferred from observed genotypes at two or more unlinked SNPs, and then tests for the association of the allelic score with a quantitative trait. To investigate the statistical properties of the proposed method, we performed a simulation study to estimate type I error rates and power and demonstrated that this allelic approach achieves greater power than the more commonly used genotypic approach to test for gene‐gene interaction. As an example, the proposed method was applied to data obtained as part of a candidate gene study of sodium retention by the kidney. We found that this method detects an interaction between the calcium‐sensing receptor gene (CaSR), the chloride channel gene (CLCNKB) and the Na, K, 2Cl cotransporter gene (CLC12A1) that contributes to variation in diastolic blood pressure. Genet. Epidemiol. 2009. © 2008 Wiley‐Liss, Inc.  相似文献   

11.
Adequate control of type I error rates will be necessary in the increasing genome‐wide search for interactive effects on complex traits. After observing unexpected variability in type I error rates from SNP‐by‐genome interaction scans, we sought to characterize this variability and test the ability of heteroskedasticity‐consistent standard errors to correct it. We performed 81 SNP‐by‐genome interaction scans using a product‐term model on quantitative traits in a sample of 1,053 unrelated European Americans from the NHLBI Family Heart Study, and additional scans on five simulated datasets. We found that the interaction‐term genomic inflation factor (lambda) showed inflation and deflation that varied with sample size and allele frequency; that similar lambda variation occurred in the absence of population substructure; and that lambda was strongly related to heteroskedasticity but not to minor non‐normality of phenotypes. Heteroskedasticity‐consistent standard errors narrowed the range of lambda, with HC3 outperforming HC0, but in individual scans tended to create new P‐value outliers related to sparse two‐locus genotype classes. We explain the lambda variation as a result of non‐independence of test statistics coupled with stochastic biases in test statistics due to a failure of the test to reach asymptotic properties. We propose that one way to interpret lambda is by comparison to an empirical distribution generated from data simulated under the null hypothesis and without population substructure. We further conclude that the interaction‐term lambda should not be used to adjust test statistics and that heteroskedasticity‐consistent standard errors come with limitations that may outweigh their benefits in this setting.  相似文献   

12.
Complex diseases are often associated with sets of multiple interacting genetic factors and possibly with unique sets of the genetic factors in different groups of individuals (genetic heterogeneity). We introduce a novel concept of custom correlation coefficient (CCC) between single nucleotide polymorphisms (SNPs) that address genetic heterogeneity by measuring subset correlations autonomously. It is used to develop a 3‐step process to identify candidate multi‐SNP patterns: (1) pairwise (SNP–SNP) correlations are computed using CCC; (2) clusters of so‐correlated SNPs identified; and (3) frequencies of these clusters in disease cases and controls compared to identify disease‐associated multi‐SNP patterns. This method identified 42 candidate multi‐SNP associations with hypertensive heart disease (HHD), among which one cluster of 22 SNPs (six genes) included 13 in SLC8A1 (aka NCX1, an essential component of cardiac excitation‐contraction coupling) and another of 32 SNPs had 29 from a different segment of SLC8A1. While allele frequencies show little difference between cases and controls, the cluster of 22 associated alleles were found in 20% of controls but no cases and the other in 3% of controls but 20% of cases. These suggest that both protective and risk effects on HHD could be exerted by combinations of variants in different regions of SLC8A1, modified by variants from other genes. The results demonstrate that this new correlation metric identifies disease‐associated multi‐SNP patterns overlooked by commonly used correlation measures. Furthermore, computation time using CCC is a small fraction of that required by other methods, thereby enabling the analyses of large GWAS datasets.  相似文献   

13.
Introduction: Genetic discoveries are validated through the meta‐analysis of genome‐wide association scans in large international consortia. Because environmental variables may interact with genetic factors, investigation of differing genetic effects for distinct levels of an environmental exposure in these large consortia may yield additional susceptibility loci undetected by main effects analysis. We describe a method of joint meta‐analysis (JMA) of SNP and SNP by Environment (SNP × E) regression coefficients for use in gene‐environment interaction studies. Methods: In testing SNP × E interactions, one approach uses a two degree of freedom test to identify genetic variants that influence the trait of interest. This approach detects both main and interaction effects between the trait and the SNP. We propose a method to jointly meta‐analyze the SNP and SNP × E coefficients using multivariate generalized least squares. This approach provides confidence intervals of the two estimates, a joint significance test for SNP and SNP × E terms, and a test of homogeneity across samples. Results: We present a simulation study comparing this method to four other methods of meta‐analysis and demonstrate that the JMA performs better than the others when both main and interaction effects are present. Additionally, we implemented our methods in a meta‐analysis of the association between SNPs from the type 2 diabetes‐associated gene PPARG and log‐transformed fasting insulin levels and interaction by body mass index in a combined sample of 19,466 individuals from five cohorts. Genet. Epidemiol. 35:11–18, 2011. © 2010 Wiley‐Liss, Inc.  相似文献   

14.
Recently, large scale genome‐wide association study (GWAS) meta‐analyses have boosted the number of known signals for some traits into the tens and hundreds. Typically, however, variants are only analysed one‐at‐a‐time. This complicates the ability of fine‐mapping to identify a small set of SNPs for further functional follow‐up. We describe a new and scalable algorithm, joint analysis of marginal summary statistics (JAM), for the re‐analysis of published marginal summary stactistics under joint multi‐SNP models. The correlation is accounted for according to estimates from a reference dataset, and models and SNPs that best explain the complete joint pattern of marginal effects are highlighted via an integrated Bayesian penalized regression framework. We provide both enumerated and Reversible Jump MCMC implementations of JAM and present some comparisons of performance. In a series of realistic simulation studies, JAM demonstrated identical performance to various alternatives designed for single region settings. In multi‐region settings, where the only multivariate alternative involves stepwise selection, JAM offered greater power and specificity. We also present an application to real published results from MAGIC (meta‐analysis of glucose and insulin related traits consortium) – a GWAS meta‐analysis of more than 15,000 people. We re‐analysed several genomic regions that produced multiple significant signals with glucose levels 2 hr after oral stimulation. Through joint multivariate modelling, JAM was able to formally rule out many SNPs, and for one gene, ADCY5, suggests that an additional SNP, which transpired to be more biologically plausible, should be followed up with equal priority to the reported index.  相似文献   

15.
16.
Although type 2 diabetes (T2D) results from metabolic defects in insulin secretion and insulin sensitivity, most of the genetic risk loci identified to date relates to insulin secretion. We reported that T2D loci influencing insulin sensitivity may be identified through interactions with insulin secretion loci, thereby leading to T2D. Here, we hypothesize that joint testing of variant main effects and interaction effects with an insulin secretion locus increases power to identify genetic interactions leading to T2D. We tested this hypothesis with an intronic MTNR1B SNP, rs10830963, which is associated with acute insulin response to glucose, a dynamic measure of insulin secretion. rs10830963 was tested for interaction and joint (main + interaction) effects with genome‐wide data in African Americans (2,452 cases and 3,772 controls) from five cohorts. Genome‐wide genotype data (Affymetrix Human Genome 6.0 array) was imputed to a 1000 Genomes Project reference panel. T2D risk was modeled using logistic regression with rs10830963 dosage, age, sex, and principal component as predictors. Joint effects were captured using the Kraft two degrees of freedom test. Genome‐wide significant (< 5 × 10?8) interaction with MTNR1B and joint effects were detected for CMIP intronic SNP rs17197883 (Pinteraction = 1.43 × 10?8; Pjoint = 4.70 × 10?8). CMIP variants have been nominally associated with T2D, fasting glucose, and adiponectin in individuals of East Asian ancestry, with high‐density lipoprotein, and with waist‐to‐hip ratio adjusted for body mass index in Europeans. These data support the hypothesis that additional genetic factors contributing to T2D risk, including insulin sensitivity loci, can be identified through interactions with insulin secretion loci.  相似文献   

17.
We study the link between two quality measures of SNP (single nucleotide polymorphism) data in genome‐wide association (GWA) studies, that is, per SNP call rates (CR) and p‐values for testing Hardy–Weinberg equilibrium (HWE). The aim is to improve these measures by applying methods based on realized randomized p‐values, the false discovery rate and estimates for the proportion of false hypotheses. While exact non‐randomized conditional p‐values for testing HWE cannot be recommended for estimating the proportion of false hypotheses, their realized randomized counterparts should be used. P‐values corresponding to the asymptotic unconditional chi‐square test lead to reasonable estimates only if SNPs with low minor allele frequency are excluded. We provide an algorithm to compute the probability that SNPs violate HWE given the observed CR, which yields an improved measure of data quality. The proposed methods are applied to SNP data from the KORA (Cooperative Health Research in the Region of Augsburg, Southern Germany) 500 K project, a GWA study in a population‐based sample genotyped by Affymetrix GeneChip 500 K arrays using the calling algorithm BRLMM 1.4.0. We show that all SNPs with CR = 100 per cent are nearly in perfect HWE which militates in favor of the population to meet the conditions required for HWE at least for these SNPs. Moreover, we show that the proportion of SNPs not being in HWE increases with decreasing CR. We conclude that using a single threshold for judging HWE p‐values without taking the CR into account is problematic. Instead we recommend a stratified analysis with respect to CR. Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

18.
A number of investigators have proposed regression methods for testing linkage between a phenotypic trait and a genetic marker with sib‐pair observations. Xu et al. [Am J Hum Genet 67:1025–8, 2000] studied a unified method for testing linkage, which tends to be more powerful than existing procedures. Often there are multiple traits, which are linked to a common set of genetic markers. In this paper, we present a simple generalization of the unified test to combine information from multiple traits optimally. We use the simulated Genetic Analysis Workshop 12 data to illustrate this methodology and show the advantage of using the combined tests over the single‐trait tests. For the four quantitative traits (Q1,...,Q4) studied, our linkage results suggest that major loci affecting Q1 and Q2 localize at or near markers D02G172, D19G032, and D09G122, while loci affecting Q3 and Q4 localize at or near markers D09G122 and D17G051. © 2001 Wiley‐Liss, Inc.  相似文献   

19.
We explore an approach that allows us to consider a trait for which we wish to determine the optimal subset of markers out of a set of p ≥ 3 candidate markers being considered in a linkage analysis. The most effective analysis would find the model that only includes the q markers closest to the q major genes which determine the trait. Finding this optimal model using classical “frequentist” multiple regression techniques would require consideration of all 2p possible subsets. We apply the work of George and McCulloch [J Am Stat Assoc 88:881–9, 1993], who have developed a Bayesian approach to optimal subset selection regression, to a modification of the Baseman ‐ Elston linkage statistic [Elston et al., Genet Epidemiol 19:1–17, 2000] in the analysis of the two quantitative traits simulated in Problem 2. The results obtained using this Bayesian method are compared to those obtained using (1) multiple regression and (2) the modified Haseman‐Elston method (single variable regression analysis). We note upon doing this that for both Q1 and Q2, (1) we have extremely low power with all methods using the samples as given and have to resort to combining several simulated samples in order to have power of 50%, (2) the multivariate analysis does not have greater power than the univariate analysis for these traits, and (3) the Bayesian approach identifies the correct model more frequently than the frequentist approaches but shows no clear advantage over the multivariate approach. © 2001 Wiley‐Liss, Inc.  相似文献   

20.
Recent research has revealed loci that display variance heterogeneity through various means such as biological disruption, linkage disequilibrium (LD), gene‐by‐gene (G × G), or gene‐by‐environment interaction. We propose a versatile likelihood ratio test that allows joint testing for mean and variance heterogeneity (LRTMV) or either effect alone (LRTM or LRTV) in the presence of covariates. Using extensive simulations for our method and others, we found that all parametric tests were sensitive to nonnormality regardless of any trait transformations. Coupling our test with the parametric bootstrap solves this issue. Using simulations and empirical data from a known mean‐only functional variant, we demonstrate how LD can produce variance‐heterogeneity loci (vQTL) in a predictable fashion based on differential allele frequencies, high D′, and relatively low r2 values. We propose that a joint test for mean and variance heterogeneity is more powerful than a variance‐only test for detecting vQTL. This takes advantage of loci that also have mean effects without sacrificing much power to detect variance only effects. We discuss using vQTL as an approach to detect G × G interactions and also how vQTL are related to relationship loci, and how both can create prior hypothesis for each other and reveal the relationships between traits and possibly between components of a composite trait.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号