首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Population substructure can lead to confounding in tests for genetic association, and failure to adjust properly can result in spurious findings. Here we address this issue of confounding by considering the impact of global ancestry (average ancestry across the genome) and local ancestry (ancestry at a specific chromosomal location) on regression parameters and relative power in ancestry‐adjusted and ‐unadjusted models. We examine theoretical expectations under different scenarios for population substructure; applying different regression models, verifying and generalizing using simulations, and exploring the findings in real‐world admixed populations. We show that admixture does not lead to confounding when the trait locus is tested directly in a single admixed population. However, if there is more complex population structure or a marker locus in linkage disequilibrium (LD) with the trait locus is tested, both global and local ancestry can be confounders. Additionally, we show the genotype parameters of adjusted and unadjusted models all provide tests for LD between the marker and trait locus, but in different contexts. The local ancestry adjusted model tests for LD in the ancestral populations, while tests using the unadjusted and the global ancestry adjusted models depend on LD in the admixed population(s), which may be enriched due to different ancestral allele frequencies. Practically, this implies that global‐ancestry adjustment should be used for screening, but local‐ancestry adjustment may better inform fine mapping and provide better effect estimates at trait loci.  相似文献   

2.
In case‐control single nucleotide polymorphism (SNP) data, the allele frequency, Hardy Weinberg Disequilibrium, and linkage disequilibrium (LD) contrast tests are three distinct sources of information about genetic association. While all three tests are typically developed in a retrospective context, we show that prospective logistic regression models may be developed that correspond conceptually to the retrospective tests. This approach provides a flexible framework for conducting a systematic series of association analyses using unphased genotype data and any number of covariates. For a single stage study, two single‐marker tests and four two‐marker tests are discussed. The true association models are derived and they allow us to understand why a model with only a linear term will generally fit well for a SNP in weak LD with a causal SNP, whatever the disease model, but not for a SNP in high LD with a non‐additive disease SNP. We investigate the power of the association tests using real LD parameters from chromosome 11 in the HapMap CEU population data. Among the single‐marker tests, the allelic test has on average the most power in the case of an additive disease, but for dominant, recessive, and heterozygote disadvantage diseases, the genotypic test has the most power. Among the four two‐marker tests, the Allelic‐LD contrast test, which incorporates linear terms for two markers and their interaction term, provides the most reliable power overall for the cases studied. Therefore, our result supports incorporating an interaction term as well as linear terms in multi‐marker tests. Genet. Epidemiol. 34:67–77, 2010. © 2009 Wiley‐Liss, Inc.  相似文献   

3.
Polygenic risk scores (PRSs) are a method to summarize the additive trait variance captured by a set of SNPs, and can increase the power of set‐based analyses by leveraging public genome‐wide association study (GWAS) datasets. PRS aims to assess the genetic liability to some phenotype on the basis of polygenic risk for the same or different phenotype estimated from independent data. We propose the application of PRSs as a set‐based method with an additional component of adjustment for linkage disequilibrium (LD), with potential extension of the PRS approach to analyze biologically meaningful SNP sets. We call this method POLARIS: POlygenic Ld‐Adjusted RIsk Score. POLARIS identifies the LD structure of SNPs using spectral decomposition of the SNP correlation matrix and replaces the individuals' SNP allele counts with LD‐adjusted dosages. Using a raw genotype dataset together with SNP effect sizes from a second independent dataset, POLARIS can be used for set‐based analysis. MAGMA is an alternative set‐based approach employing principal component analysis to account for LD between markers in a raw genotype dataset. We used simulations, both with simple constructed and real LD‐structure, to compare the power of these methods. POLARIS shows more power than MAGMA applied to the raw genotype dataset only, but less or comparable power to combined analysis of both datasets. POLARIS has the advantages that it produces a risk score per person per set using all available SNPs, and aims to increase power by leveraging the effect sizes from the discovery set in a self‐contained test of association in the test dataset.  相似文献   

4.
Meta‐analysis of genome‐wide association studies (GWAS) has achieved great success in detecting loci underlying human diseases. Incorporating GWAS results from diverse ethnic populations for meta‐analysis, however, remains challenging because of the possible heterogeneity across studies. Conventional fixed‐effects (FE) or random‐effects (RE) methods may not be most suitable to aggregate multiethnic GWAS results because of violation of the homogeneous effect assumption across studies (FE) or low power to detect signals (RE). Three recently proposed methods, modified RE (RE‐HE) model, binary‐effects (BE) model and a Bayesian approach (Meta‐analysis of Transethnic Association [MANTRA]), show increased power over FE and RE methods while incorporating heterogeneity of effects when meta‐analyzing trans‐ethnic GWAS results. We propose a two‐stage approach to account for heterogeneity in trans‐ethnic meta‐analysis in which we clustered studies with cohort‐specific ancestry information prior to meta‐analysis. We compare this to a no‐prior‐clustering (crude) approach, evaluating type I error and power of these two strategies, in an extensive simulation study to investigate whether the two‐stage approach offers any improvements over the crude approach. We find that the two‐stage approach and the crude approach for all five methods (FE, RE, RE‐HE, BE, MANTRA) provide well‐controlled type I error. However, the two‐stage approach shows increased power for BE and RE‐HE, and similar power for MANTRA and FE compared to their corresponding crude approach, especially when there is heterogeneity across the multiethnic GWAS results. These results suggest that prior clustering in the two‐stage approach can be an effective and efficient intermediate step in meta‐analysis to account for the multiethnic heterogeneity.  相似文献   

5.
Genotype-based association test for general pedigrees: the genotype-PDT   总被引:11,自引:0,他引:11  
Many family-based tests of linkage disequilibrium (LD) are based on counts of alleles rather than genotypes. However, allele-based tests may not detect interactions among alleles at a single locus that are apparent when examining associations with genotypes. Family-based tests of LD based on genotypes have been developed, but they are typically valid as tests of association only in families with a single affected individual. To take advantage of families with multiple affected individuals, we propose the genotype-pedigree disequilibrium test (geno-PDT) to test for LD between marker locus genotypes and disease. Unlike previous tests for genotypic association, the geno-PDT is valid in general pedigrees. Simulations to compare the power of the allele-based PDT and geno-PDT reveal that under an additive model, the allele-based PDT is more powerful, but that the geno-PDT can have greater power when the genetic model is recessive or dominant. Perhaps the most important property of the geno-PDT is the ability to test for association with particular genotypes, which can reveal underlying patterns of association at the genotypic level. These genotype-specific tests can be used to suggest possible underlying genetic models that are consistent with the pattern of genotypic association. This is illustrated through an application to a candidate gene analysis of the MLLT3 gene in families with Alzheimer disease. The geno-PDT approach for testing genotypes in general family data provides a useful tool for identifying genes in complex disease, and partitioning individual genotype contributions will help to dissect the influence of genotype on risk.  相似文献   

6.
Case-control study has been and continues to be one of the most popular designs in epidemiology. More recently, this design has been adopted to test for candidate genes when searching for disease genetic etiology. In this report, we present a multipoint linkage disequilibrium (LD) mapping approach with the focus on estimating the location of the target trait locus. It builds upon a representation, which shows that the difference between a case and a control in probabilities of carrying the target allele of a marker is proportional to that of the trait locus and that the proportionality factor is simply a measure of LD between the trait locus and the marker. Our method has the desired properties that (1) there is no need to specify phases of genotypic data with multiple markers, (2) it provides an estimate of location of the disease locus along with sampling uncertainty to help investigators to narrow chromosomal regions, and (3) a single test statistic is provided to test for LD in the framed region rather than testing the hypothesis one marker at a time. Our simulation work suggests that the proposed method performs well in terms of bias and coverage probability. Extension of the proposed method to account for confounding and genetic heterogeneity is discussed. We apply the proposed method to a published case-control data set for cystic fibrosis.  相似文献   

7.
Admixture mapping is potentially a powerful method for mapping genes for complex human diseases, when the disease frequency due to a particular disease-susceptible gene is different between founding populations of different ethnicity. The method tests for association of the allele ancestry with the disease. Since the markers used to define ancestral populations are not fully informative for the ancestry status, direct test of such association is not possible. In this report, we develop a unified hidden Markov model (HMM) framework for estimating the unobserved ancestry haplotypes across a chromosomal region based on marker haplotype or genotype data. The HMM efficiently utilizes all the marker data to infer the latent ancestry states at the putative disease locus. In this HMM modelling framework, we develop a likelihood test for association of allele ancestry and the disease risk based on case-control data. Existence of such association may imply linkage between the candidate locus and the disease locus. We evaluate by simulations how several factors affect the power of admixture mapping, including sample size, ethnicity relative risk, marker density, and the different admixture dynamics. Our simulation results indicate correct type 1 error rates of the proposed likelihood ratio tests and great impact of marker density on the power. The simulation results also indicate that the methods work well for the admixed populations derived from both hybrid-isolation and continuous gene-flowing models. Finally, we observed that the genotype-based HMM performs very similarly in power as the haplotype-based HMM when the haplotypes are known and the set of markers is highly informative.  相似文献   

8.
9.
Genome‐wide association studies (GWAS) have led to the discovery of over 200 single nucleotide polymorphisms (SNPs) associated with type 2 diabetes mellitus (T2DM). Additionally, East Asians develop T2DM at a higher rate, younger age, and lower body mass index than their European ancestry counterparts. The reason behind this occurrence remains elusive. With comprehensive searches through the National Human Genome Research Institute (NHGRI) GWAS catalog literature, we compiled a database of 2,800 ancestry‐specific SNPs associated with T2DM and 70 other related traits. Manual data extraction was necessary because the GWAS catalog reports statistics such as odds ratio and P‐value, but does not consistently include ancestry information. Currently, many statistics are derived by combining initial and replication samples from study populations of mixed ancestry. Analysis of all‐inclusive data can be misleading, as not all SNPs are transferable across diverse populations. We used ancestry data to construct ancestry‐specific human phenotype networks (HPN) centered on T2DM. Quantitative and visual analysis of network models reveal the genetic disparities between ancestry groups. Of the 27 phenotypes in the East Asian HPN, six phenotypes were unique to the network, revealing the underlying ancestry‐specific nature of some SNPs associated with T2DM. We studied the relationship between T2DM and five phenotypes unique to the East Asian HPN to generate new interaction hypotheses in a clinical context. The genetic differences found in our ancestry‐specific HPNs suggest different pathways are involved in the pathogenesis of T2DM among different populations. Our study underlines the importance of ancestry in the development of T2DM and its implications in pharmocogenetics and personalized medicine.  相似文献   

10.
The family-based admixture mapping test (AMT) identifies disease-related genes using family data from admixed individuals with the disease of interest (cases). The cases' genotypes at a set of markers are used to infer their DNA ancestry as it varies in blocks along the chromosomes. The test compares the cases' inferred ancestries to those expected from their family histories. Deviation between observed and expected ancestries in a region suggests the presence of a disease gene. We use a likelihood-based development of the AMT to compare it with the transmission disequilibrium test (TDT) as applied to admixed populations. The two tests have a common framework but differ significantly when the disease locus is untyped. The TDT infers disease-locus genotypes using the markers with which it is in linkage disequilibrium (LD). In contrast, the AMT infers disease locus ancestries using those of its linked markers. Thus, TDT power depends on LD between disease and marker loci, while AMT power depends on the lengths of the ancestry blocks containing the disease locus. We compare the power of the two tests when applied to cases with descent from two ancestral populations. The AMT outperforms the TDT when case marker ancestries are correctly specified and LD between disease and marker loci is less than one-third its maximal value (Delta' < 1/3). However, the TDT performs better in the presence of uncertain marker ancestries, even for weak LD between disease and marker loci (Delta' = 0.1). These findings have implications for the design of studies using admixed populations.  相似文献   

11.
Parent‐of‐origin effects have been pointed out to be one plausible source of the heritability that was unexplained by genome‐wide association studies. Here, we consider a case‐control mother‐child pair design for studying parent‐of‐origin effects of offspring genes on neonatal/early‐life disorders or pregnancy‐related conditions. In contrast to the standard case‐control design, the case‐control mother‐child pair design contains valuable parental information and therefore permits powerful assessment of parent‐of‐origin effects. Suppose the region under study is in Hardy‐Weinberg equilibrium, inheritance is Mendelian at the diallelic locus under study, there is random mating in the source population, and the SNP under study is not related to risk for the phenotype under study because of linkage disequilibrium (LD) with other SNPs. Using a maximum likelihood method that simultaneously assesses likely parental sources and estimates effect sizes of the two offspring genotypes, we investigate the extent of power increase for testing parent‐of‐origin effects through the incorporation of genotype data for adjacent markers that are in LD with the test locus. Our method does not need to assume the outcome is rare because it exploits supplementary information on phenotype prevalence. Analysis with simulated SNP data indicates that incorporating genotype data for adjacent markers greatly help recover the parent‐of‐origin information. This recovery can sometimes substantially improve statistical power for detecting parent‐of‐origin effects. We demonstrate our method by examining parent‐of‐origin effects of the gene PPARGC1A on low birth weight using data from 636 mother‐child pairs in the Jerusalem Perinatal Study.  相似文献   

12.
Genome‐wide association studies (GWAS) have been successful in finding numerous new risk variants for complex diseases, but the results almost exclusively rely on single‐marker scans. Methods that can analyze joint effects of many variants in GWAS data are still being developed and trialed. To evaluate the performance of such methods it is essential to have a GWAS data simulator that can rapidly simulate a large number of samples, and capture key features of real GWAS data such as linkage disequilibrium (LD) among single‐nucleotide polymorphisms (SNPs) and joint effects of multiple loci (multilocus epistasis). In the current study, we combine techniques for specifying high‐order epistasis among risk SNPs with an existing program GWAsimulator [Li and Li, 2008] to achieve rapid whole‐genome simulation with accurate modeling of complex interactions. We considered various approaches to specifying interaction models including the following: departure from product of marginal effects for pairwise interactions, product terms in logistic regression models for low‐order interactions, and penetrance tables conforming to marginal effect constraints for high‐order interactions or prescribing known biological interactions. Methods for conversion among different model specifications are developed using penetrance table as the fundamental characterization of disease models. The new program, called simGWA, is capable of efficiently generating large samples of GWAS data with high precision. We show that data simulated by simGWA are faithful to template LD structures, and conform to prespecified diseases models with (or without) interactions.  相似文献   

13.
Genome‐wide association studies (GWAS) offer an excellent opportunity to identify the genetic variants underlying complex human diseases. Successful utilization of this approach requires a large sample size to identify single nucleotide polymorphisms (SNPs) with subtle effects. Meta‐analysis is a cost‐efficient means to achieve large sample size by combining data from multiple independent GWAS; however, results from studies performed on different populations can be variable due to various reasons, including varied linkage equilibrium structures as well as gene‐gene and gene‐environment interactions. Nevertheless, one should expect effects of the SNP are more similar between similar populations than those between populations with quite different genetic and environmental backgrounds. Prior information on populations of GWAS is often not considered in current meta‐analysis methods, rendering such analyses less optimal for the detecting association. This article describes a test that improves meta‐analysis to incorporate variable heterogeneity among populations. The proposed method is remarkably simple in computation and hence can be performed in a rapid fashion in the setting of GWAS. Simulation results demonstrate the validity and higher power of the proposed method over conventional methods in the presence of heterogeneity. As a demonstration, we applied the test to real GWAS data to identify SNPs associated with circulating insulin‐like growth factor I concentrations.  相似文献   

14.
We present a Bayesian semiparametric model for the meta-analysis of candidate gene studies with a binary outcome. Such studies often report results from association tests for different, possibly study-specific and non-overlapping genetic markers in the same genetic region. Meta-analyses of the results at each marker in isolation are seldom appropriate as they ignore the correlation that may exist between markers due to linkage disequilibrium (LD) and cannot assess the relative importance of variants at each marker. Also such marker-wise meta-analyses are restricted to only those studies that have typed the marker in question, with a potential loss of power. A better strategy is one which incorporates information about the LD between markers so that any combined estimate of the effect of each variant is corrected for the effect of other variants, as in multiple regression. Here we develop a Bayesian semiparametric model which models the observed genotype group frequencies conditional to the case/control status and uses pairwise LD measurements between markers as prior information to make posterior inference on adjusted effects. The approach allows borrowing of strength across studies and across markers. The analysis is based on a mixture of Dirichlet processes model as the underlying semiparametric model. Full posterior inference is performed through Markov chain Monte Carlo algorithms. The approach is demonstrated on simulated and real data.  相似文献   

15.
A genome‐wide association study (GWAS) correlates marker and trait variation in a study sample. Each subject is genotyped at a multitude of SNPs (single nucleotide polymorphisms) spanning the genome. Here, we assume that subjects are randomly collected unrelateds and that trait values are normally distributed or can be transformed to normality. Over the past decade, geneticists have been remarkably successful in applying GWAS analysis to hundreds of traits. The massive amount of data produced in these studies present unique computational challenges. Penalized regression with the ?1 penalty (LASSO) or minimax concave penalty (MCP) penalties is capable of selecting a handful of associated SNPs from millions of potential SNPs. Unfortunately, model selection can be corrupted by false positives and false negatives, obscuring the genetic underpinning of a trait. Here, we compare LASSO and MCP penalized regression to iterative hard thresholding (IHT). On GWAS regression data, IHT is better at model selection and comparable in speed to both methods of penalized regression. This conclusion holds for both simulated and real GWAS data. IHT fosters parallelization and scales well in problems with large numbers of causal markers. Our parallel implementation of IHT accommodates SNP genotype compression and exploits multiple CPU cores and graphics processing units (GPUs). This allows statistical geneticists to leverage commodity desktop computers in GWAS analysis and to avoid supercomputing. Availability : Source code is freely available at https://github.com/klkeys/IHT.jl .  相似文献   

16.
Current genome-wide association studies (GWAS) often involve populations that have experienced recent genetic admixture. Genotype data generated from these studies can be used to test for association directly, as in a non-admixed population. As an alternative, these data can be used to infer chromosomal ancestry, and thus allow for admixture mapping. We quantify the contribution of allele-based and ancestry-based association testing under a family-design, and demonstrate that the two tests can provide non-redundant information. We propose a joint testing procedure, which efficiently integrates the two sources information. The efficiencies of the allele, ancestry and combined tests are compared in the context of a GWAS. We discuss the impact of population history and provide guidelines for future design and analysis of GWAS in admixed populations.  相似文献   

17.
Single nucleotide polymorphisms (SNPs) are becoming widely used as genotypic markers in genetic association studies of common, complex human diseases. For such association screens, a crucial part of study design is determining what SNPs to prioritize for genotyping. We present a novel power-based algorithm to select a subset of tag SNPs for genotyping from a map of available SNPs. Blocks of markers in strong linkage disequilibrium (LD) are identified, and SNPs are selected to represent each block such that power to detect disease association with an underlying disease allele in LD with block members is preserved; all markers outside of blocks are also included in the tagging subset. A key, novel element of this method is that it incorporates information about the phase of LD observed among marker pairs to retain markers likely to be in coupling phase with an underlying disease locus, thus increasing power compared to a phase-blind approach. Power calculations illustrate important issues regarding LD phase and make clear the advantages of our approach to SNP selection. We apply our algorithm to genotype data from the International HapMap Consortium and demonstrate that considerable reduction in SNP genotyping may be attained while retaining much of the available power for a disease association screen. We also demonstrate that these tag SNPs effectively represent underlying variants not included in the LD analysis and SNP selection, by using leave-one-out tests to show that most (approximately 90%) of the "untyped" variants lying in blocks are in coupling-phase LD with a tag SNP. Additional performance tests using the HapMap ENCyclopedia of DNA Elements (ENCODE) regions show that the method compares well with the popular r2 bin tagging method. This work is a concrete example of how empirical LD phase may be used to benefit study design.  相似文献   

18.
Family‐based genetic association studies of related individuals provide opportunities to detect genetic variants that complement studies of unrelated individuals. Most statistical methods for family association studies for common variants are single marker based, which test one SNP a time. In this paper, we consider testing the effect of an SNP set, e.g., SNPs in a gene, in family studies, for both continuous and discrete traits. Specifically, we propose a generalized estimating equations (GEEs) based kernel association test, a variance component based testing method, to test for the association between a phenotype and multiple variants in an SNP set jointly using family samples. The proposed approach allows for both continuous and discrete traits, where the correlation among family members is taken into account through the use of an empirical covariance estimator. We derive the theoretical distribution of the proposed statistic under the null and develop analytical methods to calculate the P‐values. We also propose an efficient resampling method for correcting for small sample size bias in family studies. The proposed method allows for easily incorporating covariates and SNP‐SNP interactions. Simulation studies show that the proposed method properly controls for type I error rates under both random and ascertained sampling schemes in family studies. We demonstrate through simulation studies that our approach has superior performance for association mapping compared to the single marker based minimum P‐value GEE test for an SNP‐set effect over a range of scenarios. We illustrate the application of the proposed method using data from the Cleveland Family GWAS Study.  相似文献   

19.
The National Human Genome Research Institute's catalog of published genome‐wide association studies (GWAS) lists over 10,000 genetic variants collectively associated with over 800 human diseases or traits. Most of these GWAS have been conducted in European‐ancestry populations. Findings gleaned from these studies have led to identification of disease‐associated loci and biologic pathways involved in disease etiology. In multiple instances, these genomic findings have led to the development of novel medical therapies or evidence for prescribing a given drug as the appropriate treatment for a given individual beyond phenotypic appearances or socially defined constructs of race or ethnicity. Such findings have implications for populations throughout the globe and GWAS are increasingly being conducted in more diverse populations. A major challenge for investigators seeking to follow up genomic findings between diverse populations is discordant patterns of linkage disequilibrium (LD). We provide an overview of common measures of LD and opportunities for their use in novel methods designed to address challenges associated with following up GWAS conducted in European‐ancestry populations in African‐ancestry populations or, more generally, between populations with discordant LD patterns. We detail the strengths and weaknesses associated with different approaches. We also describe application of these strategies in follow‐up studies of populations with concordant LD patterns (replication) or discordant LD patterns (transferability) as well as fine‐mapping studies. We review application of these methods to a variety of traits and diseases.  相似文献   

20.
Traditional genome‐wide association studies (GWASs) usually focus on single‐marker analysis, which only accesses marginal effects. Pathway analysis, on the other hand, considers biological pathway gene marker hierarchical structure and therefore provides additional insights into the genetic architecture underlining complex diseases. Recently, a number of methods for pathway analysis have been proposed to assess the significance of a biological pathway from a collection of single‐nucleotide polymorphisms. In this study, we propose a novel approach for pathway analysis that assesses the effects of genes using the sequence kernel association test and the effects of pathways using an extended adaptive rank truncated product statistic. It has been increasingly recognized that complex diseases are caused by both common and rare variants. We propose a new weighting scheme for genetic variants across the whole allelic frequency spectrum to be analyzed together without any form of frequency cutoff for defining rare variants. The proposed approach is flexible. It is applicable to both binary and continuous traits, and incorporating covariates is easy. Furthermore, it can be readily applied to GWAS data, exome‐sequencing data, and deep resequencing data. We evaluate the new approach on data simulated under comprehensive scenarios and show that it has the highest power in most of the scenarios while maintaining the correct type I error rate. We also apply our proposed methodology to data from a study of the association between bipolar disorder and candidate pathways from Wellcome Trust Case Control Consortium (WTCCC) to show its utility.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号