首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Population stratification has long been recognized as an issue in genetic association studies because unrecognized population stratification can lead to both false‐positive and false‐negative findings and can obscure true association signals if not appropriately corrected. This issue can be even worse in rare variant association analyses because rare variants often demonstrate stronger and potentially different patterns of stratification than common variants. To correct for population stratification in genetic association studies, we proposed a novel method to Test the effect of an Optimally Weighted combination of variants in Admixed populations (TOWA) in which the analytically derived optimal weights can be calculated from existing phenotype and genotype data. TOWA up weights rare variants and those variants that have strong associations with the phenotype. Additionally, it can adjust for the direction of the association, and allows for local ancestry difference among study subjects. Extensive simulations show that the type I error rate of TOWA is under control in the presence of population stratification and it is more powerful than existing methods. We have also applied TOWA to a real sequencing data. Our simulation studies as well as real data analysis results indicate that TOWA is a useful tool for rare variant association analyses in admixed populations.  相似文献   

2.
Population substructure can lead to confounding in tests for genetic association, and failure to adjust properly can result in spurious findings. Here we address this issue of confounding by considering the impact of global ancestry (average ancestry across the genome) and local ancestry (ancestry at a specific chromosomal location) on regression parameters and relative power in ancestry‐adjusted and ‐unadjusted models. We examine theoretical expectations under different scenarios for population substructure; applying different regression models, verifying and generalizing using simulations, and exploring the findings in real‐world admixed populations. We show that admixture does not lead to confounding when the trait locus is tested directly in a single admixed population. However, if there is more complex population structure or a marker locus in linkage disequilibrium (LD) with the trait locus is tested, both global and local ancestry can be confounders. Additionally, we show the genotype parameters of adjusted and unadjusted models all provide tests for LD between the marker and trait locus, but in different contexts. The local ancestry adjusted model tests for LD in the ancestral populations, while tests using the unadjusted and the global ancestry adjusted models depend on LD in the admixed population(s), which may be enriched due to different ancestral allele frequencies. Practically, this implies that global‐ancestry adjustment should be used for screening, but local‐ancestry adjustment may better inform fine mapping and provide better effect estimates at trait loci.  相似文献   

3.
Association analysis using admixed populations imposes challenges and opportunities for disease mapping. By developing some explicit results for the variance of an allele of interest conditional on either local or global ancestry and by simulation of recently admixed genomes we evaluate power and false‐positive rates under a variety of scenarios concerning linkage disequilibrium (LD) and the presence of unmeasured variants. Pairwise LD patterns were compared between admixed and nonadmixed populations using the HapMap phase 3 data. Based on the above, we showed that as follows:
    相似文献   

4.
During the last decade genome-wide association studies have proven to be a powerful approach to identifying disease-causing variants. However, for admixed populations, most current methods for association testing are based on the assumption that the effect of a genetic variant is the same regardless of its ancestry. This is a reasonable assumption for a causal variant but may not hold for the genetic variants that are tested in genome-wide association studies, which are usually not causal. The effects of noncausal genetic variants depend on how strongly their presence correlate with the presence of the causal variant, which may vary between ancestral populations because of different linkage disequilibrium patterns and allele frequencies. Motivated by this, we here introduce a new statistical method for association testing in recently admixed populations, where the effect size is allowed to depend on the ancestry of a given allele. Our method does not rely on accurate inference of local ancestry, yet using simulations we show that in some scenarios it gives a substantial increase in statistical power to detect associations. In addition, the method allows for testing for difference in effect size between ancestral populations, which can be used to help determine if a given genetic variant is causal. We demonstrate the usefulness of the method on data from the Greenlandic population.  相似文献   

5.
Current genome-wide association studies (GWAS) often involve populations that have experienced recent genetic admixture. Genotype data generated from these studies can be used to test for association directly, as in a non-admixed population. As an alternative, these data can be used to infer chromosomal ancestry, and thus allow for admixture mapping. We quantify the contribution of allele-based and ancestry-based association testing under a family-design, and demonstrate that the two tests can provide non-redundant information. We propose a joint testing procedure, which efficiently integrates the two sources information. The efficiencies of the allele, ancestry and combined tests are compared in the context of a GWAS. We discuss the impact of population history and provide guidelines for future design and analysis of GWAS in admixed populations.  相似文献   

6.
Admixture mapping is potentially a powerful method for mapping genes for complex human diseases, when the disease frequency due to a particular disease-susceptible gene is different between founding populations of different ethnicity. The method tests for association of the allele ancestry with the disease. Since the markers used to define ancestral populations are not fully informative for the ancestry status, direct test of such association is not possible. In this report, we develop a unified hidden Markov model (HMM) framework for estimating the unobserved ancestry haplotypes across a chromosomal region based on marker haplotype or genotype data. The HMM efficiently utilizes all the marker data to infer the latent ancestry states at the putative disease locus. In this HMM modelling framework, we develop a likelihood test for association of allele ancestry and the disease risk based on case-control data. Existence of such association may imply linkage between the candidate locus and the disease locus. We evaluate by simulations how several factors affect the power of admixture mapping, including sample size, ethnicity relative risk, marker density, and the different admixture dynamics. Our simulation results indicate correct type 1 error rates of the proposed likelihood ratio tests and great impact of marker density on the power. The simulation results also indicate that the methods work well for the admixed populations derived from both hybrid-isolation and continuous gene-flowing models. Finally, we observed that the genotype-based HMM performs very similarly in power as the haplotype-based HMM when the haplotypes are known and the set of markers is highly informative.  相似文献   

7.
Family‐based designs enriched with affected subjects and disease associated variants can increase statistical power for identifying functional rare variants. However, few rare variant analysis approaches are available for time‐to‐event traits in family designs and none of them applicable to the X chromosome. We developed novel pedigree‐based burden and kernel association tests for time‐to‐event outcomes with right censoring for pedigree data, referred to FamRATS (family‐based rare variant association tests for survival traits). Cox proportional hazard models were employed to relate a time‐to‐event trait with rare variants with flexibility to encompass all ranges and collapsing of multiple variants. In addition, the robustness of violating proportional hazard assumptions was investigated for the proposed and four current existing tests, including the conventional population‐based Cox proportional model and the burden, kernel, and sum of squares statistic (SSQ) tests for family data. The proposed tests can be applied to large‐scale whole‐genome sequencing data. They are appropriate for the practical use under a wide range of misspecified Cox models, as well as for population‐based, pedigree‐based, or hybrid designs. In our extensive simulation study and data example, we showed that the proposed kernel test is the most powerful and robust choice among the proposed burden test and the existing four rare variant survival association tests. When applied to the Diabetes Heart Study, the proposed tests found exome variants of the JAK1 gene on chromosome 1 showed the most significant association with age at onset of type 2 diabetes from the exome‐wide analysis.  相似文献   

8.
With the development of sequencing technologies, the direct testing of rare variant associations has become possible. Many statistical methods for detecting associations between rare variants and complex diseases have recently been developed, most of which are population‐based methods for unrelated individuals. A limitation of population‐based methods is that spurious associations can occur when there is a population structure. For rare variants, this problem can be more serious, because the spectrum of rare variation can be very different in diverse populations, as well as the current nonexistence of methods to control for population stratification in population‐based rare variant associations. A solution to the problem of population stratification is to use family‐based association tests, which use family members to control for population stratification. In this article, we propose a novel test for Testing the Optimally Weighted combination of variants based on data of Parents and Affected Children (TOW‐PAC). TOW‐PAC is a family‐based association test that tests the combined effect of rare and common variants in a genomic region, and is robust to the directions of the effects of causal variants. Simulation studies confirm that, for rare variant associations, family‐based association tests are robust to population stratification although population‐based association tests can be seriously confounded by population stratification. The results of power comparisons show that the power of TOW‐PAC increases with an increase of the number of affected children in each family and TOW‐PAC based on multiple affected children per family is more powerful than TOW based on unrelated individuals.  相似文献   

9.
Recent studies showed that population substructure (PS) can have more complex impact on rare variant tests and that similarity‐based collapsing tests (e.g., SKAT) may suffer more severely by PS than burden‐based tests. In this work, we evaluate the performance of SKAT coupling with principal components (PC) or variance components (VC) based PS correction methods. We consider confounding effects caused by PS including stratified populations, admixed populations, and spatially distributed nongenetic risk; we investigate which types of variants (e.g., common, less frequent, rare, or all variants) should be used to effectively control for confounding effects. We found that (i) PC‐based methods can account for confounding effects in most scenarios except for admixture, although the number of sufficient PCs depends on the PS complexity and the type of variants used. (ii) PCs based on all variants (i.e., common + less frequent + rare) tend to require equal or fewer sufficient PCs and often achieve higher power than PCs based on other variant types. (iii) VC‐based methods can effectively adjust for confounding in all scenarios (even for admixture), though the type of variants should be used to construct VC may vary. (iv) VC based on all variants works consistently in all scenarios, though its power may be sometimes lower than VC based on other variant types. Given that the best‐performed method and which variants to use depend on the underlying unknown confounding mechanisms, a robust strategy is to perform SKAT analyses using VC‐based methods based on all variants.  相似文献   

10.
Genetic association studies in admixed populations may be biased if individual ancestry varies within the population and the phenotype of interest is associated with ancestry. However, recently admixed populations also offer potential benefits in association studies since markers informative for ancestry may be in linkage disequilibrium across large distances. In particular, the enhanced LD in admixed populations may be used to identify alleles that underlie a genetically determined difference in a phenotype between two ancestral populations. Asthma is known to have different prevalence and severity among ancestrally distinct populations. We investigated several asthma-related phenotypes in two ancestrally admixed populations: Mexican Americans and Puerto Ricans. We used ancestry informative markers to estimate the individual ancestry of 181 Mexican American asthmatics and 181 Puerto Rican asthmatics and tested whether individual ancestry is associated with any of these phenotypes independently of known environmental factors. We found an association between higher European ancestry and more severe asthma as measured by both forced expiratory volume at 1 second (r=-0.21, p=0.005) and by a clinical assessment of severity among Mexican Americans (OR: 1.55; 95% CI 1.25 to 1.93). We found no significant associations between ancestry and severity or drug responsiveness among Puerto Ricans. These results suggest that asthma severity may be influenced by genetic factors differentiating Europeans and Native Americans in Mexican Americans, although differing results for Puerto Ricans require further investigation.  相似文献   

11.
Imputation in admixed populations is an important problem but challenging due to the complex linkage disequilibrium (LD) pattern. The emergence of large reference panels such as that from the 1,000 Genomes Project enables more accurate imputation in general, and in particular for admixed populations and for uncommon variants. To efficiently benefit from these large reference panels, one key issue to consider in modern genotype imputation framework is the selection of effective reference panels. In this work, we consider a number of methods for effective reference panel construction inside a hidden Markov model and specific to each target individual. These methods fall into two categories: identity‐by‐state (IBS) based and ancestry‐weighted approach. We evaluated the performance on individuals from recently admixed populations. Our target samples include 8,421 African Americans and 3,587 Hispanic Americans from the Women' Health Initiative, which allow assessment of imputation quality for uncommon variants. Our experiments include both large and small reference panels; large, medium, and small target samples; and in genome regions of varying levels of LD. We also include BEAGLE and IMPUTE2 for comparison. Experiment results with large reference panel suggest that our novel piecewise IBS method yields consistently higher imputation quality than other methods/software. The advantage is particularly noteworthy among uncommon variants where we observe up to 5.1% information gain with the difference being highly significant (Wilcoxon signed rank test P‐value < 0.0001). Our work is the first that considers various sensible approaches for imputation in admixed populations and presents a comprehensive comparison.  相似文献   

12.
Genetic association studies in admixed populations allow us to gain deeper understanding of the genetic architecture of human diseases and traits. However, population stratification, complicated linkage disequilibrium (LD) patterns, and the complex interplay of allelic and ancestry effects on phenotypic traits pose challenges in such analyses. These issues may lead to detecting spurious associations and/or result in reduced statistical power. Fortunately, if handled appropriately, these same challenges provide unique opportunities for gene mapping. To address these challenges and to take these opportunities, we propose a robust and powerful two‐step testing procedure Local Ancestry Adjusted Allelic (LAAA) association. In the first step, LAAA robustly captures associations due to allelic effect, ancestry effect, and interaction effect, allowing detection of effect heterogeneity across ancestral populations. In the second step, LAAA identifies the source of association, namely allelic, ancestry, or the combination. By jointly modeling allele, local ancestry, and ancestry‐specific allelic effects, LAAA is highly powerful in capturing the presence of interaction between ancestry and allele effect. We evaluated the validity and statistical power of LAAA through simulations over a broad spectrum of scenarios. We further illustrated its usefulness by application to the Candidate Gene Association Resource (CARe) African American participants for association with hemoglobin levels. We were able to replicate independent groups’ previously identified loci that would have been missed in CARe without joint testing. Moreover, the loci, for which LAAA detected potential effect heterogeneity, were replicated among African Americans from the Women's Health Initiative study. LAAA is freely available at https://yunliweb.its.unc.edu/LAAA .  相似文献   

13.
The role played by epistasis between alleles at unlinked loci in shaping population fitness has been debated for many years and the existing evidence has been mainly accumulated from model organisms. In model organisms, fitness epistasis can be systematically inferred by detecting nonindependence of genotypic values between loci in a population and confirmed through examining the number of offspring produced in two‐locus genotype groups. No systematic study has been conducted to detect epistasis of fitness in humans owing to experimental constraints. In this study, we developed a novel method to detect fitness epistasis by testing the correlation between local ancestries on different chromosomes in an admixed population. We inferred local ancestry across the genome in 16,252 unrelated African Americans and systematically examined the pairwise correlations between the genomic regions on different chromosomes. Our analysis revealed a pair of genomic regions on chromosomes 4 and 6 that show significant local ancestry correlation (P‐value = 4.01 × 10?8) that can be potentially attributed to fitness epistasis. However, we also observed substantial local ancestry correlation that cannot be explained by systemic ancestry inference bias. To our knowledge, this study is the first to systematically examine evidence of fitness epistasis across the human genome.  相似文献   

14.
There is an emerging interest in sequencing‐based association studies of multiple rare variants. Most association tests suggested in the literature involve collapsing rare variants with or without weighting. Recently, a variance‐component score test [sequence kernel association test (SKAT)] was proposed to address the limitations of collapsing method. Although SKAT was shown to outperform most of the alternative tests, its applications and power might be restricted and influenced by missing genotypes. In this paper, we suggest a new method based on testing whether the fraction of causal variants in a region is zero. The new association test, T REM, is derived from a random‐effects model and allows for missing genotypes, and the choice of weighting function is not required when common and rare variants are analyzed simultaneously. We performed simulations to study the type I error rates and power of four competing tests under various conditions on the sample size, genotype missing rate, variant frequency, effect directionality, and the number of non‐causal rare variant and/or causal common variant. The simulation results showed that T REM was a valid test and less sensitive to the inclusion of non‐causal rare variants and/or low effect common variants or to the presence of missing genotypes. When the effects were more consistent in the same direction, T REM also had better power performance. Finally, an application to the Shanghai Breast Cancer Study showed that rare causal variants at the FGFR2 gene were detected by T REM and SKAT, but T REM produced more consistent results for different sets of rare and common variants. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

15.
Recent meta-analyses of European ancestry subjects show strong evidence for association between smoking quantity and multiple genetic variants on chromosome 15q25. This meta-analysis extends the examination of association between distinct genes in the CHRNA5-CHRNA3-CHRNB4 region and smoking quantity to Asian and African American populations to confirm and refine specific reported associations. Association results for a dichotomized cigarettes smoked per day phenotype in 27 datasets (European ancestry (N = 14,786), Asian (N = 6,889), and African American (N = 10,912) for a total of 32,587 smokers) were meta-analyzed by population and results were compared across all three populations. We demonstrate association between smoking quantity and markers in the chromosome 15q25 region across all three populations, and narrow the region of association. Of the variants tested, only rs16969968 is associated with smoking (P < 0.01) in each of these three populations (odds ratio [OR] = 1.33, 95% CI = 1.25-1.42, P = 1.1 × 10(-17) in meta-analysis across all population samples). Additional variants displayed a consistent signal in both European ancestry and Asian datasets, but not in African Americans. The observed consistent association of rs16969968 with heavy smoking across multiple populations, combined with its known biological significance, suggests rs16969968 is most likely a functional variant that alters risk for heavy smoking. We interpret additional association results that differ across populations as providing evidence for additional functional variants, but we are unable to further localize the source of this association. Using the cross-population study paradigm provides valuable insights to narrow regions of interest and inform future biological experiments.  相似文献   

16.
Confounding due to population substructure is always a concern in genetic association studies. Although methods have been proposed to adjust for population stratification in the context of common variation, it is unclear how well these approaches will work when interrogating rare variation. Family‐based association tests can be constructed that are robust to population stratification. For example, when considering a quantitative trait, a linear model can be used that decomposes genetic effects into between‐ and within‐family components and a test of the within‐family component is robust to population stratification. However, this within‐family test ignores between‐family information potentially leading to a loss of power. Here, we propose a family‐based two‐stage rare‐variant test for quantitative traits. We first construct a weight for each variant within a gene, or other genetic unit, based on score tests of between‐family effect parameters. These weights are then used to combine variants using score tests of within‐family effect parameters. Because the between‐family and within‐family tests are orthogonal under the null hypothesis, this two‐stage approach can increase power while still maintaining validity. Using simulation, we show that this two‐stage test can significantly improve power while correctly maintaining type I error. We further show that the two‐stage approach maintains the robustness to population stratification of the within‐family test and we illustrate this using simulations reflecting samples composed of continental and closely related subpopulations.  相似文献   

17.
We describe a novel method for inferring the local ancestry of admixed individuals from dense genome‐wide single nucleotide polymorphism data. The method, called MULTIMIX, allows multiple source populations, models population linkage disequilibrium between markers and is applicable to datasets in which the sample and source populations are either phased or unphased. The model is based upon a hidden Markov model of switches in ancestry between consecutive windows of loci. We model the observed haplotypes within each window using a multivariate normal distribution with parameters estimated from the ancestral panels. We present three methods to fit the model—Markov chain Monte Carlo sampling, the Expectation Maximization algorithm, and a Classification Expectation Maximization algorithm. The performance of our method on individuals simulated to be admixed with European and West African ancestry shows it to be comparable to HAPMIX, the ancestry calls of the two methods agreeing at 99.26% of loci across the three parameter groups. In addition to it being faster than HAPMIX, it is also found to perform well over a range of extent of admixture in a simulation involving three ancestral populations. In an analysis of real data, we estimate the contribution of European, West African and Native American ancestry to each locus in the Mexican samples of HapMap, giving estimates of ancestral proportions that are consistent with those previously reported.  相似文献   

18.
Recent advancements in next‐generation DNA sequencing technologies have made it plausible to study the association of rare variants with complex diseases. Due to the low frequency, rare variants need to be aggregated in association tests to achieve adequate power with reasonable sample sizes. Hierarchical modeling/kernel machine methods have gained popularity among many available methods for testing a set of rare variants collectively. Here, we propose a new score statistic based on a hierarchical model by additionally modeling the distribution of rare variants under the case‐control study design. Results from extensive simulation studies show that the proposed method strikes a balance between robustness and power and outperforms several popular rare‐variant association tests. We demonstrate the performance of our method using the Dallas Heart Study.  相似文献   

19.
Genome‐wide association studies (GWAS) of common disease have been hugely successful in implicating loci that modify disease risk. The bulk of these associations have proven robust and reproducible, in part due to community adoption of statistical criteria for claiming significant genotype‐phenotype associations. As the cost of sequencing continues to drop, assembling large samples in global populations is becoming increasingly feasible. Sequencing studies interrogate not only common variants, as was true for genotyping‐based GWAS, but variation across the full allele frequency spectrum, yielding many more (independent) statistical tests. We sought to empirically determine genome‐wide significance thresholds for various analysis scenarios. Using whole‐genome sequence data, we simulated sequencing‐based disease studies of varying sample size and ancestry. We determined that future sequencing efforts in >2,000 samples of European, Asian, or admixed ancestry should set genome‐wide significance at approximately P = 5 × 10?9, and studies of African samples should apply a more stringent genome‐wide significance threshold of P = 1 × 10?9. Adoption of a revised multiple test correction will be crucial in avoiding irreproducible claims of association.  相似文献   

20.
By using functional data analysis techniques, we developed generalized functional linear models for testing association between a dichotomous trait and multiple genetic variants in a genetic region while adjusting for covariates. Both fixed and mixed effect models are developed and compared. Extensive simulations show that Rao's efficient score tests of the fixed effect models are very conservative since they generate lower type I errors than nominal levels, and global tests of the mixed effect models generate accurate type I errors. Furthermore, we found that the Rao's efficient score test statistics of the fixed effect models have higher power than the sequence kernel association test (SKAT) and its optimal unified version (SKAT‐O) in most cases when the causal variants are both rare and common. When the causal variants are all rare (i.e., minor allele frequencies less than 0.03), the Rao's efficient score test statistics and the global tests have similar or slightly lower power than SKAT and SKAT‐O. In practice, it is not known whether rare variants or common variants in a gene region are disease related. All we can assume is that a combination of rare and common variants influences disease susceptibility. Thus, the improved performance of our models when the causal variants are both rare and common shows that the proposed models can be very useful in dissecting complex traits. We compare the performance of our methods with SKAT and SKAT‐O on real neural tube defects and Hirschsprung's disease datasets. The Rao's efficient score test statistics and the global tests are more sensitive than SKAT and SKAT‐O in the real data analysis. Our methods can be used in either gene‐disease genome‐wide/exome‐wide association studies or candidate gene analyses.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号