首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Genome‐wide association studies are helping to dissect the etiology of complex diseases. Although case‐control association tests are generally more powerful than family‐based association tests, population stratification can lead to spurious disease‐marker association or mask a true association. Several methods have been proposed to match cases and controls prior to genotyping, using family information or epidemiological data, or using genotype data for a modest number of genetic markers. Here, we describe a genetic similarity score matching (GSM) method for efficient matched analysis of cases and controls in a genome‐wide or large‐scale candidate gene association study. GSM comprises three steps: (1) calculating similarity scores for pairs of individuals using the genotype data; (2) matching sets of cases and controls based on the similarity scores so that matched cases and controls have similar genetic background; and (3) using conditional logistic regression to perform association tests. Through computer simulation we show that GSM correctly controls false‐positive rates and improves power to detect true disease predisposing variants. We compare GSM to genomic control using computer simulations, and find improved power using GSM. We suggest that initial matching of cases and controls prior to genotyping combined with careful re‐matching after genotyping is a method of choice for genome‐wide association studies. Genet. Epidemiol. 33:508–517, 2009. © 2009 Wiley‐Liss, Inc.  相似文献   

2.
Population structure inference with genetic data has been motivated by a variety of applications in population genetics and genetic association studies. Several approaches have been proposed for the identification of genetic ancestry differences in samples where study participants are assumed to be unrelated, including principal components analysis (PCA), multidimensional scaling (MDS), and model‐based methods for proportional ancestry estimation. Many genetic studies, however, include individuals with some degree of relatedness, and existing methods for inferring genetic ancestry fail in related samples. We present a method, PC‐AiR, for robust population structure inference in the presence of known or cryptic relatedness. PC‐AiR utilizes genome‐screen data and an efficient algorithm to identify a diverse subset of unrelated individuals that is representative of all ancestries in the sample. The PC‐AiR method directly performs PCA on the identified ancestry representative subset and then predicts components of variation for all remaining individuals based on genetic similarities. In simulation studies and in applications to real data from Phase III of the HapMap Project, we demonstrate that PC‐AiR provides a substantial improvement over existing approaches for population structure inference in related samples. We also demonstrate significant efficiency gains, where a single axis of variation from PC‐AiR provides better prediction of ancestry in a variety of structure settings than using 10 (or more) components of variation from widely used PCA and MDS approaches. Finally, we illustrate that PC‐AiR can provide improved population stratification correction over existing methods in genetic association studies with population structure and relatedness.  相似文献   

3.
Genome-wide association studies (GWAS) routinely apply principal component analysis (PCA) to infer population structure within a sample to correct for confounding due to ancestry. GWAS implementation of PCA uses tens of thousands of single-nucleotide polymorphisms (SNPs) to infer structure, despite the fact that only a small fraction of such SNPs provides useful information on ancestry. The identification of this reduced set of ancestry-informative markers (AIMs) from a GWAS has practical value; for example, researchers can genotype the AIM set to correct for potential confounding due to ancestry in follow-up studies that utilize custom SNP or sequencing technology. We propose a novel technique to identify AIMs from genome-wide SNP data using sparse PCA. The procedure uses penalized regression methods to identify those SNPs in a genome-wide panel that significantly contribute to the principal components while encouraging SNPs that provide negligible loadings to vanish from the analysis. We found that sparse PCA leads to negligible loss of ancestry information compared to traditional PCA analysis of genome-wide SNP data. We further demonstrate the value of sparse PCA for AIM selection using real data from the International HapMap Project and a genomewide study of inflammatory bowel disease. We have implemented our approach in open-source R software for public use.  相似文献   

4.
In cancer observational studies, differences between groups on confounding variables may have a significant effect on results when examining health outcomes. This study demonstrates the utility of propensity score matching to balance a non-cancer and cancer cohort of older adults on multiple relevant covariates. This approach matches cases to controls on a single indicator, the propensity score, rather than multiple variables. Results indicated that propensity score matching is an efficient and useful way to create a matched case-control study out of a large cohort study, and allows confidence in the strength of the observed outcomes of the study.  相似文献   

5.
DNA methylation is an important epigenetic mechanism that has been linked to complex diseases and is of great interest to researchers as a potential link between genome, environment, and disease. As the scale of DNA methylation association studies approaches that of genome‐wide association studies, issues such as population stratification will need to be addressed. It is well‐documented that failure to adjust for population stratification can lead to false positives in genetic association studies, but population stratification is often unaccounted for in DNA methylation studies. Here, we propose several approaches to correct for population stratification using principal components (PCs) from different subsets of genome‐wide methylation data. We first illustrate the potential for confounding due to population stratification by demonstrating widespread associations between DNA methylation and race in 388 individuals (365 African American and 23 Caucasian). We subsequently evaluate the performance of our PC‐based approaches and other methods in adjusting for confounding due to population stratification. Our simulations show that (1) all of the methods considered are effective at removing inflation due to population stratification, and (2) maximum power can be obtained with single‐nucleotide polymorphism (SNP)‐based PCs, followed by methylation‐based PCs, which outperform both surrogate variable analysis and genomic control. Among our different approaches to computing methylation‐based PCs, we find that PCs based on CpG sites chosen for their potential to proxy nearby SNPs can provide a powerful and computationally efficient approach to adjust for population stratification in DNA methylation studies when genome‐wide SNP data are unavailable.  相似文献   

6.
The stratification score for a case-control study is the probability of disease modeled as a function of potential confounders. The authors show that the stratification score is a retrospective balancing score and thus plays a similar role in case-control studies as the propensity score plays in prospective studies. The authors further show how standardization using the stratification score can be used to compare the distributions of exposures that would be found among case and control participants if both groups had the same distribution of confounding covariables. The authors illustrate these results using data from a genome-wide association study, the GAIN (Genetic Association Information Network) study of schizophrenia among African Americans (2006-2008).  相似文献   

7.
Population stratification has long been recognized as an issue in genetic association studies because unrecognized population stratification can lead to both false‐positive and false‐negative findings and can obscure true association signals if not appropriately corrected. This issue can be even worse in rare variant association analyses because rare variants often demonstrate stronger and potentially different patterns of stratification than common variants. To correct for population stratification in genetic association studies, we proposed a novel method to Test the effect of an Optimally Weighted combination of variants in Admixed populations (TOWA) in which the analytically derived optimal weights can be calculated from existing phenotype and genotype data. TOWA up weights rare variants and those variants that have strong associations with the phenotype. Additionally, it can adjust for the direction of the association, and allows for local ancestry difference among study subjects. Extensive simulations show that the type I error rate of TOWA is under control in the presence of population stratification and it is more powerful than existing methods. We have also applied TOWA to a real sequencing data. Our simulation studies as well as real data analysis results indicate that TOWA is a useful tool for rare variant association analyses in admixed populations.  相似文献   

8.
ObjectiveGenetic studies of complex human diseases rely heavily on the epidemiologic association paradigm, particularly the population-based case–control designs. This study aims to compare the matching effectiveness in terms of bias reduction between exposure matching and stratum matching.Study Design and SettingFormulas for population stratification bias were derived. An index of matching effectiveness was constructed to compare the two types of matching.ResultsIt was found that exposure matching can paradoxically increase the magnitude of population stratification bias sometimes, whereas stratum matching can guarantee to reduce it.ConclusionThe authors propose two simple rules for genetic association studies: (a) to match on anything that helps to delineate population strata such as race, ethnicity, nationality, ancestry, and birthplace and (b) to match on an exposure only when it is a strong predictor of the disease and is expected to have great variation in prevalence across population strata.  相似文献   

9.
In causal studies without random assignment of treatment, causal effects can be estimated using matched treated and control samples, where matches are obtained using estimated propensity scores. Propensity score matching can reduce bias in treatment effect estimators in cases where the matched samples have overlapping covariate distributions. Despite its application in many applied problems, there is no universally employed approach to interval estimation when using propensity score matching. In this article, we present and evaluate approaches to interval estimation when using propensity score matching.  相似文献   

10.
The large number of markers considered in a genome‐wide association study (GWAS) has resulted in a simplification of analyses conducted. Most studies are analyzed one marker at a time using simple tests like the trend test. Methods that account for the special features of genetic association studies, yet remain computationally feasible for genome‐wide analysis, are desirable as they may lead to increased power to detect associations. Haplotype sharing attempts to translate between population genetics and genetic epidemiology. Near a recent mutation that increases disease risk, haplotypes of case participants should be more similar to each other than haplotypes of control participants; conversely, the opposite pattern may be found near a recent mutation that lowers disease risk. We give computationally simple association tests based on haplotype sharing that can be easily applied to GWASs while allowing use of fast (but not likelihood‐based) haplotyping algorithms and properly accounting for the uncertainty introduced by using inferred haplotypes. We also give haplotype‐sharing analyses that adjust for population stratification. Applying our methods to a GWAS of Parkinson's disease, we find a genome‐wide significant signal in the CAST gene that is not found by single‐SNP methods. Further, a missing‐data artifact that causes a spurious single‐SNP association on chromosome 9 does not impact our test. Genet. Epidemiol. 33:657–667, 2009. Published 2009 Wiley‐Liss, Inc.  相似文献   

11.
Current genome-wide association studies (GWAS) often involve populations that have experienced recent genetic admixture. Genotype data generated from these studies can be used to test for association directly, as in a non-admixed population. As an alternative, these data can be used to infer chromosomal ancestry, and thus allow for admixture mapping. We quantify the contribution of allele-based and ancestry-based association testing under a family-design, and demonstrate that the two tests can provide non-redundant information. We propose a joint testing procedure, which efficiently integrates the two sources information. The efficiencies of the allele, ancestry and combined tests are compared in the context of a GWAS. We discuss the impact of population history and provide guidelines for future design and analysis of GWAS in admixed populations.  相似文献   

12.
A key aim for current genome-wide association studies (GWAS) is to interrogate the full spectrum of genetic variation underlying human traits, including rare variants, across populations. Deep whole-genome sequencing is the gold standard to fully capture genetic variation, but remains prohibitively expensive for large sample sizes. Array genotyping interrogates a sparser set of variants, which can be used as a scaffold for genotype imputation to capture a wider set of variants. However, imputation quality depends crucially on reference panel size and genetic distance from the target population. Here, we consider sequencing a subset of GWAS participants and imputing the rest using a reference panel that includes both sequenced GWAS participants and an external reference panel. We investigate how imputation quality and GWAS power are affected by the number of participants sequenced for admixed populations (African and Latino Americans) and European population isolates (Sardinians and Finns), and identify powerful, cost-effective GWAS designs given current sequencing and array costs. For populations that are well-represented in existing reference panels, we find that array genotyping alone is cost-effective and well-powered to detect common- and rare-variant associations. For poorly represented populations, sequencing a subset of participants is often most cost-effective, and can substantially increase imputation quality and GWAS power.  相似文献   

13.
Recent studies suggest that rare variants play an important role in the etiology of many traits. Although a number of methods have been developed for genetic association analysis of rare variants, they all assume a relatively homogeneous population under study. Such an assumption may not be valid for samples collected from admixed populations such asAfricanAmericans andHispanicAmericans as there is a great extent of local variation in ancestry in these populations. To ensure valid and more powerful rare variant association tests performed in admixed populations, we have developed a local ancestry‐based weighted dosage test, which is able to take into account local ancestry of rare alleles, uncertainties in rare variant imputation when imputed data are included, and the direction of effect that rare variants exert on phenotypic outcome. We used simulated sequence data to show that our proposed test has controlled typeIerror rates, whereas naïve application of existing rare variants tests and tests that adjust for global ancestry lead to inflated type I error rates. We showed that our test has higher power than tests without proper adjustment of ancestry. We also applied the proposed method to a candidate gene study on low‐density lipoprotein cholesterol. Our results suggest that it is important to appropriately control for potential population stratification induced by local ancestry difference in the analysis of rare variants in admixed populations.  相似文献   

14.
Family-based association studies have gained in popularity for mapping disease-susceptibility gene(s) of complex diseases. However, recruiting family controls is often more difficult than recruiting unrelated controls. The author proposes a case-control study, where the possible biases due to population stratification are controlled by matching in the design stage and by genomic controlling in the data-analytic stage. The matching is based on a set of "stratum-delineating variables," such as, race, ethnicity, nationality, ancestry, and birthplace; and the genomic controlling is based on typing a number of null markers across the genome and applying the principle of multiplicative scaling of chi-square distribution. It pays to match carefully to have a higher proportion of correctly matched sets, as computer simulation showed that this would increase the power of the study. If matching is crude, one loses power but still has the correct type I error rate after genomic controlling. Power studies showed that the numbers of affected subjects required for the pair-matched study are comparable to those required by the case-parents design, if the study was conducted in a homogeneous population. As the (control-to-case) matching ratio increases, the number of affected subjects required decreases. With matching ratio tending toward infinity, the number required shrinks roughly by half. The case-control study with matching and genomic controlling frees us from family bondage, and the genetic problem as complicated as mapping genes can now be studied using simple epidemiologic methods.  相似文献   

15.
全基因组关联研究中的统计分析方法   总被引:1,自引:1,他引:0       下载免费PDF全文
随着人类基因组计划的完成,疾病的全基因组关联研究成为可能.该类研究的数据特点是:高维、小样本.面对浩瀚的数据,传统分析方法 受到严重挑战.文中介绍全基因组关联研究中的数据分析策略和步骤,包括质量控制、分析、结果 表示等,并对全基因组关联研究的局限性和目前统计分析方法 的不足进行讨论.
Abstract:
In lieu of large samples of cases and/or controls with hundreds of markers spreading throughout the human genome, researchers started to notice the dramatic increase of genome-wide association study (GWAS) for complex disorders, in the last 5 years. This paper highlights the statistical challenges in such huge-scale genetic studies, and introduces the analytical strategies and steps for handling GWAS data. Such issues as quality control of data, population stratification, methods available to data analysis and results presentation, replication, as well as the limitations of GWAS studies and the challenges presenting for statistics, are addressed.  相似文献   

16.
Populations of non-European ancestry are substantially underrepresented in genome-wide association studies (GWAS). As genetic effects can differ between ancestries due to possibly different causal variants or linkage disequilibrium patterns, a meta-analysis that includes GWAS of all populations yields biased estimation in each of the populations and the bias disproportionately impacts non-European ancestry populations. This is because meta-analysis combines study-specific estimates with inverse variance as the weights, which causes biases towards studies with the largest sample size, typical of the European ancestry population. In this paper, we propose two empirical Bayes (EB) estimators to borrow the strength of information across populations although accounting for between-population heterogeneity. Extensive simulation studies show that the proposed EB estimators are largely unbiased and improve efficiency compared to the population-specific estimator. In contrast, even though the meta-analysis estimator has a much smaller variance, it yields significant bias when the genetic effect is heterogeneous across populations. We apply the proposed EB estimators to a large-scale trans-ancestry GWAS of stroke and demonstrate that the EB estimators reduce the variance of the population-specific estimator substantially, with the effect estimates close to the population-specific estimates.  相似文献   

17.
Genome-wide association studies (GWAS) have thus far achieved substantial success. In the last decade, a large number of common variants underlying complex diseases have been identified through GWAS. In most existing GWAS, the identified common variants are obtained by single marker-based tests, that is, testing one single-nucleotide polymorphism (SNP) at a time. Generally, the basic functional unit of inheritance is a gene, rather than a SNP. Thus, results from gene-level association test can be more readily integrated with downstream functional and pathogenic investigation. In this paper, we propose a general gene-based p-value adaptive combination approach (GPA) which can integrate association evidence of multiple genetic variants using only GWAS summary statistics (either p-value or other test statistics). The proposed method could be used to test genetic association for both continuous and binary traits through not only one study but also multiple studies, which would be helpful to overcome the limitation of existing methods that can only be applied to a specific type of data. We conducted thorough simulation studies to verify that the proposed method controls type I errors well, and performs favorably compared to single-marker analysis and other existing methods. We demonstrated the utility of our proposed method through analysis of GWAS meta-analysis results for fasting glucose and lipids from the international MAGIC consortium and Global Lipids Consortium, respectively. The proposed method identified some novel trait associated genes which can improve our understanding of the mechanisms involved in -cell function, glucose homeostasis, and lipids traits.  相似文献   

18.
Population stratification may cause an inflated type-I error and spurious association when assessing the association between genetic variations with an outcome. Many genetic association studies are now using exonic variants, which captures only 1% of the genome, however, population stratification adjustments have not been evaluated in the context of exonic variants. We compare the performance of two established approaches: principal components analysis (PCA) and mixed-effects models and assess the utility of genome-wide (GW) and exonic variants, by simulation and using a data set from the Framingham Heart Study. Our results illustrate that although the PCs and genetic relationship matrices computed by GW and exonic markers are different, the type-I error rate of association tests for common variants with additive effect appear to be properly controlled in the presence of population stratification. In addition, by considering single nucleotide variants (SNVs) that have different levels of confounding by population stratification, we also compare the power across multiple association approaches to account for population stratification such as PC-based corrections and mixed-effects models. We find that while these two methods achieve a similar power for SNVs that have a low or medium level of confounding by population stratification, mixed-effects model can reach a higher power for SNVs highly confounded by population stratification.  相似文献   

19.
Genome‐wide association studies (GWAS) for complex diseases have focused primarily on single‐trait analyses for disease status and disease‐related quantitative traits. For example, GWAS on risk factors for coronary artery disease analyze genetic associations of plasma lipids such as total cholesterol, LDL‐cholesterol, HDL‐cholesterol, and triglycerides (TGs) separately. However, traits are often correlated and a joint analysis may yield increased statistical power for association over multiple univariate analyses. Recently several multivariate methods have been proposed that require individual‐level data. Here, we develop metaUSAT (where USAT is unified score‐based association test), a novel unified association test of a single genetic variant with multiple traits that uses only summary statistics from existing GWAS. Although the existing methods either perform well when most correlated traits are affected by the genetic variant in the same direction or are powerful when only a few of the correlated traits are associated, metaUSAT is designed to be robust to the association structure of correlated traits. metaUSAT does not require individual‐level data and can test genetic associations of categorical and/or continuous traits. One can also use metaUSAT to analyze a single trait over multiple studies, appropriately accounting for overlapping samples, if any. metaUSAT provides an approximate asymptotic P‐value for association and is computationally efficient for implementation at a genome‐wide level. Simulation experiments show that metaUSAT maintains proper type‐I error at low error levels. It has similar and sometimes greater power to detect association across a wide array of scenarios compared to existing methods, which are usually powerful for some specific association scenarios only. When applied to plasma lipids summary data from the METSIM and the T2D‐GENES studies, metaUSAT detected genome‐wide significant loci beyond the ones identified by univariate analyses. Evidence from larger studies suggest that the variants additionally detected by our test are, indeed, associated with lipid levels in humans. In summary, metaUSAT can provide novel insights into the genetic architecture of a common disease or traits.  相似文献   

20.
We propose a method to analyze family‐based samples together with unrelated cases and controls. The method builds on the idea of matched case–control analysis using conditional logistic regression (CLR). For each trio within the family, a case (the proband) and matched pseudo‐controls are constructed, based upon the transmitted and untransmitted alleles. Unrelated controls, matched by genetic ancestry, supplement the sample of pseudo‐controls; likewise unrelated cases are also paired with genetically matched controls. Within each matched stratum, the case genotype is contrasted with control/pseudo‐control genotypes via CLR, using a method we call matched‐CLR (mCLR). Eigenanalysis of numerous SNP genotypes provides a tool for mapping genetic ancestry. The result of such an analysis can be thought of as a multidimensional map, or eigenmap, in which the relative genetic similarities and differences amongst individuals is encoded in the map. Once constructed, new individuals can be projected onto the ancestry map based on their genotypes. Successful differentiation of individuals of distinct ancestry depends on having a diverse, yet representative sample from which to construct the ancestry map. Once samples are well‐matched, mCLR yields comparable power to competing methods while ensuring excellent control over Type I error. Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号