首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Tag SNP selection for association studies   总被引:6,自引:0,他引:6  
This report describes current methods for selection of informative single nucleotide polymorphisms (SNPs) using data from a dense network of SNPs that have been genotyped in a relatively small panel of subjects. We discuss the following issues: (1) Optimal selection of SNPs based upon maximizing either the predictability of unmeasured SNPs or the predictability of SNP haplotypes as selection criteria. (2) The dependence of the performance of tag SNP selection methods upon the density of SNP markers genotyped for the purpose of haplotype discovery and tag SNP selection. (3) The likely power of case-control studies to detect the influence upon disease risk of common disease-causing variants in candidate genes in a haplotype-based analysis. We propose a quasi-empirical approach towards evaluating the power of large studies with this calculation based upon the SNP genotype and haplotype frequencies estimated in a haplotype discovery panel. In this calculation, each common SNP in turn is treated as a potential unmeasured causal variant and subjected to a correlation analysis using the remaining SNPs. We use a small portion of the HapMap ENCODE data (488 common SNPs genotyped over approximately a 500 kb region of chromosome 2) as an illustrative example of this approach towards power evaluation.  相似文献   

2.
We provide a general purpose family-based testing strategy for associating disease phenotypes with haplotypes when phase may be ambiguous and parental genotype data may be missing. These tests for linkage and association can be used in candidate gene studies with tightly linked markers. Our proposed weighted conditional approach extends the method described in Rabinowitz and Laird to multiple markers. It is attractive because it provides haplotype tests for family-based studies that are efficient and robust to population admixture, phenotype distribution specification, and ascertainment based on phenotypes. It can handle missing parental genotypes and/or missing phase in both offspring and parents. It yields either haplotype-specific (univariate) tests or multi-haplotype (global) tests. This extension has been implemented in the freely available software haplotype FBAT. We used the haplotype FBAT program to test for associations between asthma phenotypes and single nucleotide polymorphisms (SNPs) in the beta-2 adrenergic receptor gene. Whereas no single SNP showed significant association with asthma diagnosis or bronchodilator responsiveness (quantitative trait), a haplotype-based global test found a highly significant association with asthma diagnosis (P value <0.00005) and the measure of bronchodilator responsiveness (P value =0.016).  相似文献   

3.
目的 探讨结节性硬化症(tuberous sclerosis complex,TSC)相关基因TSC1、TSC2基因多态性与儿童孤独症之间的关联。 方法 利用SNaPshot基因分型技术,在97例孤独症核心家系中,对TSC1、TSC2基因上的8个标签SNP,即rs3761840、rs2809244、rs1050700、rs739441、rs2074968、rs2074969、rs2072314、rs8063461进行分型;通过FBAT软件及Haploview软件进行基于家系的单倍型分析。 结果 1)基于家系的关联分析发现8个SNPs等位基因中有2个SNPs的等位基因倾向于过传递(rs1050700 A:Z=2.708,P=0.006769;rs2074968 G:Z=3.244,P=0.001180),并且经过FDR校正后,2个SNPs仍显示出与孤独症之间存在显著关联性(校正P值分别为0.027,0.014)。2)rs3761840-rs2809244基因型的单体型A-C显示出显著的传递不平衡,双亲较少传递给子女(Z=-2.297,P=0.021629)。rs2074968-rs2072314基因型的2种单体型即 G-C及C-C均显示出显著的传递不平衡,单体型G-C能从双亲过传递给子女(Z=2.596,P=0.009444),单体型C-C则相反(Z=-3.657,P=0.000256)。 结论 TSC1、TSC2基因可能与儿童孤独症的发生存在关联。  相似文献   

4.
By systematic examination of common tag single-nucleotide polymorphisms (SNPs) across the genome, the genome-wide association study (GWAS) has proven to be a successful approach to identify genetic variants that are associated with complex diseases and traits. Although the per base pair cost of sequencing has dropped dramatically with the advent of the next-generation technologies, it may still only be feasible to obtain DNA sequence data for a portion of available study subjects due to financial constraints. Two-phase sampling designs have been used frequently in large-scale surveys and epidemiological studies where certain variables are too costly to be measured on all subjects. We consider two-phase stratified sampling designs for genetic association, in which tag SNPs for candidate genes or regions are genotyped on all subjects in phase 1, and a proportion of subjects are selected into phase 2 based on genotypes at one or more tag SNPs. Deep sequencing in the region is then applied to genotype phase 2 subjects at sequence SNPs. We investigate alternative sampling designs for selection of phase 2 subjects within strata defined by tag SNP genotypes and develop methods of inference for sequence SNP variant associations using data from both phases. In comparison to methods that use data from phase 2 alone, the combined analysis improves efficiency.  相似文献   

5.
Current genome-wide association studies still heavily rely on a single-marker strategy, in which each single nucleotide polymorphism (SNP) is tested individually for association with a phenotype. Although methods and software packages that consider multimarker models have become available, they have been slow to become widely adopted and their efficacy in real data analysis is often questioned. Based on conducting extensive simulations, here we endeavor to provide more insights into the performance of simple multimarker association tests as compared to single-marker tests. The results reveal the power advantage as well as disadvantage of the two- vs. the single-marker test. Power differentials depend on the correlation structure among tag SNPs, as well as that between tag SNPs and causal variants. A two-marker test has relatively better performance than single-marker tests when the correlation of the two adjacent markers is high. However, using HapMap data, two-marker tests tended to have a greater chance of being less powerful than single-marker tests, due to constraints on the number of actual possible haplotypes in the HapMap data. Yet, the average power difference was small whenever the one-marker test is more powerful, while there were many situations where the two-marker test can be much more powerful. These findings can be useful to guide analyses of future studies.  相似文献   

6.
The genetic case-control association study of unrelated subjects is a leading method to identify single nucleotide polymorphisms (SNPs) and SNP haplotypes that modulate the risk of complex diseases. Association studies often genotype several SNPs in a number of candidate genes; we propose a two-stage approach to address the inherent statistical multiple comparisons problem. In the first stage, each gene's association with disease is summarized by a single p-value that controls a familywise error rate. In the second stage, summary p-values are adjusted for multiplicity using a false discovery rate (FDR) controlling procedure. For the first stage, we consider marginal and joint tests of SNPs and haplotypes within genes, and we construct an omnibus test that combines SNP and haplotype analysis. Simulation studies show that when disease susceptibility is conferred by a SNP, and all common SNPs in a gene are genotyped, marginal analysis of SNPs using the Simes test has similar or higher power than marginal or joint haplotype analysis. Conversely, haplotype analysis can be more powerful when disease susceptibility is conferred by a haplotype. The omnibus test tracks the more powerful of the two approaches, which is generally unknown. Multiple testing balances the desire for statistical power against the implicit costs of false positive results, which up to now appear to be common in the literature.  相似文献   

7.
In genetic association studies, multiple markers are usually employed to cover a genomic region of interest for localizing a trait locus. In this report, we propose a novel multi-marker family-based association test (T(LC)) that linearly combines the single-marker test statistics using data-driven weights. We examine the type-I error rate in a numerical study and compare its power to identify a common trait locus using tag single nucleotide polymorphisms (SNPs) within the same haplotype block that the trait locus resides with three competing tests including a global haplotype test (T(H)), a multi-marker test similar to the Hotelling-T(2) test for the population-based data (T(MM)), and a single-marker test with Bonferroni's correction for multiple testing (T(B)). The type-I error rate of T(LC) is well maintained in our numeric study. In all the scenarios we examined, T(LC) is the most powerful, followed by T(B). T(MM) and T(H) are the poorest. T(H) and T(MM) have essentially the same power when parents are available. However, when both parents are missing, T(MM) is substantially more powerful than T(H). We also apply this new test on a data set from a previous association study on nicotine dependence.  相似文献   

8.
Hao K  Xu X  Laird N  Wang X  Xu X 《Genetic epidemiology》2004,26(1):22-30
At the current stage, a large number of single nucleotide polymorphisms (SNPs) have been deployed in searching for genes underlying complex diseases. A powerful method is desirable for efficient analysis of SNP data. Recently, a novel method for multiple SNP association test using a combination of allelic association (AA) and Hardy-Weinberg disequilibrium (HWD) has been proposed. However, the power of this test has not been systematically examined. In this study, we conducted a simulation study to further evaluate the statistical power of the new procedure, as well as of the influence of the HWD on its performance. The simulation examined the scenarios of multiple disease SNPs among a candidate pool, assuming different parameters including allele frequencies and risk ratios, dominant, additive, and recessive genetic models, and the existence of gene-gene interactions and linkage disequilibrium (LD). We also evaluated the performance of this test in capturing real disease associated SNPs, when a significant global P value is detected. Our results suggest that this new procedure is more powerful than conventional single-point analyses with correction of multiple testing. However, inclusion of HWD reduces the power under most circumstances. We applied the novel association test procedure to a case-control study of preterm delivery (PTD), examining the effects of 96 candidate gene SNPs concurrently, and detected a global P value of 0.0250 by using Cochran-Armitage chi(2)s as "starting" statistics in the procedure. In the following single point analysis, SNPs on IL1RN, IL1R2, ESR1, Factor 5, and OPRM1 genes were identified as possible risk factors in PTD.  相似文献   

9.
This report summarizes the Genetic Analysis Workshop 14 contributions related to fine-mapping strategies, in which examining smaller regions by association with single-nucleotide polymorphisms (SNPs) can yield savings in genotyping and multiple-testing penalties. The aim of the analyses conducted in Group 7 contributions was to localize disease susceptibility loci from either the simulated or the Collaborative Study on the Genetics of Alcoholism (COGA) data within identified regions of linkage. Among the 10 contributions, most groups analyzed the simulated data, one group analyzed the COGA data only, and one group analyzed both data sets. The research questions included evaluation of new methods of analysis, as well as comparisons among alternative methods, analytic strategies, and study designs. Methods of interest included an algorithm for SNP marker ordering, a locally weighted transmission disequilibrium test statistic, a likelihood-ratio test statistic for family-based association in nuclear families, a robust test statistic for case-control association studies, and Bayesian spatial modeling methods for haplotype clustering and association. Evaluations included comparisons among confidence intervals for loci detected via linkage, effects of multiple testing adjustments and trade-offs between type I error and power, comparisons among haplotype-based (multilocus) and genotype-based (multilocus and single-locus) association analyses, and design of fine-mapping and replication studies. While several promising new approaches were identified, further development and evaluation of methods for multiple testing, regression modeling of association with multiple markers and haplotypes, and combined treatment of linkage and association data are necessary if we are to identify many of the genes that contribute to complex traits.  相似文献   

10.
Lin WY  Yi N  Zhi D  Zhang K  Gao G  Tiwari HK  Liu N 《Genetic epidemiology》2012,36(6):572-582
Detecting uncommon causal variants (minor allele frequency [MAF] < 5%) is difficult with commercial single-nucleotide polymorphism (SNP) arrays that are designed to capture common variants (MAF > 5%). Haplotypes can provide insights into underlying linkage disequilibrium (LD) structure and can tag uncommon variants that are not well tagged by common variants. In this work, we propose a wei-SIMc-matching test that inversely weights haplotype similarities with the estimated standard deviation of haplotype counts to boost the power of similarity-based approaches for detecting uncommon causal variants. We then compare the power of the wei-SIMc-matching test with that of several popular haplotype-based tests, including four other similarity-based tests, a global score test for haplotypes (global), a test based on the maximum score statistic over all haplotypes (max), and two newly proposed haplotype-based tests for rare variant detection. With systematic simulations under a wide range of LD patterns, the results show that wei-SIMc-matching and global are the two most powerful tests. Among these two tests, wei-SIMc-matching has reliable asymptotic P-values, whereas global needs permutations to obtain reliable P-values when the frequencies of some haplotype categories are low or when the trait is skewed. Therefore, we recommend wei-SIMc-matching for detecting uncommon causal variants with surrounding common SNPs, in light of its power and computational feasibility.  相似文献   

11.
Genetically complex diseases are caused by interacting environmental factors and genes. As a consequence, statistical methods that consider multiple unlinked genomic regions simultaneously are desirable. Such consideration, however, may lead to a vast number of different high-dimensional tests whose appropriate analysis pose a problem. Here, we present a method to analyze case-control studies with multiple SNP data without phase information that considers gene-gene interaction effects while correcting appropriately for multiple testing. In particular, we allow for interactions of haplotypes that belong to different unlinked regions, as haplotype analysis often proves to be more powerful than single marker analysis. In addition, we consider different marker combinations at each unlinked region. The multiple testing issue is settled via the minP approach; the P value of the "best" marker/region configuration is corrected via Monte-Carlo simulations. Thus, we do not explicitly test for a specific pre-defined interaction model, but test for the global hypothesis that none of the considered haplotype interactions shows association with the disease. We carry out a simulation study for case-control data that confirms the validity of our approach. When simulating two-locus disease models, our test proves to be more powerful than association methods that analyze each linked region separately. In addition, when one of the tested regions is not involved in the etiology of the disease, only a small amount of power is lost with interaction analysis as compared to analysis without interaction. We successfully applied our method to a real case-control data set with markers from two genes controlling a common pathway. While classical analysis failed to reach significance, we obtained a significant result even after correction for multiple testing with our proposed haplotype interaction analysis. The method described here has been implemented in FAMHAP.  相似文献   

12.
A topical question in genetic association studies is the optimal use of the information provided by genotyped single-nucleotide polymorphisms (SNPs) in order to detect the role of a candidate gene in a multifactorial disease. We propose a strategy called "combination test" that tests the association between a quantitative trait and all possible phased combinations of various numbers of SNPs. We compare this strategy to two alternative strategies: the association test that considers each SNP separately, and a multilocus genotype-based test that considers the phased combination of all SNPs together. To compare these three tests, a quantitative trait was simulated under different models of correspondence between phenotype and genotype, including the extreme case when two SNPs interact with no marginal effects of each SNP. The genotypes were taken from a sample of 290 independent individuals genotyped for three genes with various number of SNPs (from 5-8 SNPs). The results show that the "combination test" is the only one able to detect the association when the two SNPs involved in disease susceptibility interact with no marginal effects. Interestingly, even in the case of a single etiological SNP, the "combination test" performed well. We apply the three tests to Genetic Analysis Workshop 12 (Almasy et al. [2001] Genet. Epidemiol. 21:332-338) simulated data, and show that although there was no interactions between the etiological SNPs, the "combination test" was preferable to the two other compared methods to detect the role of the candidate gene.  相似文献   

13.
Not accounting for interaction in association analyses may reduce the power to detect the variants involved. We investigate the powers of different designs to detect under two‐locus models the effect of disease‐causing variants among several hundreds of markers using family‐based association tests by simulation. This setting reflects realistic situations of exploration of linkage regions or of biological pathways. We define four strategies: (S1) single‐marker analysis of all Single Nucleotide Polymorphisms (SNPs), (S2) two‐marker analysis of all possible SNPs pairs, (S3) lax preliminary selection of SNPs followed by a two‐marker analysis of all selected SNP pairs, (S4) stringent preliminary selection of SNPs, each being later paired with all the SNPs for two‐marker analysis. Strategy S2 is never the best design, except when there is an inversion of the gene effect (flip‐flop model). Testing individual SNPs (S1) is the most efficient when the two genes act multiplicatively. Designs S3 and S4 are the most powerful for nonmultiplicative models. Their respective powers depend on the level of symmetry of the model. Because the true genetic model is unknown, we cannot conclude that one design outperforms another. The optimal approach would be the two‐step strategy (S3 or S4) as it is often the most powerful, or the second best. Genet.  相似文献   

14.
The role of haplotypes in candidate gene studies   总被引:24,自引:0,他引:24  
Human geneticists working on systems for which it is possible to make a strong case for a set of candidate genes face the problem of whether it is necessary to consider the variation in those genes as phased haplotypes, or whether the one-SNP-at-a-time approach might perform as well. There are three reasons why the phased haplotype route should be an improvement. First, the protein products of the candidate genes occur in polypeptide chains whose folding and other properties may depend on particular combinations of amino acids. Second, population genetic principles show us that variation in populations is inherently structured into haplotypes. Third, the statistical power of association tests with phased data is likely to be improved because of the reduction in dimension. However, in reality it takes a great deal of extra work to obtain valid haplotype phase information, and inferred phase information may simply compound the errors. In addition, if the causal connection between SNPs and a phenotype is truly driven by just a single SNP, then the haplotype-based approach may perform worse than the one-SNP-at-a-time approach. Here we examine some of the factors that affect haplotype patterns in genes, how haplotypes may be inferred, and how haplotypes have been useful in the context of testing association between candidate genes and complex traits.  相似文献   

15.
Genome‐wide association studies (GWAS) that draw samples from multiple studies with a mixture of relationship structures are becoming more common. Analytical methods exist for using mixed‐sample data, but few methods have been proposed for the analysis of genotype‐by‐environment (G×E) interactions. Using GWAS data from a study of sarcoidosis susceptibility genes in related and unrelated African Americans, we explored the current analytic options for genotype association testing in studies using both unrelated and family‐based designs. We propose a novel method—generalized least squares (GLX)—to estimate both SNP and G×E interaction effects for categorical environmental covariates and compared this method to generalized estimating equations (GEE), logistic regression, the Cochran–Armitage trend test, and the WQLS and MQLS methods. We used simulation to demonstrate that the GLX method reduces type I error under a variety of pedigree structures. We also demonstrate its superior power to detect SNP effects while offering computational advantages and comparable power to detect G×E interactions versus GEE. Using this method, we found two novel SNPs that demonstrate a significant genome‐wide interaction with insecticide exposure—rs10499003 and rs7745248, located in the intronic and 3' UTR regions of the FUT9 gene on chromosome 6q16.1.  相似文献   

16.
Single nucleotide polymorphisms (SNPs) are becoming widely used as genotypic markers in genetic association studies of common, complex human diseases. For such association screens, a crucial part of study design is determining what SNPs to prioritize for genotyping. We present a novel power-based algorithm to select a subset of tag SNPs for genotyping from a map of available SNPs. Blocks of markers in strong linkage disequilibrium (LD) are identified, and SNPs are selected to represent each block such that power to detect disease association with an underlying disease allele in LD with block members is preserved; all markers outside of blocks are also included in the tagging subset. A key, novel element of this method is that it incorporates information about the phase of LD observed among marker pairs to retain markers likely to be in coupling phase with an underlying disease locus, thus increasing power compared to a phase-blind approach. Power calculations illustrate important issues regarding LD phase and make clear the advantages of our approach to SNP selection. We apply our algorithm to genotype data from the International HapMap Consortium and demonstrate that considerable reduction in SNP genotyping may be attained while retaining much of the available power for a disease association screen. We also demonstrate that these tag SNPs effectively represent underlying variants not included in the LD analysis and SNP selection, by using leave-one-out tests to show that most (approximately 90%) of the "untyped" variants lying in blocks are in coupling-phase LD with a tag SNP. Additional performance tests using the HapMap ENCyclopedia of DNA Elements (ENCODE) regions show that the method compares well with the popular r2 bin tagging method. This work is a concrete example of how empirical LD phase may be used to benefit study design.  相似文献   

17.
In genome‐wide association studies (GWAS), “generalization” is the replication of genotype‐phenotype association in a population with different ancestry than the population in which it was first identified. Current practices for declaring generalizations rely on testing associations while controlling the family‐wise error rate (FWER) in the discovery study, then separately controlling error measures in the follow‐up study. This approach does not guarantee control over the FWER or false discovery rate (FDR) of the generalization null hypotheses. It also fails to leverage the two‐stage design to increase power for detecting generalized associations. We provide a formal statistical framework for quantifying the evidence of generalization that accounts for the (in)consistency between the directions of associations in the discovery and follow‐up studies. We develop the directional generalization FWER (FWERg) and FDR (FDRg) controlling r‐values, which are used to declare associations as generalized. This framework extends to generalization testing when applied to a published list of Single Nucleotide Polymorphism‐(SNP)‐trait associations. Our methods control FWERg or FDRg under various SNP selection rules based on P‐values in the discovery study. We find that it is often beneficial to use a more lenient P‐value threshold than the genome‐wide significance threshold. In a GWAS of total cholesterol in the Hispanic Community Health Study/Study of Latinos (HCHS/SOL), when testing all SNPs with P‐values (15 genomic regions) for generalization in a large GWAS of whites, we generalized SNPs from 15 regions. But when testing all SNPs with P‐values (89 regions), we generalized SNPs from 27 regions.  相似文献   

18.
Family‐based genetic association studies of related individuals provide opportunities to detect genetic variants that complement studies of unrelated individuals. Most statistical methods for family association studies for common variants are single marker based, which test one SNP a time. In this paper, we consider testing the effect of an SNP set, e.g., SNPs in a gene, in family studies, for both continuous and discrete traits. Specifically, we propose a generalized estimating equations (GEEs) based kernel association test, a variance component based testing method, to test for the association between a phenotype and multiple variants in an SNP set jointly using family samples. The proposed approach allows for both continuous and discrete traits, where the correlation among family members is taken into account through the use of an empirical covariance estimator. We derive the theoretical distribution of the proposed statistic under the null and develop analytical methods to calculate the P‐values. We also propose an efficient resampling method for correcting for small sample size bias in family studies. The proposed method allows for easily incorporating covariates and SNP‐SNP interactions. Simulation studies show that the proposed method properly controls for type I error rates under both random and ascertained sampling schemes in family studies. We demonstrate through simulation studies that our approach has superior performance for association mapping compared to the single marker based minimum P‐value GEE test for an SNP‐set effect over a range of scenarios. We illustrate the application of the proposed method using data from the Cleveland Family GWAS Study.  相似文献   

19.
A goal of association analysis is to determine whether variation in a particular candidate region or gene is associated with liability to complex disease. To evaluate such candidates, ubiquitous Single Nucleotide Polymorphisms (SNPs) are useful. It is critical, however, to select a set of SNPs that are in substantial linkage disequilibrium (LD) with all other polymorphisms in the region. Whether there is an ideal statistical framework to test such a set of ‘tag SNPs’ for association is unknown. Compared to tests for association based on frequencies of haplotypes, recent evidence suggests tests for association based on linear combinations of the tag SNPs (Hotelling T2 test) are more powerful. Following this logical progression, we wondered if single‐locus tests would prove generally more powerful than the regression‐based tests? We answer this question by investigating four inferential procedures: the maximum of a series of test statistics corrected for multiple testing by the Bonferroni procedure, TB, or by permutation of case‐control status, TP; a procedure that tests the maximum of a smoothed curve fitted to the series of of test statistics, TS; and the Hotelling T2 procedure, which we call TR. These procedures are evaluated by simulating data like that from human populations, including realistic levels of LD and realistic effects of alleles conferring liability to disease. We find that power depends on the correlation structure of SNPs within a gene, the density of tag SNPs, and the placement of the liability allele. The clearest pattern emerges between power and the number of SNPs selected. When a large fraction of the SNPs within a gene are tested, and multiple SNPs are highly correlated with the liability allele, TS has better power. Using a SNP selection scheme that optimizes power but also requires a substantial number of SNPs to be genotyped (roughly 10–20 SNPs per gene), power of TP is generally superior to that for the other procedures, including TR. Finally, when a SNP selection procedure that targets a minimal number of SNPs per gene is applied, the average performances of TP and TR are indistinguishable. Genet. Epidemiol. © 2005 Wiley‐Liss, Inc.  相似文献   

20.
The recent successes of GWAS based on large sample sizes motivate combining independent datasets to obtain larger sample sizes and thereby increase statistical power. Analysis methods that can accommodate different study designs, such as family-based and case-control designs, are of general interest. However, population stratification can cause spurious association for population-based association analyses. For family-based association analysis that infers missing parental genotypes based on the allele frequencies estimated in the entire sample, the parental mating-type probabilities may not be correctly estimated in the presence of population stratification. Therefore, any approach to combining family and case-control data should also properly account for population stratification. Although several methods have been proposed to accommodate family-based and case-control data, all have restrictions. Most of them require sampling a homogeneous population, which may not be a reasonable assumption for data from a large consortium. One of the methods, FamCC, can account for population stratification and uses nuclear families with arbitrary number of siblings but requires parental genotype data, which are often unavailable for late-onset diseases. We extended the family-based test, Association in the Presence of Linkage (APL), to combine family and case-control data (CAPL). CAPL can accommodate case-control data and families with multiple affected siblings and missing parents in the presence of population stratification. We used simulations to demonstrate that CAPL is a valid test either in a homogeneous population or in the presence of population stratification. We also showed that CAPL can have more power than other methods that combine family and case-control data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号