首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 0 毫秒
1.
The availability of high-density haplotype data has motivated several fine-scale linkage disequilibrium mapping methods for locating disease-causing mutations. These methods identify loci around which haplotypes of case chromosomes exhibit greater similarity than do those of control chromosomes. A difficulty arising in such mapping is the possibility that case chromosomes have inherited disease-causing mutations from different ancestral chromosomes (founder heterogeneity). Such heterogeneity dilutes measures of case haplotype similarity. This dilution can be mitigated by separating case chromosomes into subsets according to their putative mutation origin, and searching for an area with excessive haplotype similarity within each subset. We propose a nonparametric method for identifying subsets of case chromosomes likely to share a common ancestral progenitor. By simulation studies and application to published data, we show that the method accurately identifies relatively large subsets of chromosomes that share a common founder. We also show that the method allows more precise estimates of the disease mutation loci than obtained by other fine-scale mapping methods.  相似文献   

2.
Haplotype sharing analysis is a well‐established option for the investigation of the etiology of complex diseases. The statistical power of haplotype association methods depends strongly on how the information of unobserved haplotypes can be captured by multilocus genotypes. In this study we combine an entropy‐based marker selection algorithm (EMS), with a haplotype sharing‐based Mantel statistics into a new algorithm. Genetic markers are iteratively selected by their multilocus linkage disequilibrium (LD), which is assessed by a normalized entropy difference. The initial marker set is gradually enlarged to increase the available information on the amount of sharing around a potential susceptibility marker. Markers are rejected from joint phasing if they do not increase the multilocus LD. In simulated candidate gene studies, the Mantel statistics combined with the new EMS performs as well or better at detecting the disease single nucleotide polymorphism—or in indirect association analysis its flanking markers—than the Mantel statistics without selection of markers prior to haplotype estimation and the Mantel statistics using sliding windows of size five. It is therefore appealing to apply our selection approach for haplotype‐based association analysis, since marker selection driven by the observed data avoids both the arbitrary choice of markers when using a fixed window size, as well as the estimation of haplotype block structure. Genet. Epidemiol. 34: 354–363, 2010. © 2010 Wiley‐Liss, Inc.  相似文献   

3.
Association analysis, with the aim of investigating genetic variations, is designed to detect genetic associations with observable traits, which has played an increasing part in understanding the genetic basis of diseases. Among these methods, haplotype‐based association studies are believed to possess prominent advantages, especially for the rare diseases in case‐control studies. However, when modeling these haplotypes, they are subjected to statistical problems caused by rare haplotypes. Fortunately, haplotype clustering offers an appealing solution. In this research, we have developed a new befitting haplotype similarity for “affinity propagation” clustering algorithm, which can account for the rare haplotypes primely, so as to control for the issue on degrees of freedom. The new similarity can incorporate haplotype structure information, which is believed to enhance the power and provide high resolution for identifying associations between genetic variants and disease. Our simulation studies show that the proposed approach offers merits in detecting disease‐marker associations in comparison with the cladistic haplotype clustering method CLADHC. We also illustrate an application of our method to cystic fibrosis, which shows quite accurate estimates during fine mapping. Genet. Epidemiol. 34: 633–641, 2010. © 2010 Wiley‐Liss, Inc.  相似文献   

4.
Haplotype sharing analysis was used to investigate the association of affection status with single nucleotide polymorphism (SNP) haplotypes within candidate gene 1 in one sample each from the isolated and the general population of Genetic Analysis Workshop (GAW) 12 simulated data. Gene 1 has direct influence on affection and harbors more than 70 SNPs. Haplotype sharing analysis depends heavily on previous haplotype estimation. Using GENEHUNTER haplotypes, strong evidence was found for most SNPs in the isolated population sample, thus providing evidence for an involvement of this gene, but the maximum -log(10)(p) values for the haplotype sharing statistics (HSS) test statistic did not correspond to the location of the true variant in either population. In comparison, transmission disequilibrium test (TDT) analysis showed the strongest results at the disease-causing variant in both populations, and these were outstanding in the general population. In this example, TDT analysis appears to perform better than HSS in identifying the disease-causing variant, using SNPs within a candidate gene in an outbred population. Simulations showed that the performance of HSS is hampered by closely spaced SNPs in strong linkage disequilibrium with the functional variant and by ambiguous haplotypes.  相似文献   

5.
We propose an algorithm for analysing SNP-based population association studies, which is a development of that introduced by Molitor et al. [2003: Am J Hum Genet 73:1368-1384]. It uses clustering of haplotypes to overcome the major limitations of many current haplotype-based approaches. We define a between-haplotype score that is simple, yet appears to capture much of the information about evolutionary relatedness of the haplotypes in the vicinity of a (unobserved) putative causal locus. Haplotype clusters can then be defined via a putative ancestral haplotype and a cut-off distance. The number of an individual's two haplotypes that lie within the cluster predicts the individual's genotype at the causal locus. This predicted genotype can then be investigated for association with the phenotype of interest. We implement our approach within a Markov-chain Monte Carlo algorithm that, in effect, searches over locations and ancestral haplotypes to identify large, case-rich clusters. The algorithm successfully fine-maps a causal mutation in a test analysis using real data, and achieves almost 98% accuracy in predicting the genotype at the causal locus. A simulation study indicates that the new algorithm is substantially superior to alternative approaches, and it also allows us to identify situations in which multi-point approaches can substantially improve over single-SNP analyses. Our algorithm runs quickly and there is scope for extension to a wide range of disease models and genomic scales.  相似文献   

6.
Interpretation of dense single nucleotide polymorphism (SNP) follow-up of genome-wide association or linkage scan signals can be facilitated by establishing expectation for the behaviour of primary mapping signals upon fine-mapping, under both null and alternative hypotheses. We examined the inferences that can be made regarding the posterior probability of a real genetic effect and considered different disease-mapping strategies and prior probabilities of association. We investigated the impact of the extent of linkage disequilibrium between the disease SNP and the primary analysis signal and the extent to which the disease gene can be physically localised under these scenarios. We found that large increases in significance (>2 orders of magnitude) appear in the exclusive domain of genuine genetic effects, especially in the follow-up of genome-wide association scans or consensus regions from multiple linkage scans. Fine-mapping significant association signals that reside directly under linkage peaks yield little improvement in an already high posterior probability of a real effect. Following fine-mapping, those signals that increase in significance also demonstrate improved localisation. We found local linkage disequiliptium patterns around the primary analysis signal(s) and tagging efficacy of typed markers to play an important role in determining a suitable interval for fine-mapping. Our findings help inform the interpretation and design of dense SNP-mapping follow-up studies, thus facilitating discrimination between a genuine genetic effect and chance fluctuation (false positive).  相似文献   

7.
Many family‐based tests of linkage disequilibrium are not valid when related nuclear families from larger pedigrees are used, or when independent nuclear families with multiple cases are used. The Pedigree Disequilibrium Test (PDT) proposed by Martin et al. [Am J Hum Genet 67:146–54, 2000] avoids these problems. This paper sketches an extension of the PDT that can account for measured covariates. Where the PDT is based on allele‐counting methods, this extension is based on conditional logistic regression. Versions of these statistics were used to test for association between disease and two known functional single nucleotide polymorphisms (SNPs) on gene 1 and gene 6 and one inert SNP on gene 7 in the first 25 replicates of the simulated population‐isolate data. The new method was also used to test for linkage disequilibrium after correcting for the effect of the environmental factor E1. The PDT and the conditional logistic extension had similar power to detect the functional SNPs (100% for gene 1, approximately 50% for gene 6) and appropriate type I error rates for the inert SNP. Correcting for E1 slightly increased power to detect the association between gene 6 and disease. © 2001 Wiley‐Liss, Inc.  相似文献   

8.
Genome‐wide association studies have discovered and confirmed a large number of loci that are implicated with disease susceptibility and severity. Polymorphisms that emerged from these studies are mostly indirectly associated to the phenotype, and the natural progression is to identify the causal variants that are functionally responsible for these association signals. Long stretches of high linkage disequilibrium (LD) benefitted the initial discovery phase in a genome‐wide scan, allowing commercial genotyping products with imperfect coverage to detect genomic regions genuinely associated with the phenotype. However, regions of high LD confound the fine‐mapping phase, as markers that are perfectly correlated to the causal variants display similar evidence of phenotypic association, hampering the process of differentiating the functional polymorphisms from neighboring surrogates. Here, we explore the potential of integrating information across different populations for narrowing the candidate region that a causal variant resides in, and compare the efficacy of this process of trans‐population fine‐mapping with the extent of variation in patterns of LD between the populations. In addition, we explore two different strategies for pooling data across multiple populations for the purpose of prioritizing the rankings of the causal variants. Our results clearly establish the benefits of trans‐population analysis in reducing the number of possible candidates for the causal variants, particularly in genomic regions displaying strong evidence of inter‐population LD variation. Directly integrating the statistical evidence by summing the test statistics outperforms the standard meta‐analytic procedure. These findings have direct relevance to the design and analysis of ongoing fine‐mapping studies. Genet. Epidemiol. 34: 653‐664, 2010.© 2010 Wiley‐Liss, Inc.  相似文献   

9.
Tag SNP selection for association studies   总被引:6,自引:0,他引:6  
This report describes current methods for selection of informative single nucleotide polymorphisms (SNPs) using data from a dense network of SNPs that have been genotyped in a relatively small panel of subjects. We discuss the following issues: (1) Optimal selection of SNPs based upon maximizing either the predictability of unmeasured SNPs or the predictability of SNP haplotypes as selection criteria. (2) The dependence of the performance of tag SNP selection methods upon the density of SNP markers genotyped for the purpose of haplotype discovery and tag SNP selection. (3) The likely power of case-control studies to detect the influence upon disease risk of common disease-causing variants in candidate genes in a haplotype-based analysis. We propose a quasi-empirical approach towards evaluating the power of large studies with this calculation based upon the SNP genotype and haplotype frequencies estimated in a haplotype discovery panel. In this calculation, each common SNP in turn is treated as a potential unmeasured causal variant and subjected to a correlation analysis using the remaining SNPs. We use a small portion of the HapMap ENCODE data (488 common SNPs genotyped over approximately a 500 kb region of chromosome 2) as an illustrative example of this approach towards power evaluation.  相似文献   

10.
The characterization of linkage disequilibrium (LD) is applied in a variety of studies including the identification of molecular determinants of the local recombination rate, the migration and population history of populations, and the role of positive selection in adaptation. LD suffers from the phase uncertainty of the haplotypes used in its calculation, which reflects limitations of the algorithms used for haplotype estimation. We introduce a LD calculation method, which deals with phase uncertainty by weighting all possible haplotype pairs according to their estimated probabilities as evaluated by PHASE. In contrast to the expectation-maximization (EM) algorithm as implemented in the HAPLOVIEW and GENETICS packages, our method considers haplotypes based on the entire genetic information available for the candidate region. We tested the method using simulated and real genotyping data. The results show that, for all practical purposes, the new method is advantageous in comparison with algorithms that calculate LD using only the most probable haplotype or bilocus haplotypes based on the EM algorithm. The new method deals especially well with low LD regions, which contribute strongly to phase uncertainty. Altogether, the method is an attractive alternative to standard LD calculation procedures, including those based on the EM algorithm. We implemented the method in the software suite R, together with an interface to the popular haplotype calculation package PHASE.  相似文献   

11.
The associations between haplotypes and disease phenotypes offer valuable clues about the genetic determinants of complex diseases. It is highly challenging to make statistical inferences about these associations because of the unknown gametic phase in genotype data. We describe a general likelihood-based approach to inferring haplotype-disease associations in studies of unrelated individuals. We consider all possible phenotypes (including disease indicator, quantitative trait, and potentially censored age at onset of disease) and all commonly used study designs (including cross-sectional, case-control, cohort, nested case-control, and case-cohort). The effects of haplotypes on phenotype are characterized by appropriate regression models, which allow various genetic mechanisms and gene-environment interactions. We present the likelihood functions for all study designs and disease phenotypes under Hardy-Weinberg disequilibrium. The corresponding maximum likelihood estimators are approximately unbiased, normally distributed, and statistically efficient. We provide simple and efficient numerical algorithms to calculate the maximum likelihood estimators and their variances, and implement these algorithms in a freely available computer program. Extensive simulation studies demonstrate that the proposed methods perform well in realistic situations. An application to the Carolina Breast Cancer Study reveals significant haplotype effects and haplotype-smoking interactions in the development of breast cancer.  相似文献   

12.
Case-control designs are commonly adopted in genetic epidemiological studies because they are cost effective and offer powerful tests for genetic and environmental risk factors, as well as their interactions. Previously, we proposed an association mapping approach to estimate the position of an unobserved disease locus as well as measuring its genetic effect on risk. The method provides a confidence interval for the estimated map position to help narrow the chromosomal region potentially harboring a disease locus. However, concerns often rise about case-control designs including possible false positives or bias due to confounders, heterogeneity or interactions among genes and between genes and environments. In the present work, we extended the multipoint linkage disequilibrium mapping approach for case-control studies to incorporate information about factors influencing the effect of causal genes to improve precision and efficiency of the estimated location. The efficiency, bias and coverage probability of this extended approach for locating a disease locus using case-control data with and without additional information on a covariate were compared through simulation. An example of a case-control study for type 2 diabetes was used to illustrate this extended method. In this study, a strong association between diabetes and a candidate gene, SCL2A10, was detected among nonobese subjects, whereas no evidence of association was found for either obese subjects or the whole sample when obesity was ignored. Simulation studies and these diabetes data both demonstrate how the efficiency of the estimated location of a disease gene can be improved substantially by incorporating information on covariates.  相似文献   

13.
Association analysis provides a powerful tool for complex disease gene mapping. However, in the presence of genetic heterogeneity, the power for association analysis can be low since only a fraction of the collected families may carry a specific disease susceptibility allele. Ordered-subset analysis (OSA) is a linkage test that can be powerful in the presence of genetic heterogeneity. OSA uses trait-related covariates to identify a subset of families that provide the most evidence for linkage. A similar strategy applied to genetic association analysis would likely result in increased power to detect association. Association in the presence of linkage (APL) is a family-based association test (FBAT) for nuclear families with multiple affected siblings that properly infers missing parental genotypes when linkage is present. We propose here APL-OSA, which applies the OSA method to the APL statistic to identify a subset of families that provide the most evidence for association. A permutation procedure is used to approximate the distribution of the APL-OSA statistic under the null hypothesis that there is no relationship between the family-specific covariate and the family-specific evidence for allelic association. We performed a comprehensive simulation study to verify that APL-OSA has the correct type I error rate under the null hypothesis. This simulation study also showed that APL-OSA can increase power relative to other commonly used association tests (APL, FBAT and FBAT with covariate adjustment) in the presence of genetic heterogeneity. Finally, we applied APL-OSA to a family study of age-related macular degeneration, where cigarette smoking was used as a covariate.  相似文献   

14.
Association mapping based on family studies can identify genes that influence complex human traits while providing protection against population stratification. Because no gene is likely to have a very large effect on a complex trait, most family studies have limited power. Among the commonly used family-based tests of association for quantitative traits, the quantitative transmission-disequilibrium tests (QTDT) based on the variance-components model is the most flexible and most powerful. This method assumes that the trait values are normally distributed. Departures from normality can inflate the type I error and reduce the power. Although the family-based association tests (FBAT) and pedigree disequilibrium tests (PDT) do not require normal traits, nonnormality can also result in loss of power. In many cases, approximate normality can be achieved by transforming the trait values. However, the true transformation is unknown, and incorrect transformations may compromise the type I error and power. We propose a novel class of association tests for arbitrarily distributed quantitative traits by allowing the true transformation function to be completely unspecified and empirically estimated from the data. Extensive simulation studies showed that the new methods provide accurate control of the type I error and can be substantially more powerful than the existing methods. We applied the new methods to the Collaborative Study on the Genetics of Alcoholism and discovered significant association of single nucleotide polymorphisms (SNP) tsc0022400 on chromosome 7 with the quantitative electrophysiological phenotype TTTH1, which was not detected by any existing methods. We have implemented the new methods in a freely available computer program.  相似文献   

15.
With the advent of dense single nucleotide polymorphism genotyping, population-based association studies have become the major tools for identifying human disease genes and for fine gene mapping of complex traits. We develop a genotype-based approach for association analysis of case-control studies of gene-environment interactions in the case when environmental factors are measured with error and genotype data are available on multiple genetic markers. To directly use the observed genotype data, we propose two genotype-based models: genotype effect and additive effect models. Our approach offers several advantages. First, the proposed risk functions can directly incorporate the observed genotype data while modeling the linkage disequilibrium information in the regression coefficients, thus eliminating the need to infer haplotype phase. Compared with the haplotype-based approach, an estimating procedure based on the proposed methods can be much simpler and significantly faster. In addition, there is no potential risk due to haplotype phase estimation. Further, by fitting the proposed models, it is possible to analyze the risk alleles/variants of complex diseases, including their dominant or additive effects. To model measurement error, we adopt the pseudo-likelihood method by Lobach et al. [2008]. Performance of the proposed method is examined using simulation experiments. An application of our method is illustrated using a population-based case-control study of association between calcium intake with the risk of colorectal adenoma development.  相似文献   

16.
We describe a novel method for assessing the strength of disease association with single nucleotide polymorphisms (SNPs) in a candidate gene or small candidate region, and for estimating the corresponding haplotype relative risks of disease, using unphased genotype data directly. We begin by estimating the relative frequencies of haplotypes consistent with observed SNP genotypes. Under the Bayesian partition model, we specify cluster centres from this set of consistent SNP haplotypes. The remaining haplotypes are then assigned to the cluster with the "nearest" centre, where distance is defined in terms of SNP allele matches. Within a logistic regression modelling framework, each haplotype within a cluster is assigned the same disease risk, reducing the number of parameters required. Uncertainty in phase assignment is addressed by considering all possible haplotype configurations consistent with each unphased genotype, weighted in the logistic regression likelihood by their probabilities, calculated according to the estimated relative haplotype frequencies. We develop a Markov chain Monte Carlo algorithm to sample over the space of haplotype clusters and corresponding disease risks, allowing for covariates that might include environmental risk factors or polygenic effects. Application of the algorithm to SNP genotype data in an 890-kb region flanking the CYP2D6 gene illustrates that we can identify clusters of haplotypes with similar risk of poor drug metaboliser (PDM) phenotype, and can distinguish PDM cases carrying different high-risk variants. Further, the results of a detailed simulation study suggest that we can identify positive evidence of association for moderate relative disease risks with a sample of 1,000 cases and 1,000 controls.  相似文献   

17.
Exploring the associations between haplotypes and disease phenotypes is an important step toward the discovery of genes that influence complex human diseases. When unrelated subjects are sampled, haplotypes are often ambiguous because of the unknown gametic phase of the measured sites along a chromosome. We consider cohort studies of unrelated subjects which collect data on potentially censored ages of onset of disease along with unphased genotypes and possibly time-varying environmental factors. We formulate the effects of haplotypes and environmental variables on the time to disease occurrence through a semiparametric Cox proportional hazards model, which can accommodate a variety of genetic mechanisms as well as gene-environment interactions. We develop a simple and fast expectation-maximization algorithm to maximize the likelihood for the relative risks and other parameters based on the observable data of unphased genotypes and potentially censored ages of onset. The resultant estimators are consistent, efficient, and asymptotically normal. Simulation studies show that, for practical situations, the parameter estimators are virtually unbiased, the association tests maintain type I errors near nominal levels, the confidence intervals have proper coverage probabilities, and the efficiency loss due to unknown gametic phase is small.  相似文献   

18.
Problems associated with insufficient power have haunted the analysis of genome‐wide association studies and are likely to be the main challenge for the analysis of next‐generation sequencing data. Ranking genes according to their strength of association with the investigated phenotype is one solution. To obtain rankings for genes, researchers can draw from a wide range of statistics summarizing the relationships between variants mapped to a gene and the phenotype. Hence, it is of interest to explore the performance of these statistics in the context of rankings. To this end, we conducted a simulation study (limited to genes of equal sizes) of three different summary statistics examining the ability to rank genes in a meaningful order. The weighted sum of squared marginal score test (Pan, 2009), RareCover algorithm (Bahtia et al., 2010) and the elastic net regularization (Zou and Hastie, 2005) were chosen, because they can handle common as well as rare variants. The test based on the score statistic outperformed both other methods in almost all investigated scenarios. It was the only measure to consistently detect genes with interacting causal variants. However, the RareCover algorithm proved better at identifying genes including causal variants with small effect sizes and low minor allele frequency than the weighted sum of squared marginal score test. The performance of the elastic net regularization was unimpressive for all but the simplest scenarios. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

19.
The genetic dissection of complex human diseases requires large-scale association studies which explore the population associations between genetic variants and disease phenotypes. DNA pooling can substantially reduce the cost of genotyping assays in these studies, and thus enables one to examine a large number of genetic variants on a large number of subjects. The availability of pooled genotype data instead of individual data poses considerable challenges in the statistical inference, especially in the haplotype-based analysis because of increased phase uncertainty. Here we present a general likelihood-based approach to making inferences about haplotype-disease associations based on possibly pooled DNA data. We consider cohort and case-control studies of unrelated subjects, and allow arbitrary and unequal pool sizes. The phenotype can be discrete or continuous, univariate or multivariate. The effects of haplotypes on disease phenotypes are formulated through flexible regression models, which allow a variety of genetic hypotheses and gene-environment interactions. We construct appropriate likelihood functions for various designs and phenotypes, accommodating Hardy-Weinberg disequilibrium. The corresponding maximum likelihood estimators are approximately unbiased, normally distributed, and statistically efficient. We develop simple and efficient numerical algorithms for calculating the maximum likelihood estimators and their variances, and implement these algorithms in a freely available computer program. We assess the performance of the proposed methods through simulation studies, and provide an application to the Finland-United States Investigation of NIDDM Genetics Study. The results show that DNA pooling is highly efficient in studying haplotype-disease associations. As a by-product, this work provides valid and efficient methods for estimating haplotype-disease associations with unpooled DNA samples.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号