首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
Hao K  Liu S  Niu T 《Genetic epidemiology》2005,29(4):336-352
Single nucleotide polymorphisms (SNPs) play a central role in the identification of susceptibility genes for common diseases. Recent empirical studies on human genome have revealed block-like structures, and each block contains a set of haplotype tagging SNPs (htSNPs) that capture a large fraction of the haplotype diversity. Herein, we present an innovative sparse marker extension tree (SMET) algorithm to select optimal htSNP set(s). SMET reduces the search space considerably (compared to full enumeration strategy), and therefore improves computing efficiency. We tested this algorithm on several datasets at three different genomic scales: (1) gene-wide (NOS3, CRP, IL6 PPARA, and TNF), (2) region-wide (a Whitehead Institute inflammatory bowel disease dataset and a UK Graves' disease dataset), and (3) chromosome-wide (chromosome 22) levels. SMET offers geneticists with greater flexibilities in SNP tagging than lossless methods with adjustable haplotype diversity coverage (phi). In simulation studies, we found that (1) an initial sample size of 50 individuals (100 chromosomes) or more is needed for htSNP selection; (2) the SNP tagging strategy is considerably more efficient when the underlying block structure is taken into account; and (3) htSNP sets at 80-90% phi are more cost-effective than the lossless sets in term of relative power, relative risk ratio estimation, and genotyping efforts. Our study suggests that the novel SMET algorithm is a valuable tool for association tests.  相似文献   

2.
Linkage disequilibrium (LD) in the human genome, often measured as pairwise correlation between adjacent markers, shows substantial spatial heterogeneity. Congruent with these results, studies have found that certain regions of the genome have far less haplotype diversity than expected if the alleles at multiple markers were independent, while other sets of adjacent markers behave almost independently. Regions with limited haplotype diversity have been described as "blocked" or "haplotype blocks." In this article, we propose a new method that aims to distinguish between blocked and unblocked regions in the genome. Like some other approaches, the method analyses haplotype diversity. Unlike other methods, it allows for adjacent, distinct blocks and also multiple, independent single nucleotide polymorphisms (SNPs) separating blocks. Based on an approximate likelihood model and a parsimony criterion to penalize for model complexity, the method partitions a genomic region into blocks relatively quickly, and simulations suggest that its partitions are accurate. We also propose a new, efficient method to select SNPs for association analysis, namely tag SNPs. These methods compare favorably to similar blocking and tagging methods using simulations.  相似文献   

3.
Liu Z  Lin S 《Genetic epidemiology》2005,29(4):353-364
Linkage disequilibrium (LD) plays a central role in fine mapping of disease genes and, more recently, in characterizing haplotype blocks. Classical LD measures, such as D' and r(2), are frequently used to quantify relationship between two loci. A pairwise "distance" matrix among a set of loci can be constructed using such a measure, and based upon which a number of haplotype block detection and tagging single nucleotide polymorphism (SNP) selection algorithms have been devised. Although successful in many applications, the pairwise nature of these measures does not provide a direct characterization of joint linkage disequilibrium among multiple loci. Consequently, applications based on them may lead to loss of important information. In this report, we propose a multilocus LD measure based on generalized mutual information, which is also known as relative entropy or Kullback-Leibler distance. In essence, this measure seeks to quantify the distance between the observed haplotype distribution and the expected distribution assuming linkage equilibrium. We can show that this measure is approximately equal to r(2) in the special case with two loci. Based on this multilocus LD measure and an entropy measure that characterizes haplotype diversity, we propose a class of stepwise tagging SNP selection algorithms. This represents a unified approach for SNP selection in that it takes into account both the haplotype diversity and linkage disequilibrium objectives. Applications to both simulated and real data demonstrate the utility of the proposed methods for handling a large number of SNPs. The results indicate that multilocus LD patterns can be captured well, and informative and nonredundant SNPs can be selected effectively from a large set of loci.  相似文献   

4.
5.
The pattern and nature of linkage disequilibrium in the human genome is being studied and catalogued as part of the International HapMap Project [:2003 Nature 426:789-796]. A key goal of the HapMap Project is to enable identification of tag single nucleotide polymorphisms (SNPs) that capture a substantial portion of common human genetic variability while requiring only a small fraction of SNPs to be genotyped [International HapMap Consortium, 2005: Nature 437:1299-1320]. In the current study, we examined the effectiveness of using the CEU HapMap database to select tag SNPs for a Finnish sample. We selected SNPs in a 17.9-Mb region of chromosome 14 based on pairwise linkage disequilibrium (r(2)) estimates from the HapMap CEU sample, and genotyped 956 of these SNPs in 1,425 Finnish individuals. An excess of SNPs showed significantly different allele frequencies between the HapMap CEU and the Finnish samples, consistent with population-specific differences. However, we observed strong correlations between the two samples for estimates of allele frequencies, r(2) values, and haplotype frequencies. Our results demonstrate that the HapMap CEU samples provide an adequate basis for tag SNP selection in Finnish individuals, without the need to create a map specifically for the Finnish population, and suggest that the four-population HapMap data will provide useful information for tag SNP selection beyond the specific populations from which they were sampled.  相似文献   

6.
Candidate gene association studies often utilize one single nucleotide polymorphism (SNP) for analysis, with an initial report typically not being replicated by subsequent studies. The failure to replicate may result from incomplete or poor identification of disease-related variants or haplotypes, possibly due to naive SNP selection. A method for identification of linkage disequilibrium (LD) groups and selection of SNPs that capture sufficient intra-genic genetic diversity is described. We assume all SNPs with minor allele frequency above a pre-determined frequency have been identified. Principal component analysis (PCA) is applied to evaluate multivariate SNP correlations to infer groups of SNPs in LD (LD-groups) and to establish an optimal set of group-tagging SNPs (gtSNPs) that provide the most comprehensive coverage of intra-genic diversity while minimizing the resources necessary to perform an informative association analysis. This PCA method differs from haplotype block (HB) and haplotype-tagging SNP (htSNP) methods, in that an LD-group of SNPs need not be a contiguous DNA fragment. Results of the PCA method compared well with existing htSNP methods while also providing advantages over those methods, including an indication of the optimal number of SNPs needed. Further, evaluation of the method over multiple replicates of simulated data indicated PCA to be a robust method for SNP selection. Our findings suggest that PCA may be a powerful tool for establishing an optimal SNP set that maximizes the amount of genetic variation captured for a candidate gene using a minimal number of SNPs.  相似文献   

7.
The large number of markers considered in a genome‐wide association study (GWAS) has resulted in a simplification of analyses conducted. Most studies are analyzed one marker at a time using simple tests like the trend test. Methods that account for the special features of genetic association studies, yet remain computationally feasible for genome‐wide analysis, are desirable as they may lead to increased power to detect associations. Haplotype sharing attempts to translate between population genetics and genetic epidemiology. Near a recent mutation that increases disease risk, haplotypes of case participants should be more similar to each other than haplotypes of control participants; conversely, the opposite pattern may be found near a recent mutation that lowers disease risk. We give computationally simple association tests based on haplotype sharing that can be easily applied to GWASs while allowing use of fast (but not likelihood‐based) haplotyping algorithms and properly accounting for the uncertainty introduced by using inferred haplotypes. We also give haplotype‐sharing analyses that adjust for population stratification. Applying our methods to a GWAS of Parkinson's disease, we find a genome‐wide significant signal in the CAST gene that is not found by single‐SNP methods. Further, a missing‐data artifact that causes a spurious single‐SNP association on chromosome 9 does not impact our test. Genet. Epidemiol. 33:657–667, 2009. Published 2009 Wiley‐Liss, Inc.  相似文献   

8.
We introduce a haplotype‐sharing correlation in founder haplotypes for use in genome scanning. The method evaluates the correlation between phenotype similarity and haplotype similarity at each candidate location. When applied to Genetic Analysis Workshop 12 simulated data for disease status, age at onset, and quantitative traits Q1–Q5, we found highly significant signals near four simulated disease loci in genome scans using microsatellite marker data and highly significant gene effects in three casual genes using sequence data. © 2001 Wiley‐Liss, Inc.  相似文献   

9.
We describe a novel method for assessing the strength of disease association with single nucleotide polymorphisms (SNPs) in a candidate gene or small candidate region, and for estimating the corresponding haplotype relative risks of disease, using unphased genotype data directly. We begin by estimating the relative frequencies of haplotypes consistent with observed SNP genotypes. Under the Bayesian partition model, we specify cluster centres from this set of consistent SNP haplotypes. The remaining haplotypes are then assigned to the cluster with the "nearest" centre, where distance is defined in terms of SNP allele matches. Within a logistic regression modelling framework, each haplotype within a cluster is assigned the same disease risk, reducing the number of parameters required. Uncertainty in phase assignment is addressed by considering all possible haplotype configurations consistent with each unphased genotype, weighted in the logistic regression likelihood by their probabilities, calculated according to the estimated relative haplotype frequencies. We develop a Markov chain Monte Carlo algorithm to sample over the space of haplotype clusters and corresponding disease risks, allowing for covariates that might include environmental risk factors or polygenic effects. Application of the algorithm to SNP genotype data in an 890-kb region flanking the CYP2D6 gene illustrates that we can identify clusters of haplotypes with similar risk of poor drug metaboliser (PDM) phenotype, and can distinguish PDM cases carrying different high-risk variants. Further, the results of a detailed simulation study suggest that we can identify positive evidence of association for moderate relative disease risks with a sample of 1,000 cases and 1,000 controls.  相似文献   

10.
Association tests based on multi-marker haplotypes may be more powerful than those based on single markers. The existing association tests based on multi-marker haplotypes include Pearson's chi2 test which tests for the difference of haplotype distributions in cases and controls, and haplotype-similarity based methods which compare the average similarity among cases with that of the controls. In this article, we propose new association tests based on haplotype similarities. These new tests compare the average similarities within cases and controls with the average similarity between cases and controls. These methods can be applied to either phase-known or phase-unknown data. We compare the performance of the proposed methods with Pearson's chi2 test and the existing similarity-based tests by simulation studies under a variety of scenarios and by analyzing a real data set. The simulation results show that, in most cases, the new proposed methods are more powerful than both Pearson's chi2 test and the existing similarity-based tests. In one extreme case where the disease mutant induced at a very rare haplotype (相似文献   

11.
Moderately dense maps of single-nucleotide polymorphism (SNP) markers across the human genome for both the simulated data set and data from the Collaborative Study of the Genetics of Alcoholism were available at Genetic Analysis Workshop 14 for the first time. This allowed examination of various novel and existing methods for haplotype analyses. Three contributors applied Mantel statistics in different ways for both linkage and association analysis by using the shared length between two haplotypes at a marker locus as a measure of genetic similarity. The results indicate that haplotype-sharing based on Mantel statistics can be a powerful approach and needs further methodological evaluation. Four contributors investigated haplotype-tagging SNP (htSNP) selection procedures, two contributors examined the use of multilocus haplotypes compared to single loci in association tests, and two contributors compared the accuracy of various methods for reconstructing haplotypes and estimating haplotype frequencies for both pedigree data and data from unrelated individuals. For all three different tasks, software packages and procedures gave similar results in regions of high linkage disequilibrium (LD). However, they were not as consistent in regions of moderate to low LD. One coalescence-based approach for estimating haplotype frequencies, coupled with a Markov chain Monte Carlo technique, outperformed the other haplotype frequency estimation methods in regions of low LD. In conclusion, regardless of the task, results were similar in chromosomal regions of high LD. However, based on the differing results observed here, methodological improvements are required for chromosomal regions of low to moderate LD.  相似文献   

12.
Taking advantage of increasingly available high-density single nucleotide polymorphism (SNP) markers within genes and across genomes, more and more genetic association studies began to use multiple closely linked markers in candidate genes. A practical analytical challenge arising in such studies is the possibility that not all case chromosomes have inherited disease-causing mutations from a common ancestral chromosome (founder heterogeneity). To alleviate the problem, we propose a method that applies a clustering algorithm to haplotype similarity analysis. The method identifies a sequence of nested subsets of case chromosomes by a peeling procedure, where each subset is relatively homogeneous. The average similarity score estimated from each subset in the sequence is compared to that estimated in controls, and a raw (unadjusted for multiple comparisons) P value is obtained. The test for the association between the trait and the candidate gene is based on the minimum raw P value observed in the comparison sequence, with its significance level estimated by a permutation procedure. The method can be applied to both haplotype and genotype data. Simulation studies suggest that our method has the correct type I error rate, and is generally more powerful than existing methods of haplotype similarity analysis.  相似文献   

13.
This study reports results of an extensive and comprehensive study of genetic diversity in 12 genes of the innate immune system in a population of eastern India. Genomic variation was assayed in 171 individuals by resequencing approximately 75kb of DNA comprising these genes in each individual. Almost half of the 548 DNA variants discovered was novel. DNA sequence comparisons with human and chimpanzee reference sequences revealed evolutionary features indicative of natural selection operating among individuals, who are residents of an area with a high load of microbial and other pathogens. Significant differences in allele and haplotype frequencies of the study population were observed with the HapMap populations. Gene and haplotype diversities were observed to be high. The genetic positioning of the study population among the HapMap populations based on data of the innate immunity genes substantially differed from what has been observed for Indian populations based on data of other genes. The reported range of variation in SNP density in the human genome is one SNP per 1.19kb (chromosome 22) to one SNP per 2.18kb (chromosome 19). The SNP density in innate immunity genes observed in this study (>3SNPskb(-1)) exceeds the highest density observed for any autosomal chromosome in the human genome. The extensive genomic variation and the distinct haplotype structure of innate immunity genes observed among individuals have possibly resulted from the impact of natural selection.  相似文献   

14.
We report the results of our analysis of the Genetic Analysis Workshop 12 simulated data set. Focusing on the isolated populations, we compare the efficiency of a new method, the maximum identity length contrast statistic (MILC) with the maximum likelihood score (MLS) in a genome screen strategy. MILC is a method based on the contrast of haplotype identity between transmitted and nontransmitted haplotypes in trios. It uses information on linkage and association. We found that MILC allows the detection of a risk factor corresponding to candidate gene 1 where the MLS fails, though the same population replicates were used. Interestingly, the association between this risk factor and the disease could not have been detected with the TDT at a genome‐wide level. © 2001 Wiley‐Liss, Inc.  相似文献   

15.
The concept of haplotype sharing (HS) has received considerable attention recently, and several haplotype association methods have been proposed. Here, we extend the work of Beckmann and colleagues [2005 Hum. Hered. 59:67-78] who derived an HS statistic (BHS) as special case of Mantel's space-time clustering approach. The Mantel-type HS statistic correlates genetic similarity with phenotypic similarity across pairs of individuals. While phenotypic similarity is measured as the mean-corrected cross product of phenotypes, we propose to incorporate information of the underlying genetic model in the measurement of the genetic similarity. Specifically, for the recessive and dominant modes of inheritance we suggest the use of the minimum and maximum of shared length of haplotypes around a marker locus for pairs of individuals. If the underlying genetic model is unknown, we propose a model-free HS Mantel statistic using the max-test approach. We compare our novel HS statistics to BHS using simulated case-control data and illustrate its use by re-analyzing data from a candidate region of chromosome 18q from the Rheumatoid Arthritis (RA) Consortium. We demonstrate that our approach is point-wise valid and superior to BHS. In the re-analysis of the RA data, we identified three regions with point-wise P-values<0.005 containing six known genes (PMIP1, MC4R, PIGN, KIAA1468, TNFRSF11A and ZCCHC2) which might be worth follow-up.  相似文献   

16.
Genetically complex diseases are caused by interacting environmental factors and genes. As a consequence, statistical methods that consider multiple unlinked genomic regions simultaneously are desirable. Such consideration, however, may lead to a vast number of different high-dimensional tests whose appropriate analysis pose a problem. Here, we present a method to analyze case-control studies with multiple SNP data without phase information that considers gene-gene interaction effects while correcting appropriately for multiple testing. In particular, we allow for interactions of haplotypes that belong to different unlinked regions, as haplotype analysis often proves to be more powerful than single marker analysis. In addition, we consider different marker combinations at each unlinked region. The multiple testing issue is settled via the minP approach; the P value of the "best" marker/region configuration is corrected via Monte-Carlo simulations. Thus, we do not explicitly test for a specific pre-defined interaction model, but test for the global hypothesis that none of the considered haplotype interactions shows association with the disease. We carry out a simulation study for case-control data that confirms the validity of our approach. When simulating two-locus disease models, our test proves to be more powerful than association methods that analyze each linked region separately. In addition, when one of the tested regions is not involved in the etiology of the disease, only a small amount of power is lost with interaction analysis as compared to analysis without interaction. We successfully applied our method to a real case-control data set with markers from two genes controlling a common pathway. While classical analysis failed to reach significance, we obtained a significant result even after correction for multiple testing with our proposed haplotype interaction analysis. The method described here has been implemented in FAMHAP.  相似文献   

17.
It is well recognized that multiple genes are likely contributing to the susceptibility of most common complex diseases. Studying one gene at a time might reduce our chance to identify disease susceptibility genes with relatively small effect sizes. Therefore, it is crucial to develop statistical methods that can assess the effect of multiple genes collectively. Motivated by the increasingly available high-density markers across the whole human genome, we propose a class of TDT-type methods that can jointly analyze haplotypes from multiple candidate genes (linked or unlinked). Our approach first uses a linear signed rank statistic to compare at an individual gene level the structural similarity among transmitted haplotypes against that among non-transmitted haplotypes. The results of the ranked comparisons from all considered genes are subsequently combined into global statistics, which can simultaneously test the association of the set of genes with the disease. Using simulation studies, we find that the proposed tests yield correct type I error rates in stratified populations. Compared with the gene-by-gene test, the new global tests appear to be more powerful in situations where all candidate genes are associated with the disease.  相似文献   

18.
Maximum-likelihood estimation of haplotype frequencies in nuclear families   总被引:13,自引:0,他引:13  
The importance of haplotype analysis in the context of association fine mapping of disease genes has grown steadily over the last years. Since experimental methods to determine haplotypes on a large scale are not available, phase has to be inferred statistically. For individual genotype data, several reconstruction techniques and many implementations of the expectation-maximization (EM) algorithm for haplotype frequency estimation exist. Recent research work has shown that incorporating available genotype information of related individuals largely increases the precision of haplotype frequency estimates. We, therefore, implemented a highly flexible program written in C, called FAMHAP, which calculates maximum likelihood estimates (MLEs) of haplotype frequencies from general nuclear families with an arbitrary number of children via the EM-algorithm for up to 20 SNPs. For more loci, we have implemented a locus-iterative mode of the EM-algorithm, which gives reliable approximations of the MLEs for up to 63 SNP loci, or less when multi-allelic markers are incorporated into the analysis. Missing genotypes can be handled as well. The program is able to distinguish cases (haplotypes transmitted to the first affected child of a family) from pseudo-controls (non-transmitted haplotypes with respect to the child). We tested the performance of FAMHAP and the accuracy of the obtained haplotype frequencies on a variety of simulated data sets. The implementation proved to work well when many markers were considered and no significant differences between the estimates obtained with the usual EM-algorithm and those obtained in its locus-iterative mode were observed. We conclude from the simulations that the accuracy of haplotype frequency estimation and reconstruction in nuclear families is very reliable in general and robust against missing genotypes.  相似文献   

19.
The role of haplotypes in candidate gene studies   总被引:24,自引:0,他引:24  
Human geneticists working on systems for which it is possible to make a strong case for a set of candidate genes face the problem of whether it is necessary to consider the variation in those genes as phased haplotypes, or whether the one-SNP-at-a-time approach might perform as well. There are three reasons why the phased haplotype route should be an improvement. First, the protein products of the candidate genes occur in polypeptide chains whose folding and other properties may depend on particular combinations of amino acids. Second, population genetic principles show us that variation in populations is inherently structured into haplotypes. Third, the statistical power of association tests with phased data is likely to be improved because of the reduction in dimension. However, in reality it takes a great deal of extra work to obtain valid haplotype phase information, and inferred phase information may simply compound the errors. In addition, if the causal connection between SNPs and a phenotype is truly driven by just a single SNP, then the haplotype-based approach may perform worse than the one-SNP-at-a-time approach. Here we examine some of the factors that affect haplotype patterns in genes, how haplotypes may be inferred, and how haplotypes have been useful in the context of testing association between candidate genes and complex traits.  相似文献   

20.
The availability of high-density haplotype data has motivated several fine-scale linkage disequilibrium mapping methods for locating disease-causing mutations. These methods identify loci around which haplotypes of case chromosomes exhibit greater similarity than do those of control chromosomes. A difficulty arising in such mapping is the possibility that case chromosomes have inherited disease-causing mutations from different ancestral chromosomes (founder heterogeneity). Such heterogeneity dilutes measures of case haplotype similarity. This dilution can be mitigated by separating case chromosomes into subsets according to their putative mutation origin, and searching for an area with excessive haplotype similarity within each subset. We propose a nonparametric method for identifying subsets of case chromosomes likely to share a common ancestral progenitor. By simulation studies and application to published data, we show that the method accurately identifies relatively large subsets of chromosomes that share a common founder. We also show that the method allows more precise estimates of the disease mutation loci than obtained by other fine-scale mapping methods.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号