首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 36 毫秒
1.
Recent studies have revealed that linkage disequilibrium (LD) patterns vary across the human genome with some regions of high LD interspersed by regions of low LD. A small fraction of SNPs (tag SNPs) is sufficient to capture most of the haplotype structure of the human genome. In this paper, we develop a method to partition haplotypes into blocks and to identify tag SNPs based on genotype data by combining a dynamic programming algorithm for haplotype block partitioning and tag SNP selection based on haplotype data with a variation of the expectation maximization (EM) algorithm for haplotype inference. We assess the effects of using either haplotype or genotype data in haplotype block identification and tag SNP selection as a function of several factors, including sample size, density or number of SNPs studied, allele frequencies, fraction of missing data, and genotyping error rate, using extensive simulations. We find that a modest number of haplotype or genotype samples will result in consistent block partitions and tag SNP selection. The power of association studies based on tag SNPs using genotype data is similar to that using haplotype data.  相似文献   

2.
First generation linkage disequilibrium (LD) and haplotype maps of the human major histocompatibility complex (MHC) have been generated in order to aid the unraveling of the numerous disease predisposing genes in this region by offering a first set of haplotype tagSNPs. Several parameters, like the population studied, the marker map used, the density of polymorphisms and the applied algorithm, are influencing the appearance of haplotype blocks and selection of tags. The MHC comprises a limited number of ancestral, conserved haplotypes. We address the impact of the underlying HLA haplotypes on the LD patterns, haplotype blocks and tag selection throughout the entire extended MHC (xMHC) by studying DR-DQ haplotypes, mainly those carrying DRB1*03 and DRB1*04 alleles. We observed significantly different degree and extent of LD calculated on different HLA backgrounds, as well as variation in the size and boundaries of the defined haplotype and tags selected. Our results demonstrate that the underlying ancestral HLA haplotypic architecture is yet another parameter to take into consideration when constructing LD maps of the xMHC. This may be essential for mapping of disease susceptibility genes since many diseases are associated with and map on particular HLA haplotypes.  相似文献   

3.
人类基因组中单倍型(haplotype)和单倍域(haplotype block)的结构提供了人类进化的宝贵信息,并成为发现人类复杂疾病易感基因的有效策略。一个单倍域可分割成多个具有有限单倍型多样性的离散的区域,代表每个区域结构特征的少量标签单核苷酸多态性(tag single nucleotide polymorphism,tSNP)可使绝大部分单倍型相互区分开来。因此,标签SNP在单倍型和单倍域的构建和关联研究中具有重要地位。构建单倍型和单倍域的方法分为两类,分别是基于大家系中基因分型数据和基于统计学的算法。通过系统回顾几种单倍型和单倍域的构建方法,了解它们在不同的疾病模型或根据不同的分割标准,进行关联研究的检验效能,客观评价每种方法的优、缺点、应用前景及其在关联研究中的应用。随着国际人类基因组单倍型图的完成和单倍型构建统计学运算规则的完善,融合数学、物理学、计算机科学等学科的单倍型构建方法将对人类遗传学、复杂疾病易感基因的定位和克隆鉴定等生命科学的相关领域产生深远的影响。  相似文献   

4.
OBJECTIVE: To investigate the frequencies of -1516,-574 and 4259 single nucleotide polymorphisms (SNPs) of T cells immunoglobulin mucin -3(TIM-3) gene in Hubei population and address the question whether they are in linkage disequilibrium(LD) . METHODS: Genotypes and allele frequencies of TIM-3 gene were examined by allele-specific polymerase chain reaction (AS-PCR) methods in 147 healthy Hubei Han individuals. Hardy-Weinberg equilibrium and Two-point LD analyses and haplotype frequencies were evaluated with Arlequin v3.1 software. RESULTS: The allele frequencies of the 3 SNPs were in agreement with Hardy-Weinberg equilibrium. Minor allelic frequencies of TIM-3 -1516G/T,-574T/G and 4259G/T were 8.5%,1.0% and 2.0%,respectively. The dominant haplotypes comprising the three loci were G-G-G(2.0%),G-G-T(88.4%), T-G-T(8.5%) and G-T-T(1.0%). LD analyses revealed that all of the coefficient of linkage disequilibrium (D') were 1. CONCLUSION: The -1516,-574 and 4259 loci of TIM-3 gene are in complete linkage disequilibrium. Our study has provided population genetic data on TIM-3 gene in Chinese Hubei Han population and a basis for searching immune-mediated disease-related TIM-3 haplotype.  相似文献   

5.
The impact of SNP density on fine-scale patterns of linkage disequilibrium   总被引:19,自引:0,他引:19  
Linkage disequilibrium (LD) is a measure of the degree of association between alleles in a population. The detection of disease-causing variants by association with neighbouring single nucleotide polymorphisms (SNPs) depends on the existence of strong LD between them. Previous studies have indicated that the extent of LD is highly variable in different chromosome regions and different populations, demonstrating the importance of genome-wide accurate measurement of LD at high resolution throughout the human genome. A uniform feature of these studies has been the inability to detect LD in regions of low marker density. To investigate the dependence of LD patterns on marker selection we performed a high-resolution study in African-American, Asian and UK Caucasian populations. We selected over 5000 SNPs with an average spacing of approximately 1 SNP per 2 kb after validating ca 12 000 SNPs derived from a dense SNP collection (1 SNP per 0.3 kb on average). Applications of different statistical methods of LD assessment highlight similar areas of high and low LD. However, at high resolution, features such as overall sequence coverage in LD blocks and block boundaries vary substantially with respect to marker density. Model-based linkage disequilibrium unit (LDU) maps appear robust to marker density and consistently influenced by marker allele frequency. The results suggest that very dense marker sets will be required to yield stable views of fine-scale LD in the human genome.  相似文献   

6.
BACKGROUND AND METHODS: Numerous genetic studies have mapped asthma susceptibility genes to a region on chromosome 5q31-33 in several populations. This region contains a cluster of cytokines and other immune-related genes important in immune response. In the present study, to determine the genetic variations and patterns of linkage disequilibrium (LD), we resequenced all the exons and promoter regions of the 29 asthma candidate genes in the chromosome 5q31-33 region. RESULTS: We identified a total of 314 genetic variants, including 289 single nucleotide polymorphisms (SNPs), 22 insertion/deletion polymorphisms and 3 microsatellites. Standardized variance data for allele frequency revealed substantial differences in SNP allele frequencies among different ethnic groups. Interestingly, significant ethnic differences were observed mainly in intron SNPs. LD block analysis using 174 common SNPs with a frequency of >10% disclosed strong LD within most candidate genes. No significant LD was observed across genes, except for one LD block (CD14-IK block). Gene-based haplotype analyses showed that 1-5 haplotype-tagging SNPs may be used to define the six or fewer common haplotypes with a frequency of >5%, regardless of the number of SNPs. CONCLUSION: Overall, our results provide useful information for the identification of immune-mediated disease genes in the chromosome 5q31-33 region, as well as valuable evidence for gene-based haplotype analysis in disease association studies.  相似文献   

7.
A principal goal in human genetics is to provide the tools necessary to enable genome-wide association studies. Extensive information on the distribution of gene-based single-nucleotide polymorphisms (SNPs) and linkage disequilibrium (LD) patterns across the genome is required in order to choose markers for efficient implementation of this approach. To obtain such information, we have genotyped a large Japanese cohort for SNPs identified by systematic resequencing of more than 14 000 autosomal genes. Analysis of these data led to the conclusion that the Japanese population contains approximately 130 000 common autosomal gene haplotypes (frequency >0.05), of which more than 35% are identified in the present study. We also examined allele frequencies and LD patterns according to the position of variants within genes, and their distribution across the genome. We found lower allele variability at exonic SNP sites (both non-synonymous and synonymous) compared with non-exonic SNP sites, and greater average LD between SNPs within exons of the same gene compared with other SNP combinations, both of which could be signals of selection. LD was correlated with the recombination rate per physical distance as estimated from the meiotic map, but the strength of the relationship varied considerably in different regions of the genome. Unique LD patterns, characterized by frequent instances of high LD between non-adjacent SNPs punctuated by blocks of low LD, were found in a 7 Mb region on chromosome 6p that includes the MHC (major histocompatibility complex) locus and many non-MHC genes. These results demonstrate the complexity that must be taken into account when considering SNP variability and LD patterns, while also providing tools necessary for implementation of efficient genome-wide association studies.  相似文献   

8.
There is currently considerable interest in the use of single‐nucleotide polymorphisms (SNPs) to map disease susceptibility genes. The success of this method will depend on a number of factors including the strength of linkage disequilibrium (LD) between marker and disease loci. We used a data set of SNP genotypings in the region of the APOE disease susceptibility locus to investigate the likely usefulness of SNPs in case‐control studies. Using the estimated haplotype structure surrounding and including the APOE locus, and assuming a codominant disease model, we treated each SNP in turn as if it were a disease susceptibility locus and obtained, for each disease locus and markers, the expected likelihood ratio test (LRT) to assess disease association.We were particularly interested in the power to detect association with the susceptibility polymorphism itself, the power of nearby markers to detect association, and the ability to distinguish between the susceptibility polymorphism and marker loci also showing association. We found that the expected LRT depended critically on disease allele frequencies. For disease loci with a reasonably common allele we were usually able to detect association. However, for only a subset of markers in the close neighbourhood of the disease locus was association detectable. In these cases we were usually, but not always, able to distinguish the disease locus from nearby associated marker loci. For some disease loci, no other loci demonstrated detectable association with the disease phenotype. We conclude that one may need to use very dense SNP maps in order to avoid overlooking polymorphisms affecting susceptibility to a common phenotype.  相似文献   

9.
We examined 13 single nucleotide polymorphisms (SNPs) spanning the coding region of the mu-opioid receptor gene (OPRM1), among 382 European Americans (EAs) affected with substance dependence [alcohol dependence (AD) and/or drug dependence (DD)] and 338 EA healthy controls. These SNPs delineated two haplotype blocks. Genotype distributions for all SNPs were in Hardy-Weinberg equilibrium (HWE) in controls, but in cases, four SNPs in Block I and three SNPs in Block II showed deviation from HWE. Significant differences were found between cases and controls in allele and/or genotype frequencies for six SNPs in Block I and two SNPs in Block II. Association of SNP4 in Block I with DD (allele: P=0.004), SNP5 in Block I with AD and DD (allele: P< or =0.005 for both) and two SNPs in Block II with AD (SNP11 genotype: P=0.002; SNP12 genotype: P=0.001) were significant after correction for multiple testing. Frequency distributions of haplotypes (constructed by five tag SNPs) differed significantly for cases and controls (P<0.001 for both AD and DD). Logistic regression analyses confirmed the association between OPRM1 variants and substance dependence, when sex and age of subjects and alleles, genotypes, haplotypes or diplotypes of five tag SNPs were considered. Population structure analyses excluded population stratification artifact. Additional supporting evidence for association between OPRM1 and AD was obtained in a smaller Russian sample (247 cases and 100 controls). These findings suggest that OPRM1 intronic variants play a role in susceptibility to AD and DD in populations of European ancestry.  相似文献   

10.
A Metric Linkage Disequilibrium Map of a Human Chromosome   总被引:4,自引:0,他引:4  
We used LDMAP ( Maniatis et al. 2002 ) to analyse SNP data spanning chromosome 22 ( Dawson et al. 2002 ), to obtain a whole‐chromosome metric LD map. The LD map, with map distances analogous to the centiMorgan scale of linkage maps, identifies regions of high LD as plateaus (‘blocks’) and characterises steps which define the relationship between these regions. From this map we estimate that block regions comprise between 32% and 55% of the euchromatic portion of chromosome 22 and that increasing marker density within steps may increase block coverage. Steps are regions of low LD which correspond to areas of variable recombination intensity. The intensity of recombination is related to the height of the step and thus intense recombination hot‐spots can be distinguished from more randomly distributed historical events. The LD maps are more closely related to the high‐resolution linkage map ( Kong et al. 2002 ) than average measures of ρ with recombination accounting for between 34% and 52% of the variance in patterns of LD (r = 0.58 – 0.71, p = 0.0001) . Step regions are closely correlated with a range of sequence motifs including GT/CA repeats. The LD map identifies holes in which greater marker density is required and defines the optimal SNP spacing for positional cloning, which suggests that some multiple of around 50,000 SNPs will be required to efficiently screen Caucasian genomes. Further analyses which investigate selection of informative SNPs and the effect of SNP allele frequency and marker density will refine this estimate.  相似文献   

11.
Haplotype tagging is a means of retaining most of the information in high density marker maps, while reducing genotyping requirements. Estimates of the numbers of tagging SNPs required to cover the human genome have varied widely, ranging from 100,000 to 1,000,000. Tagging has been applied to a number of gene-based datasets but has not been evaluated in contexts reflecting those of genome-wide association studies--large chromosome regions and multiple samples drawn from the same population. We analysed 5000 common markers across a 10 Mb segment of human chromosome 20 in three samples (UK Caucasian, CEPH Caucasian, African American) to evaluate tagging efficiency and consistency. Overall, the results indicate a high degree of efficiency, yielding 3-5-fold savings in Caucasians and 2-3-fold savings in African Americans. These levels varied according to linkage disequilibrium (LD) levels, tagging thresholds and allele frequencies, but in high LD regions they did not vary markedly due to marker density. However, a strong positive relationship between marker density and tagging was observed, relating to the fact that increasing marker density yields greater sequence coverage in high LD, thus requiring more tag SNPs to cover a greater fraction of the genome. Encouragingly, whatever the density employed, a high level of robustness was observed between UK and CEPH samples, as most of the htSNPs selected in one sample were also appropriate as tags in the other.  相似文献   

12.
连锁不平衡(LD)在基因定位中起着关键作用,因而是现代遗传研究中的一个重要工具。但是,连锁不平衡背景的估计受多种因素的影响,其中之一就是基因型错误。已经有研究考察了基因型错误对四个常用LD 指数D, r, Q, 和d的影响。而基因型错误对于定位数量性状位点(QTL)的LD 指数的效应还没有得到研究。本文从分析的角度调查了基因型错误存在时LD 指数lx 的性质。结果表明,指数lx 因依赖于基因型错误率因而值变小了,当基因型错误率达到0.03, lx值的改变量(减小值)将超过50%,特别是在标记基因频率比较高的时候。基因型错误的影响也通过基于血管紧张肽转化酶(ACE)基因的10个SNPs单体型频率的模拟研究得到了证实。  相似文献   

13.
Association studies of candidate genes with complex traits have generally used one or a few single nucleotide polymorphisms (SNPs), although variation in the extent of linkage disequilibrium (LD) within genes markedly influences the sensitivity and precision of association studies. The extent of LD and the underlying haplotype structure for most candidate genes are still unavailable. We sampled 193 blacks (African-Americans) and 160 whites (European-Americans) and estimated the intragenic LD and the haplotype structure in four genes of the renin-angiotensin system. We genotyped 25 SNPs, with all but one of the pairs spaced between 1 and 20 kb, thus providing resolution at small scale. The pattern of LD within a gene was very heterogeneous. Using a robust method to define haplotype blocks, blocks of limited haplotype diversity were identified at each locus; between these blocks, LD was lost owing to the history of recombination events. As anticipated, there was less LD among blacks, the number of haplotypes was substantially larger, and shorter haplotype segments were found, compared with whites. These findings have implications for candidate-gene association studies and indicate that variation between populations of European and African origin in haplotype diversity is characteristic of most genes.  相似文献   

14.
As part of a recent high-density linkage disequilibrium (LD) study of chromosome 20, we obtained genotypes for approximately 30,000 SNPs at a density of 1 SNP/2 kb on four different population samples (47 CEPH founders; 91 UK unrelateds [unrelated white individuals of western European ancestry]; 97 African Americans; 42 East Asians). We observed that approximately 50% of SNPs had at least one genetically indistinguishable partner; i.e., for every individual considered, their genotype at the first locus was identical to their genotype at the second locus, or in LD terms, the SNPs were in "perfect" LD (r2 = 1.0). These "genetically indistinguishable SNPs" (giSNPs) formed into clusters of varying size. The larger the cluster, the greater the tendency to be located within genes and to overlap with giSNP clusters in other population samples. As might be expected for this map density, many giSNPs were located close to one another, thus reflecting local regions of undetected recombination or haplotype blocks. However, approximately 1/3 of giSNP clusters had intermingled, non-indistinguishable SNPs with incomplete LD (D' and r2 <1), sometimes spanning hundreds of kilobases, comprising up to 70 indistinguishable markers and overlapping multiple haplotype blocks. These long-range, nonconsecutive giSNPs have implications for disease gene localization by allelic association as evidence for association at one locus will be indistinguishable from that at another locus, even though both loci may be situated far apart. We describe the distribution of giSNPs on this map of chromosome 20 and illustrate the potential impact they can have on association mapping.  相似文献   

15.
Alcohol response is a genetically influenced trait, and there is significant variation in the patterns of alcohol consumption between Māori and Caucasians in New Zealand. Previous studies have found that a variant of the alcohol dehydrogenase (ADH) gene (ADH1B*47His) is associated with protection against alcohol dependence in Māori. Here we extend our investigation of the ADH genes, hypothesising a different haplotype signature in Maori compared to Caucasians. We analysed nine single nucleotide polymorphisms (SNPs) spanning a 500-kb region on chromosome 4q surrounding the ADH1B variant and several other alcohol-metabolising genes (ADH 4, 5, 6, 7). Genotyping was carried out on 47 unrelated Māori individuals, and allele frequencies were compared to the Caucasian population. Large differences in minor allele frequencies were observed between Māori and Caucasian populations for six SNPs (P < 0.01). There was also strong linkage disequilibrium (LD) observed among SNP alleles in Maori indicating the presence of extended ancestral haplotype blocks (P < 0.01). Our results suggest that the Māori population has a different haplotype signature at the ADH gene region compared to Caucasians. These findings probably reflect the unique gene flow history of this genomic region in Maori and should be beneficial for designing future genetic association studies of alcohol-response traits and associated disorders in Polynesians.  相似文献   

16.
We investigated the population differences in patterns of single nucleotide polymorphisms (SNPs) for a 400 kb olfactory receptor (OR) gene cluster on human chromosome 17p13.3. Samples were drawn from 35 individuals, of four different ethnogeographical origins: Pygmies, Bedouins, Yemenite Jews and Ashkenazi Jews. Of the 74 SNPs identified, two segregated between pseudogenized and intact ORs, while a third involved a change in a highly conserved motif proposed to mediate ligand-induced signal transduction. Linkage disequilibrium (LD) was computed based on phase inference across the cluster using Clark's haplotype subtraction algorithm. We also calculated LD directly from the genotypes using the expectation-maximization (EM) algorithm. Both methods yielded very similar results. Our analyses revealed substantial differences in nucleotide diversity, haplotype distribution and LD patterns among the different human populations. In particular, the two Jewish populations had low haplotype diversity and negligible decay of LD across the entire genomic region. Intriguingly, the three functional SNPs segregated at different frequencies in the different ethnogeographical groups, with the Pygmies having higher frequencies of the intact OR genes. Our data suggests that OR genes may have evolved to create different functional repertoires in distinct human populations.  相似文献   

17.
Association studies using single nucleotide polymorphisms (SNPs) have the potential to help unravel the genetic basis of hypertension. Nevertheless, to date, association studies of hypertension have yielded ambiguous results. It is becoming clear that such association studies must be interpreted within the context of the genetic structure of the populations being studied, and patterns of variation within specific genomic regions. With this in mind we analyzed genetic variation in the G protein‐coupled receptor kinase 4 (GRK4) gene, a gene whose product has recently been shown to inhibit the dopamine receptor D1 (DRD1) from increasing sodium excretion. We genotyped three previously identified GRK4 SNPs, as well as ten additional SNPs, over 71.6 kb of the GRK4 locus in four populations: African Americans, Asians, Hispanics and Caucasians. Haplotype structure varied among populations, with Hispanics and Caucasians having the most linkage disequilibrium (LD) among SNPs. African Americans had three shorter haplotype blocks, while patterns of markers in the Asian populations demonstrated less LD among markers, a pattern inconsistent with block structure. We observed limited haplotype diversity in each of the four populations, with differing haplotype frequencies among the ethnic groups. We also found substantial evidence for population differentiation, with the largest differences between the African‐American and Asian samples with FST values in the upper 90th percentile when compared to a genome‐wide distribution. However, for all population comparisons, FST values decreased sharply in the 3′ region of the gene. This pattern of differentiation among populations is consistent with selection in this part of the gene maintaining similar patterns of variation among otherwise divergent populations. Our results document not only different allele frequencies between populations, but differences in haplotype structure that may be important in evaluating association studies between hypertension and GRK4.  相似文献   

18.
Genotyping costs still preclude analysis of a comprehensive SNP map in thousands of individual subjects in the search for disease susceptibility loci. Allele frequency estimation in DNA pools from cases and controls offers a partial solution, but variance in these estimates will result in some loss of statistical power. However, there has been no systematic attempt to quantify the several sources of error in previous studies. We report an analysis of the magnitude of variance components of each experimental stage in DNA pooling studies, and find that a design based on the formation of numerous small pools of approximately 50 individuals is superior to the formation of fewer, larger pools and the replication of any of the experimental stages. We conclude that this approach may retain an effective sample size greater than 68% of the true sample size, whilst offering a 60-fold reduction in DNA usage and a greater than 30-fold saving in cost, compared to individual genotyping. The possibility of combining pooling with informed selection of haplotype tag SNPs is also considered. In this way further savings in efficiency may be possible by using pooled allele frequency estimates to infer haplotype frequencies and hence, allele frequencies at untyped markers.  相似文献   

19.
Genotyping costs still preclude analysis of a comprehensive SNP map in thousands of individual subjects in the search for disease susceptibility loci. Allele frequency estimation in DNA pools from cases and controls offers a partial solution, but variance in these estimates will result in some loss of statistical power. However, there has been no systematic attempt to quantify the several sources of error in previous studies. We report an analysis of the magnitude of variance components of each experimental stage in DNA pooling studies, and find that a design based on the formation of numerous small pools of approximately 50 individuals is superior to the formation of fewer, larger pools and the replication of any of the experimental stages. We conclude that this approach may retain an effective sample size greater than 68% of the true sample size, whilst offering a 60-fold reduction in DNA usage and a greater than 30-fold saving in cost, compared to individual genotyping. The possibility of combining pooling with informed selection of haplotype tag SNPs is also considered. In this way further savings in efficiency may be possible by using pooled allele frequency estimates to infer haplotype frequencies and hence, allele frequencies at untyped markers.  相似文献   

20.
The HLA region on chromosome 6 is gene-rich and under selective pressure because of the high proportion of immunity-related genes. Linkage disequilibrium (LD) patterns and allele frequencies in this region are highly differentiated across broad geographical populations, making it a region of interest for population genetics and immunity-related disease studies. We examined LD in this important region of the genome in six European populations using 166 putatively neutral SNPs and the classical HLA-A, -B and -C gene alleles. We found that the pattern of association between classic HLA gene alleles and SNPs implied that most of the SNPs predated the origin of classic HLA gene alleles. The SNPs most strongly associated with HLA gene alleles were in some cases highly predictive of the HLA allele carrier status (misclassification rates ranged from <1 to 27%) in independent populations using five or fewer SNPs, a much smaller number than tagSNP panels previously proposed and often with similar accuracy, showing that our approach may be a viable solution to designing new HLA prediction panels. To describe the LD within this region, we developed a new haplotype clustering method/software based on r2, which may be more appropriate for use within regions of strong LD. Haplotype blocks created using this proposed method, as well as classic HLA gene alleles and SNPs, were predictive of a northern versus southern European population membership (misclassification error rates ranged from 0 to 23%, depending on which independent population was used for prediction), indicating that this region may be a rich source of ancestry informative markers.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号