首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Haplotype inference is an indispensable technique in medical science, especially in genome-wide association studies. Although the conventional method of inference using the expectation-maximization (EM) algorithm by Excoffier and Slatkin is one standard approach, as its calculation cost is an exponential function of the maximum number of heterozygous loci, it has not been widely applied. We propose a method of haplotype inference that can empirically accommodate up to several tens of single nucleotide polymorphism loci in a single haplotype block while maintaining criteria that are exactly equivalent to those of the EM algorithm. The idea is to reduce the cost of calculating the EM algorithm by using a haplotype-grouping preprocess exploiting the symmetrical and inclusive relationships of haplotypes based on the Hardy–Weinberg equilibrium. Testing of the proposed method using real data sets revealed that it has a wider range of applications than the EM algorithm.  相似文献   

2.
Recent studies suggest that the genome is organized into blocks of haplotypes, and efforts to create a genome-wide haplotype map of single-nucleotide polymorphisms (SNPs) are already underway. Haplotype blocks are defined algorithmically and to date several algorithms have been proposed. However, little is known about their relative performance in real data or about the impact of allele frequencies and parameter choices on the detection of haplotype blocks and the markers that tag them. Here we present a formal comparison of two major algorithms, a linkage disequilibrium (LD)-based method and a dynamic programming algorithm (DPA), in three chromosomal regions differing in gene content and recombination rate. The two methods produced strikingly different results. DPA identified fewer and larger haplotype blocks as well as a smaller set of tag SNPs than the LD method. For both methods, the results were strongly dependent on the allele frequency. Decreasing the minor allele frequency led to an up to 3.7-fold increase in the number of haplotype blocks and tag SNPs. Definition of haploytpe blocks and tag SNPs was also sensitive to parameter changes, but the results could not be reconciled simply by parameter adjustment. These results show that two major methods for detecting haplotype blocks and tag SNPs can produce different results in the same data and that these results are sensitive to marker allele frequencies and parameter choices. More information is needed to guide the choice of method, marker allele frequencies, and parameters in the development of a haplotype map.  相似文献   

3.
One limitation of the existing tagging SNP selection algorithms is that they assume the reported genotypes are error free. However, genotyping errors are often unavoidable in practice. Many tagging SNP selection methods depend heavily on the estimated haplotype frequencies. Recent studies have demonstrated that even slight genotyping errors can lead to serious consequences with regard to haplotype reconstruction and frequency estimation. Here we present a tagging SNP selection method that allows for genotyping errors. Our method is a modification of the pair-wise r2 tagging SNP selection algorithm proposed by Carlson et al. (2004) . We have replaced the standard EM algorithm in Carlson's method with an EM that accounts for genotyping errors, in an attempt to obtain better estimates of the haplotype frequencies and r2 measure. Through simulation studies we compared the performance of our modified algorithm with that of the original algorithm. We found that the number of tags selected by both methods increased with increasing genotyping errors, though our method led to smaller increase. The power of haplotype association tests using the selected tags decreased dramatically with increasing genotyping errors. The power of single marker tests also decreased, but the reduction was not as much as the reduction in power of haplotype tests. When restricting the mean number of tags selected by both methods to be similar to the baseline number, Carlson's method and our method led to similar power for the subsequent haplotype and single marker tests. Our results showed that, by accounting for random genotyping errors, our method can select tagging SNPs more efficiently than Carlson's method. The computer program that implements our modified tagging SNP selection algorithm is available at our web site: http://www.personal.psu.edu/tuy104/ .  相似文献   

4.
Taking advantage of increasingly available high‐density single nucleotide polymorphisms (SNP) markers across the genome, various types of transmission/disequilibrium tests (TDT) using haplotype information have been developed. A practical challenge arising in such studies is the possibility that transmitted haplotypes have inherited disease‐causing mutations from different ancestral chromosomes, or do not bear any disease‐causing mutations (founder heterogeneity). To reduce the loss of signal strength due to founder heterogeneity, we propose an SP‐TDT test that combines a sequential peeling procedure with the haplotype similarity based TDT method. The proposed SP‐TDT method is applicable to any size of nuclear family with or without ambiguous phase information. Simulation studies suggest that the SP‐TDT method has the correct type I error rate in stratified populations, and enhanced power compared with some existing haplotype similarity based TDT methods. Finally, we apply the proposed method to study the association of the leptin gene with obesity from the National Heart, Lung, and Blood Institute Family Heart Study.  相似文献   

5.
Haplotypes can hold key information to understand the role of candidate genes in disease etiology. However, standard haplotype analysis has yet been able to fully reveal the information retained by haplotypes. In most analysis, haplotype inference focuses on relative effects compared with an arbitrarily chosen baseline haplotype. It does not depict the effect structure unless an additional inference procedure is used in a secondary post hoc analysis, and such analysis tends to be lack of power. In this study, we propose a penalized regression approach to systematically evaluate the pattern and structure of the haplotype effects. By specifying an L1 penalty on the pairwise difference of the haplotype effects, we present a model-based haplotype analysis to detect and to characterize the haplotypic association signals. The proposed method avoids the need to choose a baseline haplotype; it simultaneously carries out the effect estimation and effect comparison of all haplotypes, and outputs the haplotype group structure based on their effect size. Finally, our penalty weights are theoretically designed to balance the likelihood and the penalty term in an appropriate manner. The proposed method can be used as a tool to comprehend candidate regions identified from a genome or chromosomal scan. Simulation studies reveal the better abilities of the proposed method to identify the haplotype effect structure compared with the traditional haplotype association methods, demonstrating the informativeness and powerfulness of the proposed method.  相似文献   

6.
Undetected genotyping errors pose a problem in genetic epidemiological studies, as they may invalidate statistical analysis or reduce its power. Haplotype analysis requires an improved standard of the data, because a haplotype can be inferred correctly only if the genotypes of all its markers are correct. Here, we present a method that identifies probable genotyping errors in trio samples with the help of the estimated haplotype frequency distribution of the sample. If the likelihood of the most likely haplotype explanation depends strongly on just one genotype, in the sense that setting the genotype to be missing leads to a much more likely haplotype explanation, this genotype is considered as a potential genotyping error. We describe a method that systematically searches the whole data set for such potential errors. Based on the haplotype distribution of a real data set, we carry out a simulation study to estimate the sensitivity and specificity of the method. In addition, we apply our approach to the real data set itself. Potentially erroneous genotypes are re-determined via sequencing. The results of both the simulation study and of the application to the real data set show that a considerable proportion of true genotyping errors is detected and that the number of false-positive signals is acceptable. We conclude that it is indeed possible to identify probable genotyping errors by considering haplotypes. The method described here will be part of the next release of our FAMHAP software.  相似文献   

7.
Motivated by the increasing availability of high‐density single nucleotide polymorphism (SNP) markers across the genome, various haplotype‐based methods have been developed for candidate gene association studies, and even for genome‐wide association studies. Although haplotype approaches dramatically reduce the multiple comparisons problem (as compared to single SNP analysis), even the number of existing haplotypes is relatively large, which increases the degrees of freedom and decreases the power for the corresponding test statistic. Grouping haplotypes is a way to reduce the degrees of freedom. We propose a procedure that uses a tree‐based recursive partitioning algorithm to group haplotypes into a small number of clusters, and conducts the association test based on groups of haplotypes instead of individual haplotypes. The method can be used for both population‐based and family‐based association studies, with known or ambiguous phase information. Simulation studies suggest that the proposed method has the right type I error rate, and is more powerful than some existing haplotype‐based tests.  相似文献   

8.
Recent studies have revealed that linkage disequilibrium (LD) patterns vary across the human genome with some regions of high LD interspersed by regions of low LD. A small fraction of SNPs (tag SNPs) is sufficient to capture most of the haplotype structure of the human genome. In this paper, we develop a method to partition haplotypes into blocks and to identify tag SNPs based on genotype data by combining a dynamic programming algorithm for haplotype block partitioning and tag SNP selection based on haplotype data with a variation of the expectation maximization (EM) algorithm for haplotype inference. We assess the effects of using either haplotype or genotype data in haplotype block identification and tag SNP selection as a function of several factors, including sample size, density or number of SNPs studied, allele frequencies, fraction of missing data, and genotyping error rate, using extensive simulations. We find that a modest number of haplotype or genotype samples will result in consistent block partitions and tag SNP selection. The power of association studies based on tag SNPs using genotype data is similar to that using haplotype data.  相似文献   

9.
人类基因组中单倍型(haplotype)和单倍域(haplotype block)的结构提供了人类进化的宝贵信息,并成为发现人类复杂疾病易感基因的有效策略。一个单倍域可分割成多个具有有限单倍型多样性的离散的区域,代表每个区域结构特征的少量标签单核苷酸多态性(tag single nucleotide polymorphism,tSNP)可使绝大部分单倍型相互区分开来。因此,标签SNP在单倍型和单倍域的构建和关联研究中具有重要地位。构建单倍型和单倍域的方法分为两类,分别是基于大家系中基因分型数据和基于统计学的算法。通过系统回顾几种单倍型和单倍域的构建方法,了解它们在不同的疾病模型或根据不同的分割标准,进行关联研究的检验效能,客观评价每种方法的优、缺点、应用前景及其在关联研究中的应用。随着国际人类基因组单倍型图的完成和单倍型构建统计学运算规则的完善,融合数学、物理学、计算机科学等学科的单倍型构建方法将对人类遗传学、复杂疾病易感基因的定位和克隆鉴定等生命科学的相关领域产生深远的影响。  相似文献   

10.
An empirical method of sample size determination for building prediction models was proposed recently. Permutation method which is used in this procedure is a commonly used method to address the problem of overfitting during cross-validation while evaluating the performance of prediction models constructed from microarray data. But major drawback of such methods which include bootstrapping and full permutations is prohibitively high cost of computation required for calculating the sample size.In this paper, we propose that a single representative null distribution can be used instead of a full permutation by using both simulated and real data sets. During simulation, we have used a dataset with zero effect size and confirmed that the empirical type I error approaches to 0.05. Hence this method can be confidently applied to reduce overfitting problem during cross-validation. We have observed that pilot data set generated by random sampling from real data could be successfully used for sample size determination. We present our results using an experiment that was repeated for 300 times while producing results comparable to that of full permutation method. Since we eliminate full permutation, sample size estimation time is not a function of pilot data size. In our experiment we have observed that this process takes around 30 min.With the increasing number of clinical studies, developing efficient sample size determination methods for building prediction models is critical. But empirical methods using bootstrap and permutation usually involve high computing costs. In this study, we propose a method that can reduce required computing time drastically by using representative null distribution of permutations. We use data from pilot experiments to apply this method for designing clinical studies efficiently for high throughput data.  相似文献   

11.
Recently, mass spectrometry analysis has a become an effective and rapid approach in detecting early-stage cancer. To identify proteomic patterns in serum to discriminate cancer patients from normal individuals, machine-learning methods, such as feature selection and classification, have already been involved in the analysis of mass spectrometry (MS) data with some success. However, the performance of existing machine learning methods for MS data analysis still needs improving. The study in this paper proposes a wavelet-based pre-processing approach to MS data analysis. The approach applies wavelet-based transforms to MS data with the aim of de-noising the data that are potentially contaminated in acquisition. The effects of the selection of wavelet function and decomposition level on the de-noising performance have also been investigated in this study. Our comparative experimental results demonstrate that the proposed de-noising pre-processing approach has potentials to remove possible noise embedded in MS data, which can lead to improved performance for existing machine learning methods in cancer detection.  相似文献   

12.
We have lately presented a testing procedure for family data which accounts for the multiple testing problem that is induced by the enormous number of different marker combinations that can be analyzed in a set of tightly linked markers. Most methods of haplotype based association analysis already require simulations to obtain an uncorrected P value for a specific marker combination. As shown before, it is nevertheless not necessary to carry out nested simulations to obtain a global P value that properly corrects for the multiple testing of different marker combinations without neglecting the dependency of the tests. We have now implemented this approach for case‐control data in our program FAMHAP, as this data structure currently plays a dominant role in the field. We consider different ways to deal with phase ambiguities and two different statistical tests for the underlying single marker combinations to obtain uncorrected P values. One test statistic is chi‐square based, the other is a haplotype trend regression. The performance of these different tests in the multiple testing situation is investigated in a large simulation study. We obtain a considerable gain in power with our global P values as opposed to Bonferroni corrected P values for all suggested test statistics. Good power was obtained both with the haplotype trend regression approach as well as with the simpler chi‐square based test. Furthermore, we conclude that the better strategy to deal with phase ambiguities is to assign to each individual its list of weighted haplotype explanations, rather than to assign to each individual its most likely haplotype explanation. Finally, we demonstrate the usefulness of our approach by a real data example.  相似文献   

13.
Rare haplotypes may tag rare causal variants of common diseases; hence, detection of such rare haplotypes may also contribute to our understanding of complex disease etiology. Because rare haplotypes frequently result from common single-nucleotide polymorphisms (SNPs), focusing on rare haplotypes is much more economical compared with using rare single-nucleotide variants (SNVs) from sequencing, as SNPs are available and ‘free'' from already amassed genome-wide studies. Further, associated haplotypes may shed light on the underlying disease causal mechanism, a feat unmatched by SNV-based collapsing methods. In recent years, data mining approaches have been adapted to detect rare haplotype association. However, as they rely on an assumed underlying disease model and require the specification of a null haplotype, results can be erroneous if such assumptions are violated. In this paper, we present a haplotype association method based on Kullback–Leibler divergence (hapKL) for case–control samples. The idea is to compare haplotype frequencies for the cases versus the controls by computing symmetrical divergence measures. An important property of such measures is that both the frequencies and logarithms of the frequencies contribute in parallel, thus balancing the contributions from rare and common, and accommodating both deleterious and protective, haplotypes. A simulation study under various scenarios shows that hapKL has well-controlled type I error rates and good power compared with existing data mining methods. Application of hapKL to age-related macular degeneration (AMD) shows a strong association of the complement factor H (CFH) gene with AMD, identifying several individual rare haplotypes with strong signals.  相似文献   

14.
莫春梅    周金治    李雪    余玺   《中国医学物理学杂志》2021,(5):571-577
针对现有肝脏图像分割方法存在分割精度较低的问题,提出一种改进U-Net的肝脏分割方法。该方法对U-Net结构做出以下改进,即引入改进的残差模块、重新设计跳跃连接,然后采用混合损失函数,从而提高特征信息的利用率,减少编码器和解码器之间的语义差异,缓解类不平衡的问题并且加快网络收敛。在CodaLab组织提供的公共数据集LITS(Liver Tumor Segmentation)上的实验结果表明,利用该方法达到的Dice相似系数值、敏感度、交并比分别为93.69%、94.87%和87.49%。相比于U-Net和Attention U-Net等分割方法,该方法分割出的肝脏区域结果更加准确,取得了更好的分割性能。  相似文献   

15.
In the attempt to understand human variation and the genetic basis of complex disease, a tremendous number of single nucleotide polymorphisms (SNPs) have been discovered and deposited into NCBI's dbSNP public database. More than 2.7 million SNPs in the database have genotype information. This data provides an invaluable resource for understanding the structure of human variation and the design of genetic association studies. The genotypes deposited to dbSNP are unphased, and thus, the haplotype information is unknown. We applied the phasing method HAP to obtain the haplotype information, block partitions, and tag SNPs for all publicly available genotype data and deposited this information into the dbSNP database. We also deposited the orthologous chimpanzee reference sequence for each predicted haplotype block computed using the UCSC BLASTZ alignments of human and chimpanzee. Using dbSNP, researchers can now easily perform analyses using multiple genotype data sets from the same genomic regions. Dense and sparse genotype data sets from the same region were combined to show that the number of common haplotypes is significantly underestimated in whole genome data sets, while the predicted haplotypes over the common SNPs are consistent between studies. To validate the accuracy of the predictions, we bench-marked HAP's running time and phasing accuracy against PHASE. Although HAP is slightly less accurate than PHASE, HAP is over 1000 times faster than PHASE, making it suitable for application to the entire set of genotypes in dbSNP.  相似文献   

16.
Recently interest has been increasing in genetic association studies using several closely linked loci. The HAP‐TDT method, which uses case‐parents trios is powerful for such a task. However, it is not uncommon in practice that one parent is missing for some reason, such as late onset. The case‐parents trios are thus reduced to case‐parent pairs. Discarding such data could lead to a severe loss of power. In this paper, we propose the HAP‐1‐TDT method based on case‐parent pairs to detect haplotype/disease association. A permutation‐based randomisation technique is devised to assess the significance of the test statistic. Furthermore, the combined statistic HAP‐C‐TDT is developed to use jointly case‐parents trios and case‐parent pairs. These test statistics can be applied to either phase‐known or phase‐unknown data. A number of simulation studies are conducted to investigate the validity of the proposed tests; these studies show that the statistics are robust to population structure. Using several disease genes from the literature, we illustrate that incorporating case‐parent pairs into an association study leads to noticeable power gain. Moreover, our simulation results suggest that our method has better size and power than UNPHASED. Finally, in simulated scenarios where there are only a few SNPs and risk is determined by two haplotypes that are complementary or near‐complementary, our method has better power than TRIMM.  相似文献   

17.
The analysis of cattle MHC (BoLA) class I gene expression is an essential component of studies on immune responses and susceptibility to disease. International BoLA workshops have generated data and reagents that allow discrimination of class I molecules at the haplotype level, but progress has been limited by difficulties encountered in defining single alleles. Our aim in this study was to develop a DNA-based system for improved identification of expressed class I alleles, utilizing available cDNA sequences derived from cattle carrying a series of serologically defined class I specificities. This method has allowed more accurate typing of animals for expression of the class I genes present within a small number of haplotypes. The method has also reliably differentiated between allelic variants (identified by prior sequence analysis) and has split existing serological specificities. The data show that MHC class I genes in cattle are more polymorphic than demonstrated by serology and biochemical analysis.  相似文献   

18.
Li D  Collier DA  He L 《Human molecular genetics》2006,15(12):1995-2002
Chromosome 8p22-p11 has been identified as a locus for schizophrenia in several genome-wide scans and confirmed by meta-analysis of published linkage data. Systematic fine mapping using extended Icelandic pedigrees identified an associated haplotype in the gene neuregulin 1 (NRG1), also known as heuregulin, glial growth factor, NDF43 and ARIA. A 290 kb core at risk haplotype at the 5' end of the gene (HAP(ICE)), defined by five SNPs and two microsatellite polymorphisms was found to be associated with schizophrenia in the Icelandic and Scottish populations. A number of subsequent independent studies have attempted to replicate the association, and while some have been successful, the associated haplotype is not always HAP(ICE). Furthermore, no obviously functional or pathogenic variants have been identified, and the relationship between the gene and schizophrenia has remained inconclusive. To reconcile these conflicting findings and to give a comprehensive picture of the genetic architecture of this important gene, we performed a meta-analysis of 13 published population-based and family-based association studies up to November 2005. We analysed data from the SNP markers SNP8NRG241930, SNP8NRG243177, SNP8NRG221132 and SNP8NRG221533, and the microsatellite markers 478B14-848, 420M9-1395. Across these studies, strong positive association was found for all six polymorphisms. The haplotype analysis also showed significant association in the pooled international populations (OR=1.22, 95% CI 1.15-1.3, P=8 x 10(-10)). In Asian populations, the risk haplotype was focused around the two microsatellite markers, 478B14-848, 420M9-1395 (haplotype block B), and in Caucasian populations with the remaining four SNP markers (haplotype block A). This meta-analysis supports the involvement of NRG1 in the pathogenesis of schizophrenia, but with association between two different but adjacent haplotypes blocks in the Caucasian and Asian populations.  相似文献   

19.
Candidate gene association tests are currently performed using several intragenic SNPs simultaneously, by testing SNP haplotype or genotype effects in multifactorial diseases or traits. The number of haplotypes drastically increases with an increase in the number of typed SNPs. As a result, large numbers of haplotypes will introduce large degrees of freedom in haplotype‐based tests, and thus limit the power of the tests. In this study we propose using the principal component method to reduce the dimension, and then construct association tests on the lower‐dimensional space to test the association between haplotypes and a quantitative trait using population‐based samples. The proposed method allows ambiguous haplotypes. We use simulation studies to evaluate the type I error rate of the tests, and compare the power of the proposed tests with that of the tests without dimension reduction, and the tests with dimension reduction by merging rare haplotypes. The simulation results show that the proposed tests have correct type I error rates and are more powerful than other tests in most cases considered in our simulation studies.  相似文献   

20.
Gene-gene interaction may play important roles in complex disease studies, in which interaction effects coupled with single-gene effects are active. Many interaction models have been proposed since the beginning of the last century. However, the existing approaches including statistical and data mining methods rarely consider genetic interaction models, which make the interaction results lack biological or genetic meaning. In this study, we developed an entropy-based method integrating two-locus genetic models to explore such interaction effects. We performed our method to simulated and real data for evaluation. Simulation results show that this method is effective to detect gene-gene interaction and, furthermore, it is able to identify the best-fit model from various interaction models. Moreover, our method, when applied to malaria data, successfully revealed negative epistatic effect between sickle cell anemia and alpha(+)-thalassemia against malaria.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号