首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Single nucleotide polymorphisms (SNPs) are the most common form of human genetic variation, with millions present in the human genome. Because only 1% might be expected to confer more than modest individual effects in association studies, the selection of predictive candidate variants for complex disease analyses is formidable. Technologic advances in SNP discovery and the ever-changing annotation of the genome have led to massive informational resources that can be difficult to master across disciplines. A simplified guide is needed. Although methods for evaluating nonsynonymous coding SNPs are known, several other publicly available computational tools can be utilized to assess polymorphic variants in noncoding regions. As an example, the authors applied multiple methods to select SNPs in DNA double-strand break repair genes. They chose to evaluate SNPs that occurred among a preexisting set of 57 validated assays and to justify new assay development for 83 potential SNPs in the DNA-dependent protein kinase catalytic subunit. Of the 140 SNPs, the authors eliminated 119 variants with low or neutral predictions. The existing computational methods they used and the semiquantitative relative ranking strategy they developed can be adapted to a priori SNP selection or post hoc evaluation of variants identified in whole genome scans or within haplotype blocks associated with disease. The authors show a "real world" application of some existing bioinformatics tools for use in large epidemiologic studies and genetic analyses. They also reviewed alternative approaches that provide related information.  相似文献   

2.
Liu Z  Lin S 《Genetic epidemiology》2005,29(4):353-364
Linkage disequilibrium (LD) plays a central role in fine mapping of disease genes and, more recently, in characterizing haplotype blocks. Classical LD measures, such as D' and r(2), are frequently used to quantify relationship between two loci. A pairwise "distance" matrix among a set of loci can be constructed using such a measure, and based upon which a number of haplotype block detection and tagging single nucleotide polymorphism (SNP) selection algorithms have been devised. Although successful in many applications, the pairwise nature of these measures does not provide a direct characterization of joint linkage disequilibrium among multiple loci. Consequently, applications based on them may lead to loss of important information. In this report, we propose a multilocus LD measure based on generalized mutual information, which is also known as relative entropy or Kullback-Leibler distance. In essence, this measure seeks to quantify the distance between the observed haplotype distribution and the expected distribution assuming linkage equilibrium. We can show that this measure is approximately equal to r(2) in the special case with two loci. Based on this multilocus LD measure and an entropy measure that characterizes haplotype diversity, we propose a class of stepwise tagging SNP selection algorithms. This represents a unified approach for SNP selection in that it takes into account both the haplotype diversity and linkage disequilibrium objectives. Applications to both simulated and real data demonstrate the utility of the proposed methods for handling a large number of SNPs. The results indicate that multilocus LD patterns can be captured well, and informative and nonredundant SNPs can be selected effectively from a large set of loci.  相似文献   

3.
Single nucleotide polymorphisms (SNPs) are important markers to investigate genetic heterogeneity of population and to perform linkage disequilibrium (LD) mapping. We propose a new method, Psi, to express frequency of 2(N(s)) haplotypes for N(s) di-allelic SNPs. Using the new expression of haplotype frequency, we propose a novel measure of LD, D(g), not only for SNP pairs but also for multiple markers. The values of D(g) for SNP pairs were revealed to be similar to values of conventional pairwise LD indices, D' and r(2), and it was revealed that D(g) quantitated components of LD that were not measured by conventional LD indices for SNP pairs. Also we propose a distinct method, D(g)-based absolute estimation, to infer the absolute maximum estimates of haplotype frequency. The result of the D(g)-based absolute estimation of haplotype frequency for SNP pairs were compared with the conventional expectation-maximization (EM) algorithm and reported that the new method gave better inference than the EM algorithm which converged infrequently to a local extreme.  相似文献   

4.
For a dense set of genetic markers such as single nucleotide polymorphisms (SNPs) on high linkage disequilibrium within a small candidate region, a haplotype-based approach for testing association between a disease phenotype and the set of markers is attractive in reducing the data complexity and increasing the statistical power. However, due to unknown status of the underlying disease variant, a comprehensive association test may require consideration of various combinations of the SNPs, which often leads to severe multiple testing problems. In this paper, we propose a latent variable approach to test for association of multiple tightly linked SNPs in case-control studies. First, we introduce a latent variable into the penetrance model to characterize a putative disease susceptible locus (DSL) that may consist of a marker allele, a haplotype from a subset of the markers, or an allele at a putative locus between the markers. Next, through using of a retrospective likelihood to adjust for the case-control sampling ascertainment and appropriately handle the Hardy-Weinberg equilibrium constraint, we develop an expectation-maximization (EM)-based algorithm to fit the penetrance model and estimate the joint haplotype frequencies of the DSL and markers simultaneously. With the latent variable to describe a flexible role of the DSL, the likelihood ratio statistic can then provide a joint association test for the set of markers without requiring an adjustment for testing of multiple haplotypes. Our simulation results also reveal that the latent variable approach may have improved power under certain scenarios comparing with classical haplotype association methods.  相似文献   

5.
As the number of single nucleotide polymorphisms (SNPs) available for genetic analysis increases, researchers will be saturating smaller and smaller regions of the genome with these biallelic markers in an effort to fine map complex diseases. An important tool in this fine-mapping effort is haplotyping. Algorithms are presented that find all possible haplotype configurations of the pedigree data under the assumption that there are no recombinants between the markers. These configurations can be used to estimate the haplotype frequencies, and identify the most common haplotypes in the data. These algorithms have been implemented into a software program (ZAPLO), and were tested on a published data set.  相似文献   

6.
Multi-locus association analyses, including haplotype-based analyses, can sometimes provide greater power than single-locus analyses for detecting disease susceptibility loci. This potential gain, however, can be compromised by the large number of degrees of freedom caused by irrelevant markers. Exhaustive search for the optimal set of markers might be possible for a small number of markers, yet it is computationally inefficient. In this paper, we present a sequential haplotype scan method to search for combinations of adjacent markers that are jointly associated with disease status. When evaluating each marker, we add markers close to it in a sequential manner: a marker is added if its contribution to the haplotype association with disease is warranted, conditional on current haplotypes. This conditional evaluation is based on the well-known Mantel-Haenszel statistic. We propose two permutation based methods to evaluate the growing haplotypes: a haplotype method for the combined markers, and a summary method that sums conditional statistics. We compared our proposed methods, the single-locus method, and a sliding window method using simulated data. We also applied our sequential haplotype scan algorithm to experimental data for CYP2D6. The results indicate that the sequential scan procedure can identify a set of adjacent markers whose haplotypes might have strong genetic effects or be in linkage disequilibrium with disease predisposing variants. As a result, our methods can achieve greater power than the single-locus method, yet is much more computationally efficient than sliding window methods.  相似文献   

7.
Recent studies suggest that haplotypes tend to have block-like structures throughout the human genome. Several methods were proposed for haplotype block partitioning and for tagging single-nucleotide polymorphism (SNP) identification. In population genetics studies, several research groups compared block structures across human populations. However, the measures used to quantify population similarity are either less than satisfactory or nonexistent. In this article, we propose several similarity measures to facilitate the comparisons of haplotype structures, namely block boundaries and tagging SNPs, across populations. With these measures, we can more objectively compare haplotype block structures and tagging SNP sets between different populations. In addition, these measures allow us to compare the results of different methods for block partition and tagging SNP identification. When we applied these measures to a real data set on chromosome 10 in 16 worldwide populations, we found that in this genome region: 1) haplotype block boundaries vary among populations, with European and some African populations showing similar boundaries but other populations showing other patterns; 2) tagging SNP sets are generally similar for populations with similar haplotype block structures but differ if the block structures differ; and 3) all but one of the block finding methods we tested yield consistent results, although variations exist regarding consistency. Our tentative results show that at least in the genome region studied, it is unlikely that a common haplotype pattern exists for all human populations: many populations, even in the same geographical region, may have different haplotype patterns.  相似文献   

8.
Knowledge of haplotypes is useful for understanding block structure in the genome and disease risk associations. Direct measurement of haplotypes in the absence of family data is presently impractical, and hence, several methods have been developed for reconstructing haplotypes from population data. We have developed a new population-based method using a Bayesian Hidden Markov model for the source of the ancestral haplotype segments. In our Bayesian model, a higher order Markov model is used as the prior for ancestral haplotypes, to account for linkage disequilibrium. Our model includes parameters for the genotyping error rate, the mutation rate, and the recombination rate at each position. Computation is done by Markov Chain Monte Carlo using the forward-backward algorithm to efficiently sum over all possible state sequences of the Hidden Markov model. We have used the model to reconstruct the haplotypes of 129 children at a region on chromosome 5 in the data set of Daly et al. [2001] (for which true haplotypes are obtained based on parental genotypes) and of 30 children at selected regions in the CEU and YRI data of the HAPMAP project. The results are quite close to the family-based reconstructions and comparable with the state-of-the-art PHASE program. Our haplotype reconstruction method does not require division of the markers into small blocks of loci. The recombination rates inferred from our model can help to predict haplotype block boundaries, and estimate recombination hotspots.  相似文献   

9.
Population-based case-control studies measuring associations between haplotypes of single nucleotide polymorphisms (SNPs) are increasingly popular, in part because haplotypes of a few "tagging" SNPs may serve as surrogates for variation in relatively large sections of the genome. Due to current technological limitations, haplotypes in cases and controls must be inferred from unphased genotypic data. Using individual-specific inferred haplotypes as covariates in standard epidemiologic analyses (e.g., conditional logistic regression) is an attractive analysis strategy, as it allows adjustment for nongenetic covariates, provides omnibus and haplotype-specific tests of association, and can estimate haplotype and haplotype x environment interaction effects. In principle, some adjustment for the uncertainty in inferred haplotypes should be made. Via simulation, we compare the performance (bias and mean squared error of haplotype and haplotype x environment interaction effect estimates) of several analytic strategies using inferred haplotypes in the context of matched case-control data. These strategies include using only the most likely haplotype assignment, the expectation substitution approach described by Stram et al. ([2003b] Hum. Hered. 55:179-190) and others, and an improper version of multiple imputation. For relatively uncomplicated haplotype structures and moderate haplotype relative risks (/=5). An application to progesterone-receptor haplotypes and endometrial cancer further illustrates that the performance of all these methods depends on how well the observed haplotypes "tag" the unobserved causal variant.  相似文献   

10.
Tag SNP selection for association studies   总被引:6,自引:0,他引:6  
This report describes current methods for selection of informative single nucleotide polymorphisms (SNPs) using data from a dense network of SNPs that have been genotyped in a relatively small panel of subjects. We discuss the following issues: (1) Optimal selection of SNPs based upon maximizing either the predictability of unmeasured SNPs or the predictability of SNP haplotypes as selection criteria. (2) The dependence of the performance of tag SNP selection methods upon the density of SNP markers genotyped for the purpose of haplotype discovery and tag SNP selection. (3) The likely power of case-control studies to detect the influence upon disease risk of common disease-causing variants in candidate genes in a haplotype-based analysis. We propose a quasi-empirical approach towards evaluating the power of large studies with this calculation based upon the SNP genotype and haplotype frequencies estimated in a haplotype discovery panel. In this calculation, each common SNP in turn is treated as a potential unmeasured causal variant and subjected to a correlation analysis using the remaining SNPs. We use a small portion of the HapMap ENCODE data (488 common SNPs genotyped over approximately a 500 kb region of chromosome 2) as an illustrative example of this approach towards power evaluation.  相似文献   

11.
Disease association studies often test large numbers of markers, and various methods have been proposed to correct for multiple testing. In this paper, we propose an admixture maximum likelihood approach that estimates both the proportion of associated single nucleotide polymorphisms (SNPs) and their typical effect size. We assessed this method and compared it with several previously proposed approaches by simulation. The maximum likelihood approach performed similarly to or better than all other tests across a wide range of alternative hypotheses. The rank truncated product method also had good power, though somewhat inferior to the maximum likelihood approach in most cases. A simple Bonferroni correction performed best only when the number of associated SNPs was small.  相似文献   

12.
A new multimarker test for family-based association studies   总被引:1,自引:0,他引:1  
  相似文献   

13.
Candidate gene association studies often utilize one single nucleotide polymorphism (SNP) for analysis, with an initial report typically not being replicated by subsequent studies. The failure to replicate may result from incomplete or poor identification of disease-related variants or haplotypes, possibly due to naive SNP selection. A method for identification of linkage disequilibrium (LD) groups and selection of SNPs that capture sufficient intra-genic genetic diversity is described. We assume all SNPs with minor allele frequency above a pre-determined frequency have been identified. Principal component analysis (PCA) is applied to evaluate multivariate SNP correlations to infer groups of SNPs in LD (LD-groups) and to establish an optimal set of group-tagging SNPs (gtSNPs) that provide the most comprehensive coverage of intra-genic diversity while minimizing the resources necessary to perform an informative association analysis. This PCA method differs from haplotype block (HB) and haplotype-tagging SNP (htSNP) methods, in that an LD-group of SNPs need not be a contiguous DNA fragment. Results of the PCA method compared well with existing htSNP methods while also providing advantages over those methods, including an indication of the optimal number of SNPs needed. Further, evaluation of the method over multiple replicates of simulated data indicated PCA to be a robust method for SNP selection. Our findings suggest that PCA may be a powerful tool for establishing an optimal SNP set that maximizes the amount of genetic variation captured for a candidate gene using a minimal number of SNPs.  相似文献   

14.
目的 比较中国北京汉族人群(CHB)和日本东京人群(JPT)受体酪氨酸激酶样孤立受体2(ROR2)基因单核苷酸多态性(SNP)的异同.方法 收集国际人类基因组单体型图计划(HapMap)公布的CHB及JPT的ROR2基因SNP数据,利用Haploview和SPSS 13.0软件区分纯合与非纯合基因型SNP,以基因型测...  相似文献   

15.
With the rapid development of modern genotyping technology, it is becoming commonplace to genotype densely spaced genetic markers such as single nucleotide polymorphisms (SNPs) along the genome. This development has inspired a strong interest in using multiple markers located in the target region for the detection of association. We introduce a principal components (PCs) regression method for candidate gene association studies where multiple SNPs from the candidate region tend to be correlated. In this approach, the total variance in the original genotype scores is decomposed into parts that correspond to uncorrelated PCs. The PCs with the largest variances are then used as regressors in a multiple regression. Simulation studies suggest that this approach can have higher power than some popular methods. An application to CHI3L2 gene expression data confirms a significant association between CHI3L2 gene expression level and SNPs from this gene that has been previously reported by others.  相似文献   

16.
In the setting of genome‐wide association studies, we propose a method for assigning a measure of significance to pre‐defined sets of markers in the genome. The sets can be genes, conserved regions, or groups of genes such as pathways. Using the proposed methods and algorithms, evidence for association between a particular functional unit and a disease status can be obtained not just by the presence of a strong signal from a SNP within it, but also by the combination of several simultaneous weaker signals that are not strongly correlated. This approach has several advantages. First, moderately strong signals from different SNPs are combined to obtain a much stronger signal for the set, therefore increasing power. Second, in combination with methods that provide information on untyped markers, it leads to results that can be readily combined across studies and platforms that might use different SNPs. Third, the results are easy to interpret, since they refer to functional sets of markers that are likely to behave as a unit in their phenotypic effect. Finally, the availability of gene‐level P‐values for association is the first step in developing methods that integrate information from pathways and networks with genome‐wide association data, and these can lead to a better understanding of the complex traits genetic architecture. The power of the approach is investigated in simulated and real datasets. Novel Crohn's disease associations are found using the WTCCC data. Genet. Epidemiol. 34: 222–231, 2010. © 2009 Wiley‐Liss, Inc.  相似文献   

17.
A map of the background levels of disequilibrium between nearby markers can be useful for association mapping studies. In order to assess the background levels of linkage disequilibrium (LD), multilocus LD measures are more advantageous than pairwise LD measures because the combined analysis of pairwise LD measures is not adequate to detect simultaneous allele associations among multiple markers. Various multilocus LD measures based on haplotypes have been proposed. However, most of these measures provide a single index of association among multiple markers and does not reveal the complex patterns and different levels of LD structure. In this paper, we employ non-homogeneous, multiple order Markov Chain models as a statistical framework to measure and partition the LD among multiple markers into components due to different orders of marker associations. Using a sliding window of multiple markers on phased haplotype data, we compute corresponding likelihoods for different Markov Chain (MC) orders in each window. The log-likelihood difference between the lowest MC order model (MC0) and the highest MC order model in each window is used as a measure of the total LD or the overall deviation from the gametic equilibrium for the window. Then, we partition the total LD into lower order disequilibria and estimate the effects from two-, three-, and higher order disequilibria. The relationship between different orders of LD and the log-likelihood difference involving two different orders of MC models are explored. By applying our method to the phased haplotype data in the ENCODE regions of the HapMap project, we are able to identify high/low multilocus LD regions. Our results reveal that the most LD in the HapMap data is attributed to the LD between adjacent pairs of markers across the whole region. LD between adjacent pairs of markers appears to be more significant in high multilocus LD regions than in low multilocus LD regions. We also find that as the multilocus total LD increases, the effects of high-order LD tends to get weaker due to the lack of observed multilocus haplotypes. The overall estimates of first, second, third, and fourth order LD across the ENCODE regions are 64, 23, 9, and 3%.  相似文献   

18.
Modern molecular techniques make discovery of numerous single nucleotide polymorphims (SNPs) in candidate gene regions feasible. Conventional analysis relies on either independent tests with each variant or the use of haplotypes in association analysis. The first technique ignores the dependencies between SNPs. The second, though it may increase power, often introduces uncertainty by estimating haplotypes from population data. Additionally, as the number of loci expands for a haplotype, ambiguity in interpretation increases for determining the underlying genetic components driving a detected association. Here, we present a genotype-level analysis to jointly model the SNPs via a SNP interaction model with phase information (SIMPle) to capture the underlying haplotype structure. This analysis estimates both the risk associated with each variant and the importance of phase between pairwise combinations of SNPs. Thus, rather than selecting between genotype- or haplotype-level approaches, the SIMPle method frames the analysis of multilocus data in a model selection paradigm, the aim to determine which SNPs, phase terms, and linear combinations best describe the relation between genetic variation and a trait of interest. To avoid unstable estimation due to sparse data and to incorporate both the dependencies among terms and the uncertainty in model selection, we propose a Bayes model averaging procedure. This highlights key SNPs and phase terms and yields a set of best representative models. Using simulations, we demonstrate the utility of the SIMPle model to identify crucial SNPs and underlying haplotype structures across a variety of causal models and genetic architectures.  相似文献   

19.
Single nucleotide polymorphisms (SNPs) are becoming widely used as genotypic markers in genetic association studies of common, complex human diseases. For such association screens, a crucial part of study design is determining what SNPs to prioritize for genotyping. We present a novel power-based algorithm to select a subset of tag SNPs for genotyping from a map of available SNPs. Blocks of markers in strong linkage disequilibrium (LD) are identified, and SNPs are selected to represent each block such that power to detect disease association with an underlying disease allele in LD with block members is preserved; all markers outside of blocks are also included in the tagging subset. A key, novel element of this method is that it incorporates information about the phase of LD observed among marker pairs to retain markers likely to be in coupling phase with an underlying disease locus, thus increasing power compared to a phase-blind approach. Power calculations illustrate important issues regarding LD phase and make clear the advantages of our approach to SNP selection. We apply our algorithm to genotype data from the International HapMap Consortium and demonstrate that considerable reduction in SNP genotyping may be attained while retaining much of the available power for a disease association screen. We also demonstrate that these tag SNPs effectively represent underlying variants not included in the LD analysis and SNP selection, by using leave-one-out tests to show that most (approximately 90%) of the "untyped" variants lying in blocks are in coupling-phase LD with a tag SNP. Additional performance tests using the HapMap ENCyclopedia of DNA Elements (ENCODE) regions show that the method compares well with the popular r2 bin tagging method. This work is a concrete example of how empirical LD phase may be used to benefit study design.  相似文献   

20.
A genome‐wide association study (GWAS) typically is focused on detecting marginal genetic effects. However, many complex traits are likely to be the result of the interplay of genes and environmental factors. These SNPs may have a weak marginal effect and thus unlikely to be detected from a scan of marginal effects, but may be detectable in a gene–environment (G × E) interaction analysis. However, a genome‐wide interaction scan (GWIS) using a standard test of G × E interaction is known to have low power, particularly when one corrects for testing multiple SNPs. Two 2‐step methods for GWIS have been previously proposed, aimed at improving efficiency by prioritizing SNPs most likely to be involved in a G × E interaction using a screening step. For a quantitative trait, these include a method that screens on marginal effects [Kooperberg and Leblanc, 2008] and a method that screens on variance heterogeneity by genotype [Paré et al., 2010] In this paper, we show that the Paré et al. approach has an inflated false‐positive rate in the presence of an environmental marginal effect, and we propose an alternative that remains valid. We also propose a novel 2‐step approach that combines the two screening approaches, and provide simulations demonstrating that the new method can outperform other GWIS approaches. Application of this method to a G × Hispanic‐ethnicity scan for childhood lung function reveals a SNP near the MARCO locus that was not identified by previous marginal‐effect scans.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号