共查询到20条相似文献,搜索用时 15 毫秒
1.
Association studies, both family-based and population-based, can be powerful means of detecting disease-liability alleles. To increase the information of the test, various researchers have proposed targeting haplotypes. The larger number of haplotypes, however, relative to alleles at individual loci, could decrease power because of the additional degrees of freedom required for the test. An optimal strategy would focus the test on particular haplotypes or groups of haplotypes, much as is done with cladistic-based association analysis. First suggested by Templeton et al. ([1987] Genetics 117:343-351), such analyses use the evolutionary relationships among haplotypes to produce a limited set of hypothesis tests and to increase the interpretability of these tests. To more fully utilize the information contained in the evolutionary relationships among haplotypes and in the sample, we propose generalized linear models (GLM) for the analysis of data from family-based and population-based studies. These models fully account for haplotype phase ambiguity and allow for covariates. The models are encoded into a software package (the Evolutionary-Based Haplotype Analysis Package, EHAP), which also provides for various kinds of exploratory data analysis. The exploratory analyses, such as error checking, estimation of haplotype frequencies, and tools for building cladograms, should facilitate the implementation of cladistic-based association analysis with haplotypes. 相似文献
2.
Pedigree disequilibrium tests for multilocus haplotypes 总被引:27,自引:0,他引:27
Dudbridge F 《Genetic epidemiology》2003,25(2):115-121
Association tests of multilocus haplotypes are of interest both in linkage disequilibrium mapping and in candidate gene studies. For case-parent trios, I discuss the extension of existing multilocus methods to include ambiguous haplotypes in tests of models which distinguish between the cis and trans phase. A likelihood-ratio test is proposed, using the expectation-maximization (E-M) algorithm to account for haplotype ambiguities. Assumptions about the population structure are required, but realistic situations, including population stratification, which violate the assumptions lead to conservative tests. I describe a permutation procedure for the null hypothesis of interest, which controls for violation of the assumptions. For general pedigrees, I describe extensions of the pedigree disequilibrium test to include uncertain haplotypes. The summary statistics are replaced by their expected values over prior distributions of haplotype frequencies. If prior distributions are not available, a valid test is possible by using the E-M algorithm to estimate the null distribution of haplotype frequencies. Similar methods are available for quantitative traits. Exact permutation tests are difficult to construct in small samples, but an approximate procedure is appropriate in large samples, and can be used to account for dependencies between tests of multiple haplotypes and loci. 相似文献
3.
Morris AP 《Genetic epidemiology》2005,29(2):91-107
We describe a novel method for assessing the strength of disease association with single nucleotide polymorphisms (SNPs) in a candidate gene or small candidate region, and for estimating the corresponding haplotype relative risks of disease, using unphased genotype data directly. We begin by estimating the relative frequencies of haplotypes consistent with observed SNP genotypes. Under the Bayesian partition model, we specify cluster centres from this set of consistent SNP haplotypes. The remaining haplotypes are then assigned to the cluster with the "nearest" centre, where distance is defined in terms of SNP allele matches. Within a logistic regression modelling framework, each haplotype within a cluster is assigned the same disease risk, reducing the number of parameters required. Uncertainty in phase assignment is addressed by considering all possible haplotype configurations consistent with each unphased genotype, weighted in the logistic regression likelihood by their probabilities, calculated according to the estimated relative haplotype frequencies. We develop a Markov chain Monte Carlo algorithm to sample over the space of haplotype clusters and corresponding disease risks, allowing for covariates that might include environmental risk factors or polygenic effects. Application of the algorithm to SNP genotype data in an 890-kb region flanking the CYP2D6 gene illustrates that we can identify clusters of haplotypes with similar risk of poor drug metaboliser (PDM) phenotype, and can distinguish PDM cases carrying different high-risk variants. Further, the results of a detailed simulation study suggest that we can identify positive evidence of association for moderate relative disease risks with a sample of 1,000 cases and 1,000 controls. 相似文献
4.
The completion of the HapMap Project and the development of high-throughput single nucleotide polymorphism genotyping technologies have greatly enhanced the prospects of identifying and characterizing the genetic variants that influence complex traits. In principle, association analysis of haplotypes rather than single nucleotide polymorphisms may better capture an underlying causal variant, but the multiple haplotypes can lead to reduced statistical power due to the testing of (and need to correct for) a large number of haplotypes. This paper presents a novel method based on clustering similar haplotypes to address this issue. The method, implemented in the CLUMPHAP program, is an extension of the CLUMP program designed for the analysis of multi-allelic markers (Sham and Curtis [1995] Ann. Hum. Genet. 59(Pt1):97-105). CLUMPHAP performs a hierarchical clustering of the haplotypes and then computes the chi(2) statistic between each haplotype cluster and disease; the statistical significance of the largest of the chi(2) statistics is obtained by permutation testing. A significant result suggests that the presence of a disease-causing variant in the haplotype cluster is over-represented in cases. Using simulation studies, we have compared CLUMPHAP and more widely used approaches in terms of their statistical power to identify an untyped susceptibility locus. Our results show that CLUMPHAP tends to have greater power than the omnibus haplotype test and is comparable in power to multiple regression locus-coding approaches. 相似文献
5.
In haplotype-based association studies for late onset diseases, one attractive design is to use available unaffected spouses as controls (Valle et al. [1998] Diab. Care 21:949-958). Given cases and spouses only, the standard expectation-maximization (EM) algorithm (Dempster et al. [1977] J. R. Stat. Soc. B 39:1-38) for case-control data can be used to estimate haplotype frequencies. But often we will have offspring for at least some of the spouse pairs, and offspring genotypes provide additional information about the haplotypes of the parents. Existing methods may either ignore the offspring information, or reconstruct haplotypes for the subjects using offspring information and discard data from those whose haplotypes cannot be reconstructed with high confidence. Neither of these approaches is efficient, and the latter approach may also be biased. For case-control data with some subjects forming spouse pairs and offspring genotypes available for some spouse pairs or individuals, we propose a unified, likelihood-based method of haplotype inference. The method makes use of available offspring genotype information to apportion ambiguous haplotypes for the subjects. For subjects without offspring genotype information, haplotypes are apportioned as in the standard EM algorithm for case-control data. Our method enables efficient haplotype frequency estimation using an EM algorithm and supports probabilistic haplotype reconstruction with the probability calculated based on the whole sample. We describe likelihood ratio and permutation tests to test for disease-haplotype association, and describe three test statistics that are potentially useful for detecting such an association. 相似文献
6.
Genotype-based association test for general pedigrees: the genotype-PDT 总被引:11,自引:0,他引:11
Many family-based tests of linkage disequilibrium (LD) are based on counts of alleles rather than genotypes. However, allele-based tests may not detect interactions among alleles at a single locus that are apparent when examining associations with genotypes. Family-based tests of LD based on genotypes have been developed, but they are typically valid as tests of association only in families with a single affected individual. To take advantage of families with multiple affected individuals, we propose the genotype-pedigree disequilibrium test (geno-PDT) to test for LD between marker locus genotypes and disease. Unlike previous tests for genotypic association, the geno-PDT is valid in general pedigrees. Simulations to compare the power of the allele-based PDT and geno-PDT reveal that under an additive model, the allele-based PDT is more powerful, but that the geno-PDT can have greater power when the genetic model is recessive or dominant. Perhaps the most important property of the geno-PDT is the ability to test for association with particular genotypes, which can reveal underlying patterns of association at the genotypic level. These genotype-specific tests can be used to suggest possible underlying genetic models that are consistent with the pattern of genotypic association. This is illustrated through an application to a candidate gene analysis of the MLLT3 gene in families with Alzheimer disease. The geno-PDT approach for testing genotypes in general family data provides a useful tool for identifying genes in complex disease, and partitioning individual genotype contributions will help to dissect the influence of genotype on risk. 相似文献
7.
Qian D 《Genetic epidemiology》2004,27(1):43-52
The haplotype-sharing correlation (HSC) method for association analysis using family data is revisited by introducing a permutation procedure for estimating region-wise significance at each marker on a study segment. In simulation studies, the HSC method has a correct type 1 error rate in both unstructured and structured populations. The HSC signals on disease segments occur in the vicinity of a true disease locus on a restricted region without recombination hotspots. However, the peak signal may not pinpoint the true disease location in a small region with dense markers. The HSC method is shown to have higher power than single- and multilocus family-based association test (FBAT) methods when the true disease locus is unobserved among the study markers, and especially under conditions of weak linkage disequilibrium and multiple ancestral disease alleles. These simulation results suggest that the HSC method has the capacity to identify true disease-associated segments under allelic heterogeneity that go undetected by the FBAT method that compares allelic or haplotypic frequencies. 相似文献
8.
Association tests based on multi-marker haplotypes may be more powerful than those based on single markers. The existing association tests based on multi-marker haplotypes include Pearson's chi2 test which tests for the difference of haplotype distributions in cases and controls, and haplotype-similarity based methods which compare the average similarity among cases with that of the controls. In this article, we propose new association tests based on haplotype similarities. These new tests compare the average similarities within cases and controls with the average similarity between cases and controls. These methods can be applied to either phase-known or phase-unknown data. We compare the performance of the proposed methods with Pearson's chi2 test and the existing similarity-based tests by simulation studies under a variety of scenarios and by analyzing a real data set. The simulation results show that, in most cases, the new proposed methods are more powerful than both Pearson's chi2 test and the existing similarity-based tests. In one extreme case where the disease mutant induced at a very rare haplotype (相似文献
9.
Lin DY 《Genetic epidemiology》2004,26(4):255-264
Exploring the associations between haplotypes and disease phenotypes is an important step toward the discovery of genes that influence complex human diseases. When unrelated subjects are sampled, haplotypes are often ambiguous because of the unknown gametic phase of the measured sites along a chromosome. We consider cohort studies of unrelated subjects which collect data on potentially censored ages of onset of disease along with unphased genotypes and possibly time-varying environmental factors. We formulate the effects of haplotypes and environmental variables on the time to disease occurrence through a semiparametric Cox proportional hazards model, which can accommodate a variety of genetic mechanisms as well as gene-environment interactions. We develop a simple and fast expectation-maximization algorithm to maximize the likelihood for the relative risks and other parameters based on the observable data of unphased genotypes and potentially censored ages of onset. The resultant estimators are consistent, efficient, and asymptotically normal. Simulation studies show that, for practical situations, the parameter estimators are virtually unbiased, the association tests maintain type I errors near nominal levels, the confidence intervals have proper coverage probabilities, and the efficiency loss due to unknown gametic phase is small. 相似文献
10.
On the use of phylogeny‐based tests to detect association between quantitative traits and haplotypes
Claire Bardel Vincent Danjean Pierre Morange Emmanuelle Génin Pierre Darlu 《Genetic epidemiology》2009,33(8):729-739
With the increasing availability of genetic data, several SNPs in a candidate gene can be combined into haplotypes to test for association with a quantitative trait. When the number of SNPs increases, the number of haplotypes can become very large and there is a need to group them together. The use of the phylogenetic relationships between haplotypes provides a natural and efficient way of grouping. Moreover, it allows us to identify disease or quantitative trait‐related loci. In this article, we describe ALTree‐q , a phylogeny‐based approach to test for association between quantitative traits and haplotypes and to identify putative quantitative trait nucleotides (QTN). This study focuses on ALTree‐q association test which is based on one‐way analyses of variance (ANOVA) performed at the different levels of the tree. The statistical properties (type‐one error and power rates) were estimated through simulations under different genetic models and were compared to another phylogeny‐based test, TreeScan, (Templeton, 2005) and to a haplotypic omnibus test consisting in a one‐way ANOVA between all haplotypes. For dominant and additive models ALTree‐q is usually the most powerful test whereas TreeScan performs better under a recessive model. However, power depends strongly on the recurrence rate of the QTN, on the QTN allele frequency, and on the linkage disequilibrium between the QTN and other markers. An application of the method on Thrombin Activatable Fibronolysis Inhibitor Antigen levels in European and African samples confirms a possible association with polymorphisms of the CPB2 gene and identifies several QTNs. Genet. Epidemiol. 33:729–739, 2009. © 2009 Wiley‐Liss, Inc. 相似文献
11.
A hierarchical clustering method for dimension reduction in joint analysis of multiple phenotypes 下载免费PDF全文
Genome‐wide association studies (GWAS) have become a very effective research tool to identify genetic variants of underlying various complex diseases. In spite of the success of GWAS in identifying thousands of reproducible associations between genetic variants and complex disease, in general, the association between genetic variants and a single phenotype is usually weak. It is increasingly recognized that joint analysis of multiple phenotypes can be potentially more powerful than the univariate analysis, and can shed new light on underlying biological mechanisms of complex diseases. In this paper, we develop a novel variable reduction method using hierarchical clustering method (HCM) for joint analysis of multiple phenotypes in association studies. The proposed method involves two steps. The first step applies a dimension reduction technique by using a representative phenotype for each cluster of phenotypes. Then, existing methods are used in the second step to test the association between genetic variants and the representative phenotypes rather than the individual phenotypes. We perform extensive simulation studies to compare the powers of multivariate analysis of variance (MANOVA), joint model of multiple phenotypes (MultiPhen), and trait‐based association test that uses extended simes procedure (TATES) using HCM with those of without using HCM. Our simulation studies show that using HCM is more powerful than without using HCM in most scenarios. We also illustrate the usefulness of using HCM by analyzing a whole‐genome genotyping data from a lung function study. 相似文献
12.
We consider genetic association analysis that combines data from case-parent trios/sibships and unrelated controls. A general and simple methodology is proposed, using a weighted least-squares approach to combine separate information from the case-parent/case-sibling analysis and the case-unrelated control analysis. The new proposal improves over the existing methods in that it requires no assumptions and estimation on the mating-type distribution. Simulation results show that the new method competes well with the likelihood-based method when the latter is applicable, and achieves substantial power gains over separate analyses in general. Therefore, the proposed combined association analysis can enjoy wide applications, including the multiallele/locus, haplotype, and genome-wide association studies. 相似文献
13.
The role of haplotypes in candidate gene studies 总被引:24,自引:0,他引:24
Clark AG 《Genetic epidemiology》2004,27(4):321-333
Human geneticists working on systems for which it is possible to make a strong case for a set of candidate genes face the problem of whether it is necessary to consider the variation in those genes as phased haplotypes, or whether the one-SNP-at-a-time approach might perform as well. There are three reasons why the phased haplotype route should be an improvement. First, the protein products of the candidate genes occur in polypeptide chains whose folding and other properties may depend on particular combinations of amino acids. Second, population genetic principles show us that variation in populations is inherently structured into haplotypes. Third, the statistical power of association tests with phased data is likely to be improved because of the reduction in dimension. However, in reality it takes a great deal of extra work to obtain valid haplotype phase information, and inferred phase information may simply compound the errors. In addition, if the causal connection between SNPs and a phenotype is truly driven by just a single SNP, then the haplotype-based approach may perform worse than the one-SNP-at-a-time approach. Here we examine some of the factors that affect haplotype patterns in genes, how haplotypes may be inferred, and how haplotypes have been useful in the context of testing association between candidate genes and complex traits. 相似文献
14.
Fallin MD Hetmanski JB Park J Scott AF Ingersoll R Fuernkranz HA McIntosh I Beaty TH 《Genetic epidemiology》2003,25(2):168-175
Oral clefts, one of the most common forms of birth defects, are considered to be of complex etiology, including both genetic and environmental causes. To date, however, no particular genetic cause has been confirmed for isolated, nonsyndromic oral clefts. Previous case-control and family-based association studies reported an association between an intronic CA repeat of the MSX1 gene and risk for oral clefts. In this study, we identify eight single-nucleotide polymorphisms (SNPs) in the MSX1 gene, and present genotype results for these SNPs in a set of 206 oral cleft cases and their parents. We performed single-marker and haplotype-based transmission disequilibrium tests (TDTs), and tested for evidence of interaction between MSX1 haplotypes and exposure to maternal smoking in the first trimester, using a case-only approach. The haplotype TDT analyses further implicate this gene, or region, in controlling the risk for oral clefts, particularly for cleft palate. In addition, case-only haplotype analyses suggest an interaction between variation in the MSX1 gene and exposure to maternal smoking. This study encourages further focus on the MSX1 gene region to ultimately determine specific variants predisposing to oral clefts. 相似文献
15.
Whole genome association studies (WGAS) have surged in popularity in recent years as technological advances have made large‐scale genotyping more feasible and as new exciting results offer tremendous hope and optimism. The logic of WGAS rests upon the common disease/common variant (CD/CV) hypothesis. Detection of association under the common disease/rare variant (CD/RV) scenario is much harder, and the current practices of WGAS may be under‐power without large enough sample sizes. In this article, we propose a generalized linear model with regularization (rGLM) approach for detecting disease‐haplotype association using unphased single nucleotide polymorphisms data that is applicable to both CD/CV and CD/RV scenarios. We borrow a dimension‐reduction method from the data mining and statistical learning literature, but use it for the purpose of weeding out haplotypes that are not associated with the disease so that the associated haplotypes, especially those that are rare, can stand out and be accounted for more precisely. By using high‐dimensional data analysis techniques, which are frequently employed in microarray analyses, interacting effects among haplotypes in different blocks can be investigated without much concern about the sample size being overwhelmed by the number of haplotype combinations. Our simulation study demonstrates the gain in power for detecting associations with moderate sample sizes. For detecting association under CD/RV, regression type methods such as that implemented in hapassoc may fail to provide coefficient estimates for rare associated haplotypes, resulting in a loss of power compared to rGLM. Furthermore, our results indicate that rGLM can uncover the associated variants much more frequently than can hapassoc. Genet. Epidemiol. 2009. © 2008 Wiley‐Liss, Inc. 相似文献
16.
We develop novel statistical tests for transmission disequilibrium testing (tests of linkage in the presence of association) for quantitative traits using parents and offspring. These joint tests utilize information in both the covariance (or more generally, dependency) between genotype and phenotype and the marginal distribution of genotype. Using computer simulation we test the validity (Type I error rate control) and power of the proposed methods, for additive, dominant, and recessive modes of inheritance, locus-specific heritability of the trait 0.05, 0.1, 0.2 with allele frequencies of P=0.2 and 0.4, and sample sizes of 500, 200, and 100 trios. Both random sampling and extreme sampling schemes were investigated. A multinomial logistic joint test provides the highest overall power irrespective of sample size, allele frequency, heritability, and modes of inheritance. 相似文献
17.
Rare‐variant association tests in longitudinal studies,with an application to the Multi‐Ethnic Study of Atherosclerosis (MESA) 下载免费PDF全文
Zihuai He Seunggeun Lee Min Zhang Jennifer A. Smith Xiuqing Guo Walter Palmas Sharon L.R. Kardia Iuliana Ionita‐Laza Bhramar Mukherjee 《Genetic epidemiology》2017,41(8):801-810
Over the past few years, an increasing number of studies have identified rare variants that contribute to trait heritability. Due to the extreme rarity of some individual variants, gene‐based association tests have been proposed to aggregate the genetic variants within a gene, pathway, or specific genomic region as opposed to a one‐at‐a‐time single variant analysis. In addition, in longitudinal studies, statistical power to detect disease susceptibility rare variants can be improved through jointly testing repeatedly measured outcomes, which better describes the temporal development of the trait of interest. However, usual sandwich/model‐based inference for sequencing studies with longitudinal outcomes and rare variants can produce deflated/inflated type I error rate without further corrections. In this paper, we develop a group of tests for rare‐variant association based on outcomes with repeated measures. We propose new perturbation methods such that the type I error rate of the new tests is not only robust to misspecification of within‐subject correlation, but also significantly improved for variants with extreme rarity in a study with small or moderate sample size. Through extensive simulation studies, we illustrate that substantially higher power can be achieved by utilizing longitudinal outcomes and our proposed finite sample adjustment. We illustrate our methods using data from the Multi‐Ethnic Study of Atherosclerosis for exploring association of repeated measures of blood pressure with rare and common variants based on exome sequencing data on 6,361 individuals. 相似文献
18.
Association analysis, with the aim of investigating genetic variations, is designed to detect genetic associations with observable traits, which has played an increasing part in understanding the genetic basis of diseases. Among these methods, haplotype‐based association studies are believed to possess prominent advantages, especially for the rare diseases in case‐control studies. However, when modeling these haplotypes, they are subjected to statistical problems caused by rare haplotypes. Fortunately, haplotype clustering offers an appealing solution. In this research, we have developed a new befitting haplotype similarity for “affinity propagation” clustering algorithm, which can account for the rare haplotypes primely, so as to control for the issue on degrees of freedom. The new similarity can incorporate haplotype structure information, which is believed to enhance the power and provide high resolution for identifying associations between genetic variants and disease. Our simulation studies show that the proposed approach offers merits in detecting disease‐marker associations in comparison with the cladistic haplotype clustering method CLADHC. We also illustrate an application of our method to cystic fibrosis, which shows quite accurate estimates during fine mapping. Genet. Epidemiol. 34: 633–641, 2010. © 2010 Wiley‐Liss, Inc. 相似文献
19.
For genome‐wide association studies with family‐based designs, we propose a Bayesian approach. We show that standard transmission disequilibrium test and family‐based association test statistics can naturally be implemented in a Bayesian framework, allowing flexible specification of the likelihood and prior odds. We construct a Bayes factor conditional on the offspring phenotype and parental genotype data and then use the data we conditioned on to inform the prior odds for each marker. In the construction of the prior odds, the evidence for association for each single marker is obtained at the population‐level by estimating its genetic effect size by fitting the conditional mean model. Since such genetic effect size estimates are statistically independent of the effect size estimation within the families, the actual data set can inform the construction of the prior odds without any statistical penalty. In contrast to Bayesian approaches that have recently been proposed for genome‐wide association studies, our approach does not require assumptions about the genetic effect size; this makes the proposed method entirely data‐driven. The power of the approach was assessed through simulation. We then applied the approach to a genome‐wide association scan to search for associations between single nucleotide polymorphisms and body mass index in the Childhood Asthma Management Program data. Genet. Epidemiol. 34:569–574, 2010. © 2010 Wiley‐Liss, Inc. 相似文献
20.
Ionita-Laza I Perry GH Raby BA Klanderman B Lee C Laird NM Weiss ST Lange C 《Genetic epidemiology》2008,32(3):273-284
Though there is an increasing support for an important contribution of copy number variation (CNV) to the genetic architecture of complex disease, few methods have been developed for the analysis of such variation in the context of genetic association studies. In this paper, we propose a generalization of family-based association tests (FBATs) to allow for the analysis of CNVs at a genome-wide level. We translate the popular FBAT approach so that, instead of genotypes, raw intensity values that reflect copy number are used directly in the test statistic, thereby bypassing the need for a CNV genotyping algorithm. Moreover, both inherited and de novo CNVs can be analyzed without any prior knowledge about the type of CNV, making it easily applicable to large-scale association studies. All robustness properties of the genotype FBAT approach are maintained and all previously developed FBAT extensions, including FBATs for time-to-onset, multivariate FBATs, and FBAT-testing strategies, can be directly transferred to the analysis of CNVs. Using simulation studies, we evaluate the power and the robustness of the new approach. Furthermore, for those CNVs that can be genotyped, we compare FBATs based on genotype calls with FBATs that are directly based on the intensity data. An application to one of the first CNV genome-wide-association studies of asthma identifies a very plausible candidate gene. A software implementation of the approach is freely available at http://www.hsph.harvard.edu/research/iuliana-ionita/software. The approach has also been completely integrated in the PBAT software package. 相似文献