共查询到20条相似文献,搜索用时 15 毫秒
1.
Dudbridge F 《Genetic epidemiology》2003,25(2):115-121
Association tests of multilocus haplotypes are of interest both in linkage disequilibrium mapping and in candidate gene studies. For case-parent trios, I discuss the extension of existing multilocus methods to include ambiguous haplotypes in tests of models which distinguish between the cis and trans phase. A likelihood-ratio test is proposed, using the expectation-maximization (E-M) algorithm to account for haplotype ambiguities. Assumptions about the population structure are required, but realistic situations, including population stratification, which violate the assumptions lead to conservative tests. I describe a permutation procedure for the null hypothesis of interest, which controls for violation of the assumptions. For general pedigrees, I describe extensions of the pedigree disequilibrium test to include uncertain haplotypes. The summary statistics are replaced by their expected values over prior distributions of haplotype frequencies. If prior distributions are not available, a valid test is possible by using the E-M algorithm to estimate the null distribution of haplotype frequencies. Similar methods are available for quantitative traits. Exact permutation tests are difficult to construct in small samples, but an approximate procedure is appropriate in large samples, and can be used to account for dependencies between tests of multiple haplotypes and loci. 相似文献
2.
The completion of the HapMap Project and the development of high-throughput single nucleotide polymorphism genotyping technologies have greatly enhanced the prospects of identifying and characterizing the genetic variants that influence complex traits. In principle, association analysis of haplotypes rather than single nucleotide polymorphisms may better capture an underlying causal variant, but the multiple haplotypes can lead to reduced statistical power due to the testing of (and need to correct for) a large number of haplotypes. This paper presents a novel method based on clustering similar haplotypes to address this issue. The method, implemented in the CLUMPHAP program, is an extension of the CLUMP program designed for the analysis of multi-allelic markers (Sham and Curtis [1995] Ann. Hum. Genet. 59(Pt1):97-105). CLUMPHAP performs a hierarchical clustering of the haplotypes and then computes the chi(2) statistic between each haplotype cluster and disease; the statistical significance of the largest of the chi(2) statistics is obtained by permutation testing. A significant result suggests that the presence of a disease-causing variant in the haplotype cluster is over-represented in cases. Using simulation studies, we have compared CLUMPHAP and more widely used approaches in terms of their statistical power to identify an untyped susceptibility locus. Our results show that CLUMPHAP tends to have greater power than the omnibus haplotype test and is comparable in power to multiple regression locus-coding approaches. 相似文献
3.
Genome‐wide association studies (GWAS) have become a very effective research tool to identify genetic variants of underlying various complex diseases. In spite of the success of GWAS in identifying thousands of reproducible associations between genetic variants and complex disease, in general, the association between genetic variants and a single phenotype is usually weak. It is increasingly recognized that joint analysis of multiple phenotypes can be potentially more powerful than the univariate analysis, and can shed new light on underlying biological mechanisms of complex diseases. In this paper, we develop a novel variable reduction method using hierarchical clustering method (HCM) for joint analysis of multiple phenotypes in association studies. The proposed method involves two steps. The first step applies a dimension reduction technique by using a representative phenotype for each cluster of phenotypes. Then, existing methods are used in the second step to test the association between genetic variants and the representative phenotypes rather than the individual phenotypes. We perform extensive simulation studies to compare the powers of multivariate analysis of variance (MANOVA), joint model of multiple phenotypes (MultiPhen), and trait‐based association test that uses extended simes procedure (TATES) using HCM with those of without using HCM. Our simulation studies show that using HCM is more powerful than without using HCM in most scenarios. We also illustrate the usefulness of using HCM by analyzing a whole‐genome genotyping data from a lung function study. 相似文献
4.
Many family-based tests of linkage disequilibrium (LD) are based on counts of alleles rather than genotypes. However, allele-based tests may not detect interactions among alleles at a single locus that are apparent when examining associations with genotypes. Family-based tests of LD based on genotypes have been developed, but they are typically valid as tests of association only in families with a single affected individual. To take advantage of families with multiple affected individuals, we propose the genotype-pedigree disequilibrium test (geno-PDT) to test for LD between marker locus genotypes and disease. Unlike previous tests for genotypic association, the geno-PDT is valid in general pedigrees. Simulations to compare the power of the allele-based PDT and geno-PDT reveal that under an additive model, the allele-based PDT is more powerful, but that the geno-PDT can have greater power when the genetic model is recessive or dominant. Perhaps the most important property of the geno-PDT is the ability to test for association with particular genotypes, which can reveal underlying patterns of association at the genotypic level. These genotype-specific tests can be used to suggest possible underlying genetic models that are consistent with the pattern of genotypic association. This is illustrated through an application to a candidate gene analysis of the MLLT3 gene in families with Alzheimer disease. The geno-PDT approach for testing genotypes in general family data provides a useful tool for identifying genes in complex disease, and partitioning individual genotype contributions will help to dissect the influence of genotype on risk. 相似文献
5.
We develop novel statistical tests for transmission disequilibrium testing (tests of linkage in the presence of association) for quantitative traits using parents and offspring. These joint tests utilize information in both the covariance (or more generally, dependency) between genotype and phenotype and the marginal distribution of genotype. Using computer simulation we test the validity (Type I error rate control) and power of the proposed methods, for additive, dominant, and recessive modes of inheritance, locus-specific heritability of the trait 0.05, 0.1, 0.2 with allele frequencies of P=0.2 and 0.4, and sample sizes of 500, 200, and 100 trios. Both random sampling and extreme sampling schemes were investigated. A multinomial logistic joint test provides the highest overall power irrespective of sample size, allele frequency, heritability, and modes of inheritance. 相似文献
6.
Zihuai He Seunggeun Lee Min Zhang Jennifer A. Smith Xiuqing Guo Walter Palmas Sharon L.R. Kardia Iuliana Ionita‐Laza Bhramar Mukherjee 《Genetic epidemiology》2017,41(8):801-810
Over the past few years, an increasing number of studies have identified rare variants that contribute to trait heritability. Due to the extreme rarity of some individual variants, gene‐based association tests have been proposed to aggregate the genetic variants within a gene, pathway, or specific genomic region as opposed to a one‐at‐a‐time single variant analysis. In addition, in longitudinal studies, statistical power to detect disease susceptibility rare variants can be improved through jointly testing repeatedly measured outcomes, which better describes the temporal development of the trait of interest. However, usual sandwich/model‐based inference for sequencing studies with longitudinal outcomes and rare variants can produce deflated/inflated type I error rate without further corrections. In this paper, we develop a group of tests for rare‐variant association based on outcomes with repeated measures. We propose new perturbation methods such that the type I error rate of the new tests is not only robust to misspecification of within‐subject correlation, but also significantly improved for variants with extreme rarity in a study with small or moderate sample size. Through extensive simulation studies, we illustrate that substantially higher power can be achieved by utilizing longitudinal outcomes and our proposed finite sample adjustment. We illustrate our methods using data from the Multi‐Ethnic Study of Atherosclerosis for exploring association of repeated measures of blood pressure with rare and common variants based on exome sequencing data on 6,361 individuals. 相似文献
7.
Association tests based on multi-marker haplotypes may be more powerful than those based on single markers. The existing association tests based on multi-marker haplotypes include Pearson's chi2 test which tests for the difference of haplotype distributions in cases and controls, and haplotype-similarity based methods which compare the average similarity among cases with that of the controls. In this article, we propose new association tests based on haplotype similarities. These new tests compare the average similarities within cases and controls with the average similarity between cases and controls. These methods can be applied to either phase-known or phase-unknown data. We compare the performance of the proposed methods with Pearson's chi2 test and the existing similarity-based tests by simulation studies under a variety of scenarios and by analyzing a real data set. The simulation results show that, in most cases, the new proposed methods are more powerful than both Pearson's chi2 test and the existing similarity-based tests. In one extreme case where the disease mutant induced at a very rare haplotype (相似文献
8.
Whole genome association studies (WGAS) have surged in popularity in recent years as technological advances have made large‐scale genotyping more feasible and as new exciting results offer tremendous hope and optimism. The logic of WGAS rests upon the common disease/common variant (CD/CV) hypothesis. Detection of association under the common disease/rare variant (CD/RV) scenario is much harder, and the current practices of WGAS may be under‐power without large enough sample sizes. In this article, we propose a generalized linear model with regularization (rGLM) approach for detecting disease‐haplotype association using unphased single nucleotide polymorphisms data that is applicable to both CD/CV and CD/RV scenarios. We borrow a dimension‐reduction method from the data mining and statistical learning literature, but use it for the purpose of weeding out haplotypes that are not associated with the disease so that the associated haplotypes, especially those that are rare, can stand out and be accounted for more precisely. By using high‐dimensional data analysis techniques, which are frequently employed in microarray analyses, interacting effects among haplotypes in different blocks can be investigated without much concern about the sample size being overwhelmed by the number of haplotype combinations. Our simulation study demonstrates the gain in power for detecting associations with moderate sample sizes. For detecting association under CD/RV, regression type methods such as that implemented in hapassoc may fail to provide coefficient estimates for rare associated haplotypes, resulting in a loss of power compared to rGLM. Furthermore, our results indicate that rGLM can uncover the associated variants much more frequently than can hapassoc. Genet. Epidemiol. 2009. © 2008 Wiley‐Liss, Inc. 相似文献
9.
Association analysis, with the aim of investigating genetic variations, is designed to detect genetic associations with observable traits, which has played an increasing part in understanding the genetic basis of diseases. Among these methods, haplotype‐based association studies are believed to possess prominent advantages, especially for the rare diseases in case‐control studies. However, when modeling these haplotypes, they are subjected to statistical problems caused by rare haplotypes. Fortunately, haplotype clustering offers an appealing solution. In this research, we have developed a new befitting haplotype similarity for “affinity propagation” clustering algorithm, which can account for the rare haplotypes primely, so as to control for the issue on degrees of freedom. The new similarity can incorporate haplotype structure information, which is believed to enhance the power and provide high resolution for identifying associations between genetic variants and disease. Our simulation studies show that the proposed approach offers merits in detecting disease‐marker associations in comparison with the cladistic haplotype clustering method CLADHC. We also illustrate an application of our method to cystic fibrosis, which shows quite accurate estimates during fine mapping. Genet. Epidemiol. 34: 633–641, 2010. © 2010 Wiley‐Liss, Inc. 相似文献
10.
For genome‐wide association studies with family‐based designs, we propose a Bayesian approach. We show that standard transmission disequilibrium test and family‐based association test statistics can naturally be implemented in a Bayesian framework, allowing flexible specification of the likelihood and prior odds. We construct a Bayes factor conditional on the offspring phenotype and parental genotype data and then use the data we conditioned on to inform the prior odds for each marker. In the construction of the prior odds, the evidence for association for each single marker is obtained at the population‐level by estimating its genetic effect size by fitting the conditional mean model. Since such genetic effect size estimates are statistically independent of the effect size estimation within the families, the actual data set can inform the construction of the prior odds without any statistical penalty. In contrast to Bayesian approaches that have recently been proposed for genome‐wide association studies, our approach does not require assumptions about the genetic effect size; this makes the proposed method entirely data‐driven. The power of the approach was assessed through simulation. We then applied the approach to a genome‐wide association scan to search for associations between single nucleotide polymorphisms and body mass index in the Childhood Asthma Management Program data. Genet. Epidemiol. 34:569–574, 2010. © 2010 Wiley‐Liss, Inc. 相似文献
11.
Shuai Wang Jing Hua Zhao Ping An Xiuqing Guo Richard A. Jensen Jonathan Marten Jennifer E. Huffman Karina Meidtner Heiner Boeing Archie Campbell Kenneth M. Rice Robert A. Scott Jie Yao Matthias B. Schulze Nicholas J. Wareham Ingrid B. Borecki Michael A. Province Jerome I. Rotter Caroline Hayward Mark O. Goodarzi James B. Meigs Josée Dupuis 《Genetic epidemiology》2016,40(3):244-252
For complex traits, most associated single nucleotide variants (SNV) discovered to date have a small effect, and detection of association is only possible with large sample sizes. Because of patient confidentiality concerns, it is often not possible to pool genetic data from multiple cohorts, and meta‐analysis has emerged as the method of choice to combine results from multiple studies. Many meta‐analysis methods are available for single SNV analyses. As new approaches allow the capture of low frequency and rare genetic variation, it is of interest to jointly consider multiple variants to improve power. However, for the analysis of haplotypes formed by multiple SNVs, meta‐analysis remains a challenge, because different haplotypes may be observed across studies. We propose a two‐stage meta‐analysis approach to combine haplotype analysis results. In the first stage, each cohort estimate haplotype effect sizes in a regression framework, accounting for relatedness among observations if appropriate. For the second stage, we use a multivariate generalized least square meta‐analysis approach to combine haplotype effect estimates from multiple cohorts. Haplotype‐specific association tests and a global test of independence between haplotypes and traits are obtained within our framework. We demonstrate through simulation studies that we control the type‐I error rate, and our approach is more powerful than inverse variance weighted meta‐analysis of single SNV analysis when haplotype effects are present. We replicate a published haplotype association between fasting glucose‐associated locus (G6PC2) and fasting glucose in seven studies from the Cohorts for Heart and Aging Research in Genomic Epidemiology Consortium and we provide more precise haplotype effect estimates. 相似文献
12.
In haplotype-based association studies for late onset diseases, one attractive design is to use available unaffected spouses as controls (Valle et al. [1998] Diab. Care 21:949-958). Given cases and spouses only, the standard expectation-maximization (EM) algorithm (Dempster et al. [1977] J. R. Stat. Soc. B 39:1-38) for case-control data can be used to estimate haplotype frequencies. But often we will have offspring for at least some of the spouse pairs, and offspring genotypes provide additional information about the haplotypes of the parents. Existing methods may either ignore the offspring information, or reconstruct haplotypes for the subjects using offspring information and discard data from those whose haplotypes cannot be reconstructed with high confidence. Neither of these approaches is efficient, and the latter approach may also be biased. For case-control data with some subjects forming spouse pairs and offspring genotypes available for some spouse pairs or individuals, we propose a unified, likelihood-based method of haplotype inference. The method makes use of available offspring genotype information to apportion ambiguous haplotypes for the subjects. For subjects without offspring genotype information, haplotypes are apportioned as in the standard EM algorithm for case-control data. Our method enables efficient haplotype frequency estimation using an EM algorithm and supports probabilistic haplotype reconstruction with the probability calculated based on the whole sample. We describe likelihood ratio and permutation tests to test for disease-haplotype association, and describe three test statistics that are potentially useful for detecting such an association. 相似文献
13.
Clement Ma Michael Boehnke Seunggeun Lee the GoTD Investigators 《Genetic epidemiology》2015,39(7):499-508
Although genome‐wide association studies (GWAS) have identified thousands of trait‐associated genetic variants, there are relatively few findings on the X chromosome. For analysis of low‐frequency variants (minor allele frequency <5%), investigators can use region‐ or gene‐based tests where multiple variants are analyzed jointly to increase power. To date, there are no gene‐based tests designed for association testing of low‐frequency variants on the X chromosome. Here we propose three gene‐based tests for the X chromosome: burden, sequence kernel association test (SKAT), and optimal unified SKAT (SKAT‐O). Using simulated case‐control and quantitative trait (QT) data, we evaluate the calibration and power of these tests as a function of (1) male:female sample size ratio; and (2) coding of haploid male genotypes for variants under X‐inactivation. For case‐control studies, all three tests are reasonably well‐calibrated for all scenarios we evaluated. As expected, power for gene‐based tests depends on the underlying genetic architecture of the genomic region analyzed. Studies with more (haploid) males are generally less powerful due to decreased number of chromosomes. Power generally is slightly greater when the coding scheme for male genotypes matches the true underlying model, but the power loss for misspecifying the (generally unknown) model is small. For QT studies, type I error and power results largely mirror those for binary traits. We demonstrate the use of these three gene‐based tests for X‐chromosome association analysis in simulated data and sequencing data from the Genetics of Type 2 Diabetes (GoT2D) study. 相似文献
14.
Qian D 《Genetic epidemiology》2004,27(1):43-52
The haplotype-sharing correlation (HSC) method for association analysis using family data is revisited by introducing a permutation procedure for estimating region-wise significance at each marker on a study segment. In simulation studies, the HSC method has a correct type 1 error rate in both unstructured and structured populations. The HSC signals on disease segments occur in the vicinity of a true disease locus on a restricted region without recombination hotspots. However, the peak signal may not pinpoint the true disease location in a small region with dense markers. The HSC method is shown to have higher power than single- and multilocus family-based association test (FBAT) methods when the true disease locus is unobserved among the study markers, and especially under conditions of weak linkage disequilibrium and multiple ancestral disease alleles. These simulation results suggest that the HSC method has the capacity to identify true disease-associated segments under allelic heterogeneity that go undetected by the FBAT method that compares allelic or haplotypic frequencies. 相似文献
15.
Genotype-based likelihood-ratio tests (LRT) of association that examine maternal and parent-of-origin effects have been previously developed in the framework of log-linear and conditional logistic regression models. In the situation where parental genotypes are missing, the expectation-maximization (EM) algorithm has been incorporated in the log-linear approach to allow incomplete triads to contribute to the LRT. We present an extension to this model which we call the Combined_LRT that incorporates additional information from the genotypes of unaffected siblings to improve assignment of incompletely typed families to mating type categories, thereby improving inference of missing parental data. Using simulations involving a realistic array of family structures, we demonstrate the validity of the Combined_LRT under the null hypothesis of no association and provide power comparisons under varying levels of missing data and using sibling genotype data. We demonstrate the improved power of the Combined_LRT compared with the family-based association test (FBAT), another widely used association test. Lastly, we apply the Combined_LRT to a candidate gene analysis in Autism families, some of which have missing parental genotypes. We conclude that the proposed log-linear model will be an important tool for future candidate gene studies, for many complex diseases where unaffected siblings can often be ascertained and where epigenetic factors such as imprinting may play a role in disease etiology. 相似文献
16.
Identifying gene‐environment (G‐E) interactions can contribute to a better understanding of disease etiology, which may help researchers develop disease prevention strategies and interventions. One big criticism of studying G‐E interaction is the lack of power due to sample size. Studies often restrict the interaction search to the top few hundred hits from a genome‐wide association study or focus on potential candidate genes. In this paper, we test interactions between a candidate gene and an environmental factor to improve power by analyzing multiple variants within a gene. We extend recently developed score statistic based genetic association testing approaches to the G‐E interaction testing problem. We also propose tests for interaction using gene‐based summary measures that pool variants together. Although it has recently been shown that these summary measures can be biased and may lead to inflated type I error, we show that under several realistic scenarios, we can still provide valid tests of interaction. These tests use significantly less degrees of freedom and thus can have much higher power to detect interaction. Additionally, we demonstrate that the iSeq‐aSum‐min test, which combines a gene‐based summary measure test, iSeq‐aSum‐G, and an interaction‐based summary measure test, iSeq‐aSum‐I, provides a powerful alternative to test G‐E interaction. We demonstrate the performance of these approaches using simulation studies and illustrate their performance to study interaction between the SNPs in several candidate genes and family climate environment on alcohol consumption using the Minnesota Center for Twin and Family Research dataset. 相似文献
17.
We consider three tests for genetic association in data from nuclear families (the Family-Based Association Test (FBAT) test proposed by Rabinowitz and Laird ([2000] Hum. Hered. 50:211-223), a second test proposed by Rabinowitz ([2002] J. Am. Stat. Assoc. 97:742-758), and the Family Genotype Analysis Program (FGAP) nonfounder or partial score test proposed by Clayton ([1999] Am. J. Hum. Genet. 65:1170-1177) and Whittemore and Tu ([2000] Am. J. Hum. Genet. 66:1329-1340)). We show that each test statistic arises from the efficient score of the family data as the solution to a set of constraints on its null expectation. Moreover, the FBAT and Rabinowitz tests (but not the FGAP test) are locally the most powerful among all tests satisfying their constraints. We used simulations to examine how the three tests perform in situations when their assumptions are violated and the number of families is not huge. We found that the FBAT test tended to have less power than the other two tests, particularly when applied to families in whom all offspring were affected. The Rabinowitz and FGAP tests performed similarly, although the latter tended to extract more information from families containing one typed parent. While none of the tests showed good power to detect rare, recessively acting genes, the Rabinowitz test with a sample variance estimate performed particularly poorly in this case. However, the Rabinowitz test with a model-based variance had power comparable to that of the FGAP test, and more accurate type I error rates. We conclude that for the situations we considered, the Rabinowitz test with model-based variance has good power without forfeiting robustness against misspecification of parental genotype probabilities. However, its utility is limited by the lack of a simple algorithm to apply it to families with varying structures and phenotypes. 相似文献
18.
Penalized likelihood methods have become increasingly popular in recent years for evaluating haplotype-phenotype association in case-control studies. Although a retrospective likelihood is dictated by the sampling scheme, these penalized methods are typically built on prospective likelihoods due to their modeling simplicity and computational feasibility. It has been well documented that for unpenalized methods, prospective analyses of case-control data can be valid but less efficient than their retrospective counterparts when testing for association, and result in substantial bias when estimating the haplotype effects. For penalized methods, which combine effect estimation and testing in one step, the impact of using a prospective likelihood is not clear. In this work, we examine the consequences of ignoring the sampling scheme for haplotype-based penalized likelihood methods. Our results suggest that the impact of prospective analyses depends on (1) the underlying genetic mode and (2) the genetic model adopted in the analysis. When the correct genetic model is used, the difference between the two analyses is negligible for additive and slight for dominant haplotype effects. For recessive haplotype effects, the more appropriate retrospective likelihood clearly outperforms the prospective likelihood. If an additive model is incorrectly used, as the true underlying genetic mode is unknown a priori, both retrospective and prospective penalized methods suffer from a sizeable power loss and increase in bias. The impact of using the incorrect genetic model is much bigger on retrospective analyses than prospective analyses, and results in comparable performances for both methods. An application of these methods to the Genetic Analysis Workshop 15 rheumatoid arthritis data is provided. 相似文献
19.
Current common wisdom posits that association analyses using family‐based designs have inflated type 1 error rates (if relationships are ignored) and independent controls are more powerful than familial controls. We explore these suppositions. We show theoretically that family‐based designs can have deflated type‐error rates. Through simulation, we examine the validity and power of family designs for several scenarios: cases from randomly or selectively ascertained pedigrees; and familial or independent controls. Family structures considered are as follows: sibships, nuclear families, moderate‐sized and extended pedigrees. Three methods were considered with the χ2 test for trend: variance correction (VC), weighted (weights assigned to account for genetic similarity), and naïve (ignoring relatedness) as well as the Modified Quasi‐likelihood Score (MQLS) test. Selectively ascertained pedigrees had similar levels of disease enrichment; random ascertainment had no such restriction. Data for 1,000 cases and 1,000 controls were created under the null and alternate models. The VC and MQLS methods were always valid. The naïve method was anti‐conservative if independent controls were used and valid or conservative in designs with familial controls. The weighted association method was generally valid for independent controls, and was conservative for familial controls. With regard to power, independent controls were more powerful for small‐to‐moderate selectively ascertained pedigrees, but familial and independent controls were equivalent in the extended pedigrees and familial controls were consistently more powerful for all randomly ascertained pedigrees. These results suggest a more complex situation than previously assumed, which has important implications for study design and analysis. Genet. Epidemiol. 35:174‐181, 2011. © 2011 Wiley‐Liss, Inc. 相似文献
20.
We propose an algorithm for analysing SNP-based population association studies, which is a development of that introduced by Molitor et al. [2003: Am J Hum Genet 73:1368-1384]. It uses clustering of haplotypes to overcome the major limitations of many current haplotype-based approaches. We define a between-haplotype score that is simple, yet appears to capture much of the information about evolutionary relatedness of the haplotypes in the vicinity of a (unobserved) putative causal locus. Haplotype clusters can then be defined via a putative ancestral haplotype and a cut-off distance. The number of an individual's two haplotypes that lie within the cluster predicts the individual's genotype at the causal locus. This predicted genotype can then be investigated for association with the phenotype of interest. We implement our approach within a Markov-chain Monte Carlo algorithm that, in effect, searches over locations and ancestral haplotypes to identify large, case-rich clusters. The algorithm successfully fine-maps a causal mutation in a test analysis using real data, and achieves almost 98% accuracy in predicting the genotype at the causal locus. A simulation study indicates that the new algorithm is substantially superior to alternative approaches, and it also allows us to identify situations in which multi-point approaches can substantially improve over single-SNP analyses. Our algorithm runs quickly and there is scope for extension to a wide range of disease models and genomic scales. 相似文献