首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Testing Hardy‐Weinberg equilibrium (HWE) in the control group is commonly used to detect genotyping errors in genetic association studies. We propose a likelihood ratio test for testing HWE in the study population using both case and control samples. This test incorporates underlying association models. Another feature is that, when we infer the disease‐genotype association, we explicitly incorporate HWE or a possible departure from Hardy‐Weinberg equilibrium (DHWE) into the model. Our unified framework enables us to infer the disease‐genotype association when a detected DHWE needs to be part of the model after causes for the DHWE are explored. Real data sets are used to illustrate the application of the methodology and its implication in genetic association studies. Our analysis and interpretation touch on issues such as genotyping errors, population selection, population stratification, or the study sampling plan, that all could be the cause of DHWE. Genet. Epidemiol. 2009. Published 2008 Wiley‐Liss, Inc.  相似文献   

2.
Genetic association is often determined in case-control studies by the differential distribution of alleles or genotypes. Recent work has demonstrated that association can also be assessed by deviations from the expected distributions of alleles or genotypes. Specifically, multiple methods motivated by the principles of Hardy-Weinberg equilibrium (HWE) have been developed. However, these methods do not take into account many of the assumptions of HWE. Therefore, we have developed a prevalence-based association test (PRAT) as an alternative method for detecting association in case-control studies. This method, also motivated by the principles of HWE, uses an estimated population allele frequency to generate expected genotype frequencies instead of using the case and control frequencies separately. Our method often has greater power, under a wide variety of genetic models, to detect association than genotypic, allelic or Cochran-Armitage trend association tests. Therefore, we propose PRAT as a powerful alternative method of testing for association.  相似文献   

3.
Genome‐wide association studies (GWAS) require considerable investment, so researchers often study multiple traits collected on the same set of subjects to maximize return. However, many GWAS have adopted a case‐control design; improperly accounting for case‐control ascertainment can lead to biased estimates of association between markers and secondary traits. We show that under the null hypothesis of no marker‐secondary trait association, naïve analyses that ignore ascertainment or stratify on case‐control status have proper Type I error rates except when both the marker and secondary trait are independently associated with disease risk. Under the alternative hypothesis, these methods are unbiased when the secondary trait is not associated with disease risk. We also show that inverse‐probability‐of‐sampling‐weighted (IPW) regression provides unbiased estimates of marker‐secondary trait association. We use simulation to quantify the Type I error, power and bias of naïve and IPW methods. IPW regression has appropriate Type I error in all situations we consider, but has lower power than naïve analyses. The bias for naïve analyses is small provided the marker is independent of disease risk. Considering the majority of tested markers in a GWAS are not associated with disease risk, naïve analyses provide valid tests of and nearly unbiased estimates of marker‐secondary trait association. Care must be taken when there is evidence that both the secondary trait and tested marker are associated with the primary disease, a situation we illustrate using an analysis of the relationship between a marker in FGFR2 and mammographic density in a breast cancer case‐control sample. Genet. Epidemiol. 33:717–728, 2009. © 2009 Wiley‐Liss, Inc.  相似文献   

4.
Genome‐wide scans of nucleotide variation in human subjects are providing an increasing number of replicated associations with complex disease traits. Most of the variants detected have small effects and, collectively, they account for a small fraction of the total genetic variance. Very large sample sizes are required to identify and validate findings. In this situation, even small sources of systematic or random error can cause spurious results or obscure real effects. The need for careful attention to data quality has been appreciated for some time in this field, and a number of strategies for quality control and quality assurance (QC/QA) have been developed. Here we extend these methods and describe a system of QC/QA for genotypic data in genome‐wide association studies (GWAS). This system includes some new approaches that (1) combine analysis of allelic probe intensities and called genotypes to distinguish gender misidentification from sex chromosome aberrations, (2) detect autosomal chromosome aberrations that may affect genotype calling accuracy, (3) infer DNA sample quality from relatedness and allelic intensities, (4) use duplicate concordance to infer SNP quality, (5) detect genotyping artifacts from dependence of Hardy‐Weinberg equilibrium test P‐values on allelic frequency, and (6) demonstrate sensitivity of principal components analysis to SNP selection. The methods are illustrated with examples from the “Gene Environment Association Studies” (GENEVA) program. The results suggest several recommendations for QC/QA in the design and execution of GWAS. Genet. Epidemiol. 34: 591–602, 2010. © 2010 Wiley‐Liss, Inc.  相似文献   

5.
A major challenge in genome‐wide association studies (GWASs) is to derive the multiple testing threshold when hypothesis tests are conducted using a large number of single nucleotide polymorphisms. Permutation tests are considered the gold standard in multiple testing adjustment in genetic association studies. However, it is computationally intensive, especially for GWASs, and can be impractical if a large number of random shuffles are used to ensure accuracy. Many researchers have developed approximation algorithms to relieve the computing burden imposed by permutation. One particularly attractive alternative to permutation is to calculate the effective number of independent tests, Meff, which has been shown to be promising in genetic association studies. In this study, we compare recently developed Meff methods and validate them by the permutation test with 10,000 random shuffles using two real GWAS data sets: an Illumina 1M BeadChip and an Affymetrix GeneChip® Human Mapping 500K Array Set. Our results show that the simpleM method produces the best approximation of the permutation threshold, and it does so in the shortest amount of time. We also show that Meff is indeed valid on a genome‐wide scale in these data sets based on statistical theory and significance tests. The significance thresholds derived can provide practical guidelines for other studies using similar population samples and genotyping platforms. Genet. Epidemiol. 34:100–105, 2010. © 2009 Wiley‐Liss, Inc.  相似文献   

6.
Case‐control genome‐wide association studies provide a vast amount of genetic information that may be used to investigate secondary phenotypes. We study the situation in which the primary disease is rare and the secondary phenotype and genetic markers are dichotomous. An analysis of the association between a genetic marker and the secondary phenotype based on controls only (CO) is valid, whereas standard methods that also use cases result in biased estimates and highly inflated type I error if there is an interaction between the secondary phenotype and the genetic marker on the risk of the primary disease. Here we present an adaptively weighted (AW) method that combines the case and control data to study the association, while reducing to the CO analysis if there is strong evidence of an interaction. The possibility of such an interaction and the misleading results for standard methods, but not for the AW or CO approaches, are illustrated by data from a case‐control study of colorectal adenoma. Simulations and asymptotic theory indicate that the AW method can reduce the mean square error for estimation with a prespecified SNP and increase the power to discover a new association in a genome‐wide study, compared to CO analysis. Further experience with genome‐wide studies is needed to determine when methods that assume no interaction gain precision and power, thereby can be recommended, and when methods such as the AW or CO approaches are needed to guard against the possibility of nonzero interactions. Genet. Epidemiol. 34:427–433, 2010. Published 2010 Wiley‐Liss, Inc.  相似文献   

7.
Genome‐wide association studies are characterized by a huge number of statistical tests performed to discover new disease‐related genetic variants [in the form of single‐nucleotide polymorphisms (SNPs)] in human DNA. Many SNPs have been identified for cross‐sectionally measured phenotypes. However, there is a growing interest in genetic determinants of the evolution of traits over time. Dealing with correlated observations from the same individual, we need to apply advanced statistical techniques. The linear mixed model is popular but also much more computationally demanding than fitting a linear regression model to independent observations. We propose a conditional two‐step approach as an approximate method to explore the longitudinal relationship between the trait and the SNP. In a simulation study, we compare several fast methods with respect to their accuracy and speed. The conditional two‐step approach is applied to relate SNPs to longitudinal bone mineral density responses collected in the Rotterdam Study. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

8.
Problems associated with insufficient power have haunted the analysis of genome‐wide association studies and are likely to be the main challenge for the analysis of next‐generation sequencing data. Ranking genes according to their strength of association with the investigated phenotype is one solution. To obtain rankings for genes, researchers can draw from a wide range of statistics summarizing the relationships between variants mapped to a gene and the phenotype. Hence, it is of interest to explore the performance of these statistics in the context of rankings. To this end, we conducted a simulation study (limited to genes of equal sizes) of three different summary statistics examining the ability to rank genes in a meaningful order. The weighted sum of squared marginal score test (Pan, 2009), RareCover algorithm (Bahtia et al., 2010) and the elastic net regularization (Zou and Hastie, 2005) were chosen, because they can handle common as well as rare variants. The test based on the score statistic outperformed both other methods in almost all investigated scenarios. It was the only measure to consistently detect genes with interacting causal variants. However, the RareCover algorithm proved better at identifying genes including causal variants with small effect sizes and low minor allele frequency than the weighted sum of squared marginal score test. The performance of the elastic net regularization was unimpressive for all but the simplest scenarios. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

9.
Association analysis, with the aim of investigating genetic variations, is designed to detect genetic associations with observable traits, which has played an increasing part in understanding the genetic basis of diseases. Among these methods, haplotype‐based association studies are believed to possess prominent advantages, especially for the rare diseases in case‐control studies. However, when modeling these haplotypes, they are subjected to statistical problems caused by rare haplotypes. Fortunately, haplotype clustering offers an appealing solution. In this research, we have developed a new befitting haplotype similarity for “affinity propagation” clustering algorithm, which can account for the rare haplotypes primely, so as to control for the issue on degrees of freedom. The new similarity can incorporate haplotype structure information, which is believed to enhance the power and provide high resolution for identifying associations between genetic variants and disease. Our simulation studies show that the proposed approach offers merits in detecting disease‐marker associations in comparison with the cladistic haplotype clustering method CLADHC. We also illustrate an application of our method to cystic fibrosis, which shows quite accurate estimates during fine mapping. Genet. Epidemiol. 34: 633–641, 2010. © 2010 Wiley‐Liss, Inc.  相似文献   

10.
Genome‐wide association studies have recently identified many new loci associated with human complex diseases. These newly discovered variants typically have weak effects requiring studies with large numbers of individuals to achieve the statistical power necessary to identify them. Likely, there exist even more associated variants, which remain to be found if even larger association studies can be assembled. Meta‐analysis provides a straightforward means of increasing study sample sizes without collecting new samples by combining existing data sets. One obstacle to combining studies is that they are often performed on platforms with different marker sets. Current studies overcome this issue by imputing genotypes missing from each of the studies and then performing standard meta‐analysis techniques. We show that this approach may result in a loss of power since errors in imputation are not accounted for. We present a new method for performing meta‐analysis over imputed single nucleotide polymorphisms, show that it is optimal with respect to power, and discuss practical implementation issues. Through simulation experiments, we show that our imputation aware meta‐analysis approach outperforms or matches standard meta‐analysis approaches. Genet. Epidemiol. 34: 537–542, 2010. © 2010 Wiley‐Liss, Inc.  相似文献   

11.
The large number of markers considered in a genome‐wide association study (GWAS) has resulted in a simplification of analyses conducted. Most studies are analyzed one marker at a time using simple tests like the trend test. Methods that account for the special features of genetic association studies, yet remain computationally feasible for genome‐wide analysis, are desirable as they may lead to increased power to detect associations. Haplotype sharing attempts to translate between population genetics and genetic epidemiology. Near a recent mutation that increases disease risk, haplotypes of case participants should be more similar to each other than haplotypes of control participants; conversely, the opposite pattern may be found near a recent mutation that lowers disease risk. We give computationally simple association tests based on haplotype sharing that can be easily applied to GWASs while allowing use of fast (but not likelihood‐based) haplotyping algorithms and properly accounting for the uncertainty introduced by using inferred haplotypes. We also give haplotype‐sharing analyses that adjust for population stratification. Applying our methods to a GWAS of Parkinson's disease, we find a genome‐wide significant signal in the CAST gene that is not found by single‐SNP methods. Further, a missing‐data artifact that causes a spurious single‐SNP association on chromosome 9 does not impact our test. Genet. Epidemiol. 33:657–667, 2009. Published 2009 Wiley‐Liss, Inc.  相似文献   

12.
Case‐control association studies often collect from their subjects information on secondary phenotypes. Reusing the data and studying the association between genes and secondary phenotypes provide an attractive and cost‐effective approach that can lead to discovery of new genetic associations. A number of approaches have been proposed, including simple and computationally efficient ad hoc methods that ignore ascertainment or stratify on case‐control status. Justification for these approaches relies on the assumption of no covariates and the correct specification of the primary disease model as a logistic model. Both might not be true in practice, for example, in the presence of population stratification or the primary disease model following a probit model. In this paper, we investigate the validity of ad hoc methods in the presence of covariates and possible disease model misspecification. We show that in taking an ad hoc approach, it may be desirable to include covariates that affect the primary disease in the secondary phenotype model, even though these covariates are not necessarily associated with the secondary phenotype. We also show that when the disease is rare, ad hoc methods can lead to severely biased estimation and inference if the true disease model follows a probit model instead of a logistic model. Our results are justified theoretically and via simulations. Applied to real data analysis of genetic associations with cigarette smoking, ad hoc methods collectively identified as highly significant () single nucleotide polymorphisms from over 10 genes, genes that were identified in previous studies of smoking cessation.  相似文献   

13.
Case‐control association studies often collect extensive information on secondary phenotypes, which are quantitative or qualitative traits other than the case‐control status. Exploring secondary phenotypes can yield valuable insights into biological pathways and identify genetic variants influencing phenotypes of direct interest. All publications on secondary phenotypes have used standard statistical methods, such as least‐squares regression for quantitative traits. Because of unequal selection probabilities between cases and controls, the case‐control sample is not a random sample from the general population. As a result, standard statistical analysis of secondary phenotype data can be extremely misleading. Although one may avoid the sampling bias by analyzing cases and controls separately or by including the case‐control status as a covariate in the model, the associations between a secondary phenotype and a genetic variant in the case and control groups can be quite different from the association in the general population. In this article, we present novel statistical methods that properly reflect the case‐control sampling in the analysis of secondary phenotype data. The new methods provide unbiased estimation of genetic effects and accurate control of false‐positive rates while maximizing statistical power. We demonstrate the pitfalls of the standard methods and the advantages of the new methods both analytically and numerically. The relevant software is available at our website. Genet. Epidemiol. 2009. © 2008 Wiley‐Liss, Inc.  相似文献   

14.
Hu YJ  Lin DY 《Genetic epidemiology》2010,34(8):803-815
Analysis of untyped single nucleotide polymorphisms (SNPs) can facilitate the localization of disease-causing variants and permit meta-analysis of association studies with different genotyping platforms. We present two approaches for using the linkage disequilibrium structure of an external reference panel to infer the unknown value of an untyped SNP from the observed genotypes of typed SNPs. The maximum-likelihood approach integrates the prediction of untyped genotypes and estimation of association parameters into a single framework and yields consistent and efficient estimators of genetic effects and gene-environment interactions with proper variance estimators. The imputation approach is a two-stage strategy, which first imputes the untyped genotypes by either the most likely genotypes or the expected genotype counts and then uses the imputed values in a downstream association analysis. The latter approach has proper control of type I error in single-SNP tests with possible covariate adjustments even when the reference panel is misspecified; however, type I error may not be properly controlled in testing multiple-SNP effects or gene-environment interactions. In general, imputation yields biased estimators of genetic effects and gene-environment interactions, and the variances are underestimated. We conduct extensive simulation studies to compare the bias, type I error, power, and confidence interval coverage between the maximum likelihood and imputation approaches in the analysis of single-SNP effects, multiple-SNP effects, and gene-environment interactions under cross-sectional and case-control designs. In addition, we provide an illustration with genome-wide data from the Wellcome Trust Case-Control Consortium (WTCCC) [2007].  相似文献   

15.
The associations between haplotypes and disease phenotypes offer valuable clues about the genetic determinants of complex diseases. It is highly challenging to make statistical inferences about these associations because of the unknown gametic phase in genotype data. We describe a general likelihood-based approach to inferring haplotype-disease associations in studies of unrelated individuals. We consider all possible phenotypes (including disease indicator, quantitative trait, and potentially censored age at onset of disease) and all commonly used study designs (including cross-sectional, case-control, cohort, nested case-control, and case-cohort). The effects of haplotypes on phenotype are characterized by appropriate regression models, which allow various genetic mechanisms and gene-environment interactions. We present the likelihood functions for all study designs and disease phenotypes under Hardy-Weinberg disequilibrium. The corresponding maximum likelihood estimators are approximately unbiased, normally distributed, and statistically efficient. We provide simple and efficient numerical algorithms to calculate the maximum likelihood estimators and their variances, and implement these algorithms in a freely available computer program. Extensive simulation studies demonstrate that the proposed methods perform well in realistic situations. An application to the Carolina Breast Cancer Study reveals significant haplotype effects and haplotype-smoking interactions in the development of breast cancer.  相似文献   

16.
Many complex diseases are influenced by genetic variations in multiple genes, each with only a small marginal effect on disease susceptibility. Pathway analysis, which identifies biological pathways associated with disease outcome, has become increasingly popular for genome‐wide association studies (GWAS). In addition to combining weak signals from a number of SNPs in the same pathway, results from pathway analysis also shed light on the biological processes underlying disease. We propose a new pathway‐based analysis method for GWAS, the supervised principal component analysis (SPCA) model. In the proposed SPCA model, a selected subset of SNPs most associated with disease outcome is used to estimate the latent variable for a pathway. The estimated latent variable for each pathway is an optimal linear combination of a selected subset of SNPs; therefore, the proposed SPCA model provides the ability to borrow strength across the SNPs in a pathway. In addition to identifying pathways associated with disease outcome, SPCA also carries out additional within‐category selection to identify the most important SNPs within each gene set. The proposed model operates in a well‐established statistical framework and can handle design information such as covariate adjustment and matching information in GWAS. We compare the proposed method with currently available methods using data with realistic linkage disequilibrium structures, and we illustrate the SPCA method using the Wellcome Trust Case‐Control Consortium Crohn Disease (CD) data set. Genet. Epidemiol. 34: 716‐724, 2010. © 2010 Wiley‐Liss, Inc.  相似文献   

17.
With the emergence of Biobanks alongside large‐scale genome‐wide association studies (GWAS) we will soon be in the enviable situation of obtaining precise estimates of population allele frequencies for SNPs which make up the panels in standard genotyping arrays, such as those produced from Illumina and Affymetrix. For disease association studies it is well known that for rare diseases with known population minor allele frequencies (pMAFs) a case‐only design is most powerful. That is, for a fixed budget the optimal procedure is to genotype only cases (affecteds). In such tests experimenters look for a divergence from allele distribution in cases from that of the known population pMAF; in order to test the null hypothesis of no association between the disease status and the allele frequency. However, what has not been previously characterized is the utility of controls (known unaffecteds) when available. In this study we consider frequentist and Bayesian statistical methods for testing for SNP genotype association when population MAFs are known and when both cases and controls are available. We demonstrate that for rare diseases the most powerful frequentist design is, somewhat counterintuitively, to actively discard the controls even though they contain information on the association. In contrast we develop a Bayesian test which uses all available information (cases and controls) and appears to exhibit uniformaly greater power than all frequentist methods we considered. Genet. Epidemiol. 33:371–378, 2009. © 2009 Wiley Liss, Inc.  相似文献   

18.
We propose a method to analyze family‐based samples together with unrelated cases and controls. The method builds on the idea of matched case–control analysis using conditional logistic regression (CLR). For each trio within the family, a case (the proband) and matched pseudo‐controls are constructed, based upon the transmitted and untransmitted alleles. Unrelated controls, matched by genetic ancestry, supplement the sample of pseudo‐controls; likewise unrelated cases are also paired with genetically matched controls. Within each matched stratum, the case genotype is contrasted with control/pseudo‐control genotypes via CLR, using a method we call matched‐CLR (mCLR). Eigenanalysis of numerous SNP genotypes provides a tool for mapping genetic ancestry. The result of such an analysis can be thought of as a multidimensional map, or eigenmap, in which the relative genetic similarities and differences amongst individuals is encoded in the map. Once constructed, new individuals can be projected onto the ancestry map based on their genotypes. Successful differentiation of individuals of distinct ancestry depends on having a diverse, yet representative sample from which to construct the ancestry map. Once samples are well‐matched, mCLR yields comparable power to competing methods while ensuring excellent control over Type I error. Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

19.
Genome‐wide association studies have achieved unprecedented success in the identification of novel genes and pathways implicated in complex traits. Typically, studies for disease use a case‐control (CC) design and studies for quantitative traits (QT) are population based. The question that we address is what is the equivalence between CC and QT association studies in terms of detection power and sample size? We compare the binary and continuous traits by assuming a threshold model for disease and assuming that the effect size on disease liability has similar feature as on QT. We derive the approximate ratio of the non‐centrality parameter (NCP) between CC and QT association studies, which is determined by sample size, disease prevalence (K) and the proportion of cases (v) in the CC study. For disease with prevalence <0.1, CC association study with equal numbers of cases and controls (v=0.5) needs smaller sample size than QT association study to achieve equivalent power, e.g. a CC association study of schizophrenia (K=0.01) needs only ~55% sample size required for association study of height. So a planned meta‐analysis for height on ~120,000 individuals has power equivalent to a CC study on 33,100 schizophrenia cases and 33,100 controls, a size not yet achievable for this disease. With equal sample size, when v=K, the power of CC association study is much less than that of QT association study because of the information lost by transforming a quantitative continuous trait to a binary trait. Genet. Epidemiol. 34: 254–257, 2010. © 2009 Wiley‐Liss, Inc.  相似文献   

20.
Hao K  Xu X  Laird N  Wang X  Xu X 《Genetic epidemiology》2004,26(1):22-30
At the current stage, a large number of single nucleotide polymorphisms (SNPs) have been deployed in searching for genes underlying complex diseases. A powerful method is desirable for efficient analysis of SNP data. Recently, a novel method for multiple SNP association test using a combination of allelic association (AA) and Hardy-Weinberg disequilibrium (HWD) has been proposed. However, the power of this test has not been systematically examined. In this study, we conducted a simulation study to further evaluate the statistical power of the new procedure, as well as of the influence of the HWD on its performance. The simulation examined the scenarios of multiple disease SNPs among a candidate pool, assuming different parameters including allele frequencies and risk ratios, dominant, additive, and recessive genetic models, and the existence of gene-gene interactions and linkage disequilibrium (LD). We also evaluated the performance of this test in capturing real disease associated SNPs, when a significant global P value is detected. Our results suggest that this new procedure is more powerful than conventional single-point analyses with correction of multiple testing. However, inclusion of HWD reduces the power under most circumstances. We applied the novel association test procedure to a case-control study of preterm delivery (PTD), examining the effects of 96 candidate gene SNPs concurrently, and detected a global P value of 0.0250 by using Cochran-Armitage chi(2)s as "starting" statistics in the procedure. In the following single point analysis, SNPs on IL1RN, IL1R2, ESR1, Factor 5, and OPRM1 genes were identified as possible risk factors in PTD.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号