首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
OBJECTIVES: Genotyping errors can induce biases in frequency estimates for haplotypes of single nucleotide polymorphisms (SNPs). Here, we considered the impact of SNP allele misclassification on haplotype odds ratio estimates from case-control studies of unrelated individuals. METHODS: We calculated bias analytically, using the haplotype counts expected in cases and controls under genotype misclassification. We evaluated the bias due to allele misclassification across a range of haplotype distributions using empirical haplotype frequencies within blocks of limited haplotype diversity. We also considered simple two- and three-locus haplotype distributions to understand the impact of haplotype frequency and number of SNPs on misclassification bias. RESULTS: We found that for common haplotypes (>5% frequency), realistic genotyping error rates (0.1-1% chance of miscalling an allele), and moderate relative risks (2-4), the bias was always towards the null and increases in magnitude with increasing error rate, increasing odds ratio. For common haplotypes, bias generally increased with increasing haplotype frequency, while for rare haplotypes, bias generally increased with decreasing frequency. When the chance of miscalling an allele is 0.5%, the median bias in haplotype-specific odds ratios for common haplotypes was generally small (<4% on the log odds ratio scale), but the bias for some individual haplotypes was larger (10-20%). Bias towards the null leads to a loss in power; the relative efficiency using a test statistic based upon misclassified haplotype data compared to a test based on the unobserved true haplotypes ranged from roughly 60% to 80%, and worsened with increasing haplotype frequency. CONCLUSIONS: The cumulative effect of small allele-calling errors across multiple loci can induce noticeable bias and reduce power in realistic scenarios. This has implications for the design of candidate gene association studies that utilize multi-marker haplotypes.  相似文献   

2.
Trend tests for genetic association using a matched case-control design are studied, which allows for a variable number of controls per case. However, the tests depend on the scores based on the underlying genetic model, thus it may result in substantial loss of power when the model is misspecified. Since the mode of inheritance may be unknown for complex diseases, robust trend tests in matched case-control studies are developed. Simulation is conducted to compare the trend tests and the robust trend tests under various genetic models. The results are applied to detect candidate-gene association using an example from a case-control aetiologic study of sarcoidosis.  相似文献   

3.
In case-control studies, subjects in the case group may be recruited from suspected patients who are diagnosed positively with disease. While many statistical methods have been developed for measurement error or misclassification of exposure variables in epidemiological studies, no studies have been reported on the effect of errors in diagnosing disease on testing genetic association in case-control studies. We study the impact of using the original Cochran-Armitage trend test assuming no diagnostic error when, in fact, cases and controls may be clinically diagnosed by an imperfect gold standard or a reference test. The type I error, sample size and asymptotic power of trend tests are examined under a family of genetic models in the presence of diagnostic error. The empirical powers of the trend tests are also compared by simulation studies under various genetic models.  相似文献   

4.
Association analysis, with the aim of investigating genetic variations, is designed to detect genetic associations with observable traits, which has played an increasing part in understanding the genetic basis of diseases. Among these methods, haplotype‐based association studies are believed to possess prominent advantages, especially for the rare diseases in case‐control studies. However, when modeling these haplotypes, they are subjected to statistical problems caused by rare haplotypes. Fortunately, haplotype clustering offers an appealing solution. In this research, we have developed a new befitting haplotype similarity for “affinity propagation” clustering algorithm, which can account for the rare haplotypes primely, so as to control for the issue on degrees of freedom. The new similarity can incorporate haplotype structure information, which is believed to enhance the power and provide high resolution for identifying associations between genetic variants and disease. Our simulation studies show that the proposed approach offers merits in detecting disease‐marker associations in comparison with the cladistic haplotype clustering method CLADHC. We also illustrate an application of our method to cystic fibrosis, which shows quite accurate estimates during fine mapping. Genet. Epidemiol. 34: 633–641, 2010. © 2010 Wiley‐Liss, Inc.  相似文献   

5.
Confounding caused by latent population structure in genome‐wide association studies has been a big concern despite the success of genome‐wide association studies at identifying genetic variants associated with complex diseases. In particular, because of the growing interest in association mapping using count phenotype data, it would be interesting to develop a testing framework for genetic associations that is immune to population structure when phenotype data consist of count measurements. Here, I propose a solution for testing associations between single nucleotide polymorphisms and a count phenotype in the presence of an arbitrary population structure. I consider a classical range of models for count phenotype data. Under these models, a unified test for genetic associations that protects against confounding was derived. An algorithm was developed to efficiently estimate the parameters that are required to fit the proposed model. I illustrate the proposed approach using simulation studies and an empirical study. Both simulated and real‐data examples suggest that the proposed method successfully corrects population structure.  相似文献   

6.
Case‐control genome‐wide association studies provide a vast amount of genetic information that may be used to investigate secondary phenotypes. We study the situation in which the primary disease is rare and the secondary phenotype and genetic markers are dichotomous. An analysis of the association between a genetic marker and the secondary phenotype based on controls only (CO) is valid, whereas standard methods that also use cases result in biased estimates and highly inflated type I error if there is an interaction between the secondary phenotype and the genetic marker on the risk of the primary disease. Here we present an adaptively weighted (AW) method that combines the case and control data to study the association, while reducing to the CO analysis if there is strong evidence of an interaction. The possibility of such an interaction and the misleading results for standard methods, but not for the AW or CO approaches, are illustrated by data from a case‐control study of colorectal adenoma. Simulations and asymptotic theory indicate that the AW method can reduce the mean square error for estimation with a prespecified SNP and increase the power to discover a new association in a genome‐wide study, compared to CO analysis. Further experience with genome‐wide studies is needed to determine when methods that assume no interaction gain precision and power, thereby can be recommended, and when methods such as the AW or CO approaches are needed to guard against the possibility of nonzero interactions. Genet. Epidemiol. 34:427–433, 2010. Published 2010 Wiley‐Liss, Inc.  相似文献   

7.
Many genetic analyses are done with incomplete information; for example, unknown phase in haplotype-based association studies. Measures of the amount of available information can be used for efficient planning of studies and/or analyses. In particular, the linkage disequilibrium (LD) between two sets of markers can be interpreted as the amount of information one set of markers contains for testing allele frequency differences in the second set, and measuring LD can be viewed as quantifying information in a missing data problem. We introduce a framework for measuring the association between two sets of variables; for example, genotype data for two distinct groups of markers, or haplotype and genotype data for a given set of polymorphisms. The goal is to quantify how much information is in one data set, e.g. genotype data for a set of SNPs, for estimating parameters that are functions of frequencies in the second data set, e.g. haplotype frequencies, relative to the ideal case of actually observing the complete data, e.g. haplotypes. In the case of genotype data on two mutually exclusive sets of markers, the measure determines the amount of multi-locus LD, and is equal to the classical measure r(2), if the sets consist each of one bi-allelic marker. In general, the measures are interpreted as the asymptotic ratio of sample sizes necessary to achieve the same power in case-control testing. The focus of this paper is on case-control allele/haplotype tests, but the framework can be extended easily to other settings like regressing quantitative traits on allele/haplotype counts, or tests on genotypes or diplotypes. We highlight applications of the approach, including tools for navigating the HapMap database [The International HapMap Consortium, 2003], and genotyping strategies for positional cloning studies.  相似文献   

8.
The present study introduces new Haplotype Sharing Transmission/Disequilibrium Tests (HS-TDTs) that allow for random genotyping errors. We evaluate the type I error rate and power of the new proposed tests under a variety of scenarios and perform a power comparison among the proposed tests, the HS-TDT and the single-marker TDT. The results indicate that the HS-TDT shows a significant increase in type I error when applied to data in which either Mendelian inconsistent trios are removed or Mendelian inconsistent markers are treated as missing genotypes, and the magnitude of the type I error increases both with an increase in sample size and with an increase in genotyping error rate. The results also show that a simple strategy, that is, merging each rare haplotype to a most similar common haplotype, can control the type I error inflation for a wide range of genotyping error rates, and after merging rare haplotypes, the power of the test is very similar to that without merging the rare haplotypes. Therefore, we conclude that a simple strategy may make the HS-TDT robust to genotyping errors. Our simulation results also show that this strategy may also be applicable to other haplotype-based TDTs.  相似文献   

9.
The standard procedure to assess genetic equilibrium is a χ2 test of goodness‐of‐fit. As is the case with any statistical procedure of that type, the null hypothesis is that the distribution underlying the data is in agreement with the model. Thus, a significant result indicates incompatibility of the observed data with the model, which is clearly at variance with the aim in the majority of applications: to exclude the existence of gross violations of the equilibrium condition. In current practice, we try to avoid this basic logical difficulty by increasing the significance bound to the P‐value (e.g. from 5 to 10%) and inferring compatibility of the data with Hardy Weinberg Equilibrium (HWE) from an insignificant result. Unfortunately, such direct inversion of a statistical testing procedure fails to produce a valid test of the hypothesis of interest, namely, that the data are in sufficiently good agreement with the model under which the P‐value is calculated. We present a logically unflawed solution to the problem of establishing (approximate) compatibility of an observed genotype distribution with HWE. The test is available in one‐ and two‐sided versions. For both versions, we provide tools for exact power calculation. We demonstrate the merits of the new approach through comparison with the traditional χ2 goodness‐of‐fit test in 2×60 genotype distributions from 43 published genetic studies of complex diseases where departure from HWE was noted in either the case or control sample. In addition, we show that the new test is useful for the analysis of genome‐wide association studies. Genet. Epidemiol. 33:569–580, 2009. © 2009 Wiley‐Liss, Inc.  相似文献   

10.
Hu YJ  Lin DY 《Genetic epidemiology》2010,34(8):803-815
Analysis of untyped single nucleotide polymorphisms (SNPs) can facilitate the localization of disease-causing variants and permit meta-analysis of association studies with different genotyping platforms. We present two approaches for using the linkage disequilibrium structure of an external reference panel to infer the unknown value of an untyped SNP from the observed genotypes of typed SNPs. The maximum-likelihood approach integrates the prediction of untyped genotypes and estimation of association parameters into a single framework and yields consistent and efficient estimators of genetic effects and gene-environment interactions with proper variance estimators. The imputation approach is a two-stage strategy, which first imputes the untyped genotypes by either the most likely genotypes or the expected genotype counts and then uses the imputed values in a downstream association analysis. The latter approach has proper control of type I error in single-SNP tests with possible covariate adjustments even when the reference panel is misspecified; however, type I error may not be properly controlled in testing multiple-SNP effects or gene-environment interactions. In general, imputation yields biased estimators of genetic effects and gene-environment interactions, and the variances are underestimated. We conduct extensive simulation studies to compare the bias, type I error, power, and confidence interval coverage between the maximum likelihood and imputation approaches in the analysis of single-SNP effects, multiple-SNP effects, and gene-environment interactions under cross-sectional and case-control designs. In addition, we provide an illustration with genome-wide data from the Wellcome Trust Case-Control Consortium (WTCCC) [2007].  相似文献   

11.
Han F  Pan W 《Genetic epidemiology》2010,34(7):680-688
To detect genetic association with common and complex diseases, many statistical tests have been proposed for candidate gene or genome-wide association studies with the case-control design. Due to linkage disequilibrium (LD), multi-marker association tests can gain power over single-marker tests with a Bonferroni multiple testing adjustment. Among many existing multi-marker association tests, most target to detect only one of many possible aspects in distributional differences between the genotypes of cases and controls, such as allele frequency differences, while a few new ones aim to target two or three aspects, all of which can be implemented in logistic regression. In contrast to logistic regression, a genomic distance-based regression (GDBR) approach aims to detect some high-order genotypic differences between cases and controls. A recent study has confirmed the high power of GDBR tests. At this moment, the popular logistic regression and the emerging GDBR approaches are completely unrelated; for example, one has to choose between the two. In this article, we reformulate GDBR as logistic regression, opening a venue to constructing other powerful tests while overcoming some limitations of GDBR. For example, asymptotic distributions can replace time-consuming permutations for deriving P-values and covariates, including gene-gene interactions, can be easily incorporated. Importantly, this reformulation facilitates combining GDBR with other existing methods in a unified framework of logistic regression. In particular, we show that Fisher's P-value combining method can boost statistical power by incorporating information from allele frequencies, Hardy-Weinberg disequilibrium, LD patterns, and other higher-order interactions among multi-markers as captured by GDBR.  相似文献   

12.
The impact of erroneous genotypes having passed standard quality control (QC) can be severe in genome-wide association studies, genotype imputation, and estimation of heritability and prediction of genetic risk based on single nucleotide polymorphisms (SNP). To detect such genotyping errors, a simple two-locus QC method, based on the difference in test statistic of association between single SNPs and pairs of SNPs, was developed and applied. The proposed approach could detect many problematic SNPs with statistical significance even when standard single SNP QC analyses fail to detect them in real data. Depending on the data set used, the number of erroneous SNPs that were not filtered out by standard single SNP QC but detected by the proposed approach varied from a few hundred to thousands. Using simulated data, it was shown that the proposed method was powerful and performed better than other tested existing methods. The power of the proposed approach to detect erroneous genotypes was ~80% for a 3% error rate per SNP. This novel QC approach is easy to implement and computationally efficient, and can lead to a better quality of genotypes for subsequent genotype-phenotype investigations.  相似文献   

13.
In case‐control single nucleotide polymorphism (SNP) data, the allele frequency, Hardy Weinberg Disequilibrium, and linkage disequilibrium (LD) contrast tests are three distinct sources of information about genetic association. While all three tests are typically developed in a retrospective context, we show that prospective logistic regression models may be developed that correspond conceptually to the retrospective tests. This approach provides a flexible framework for conducting a systematic series of association analyses using unphased genotype data and any number of covariates. For a single stage study, two single‐marker tests and four two‐marker tests are discussed. The true association models are derived and they allow us to understand why a model with only a linear term will generally fit well for a SNP in weak LD with a causal SNP, whatever the disease model, but not for a SNP in high LD with a non‐additive disease SNP. We investigate the power of the association tests using real LD parameters from chromosome 11 in the HapMap CEU population data. Among the single‐marker tests, the allelic test has on average the most power in the case of an additive disease, but for dominant, recessive, and heterozygote disadvantage diseases, the genotypic test has the most power. Among the four two‐marker tests, the Allelic‐LD contrast test, which incorporates linear terms for two markers and their interaction term, provides the most reliable power overall for the cases studied. Therefore, our result supports incorporating an interaction term as well as linear terms in multi‐marker tests. Genet. Epidemiol. 34:67–77, 2010. © 2009 Wiley‐Liss, Inc.  相似文献   

14.
Genetic association studies are a powerful tool to detect genetic variants that predispose to human disease. Once an associated variant is identified, investigators are also interested in estimating the effect of the identified variant on disease risk. Estimates of the genetic effect based on new association findings tend to be upwardly biased due to a phenomenon known as the “winner's curse.” Overestimation of genetic effect size in initial studies may cause follow‐up studies to be underpowered and so to fail. In this paper, we quantify the impact of the winner's curse on the allele frequency difference and odds ratio estimators for one‐ and two‐stage case‐control association studies. We then propose an ascertainment‐corrected maximum likelihood method to reduce the bias of these estimators. We show that overestimation of the genetic effect by the uncorrected estimator decreases as the power of the association study increases and that the ascertainment‐corrected method reduces absolute bias and mean square error unless power to detect association is high. Genet. Epidemiol. 33:453–462, 2009. © 2009 Wiley‐Liss, Inc.  相似文献   

15.
Zheng G 《Statistics in medicine》2003,22(16):2657-2666
In case-control studies, the Cochran-Armitage (CA) trend test is powerful for detection of an association between a risk allele and a marker. To apply this test, a score should be assigned to the genotypes based on the genetic model. When the underlying genetic model is unknown, the trend test statistic is a function of the score. In this paper, simple procedures are given to obtain two scores (max and min), which respectively maximize and minimize the CA trend test statistics for genetic associations. These two scores can be used to examine the effect of the choice of scores on the test of no association. When the CA trend test statistic with the max (or min) score is less (or greater) than a prespecified value, the conclusion is clear: we will accept (or reject) the null hypothesis of no association for any scores used. When this value is less than the CA trend test statistic with the max score but greater than the one with the min score, the decision of whether or not to reject the null hypothesis depends on the choice of scores. In this situation, the CA trend test with a prespecified score cannot be used without careful scientific justification of the choice of scores. The use of max and min scoring schemes is applied to a real data set.  相似文献   

16.
Whole genome association studies (WGAS) have surged in popularity in recent years as technological advances have made large‐scale genotyping more feasible and as new exciting results offer tremendous hope and optimism. The logic of WGAS rests upon the common disease/common variant (CD/CV) hypothesis. Detection of association under the common disease/rare variant (CD/RV) scenario is much harder, and the current practices of WGAS may be under‐power without large enough sample sizes. In this article, we propose a generalized linear model with regularization (rGLM) approach for detecting disease‐haplotype association using unphased single nucleotide polymorphisms data that is applicable to both CD/CV and CD/RV scenarios. We borrow a dimension‐reduction method from the data mining and statistical learning literature, but use it for the purpose of weeding out haplotypes that are not associated with the disease so that the associated haplotypes, especially those that are rare, can stand out and be accounted for more precisely. By using high‐dimensional data analysis techniques, which are frequently employed in microarray analyses, interacting effects among haplotypes in different blocks can be investigated without much concern about the sample size being overwhelmed by the number of haplotype combinations. Our simulation study demonstrates the gain in power for detecting associations with moderate sample sizes. For detecting association under CD/RV, regression type methods such as that implemented in hapassoc may fail to provide coefficient estimates for rare associated haplotypes, resulting in a loss of power compared to rGLM. Furthermore, our results indicate that rGLM can uncover the associated variants much more frequently than can hapassoc. Genet. Epidemiol. 2009. © 2008 Wiley‐Liss, Inc.  相似文献   

17.
To evaluate the risk of a disease associated with the joint effects of genetic susceptibility and environmental exposures, epidemiologic researchers often test for non-multiplicative gene-environment effects from case-control studies. In this article, we present a comparative study of four alternative tests for interactions: (i) the standard case-control method; (ii) the case-only method, which requires an assumption of gene-environment independence for the underlying population; (iii) a two-step method that decides between the case-only and case-control estimators depending on a statistical test for the gene-environment independence assumption and (iv) a novel empirical-Bayes (EB) method that combines the case-control and case-only estimators depending on the sample size and strength of the gene-environment association in the data. We evaluate the methods in terms of integrated Type I error and power, averaged with respect to varying scenarios for gene-environment association that are likely to appear in practice. These unique studies suggest that the novel EB procedure overall is a promising approach for detection of gene-environment interactions from case-control studies. In particular, the EB procedure, unlike the case-only or two-step methods, can closely maintain a desired Type I error under realistic scenarios of gene-environment dependence and yet can be substantially more powerful than the traditional case-control analysis when the gene-environment independence assumption is satisfied, exactly or approximately. Our studies also reveal potential utility of some non-traditional case-control designs that samples controls at a smaller rate than the cases. Apart from the simulation studies, we also illustrate the different methods by analyzing interactions of two commonly studied genes, N-acetyl transferase type 2 and glutathione s-transferase M1, with smoking and dietary exposures, in a large case-control study of colorectal cancer.  相似文献   

18.
Case-control association studies using unrelated individuals may offer an effective approach for identifying genetic variants that have small to moderate disease risks. In general, two different strategies may be employed to establish associations between genotypes and phenotypes: (1) collecting individual genotypes or (2) quantifying allele frequencies in DNA pools. These two technologies have their respective advantages. Individual genotyping gathers more information, whereas DNA pooling may be more cost effective. Recent technological advances in DNA pooling have generated great interest in using DNA pooling in association studies. In this article, we investigate the impacts of errors in genotyping or measuring allele frequencies on the identification of genetic associations with these two strategies. We find that, with current technologies, compared to individual genotyping, a larger sample is generally required to achieve the same power using DNA pooling. We further consider the use of DNA pooling as a screening tool to identify candidate regions for follow-up studies. We find that the majority of the positive regions identified from DNA pooling results may represent false positives if measurement errors are not appropriately considered in the design of the study.  相似文献   

19.
Genome‐wide association studies (GWAS) require considerable investment, so researchers often study multiple traits collected on the same set of subjects to maximize return. However, many GWAS have adopted a case‐control design; improperly accounting for case‐control ascertainment can lead to biased estimates of association between markers and secondary traits. We show that under the null hypothesis of no marker‐secondary trait association, naïve analyses that ignore ascertainment or stratify on case‐control status have proper Type I error rates except when both the marker and secondary trait are independently associated with disease risk. Under the alternative hypothesis, these methods are unbiased when the secondary trait is not associated with disease risk. We also show that inverse‐probability‐of‐sampling‐weighted (IPW) regression provides unbiased estimates of marker‐secondary trait association. We use simulation to quantify the Type I error, power and bias of naïve and IPW methods. IPW regression has appropriate Type I error in all situations we consider, but has lower power than naïve analyses. The bias for naïve analyses is small provided the marker is independent of disease risk. Considering the majority of tested markers in a GWAS are not associated with disease risk, naïve analyses provide valid tests of and nearly unbiased estimates of marker‐secondary trait association. Care must be taken when there is evidence that both the secondary trait and tested marker are associated with the primary disease, a situation we illustrate using an analysis of the relationship between a marker in FGFR2 and mammographic density in a breast cancer case‐control sample. Genet. Epidemiol. 33:717–728, 2009. © 2009 Wiley‐Liss, Inc.  相似文献   

20.
In some genetic association studies, samples contain both parental and unrelated controls. Under such scenarios, instead of analyzing only trios using family-based association tests or only unrelated subjects using a case-control study design, Nagelkerke et al. ([2004] Eur. J. Hum. Genet. 12:964-970) and Epstein et al. ([2005] Am. J. Hum. Genet. 76:592-608) proposed methods that implemented a likelihood ratio test to combine the two different types of data. In this article, we put forward a more powerful and simplified strategy to combine trios with unrelated subjects based on the haplotype relative risk (HRR) (Falk and Rubinstein [1987] Ann. Hum. Genet. 51:227-233). The HRR compares parental marker alleles transmitted to an affected offspring to those not transmitted as a test for association, a strategy that is similar to a case-control study that compares allele frequencies in diseased cases to those of unrelated controls. We prove that affected offspring can be pooled with diseased cases and that parental controls can be treated as unrelated controls when the trios and unrelated subjects are randomly sampled from the same population. Therefore, unrelated subjects can be incorporated into the HRR intuitively and effortlessly. For trios without complete parental genotypes, we adopted the strategy proposed by (Guo et al. [2005a] BMC Genet. 6:S90; [2005b] Hum. Hered. 59: 125-135), which is more feasible than the one proposed by Weinberg ([1999] Am. J. Hum. Genet. 64:1186-1193). In addition, simulation results suggest that the combined haplotype relative risk is more powerful than Epstein et al.'s method regardless of the disease prevalence in a homogeneous population.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号