首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The case‐control study is a common design for assessing the association between genetic exposures and a disease phenotype. Though association with a given (case‐control) phenotype is always of primary interest, there is often considerable interest in assessing relationships between genetic exposures and other (secondary) phenotypes. However, the case‐control sample represents a biased sample from the general population. As a result, if this sampling framework is not correctly taken into account, analyses estimating the effect of exposures on secondary phenotypes can be biased leading to incorrect inference. In this paper, we address this problem and propose a general approach for estimating and testing the population effect of a genetic variant on a secondary phenotype. Our approach is based on inverse probability weighted estimating equations, where the weights depend on genotype and the secondary phenotype. We show that, though slightly less efficient than a full likelihood‐based analysis when the likelihood is correctly specified, it is substantially more robust to model misspecification, and can out‐perform likelihood‐based analysis, both in terms of validity and power, when the model is misspecified. We illustrate our approach with an application to a case‐control study extracted from the Framingham Heart Study. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

2.
With the emergence of Biobanks alongside large‐scale genome‐wide association studies (GWAS) we will soon be in the enviable situation of obtaining precise estimates of population allele frequencies for SNPs which make up the panels in standard genotyping arrays, such as those produced from Illumina and Affymetrix. For disease association studies it is well known that for rare diseases with known population minor allele frequencies (pMAFs) a case‐only design is most powerful. That is, for a fixed budget the optimal procedure is to genotype only cases (affecteds). In such tests experimenters look for a divergence from allele distribution in cases from that of the known population pMAF; in order to test the null hypothesis of no association between the disease status and the allele frequency. However, what has not been previously characterized is the utility of controls (known unaffecteds) when available. In this study we consider frequentist and Bayesian statistical methods for testing for SNP genotype association when population MAFs are known and when both cases and controls are available. We demonstrate that for rare diseases the most powerful frequentist design is, somewhat counterintuitively, to actively discard the controls even though they contain information on the association. In contrast we develop a Bayesian test which uses all available information (cases and controls) and appears to exhibit uniformaly greater power than all frequentist methods we considered. Genet. Epidemiol. 33:371–378, 2009. © 2009 Wiley Liss, Inc.  相似文献   

3.
Advancement in sequencing technology enables the study of association between complex disorder phenotypes and single‐nucleotide polymorphisms with rare mutations. However, the rare genetic variant has extremely small variance and impairs testing power of traditional statistical methods. We introduce a W‐test collapsing method to evaluate rare‐variant association by measuring the distributional differences between cases and controls through combined log of odds ratio within a genomic region. The method is model‐free and inherits chi‐squared distribution with degrees of freedom estimated from bootstrapped samples of the data, and allows for fast and accurate P‐value calculation without the need of permutations. The proposed method is compared with the Weighted‐Sum Statistic and Sequence Kernel Association Test on simulation datasets, and showed good performances and significantly faster computing speed. In the application of real next‐generation sequencing dataset of hypertensive disorder, it identified genes of interesting biological functions associated to metabolism disorder and inflammation, including the MACROD1, NLRP7, AGK, PAK6, and APBB1. The proposed method offers an efficient and effective way for testing rare genetic variants in whole exome sequencing datasets.  相似文献   

4.
Confounding due to population substructure is always a concern in genetic association studies. Although methods have been proposed to adjust for population stratification in the context of common variation, it is unclear how well these approaches will work when interrogating rare variation. Family‐based association tests can be constructed that are robust to population stratification. For example, when considering a quantitative trait, a linear model can be used that decomposes genetic effects into between‐ and within‐family components and a test of the within‐family component is robust to population stratification. However, this within‐family test ignores between‐family information potentially leading to a loss of power. Here, we propose a family‐based two‐stage rare‐variant test for quantitative traits. We first construct a weight for each variant within a gene, or other genetic unit, based on score tests of between‐family effect parameters. These weights are then used to combine variants using score tests of within‐family effect parameters. Because the between‐family and within‐family tests are orthogonal under the null hypothesis, this two‐stage approach can increase power while still maintaining validity. Using simulation, we show that this two‐stage test can significantly improve power while correctly maintaining type I error. We further show that the two‐stage approach maintains the robustness to population stratification of the within‐family test and we illustrate this using simulations reflecting samples composed of continental and closely related subpopulations.  相似文献   

5.
Genome‐wide association studies are helping to dissect the etiology of complex diseases. Although case‐control association tests are generally more powerful than family‐based association tests, population stratification can lead to spurious disease‐marker association or mask a true association. Several methods have been proposed to match cases and controls prior to genotyping, using family information or epidemiological data, or using genotype data for a modest number of genetic markers. Here, we describe a genetic similarity score matching (GSM) method for efficient matched analysis of cases and controls in a genome‐wide or large‐scale candidate gene association study. GSM comprises three steps: (1) calculating similarity scores for pairs of individuals using the genotype data; (2) matching sets of cases and controls based on the similarity scores so that matched cases and controls have similar genetic background; and (3) using conditional logistic regression to perform association tests. Through computer simulation we show that GSM correctly controls false‐positive rates and improves power to detect true disease predisposing variants. We compare GSM to genomic control using computer simulations, and find improved power using GSM. We suggest that initial matching of cases and controls prior to genotyping combined with careful re‐matching after genotyping is a method of choice for genome‐wide association studies. Genet. Epidemiol. 33:508–517, 2009. © 2009 Wiley‐Liss, Inc.  相似文献   

6.
Testing Hardy‐Weinberg equilibrium (HWE) in the control group is commonly used to detect genotyping errors in genetic association studies. We propose a likelihood ratio test for testing HWE in the study population using both case and control samples. This test incorporates underlying association models. Another feature is that, when we infer the disease‐genotype association, we explicitly incorporate HWE or a possible departure from Hardy‐Weinberg equilibrium (DHWE) into the model. Our unified framework enables us to infer the disease‐genotype association when a detected DHWE needs to be part of the model after causes for the DHWE are explored. Real data sets are used to illustrate the application of the methodology and its implication in genetic association studies. Our analysis and interpretation touch on issues such as genotyping errors, population selection, population stratification, or the study sampling plan, that all could be the cause of DHWE. Genet. Epidemiol. 2009. Published 2008 Wiley‐Liss, Inc.  相似文献   

7.
Genome‐wide association studies have achieved unprecedented success in the identification of novel genes and pathways implicated in complex traits. Typically, studies for disease use a case‐control (CC) design and studies for quantitative traits (QT) are population based. The question that we address is what is the equivalence between CC and QT association studies in terms of detection power and sample size? We compare the binary and continuous traits by assuming a threshold model for disease and assuming that the effect size on disease liability has similar feature as on QT. We derive the approximate ratio of the non‐centrality parameter (NCP) between CC and QT association studies, which is determined by sample size, disease prevalence (K) and the proportion of cases (v) in the CC study. For disease with prevalence <0.1, CC association study with equal numbers of cases and controls (v=0.5) needs smaller sample size than QT association study to achieve equivalent power, e.g. a CC association study of schizophrenia (K=0.01) needs only ~55% sample size required for association study of height. So a planned meta‐analysis for height on ~120,000 individuals has power equivalent to a CC study on 33,100 schizophrenia cases and 33,100 controls, a size not yet achievable for this disease. With equal sample size, when v=K, the power of CC association study is much less than that of QT association study because of the information lost by transforming a quantitative continuous trait to a binary trait. Genet. Epidemiol. 34: 254–257, 2010. © 2009 Wiley‐Liss, Inc.  相似文献   

8.
Genome‐wide association (GWA) studies have proved to be extremely successful in identifying novel common polymorphisms contributing effects to the genetic component underlying complex traits. Nevertheless, one source of, as yet, undiscovered genetic determinants of complex traits are those mediated through the effects of rare variants. With the increasing availability of large‐scale re‐sequencing data for rare variant discovery, we have developed a novel statistical method for the detection of complex trait associations with these loci, based on searching for accumulations of minor alleles within the same functional unit. We have undertaken simulations to evaluate strategies for the identification of rare variant associations in population‐based genetic studies when data are available from re‐sequencing discovery efforts or from commercially available GWA chips. Our results demonstrate that methods based on accumulations of rare variants discovered through re‐sequencing offer substantially greater power than conventional analysis of GWA data, and thus provide an exciting opportunity for future discovery of genetic determinants of complex traits. Genet. Epidemiol. 34: 188–193, 2010. © 2009 Wiley‐Liss, Inc.  相似文献   

9.
Proper control of confounding due to population stratification is crucial for valid analysis of case-control association studies. Fine matching of cases and controls based on genetic ancestry is an increasingly popular strategy to correct for such confounding, both in genome-wide association studies (GWASs) as well as studies that employ next-generation sequencing, where matching can be used when selecting a subset of participants from a GWAS for rare-variant analysis. Existing matching methods match on measures of genetic ancestry that combine multiple components of ancestry into a scalar quantity. However, we show that including nonconfounding ancestry components in a matching criterion can lead to inaccurate matches, and hence to an improper control of confounding. To resolve this issue, we propose a novel method that assigns cases and controls to matched strata based on the stratification score (Epstein et al. [2007] Am J Hum Genet 80:921-930), which is the probability of disease given genomic variables. Matching on the stratification score leads to more accurate matches because case participants are matched to control participants who have a similar risk of disease given ancestry information. We illustrate our matching method using the African-American arm of the GAIN GWAS of schizophrenia. In this study, we observe that confounding due to stratification can be resolved by our matching approach but not by other existing matching procedures. We also use simulated data to show our novel matching approach can provide a more appropriate correction for population stratification than existing matching approaches.  相似文献   

10.
There is an emerging interest in sequencing‐based association studies of multiple rare variants. Most association tests suggested in the literature involve collapsing rare variants with or without weighting. Recently, a variance‐component score test [sequence kernel association test (SKAT)] was proposed to address the limitations of collapsing method. Although SKAT was shown to outperform most of the alternative tests, its applications and power might be restricted and influenced by missing genotypes. In this paper, we suggest a new method based on testing whether the fraction of causal variants in a region is zero. The new association test, T REM, is derived from a random‐effects model and allows for missing genotypes, and the choice of weighting function is not required when common and rare variants are analyzed simultaneously. We performed simulations to study the type I error rates and power of four competing tests under various conditions on the sample size, genotype missing rate, variant frequency, effect directionality, and the number of non‐causal rare variant and/or causal common variant. The simulation results showed that T REM was a valid test and less sensitive to the inclusion of non‐causal rare variants and/or low effect common variants or to the presence of missing genotypes. When the effects were more consistent in the same direction, T REM also had better power performance. Finally, an application to the Shanghai Breast Cancer Study showed that rare causal variants at the FGFR2 gene were detected by T REM and SKAT, but T REM produced more consistent results for different sets of rare and common variants. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

11.
Genome‐wide association studies (GWAS) often measure gene–environment interactions (G × E). We consider the problem of accurately estimating a G × E in a case–control GWAS when a subset of the controls have silent, or undiagnosed, disease and the frequency of the silent disease varies by the environmental variable. We show that using case–control status without accounting for misdiagnosis can lead to biased estimates of the G × E. We further propose a pseudolikelihood approach to remove the bias and accurately estimate how the relationship between the genetic variant and the true disease status varies by the environmental variable. We demonstrate our method in extensive simulations and apply our method to a GWAS of prostate cancer.  相似文献   

12.
The potential for bias from population stratification (PS) has raised concerns about case-control studies involving admixed ethnicities. We evaluated the potential bias due to PS in relating a binary outcome with a candidate gene under simulated settings where study populations consist of multiple ethnicities. Disease risks were assigned within the range of prostate cancer rates of African Americans reported in SEER registries assuming k=2, 5, or 10 admixed ethnicities. Genotype frequencies were considered in the range of 5-95%. Under a model assuming no genotype effect on disease (odds ratio (OR)=1), the range of observed OR estimates ignoring ethnicity was 0.64-1.55 for k=2, 0.72-1.33 for k=5, and 0.81-1.22 for k=10. When genotype effect on disease was modeled to be OR=2, the ranges of observed OR estimates were 1.28-3.09, 1.43-2.65, and 1.62-2.42 for k=2, 5, and 10 ethnicities, respectively. Our results indicate that the magnitude of bias is small unless extreme differences exist in genotype frequency. Bias due to PS decreases as the number of admixed ethnicities increases. The biases are bounded by the minimum and maximum of all pairwise baseline disease odds ratios across ethnicities. Therefore, bias due to PS alone may be small when baseline risk differences are small within major categories of admixed ethnicity, such as African Americans.  相似文献   

13.
There has been increasing interest in identifying genes within the human genome that influence multiple diverse phenotypes. In the presence of pleiotropy, joint testing of these phenotypes is not only biologically meaningful but also statistically more powerful than univariate analysis of each separate phenotype accounting for multiple testing. Although many cross‐phenotype association tests exist, the majority of such methods assume samples composed of unrelated subjects and therefore are not applicable to family‐based designs, including the valuable case‐parent trio design. In this paper, we describe a robust gene‐based association test of multiple phenotypes collected in a case‐parent trio study. Our method is based on the kernel distance covariance (KDC) method, where we first construct a similarity matrix for multiple phenotypes and a similarity matrix for genetic variants in a gene; we then test the dependency between the two similarity matrices. The method is applicable to either common variants or rare variants in a gene, and resulting tests from the method are by design robust to confounding due to population stratification. We evaluated our method through simulation studies and observed that the method is substantially more powerful than standard univariate testing of each separate phenotype. We also applied our method to phenotypic and genotypic data collected in case‐parent trios as part of the Genetics of Kidneys in Diabetes (GoKinD) study and identified a genome‐wide significant gene demonstrating cross‐phenotype effects that was not identified using standard univariate approaches.  相似文献   

14.
Genome‐wide case‐control association study is gaining popularity, thanks to the rapid development of modern genotyping technology. In such studies, population stratification is a potential concern especially when the number of study subjects is large as it can lead to seriously inflated false‐positive rates. Current methods addressing this issue are still not completely immune to excess false positives. A simple method that corrects for population stratification is proposed. This method modifies a test statistic such as the Armitage trend test by using an additive constant that measures the variation of the effect size confounded by population stratification across genomic control (GC) markers. As a result, the original statistic is deflated by a multiplying factor that is specific to the marker being tested for association. This deflating multiplying factor is guaranteed to be larger than 1. These properties are in contrast to the conventional GC method where the original statistic is deflated by a common factor regardless of the marker being tested and the deflation factor may turn out to be less than 1. The new method is introduced first for regular case‐control design and then for other situations such as quantitative traits and the presence of covariates. Extensive simulation study indicates that this new method provides an appealing alternative for genetic association analysis in the presence of population stratification. Genet. Epidemiol. 33:637–645, 2009. © 2009 Wiley‐Liss, Inc.  相似文献   

15.
Case‐control association studies often collect extensive information on secondary phenotypes, which are quantitative or qualitative traits other than the case‐control status. Exploring secondary phenotypes can yield valuable insights into biological pathways and identify genetic variants influencing phenotypes of direct interest. All publications on secondary phenotypes have used standard statistical methods, such as least‐squares regression for quantitative traits. Because of unequal selection probabilities between cases and controls, the case‐control sample is not a random sample from the general population. As a result, standard statistical analysis of secondary phenotype data can be extremely misleading. Although one may avoid the sampling bias by analyzing cases and controls separately or by including the case‐control status as a covariate in the model, the associations between a secondary phenotype and a genetic variant in the case and control groups can be quite different from the association in the general population. In this article, we present novel statistical methods that properly reflect the case‐control sampling in the analysis of secondary phenotype data. The new methods provide unbiased estimation of genetic effects and accurate control of false‐positive rates while maximizing statistical power. We demonstrate the pitfalls of the standard methods and the advantages of the new methods both analytically and numerically. The relevant software is available at our website. Genet. Epidemiol. 2009. © 2008 Wiley‐Liss, Inc.  相似文献   

16.
Population‐based case‐control design has become one of the most popular approaches for conducting genome‐wide association scans for rare diseases like cancer. In this article, we propose a novel method for improving the power of the widely used single‐single‐nucleotide polymorphism (SNP) two‐degrees‐of‐freedom (2 d.f.) association test for case‐control studies by exploiting the common assumption of Hardy‐Weinberg Equilibrium (HWE) for the underlying population. A key feature of the method is that it can relax the assumed model constraints via a completely data‐adaptive shrinkage estimation approach so that the number of false‐positive results due to the departure of HWE is controlled. The method is computationally simple and is easily scalable to association tests involving hundreds of thousands or millions of genetic markers. Simulation studies as well as an application involving data from a real genome‐wide association study illustrate that the proposed method is very robust for large‐scale association studies and can improve the power for detecting susceptibility SNPs with recessive effects, when compared to existing methods. Implications of the general estimation strategy beyond the simple 2 d.f. association test are discussed. Genet. Epidemiol. 33:740–750, 2009. Published 2009 Wiley‐Liss, Inc.  相似文献   

17.
18.
In case‐control studies, exposure assessments are almost always error‐prone. In the absence of a gold standard, two or more assessment approaches are often used to classify people with respect to exposure. Each imperfect assessment tool may lead to misclassification of exposure assignment; the exposure misclassification may be differential with respect to case status or not; and, the errors in exposure classification under the different approaches may be independent (conditional upon the true exposure status) or not. Although methods have been proposed to study diagnostic accuracy in the absence of a gold standard, these methods are infrequently used in case‐control studies to correct exposure misclassification that is simultaneously differential and dependent. In this paper, we proposed a Bayesian method to estimate the measurement‐error corrected exposure‐disease association, accounting for both differential and dependent misclassification. The performance of the proposed method is investigated using simulations, which show that the proposed approach works well, as well as an application to a case‐control study assessing the association between asbestos exposure and mesothelioma. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

19.
We examine the impact of nondifferential outcome misclassification on odds ratios estimated from pair‐matched case‐control studies and propose a Bayesian model to adjust these estimates for misclassification bias. The model relies on access to a validation subgroup with confirmed outcome status for all case‐control pairs as well as prior knowledge about the positive and negative predictive value of the classification mechanism. We illustrate the model's performance on simulated data and apply it to a database study examining the presence of ten morbidities in the prodromal phase of multiple sclerosis.  相似文献   

20.
It is well known that using proper weights for genetic variants is crucial in enhancing the power of gene‐ or pathway‐based association tests. To increase the power, we propose a general approach that adaptively selects weights among a class of weight families and apply it to the popular sequencing kernel association test. Through comprehensive simulation studies, we demonstrate that the proposed method can substantially increase power under some conditions. Applications to real data are also presented. This general approach can be extended to all current set‐based rare variant association tests whose performances depend on variant's weight assignment.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号