首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Confounding caused by latent population structure in genome‐wide association studies has been a big concern despite the success of genome‐wide association studies at identifying genetic variants associated with complex diseases. In particular, because of the growing interest in association mapping using count phenotype data, it would be interesting to develop a testing framework for genetic associations that is immune to population structure when phenotype data consist of count measurements. Here, I propose a solution for testing associations between single nucleotide polymorphisms and a count phenotype in the presence of an arbitrary population structure. I consider a classical range of models for count phenotype data. Under these models, a unified test for genetic associations that protects against confounding was derived. An algorithm was developed to efficiently estimate the parameters that are required to fit the proposed model. I illustrate the proposed approach using simulation studies and an empirical study. Both simulated and real‐data examples suggest that the proposed method successfully corrects population structure.  相似文献   

2.
In the new era of large‐scale collaborative Genome Wide Association Studies (GWAS), population stratification has become a critical issue that must be addressed. In order to build upon the methods developed to control the confounding effect of a structured population, it is extremely important to visualize and quantify that effect. In this work, we develop methodology for single nucleotide polymorphism (SNP) selection and subsequent population stratification visualization based on deviation from Hardy‐Weinberg equilibrium in conjunction with non‐metric multidimensional scaling (MDS); a distance‐based multivariate technique. Through simulation, it is shown that SNP selection based on Hardy‐Weinberg disequilibrium (HWD) is robust against confounding linkage disequilibrium patterns that have been problematic in past studies and methods as well as producing a differentiated SNP set. Non‐metric MDS is shown to be a multivariate visualization tool preferable to principal components in conjunction with HWD SNP selection through theoretical and empirical study from HapMap samples. The proposed selection tool offers a simple and effective way to select appropriate substructure‐informative markers for use in exploring the effect that population stratification may have in association studies. Genet. Epidemiol. 33:488–496, 2009. © 2009 Wiley‐Liss, Inc.  相似文献   

3.
Family‐based association studies are commonly used in genetic research because they can be robust to population stratification (PS). Recent advances in high‐throughput genotyping technologies have produced a massive amount of genomic data in family‐based studies. However, current family‐based association tests are mainly focused on evaluating individual variants one at a time. In this article, we introduce a family‐based generalized genetic random field (FB‐GGRF) method to test the joint association between a set of autosomal SNPs (i.e., single‐nucleotide polymorphisms) and disease phenotypes. The proposed method is a natural extension of a recently developed GGRF method for population‐based case‐control studies. It models offspring genotypes conditional on parental genotypes, and, thus, is robust to PS. Through simulations, we presented that under various disease scenarios the FB‐GGRF has improved power over a commonly used family‐based sequence kernel association test (FB‐SKAT). Further, similar to GGRF, the proposed FB‐GGRF method is asymptotically well‐behaved, and does not require empirical adjustment of the type I error rates. We illustrate the proposed method using a study of congenital heart defects with family trios from the National Birth Defects Prevention Study (NBDPS).  相似文献   

4.
The potential for bias from population stratification (PS) has raised concerns about case-control studies involving admixed ethnicities. We evaluated the potential bias due to PS in relating a binary outcome with a candidate gene under simulated settings where study populations consist of multiple ethnicities. Disease risks were assigned within the range of prostate cancer rates of African Americans reported in SEER registries assuming k=2, 5, or 10 admixed ethnicities. Genotype frequencies were considered in the range of 5-95%. Under a model assuming no genotype effect on disease (odds ratio (OR)=1), the range of observed OR estimates ignoring ethnicity was 0.64-1.55 for k=2, 0.72-1.33 for k=5, and 0.81-1.22 for k=10. When genotype effect on disease was modeled to be OR=2, the ranges of observed OR estimates were 1.28-3.09, 1.43-2.65, and 1.62-2.42 for k=2, 5, and 10 ethnicities, respectively. Our results indicate that the magnitude of bias is small unless extreme differences exist in genotype frequency. Bias due to PS decreases as the number of admixed ethnicities increases. The biases are bounded by the minimum and maximum of all pairwise baseline disease odds ratios across ethnicities. Therefore, bias due to PS alone may be small when baseline risk differences are small within major categories of admixed ethnicity, such as African Americans.  相似文献   

5.
Recent studies showed that population substructure (PS) can have more complex impact on rare variant tests and that similarity‐based collapsing tests (e.g., SKAT) may suffer more severely by PS than burden‐based tests. In this work, we evaluate the performance of SKAT coupling with principal components (PC) or variance components (VC) based PS correction methods. We consider confounding effects caused by PS including stratified populations, admixed populations, and spatially distributed nongenetic risk; we investigate which types of variants (e.g., common, less frequent, rare, or all variants) should be used to effectively control for confounding effects. We found that (i) PC‐based methods can account for confounding effects in most scenarios except for admixture, although the number of sufficient PCs depends on the PS complexity and the type of variants used. (ii) PCs based on all variants (i.e., common + less frequent + rare) tend to require equal or fewer sufficient PCs and often achieve higher power than PCs based on other variant types. (iii) VC‐based methods can effectively adjust for confounding in all scenarios (even for admixture), though the type of variants should be used to construct VC may vary. (iv) VC based on all variants works consistently in all scenarios, though its power may be sometimes lower than VC based on other variant types. Given that the best‐performed method and which variants to use depend on the underlying unknown confounding mechanisms, a robust strategy is to perform SKAT analyses using VC‐based methods based on all variants.  相似文献   

6.
Many expression quantitative trait loci (eQTL) studies have been conducted to investigate the biological effects of variants in gene regulation. However, these eQTL studies may suffer from low or moderate statistical power and overly conservative false-discovery rate. In practice, most algorithms for eQTL identification do not model the joint effects of multiple genetic variants with weak or moderate influence. Here we present a novel machine-learning algorithm, lasso least-squares kernel machine (LSKM-LASSO) that model the association between multiple genetic variants and phenotypic traits simultaneously with the existence of nongenetic and genetic confounding. With a more general and flexible framework for the estimation of genetic confounding, LSKM-LASSO is able to provide a more accurate evaluation of the joint effects of multiple genetic variants. Our simulations demonstrate that our approach outperforms three state-of-the-art alternatives in terms of eQTL identification and phenotype prediction. We then apply our method to genotype and gene expression data of 11 tissues obtained from the Genotype-Tissue Expression project. Our algorithm was able to identify more genes with eQTL than other algorithms. By incorporating a regularization term and combining it with least-squares kernel machine, LSKM-LASSO provides a powerful tool for eQTL mapping and phenotype prediction.  相似文献   

7.
Propensity score (PS) methods have been used extensively to adjust for confounding factors in the statistical analysis of observational data in comparative effectiveness research. There are four major PS‐based adjustment approaches: PS matching, PS stratification, covariate adjustment by PS, and PS‐based inverse probability weighting. Though covariate adjustment by PS is one of the most frequently used PS‐based methods in clinical research, the conventional variance estimation of the treatment effects estimate under covariate adjustment by PS is biased. As Stampf et al. have shown, this bias in variance estimation is likely to lead to invalid statistical inference and could result in erroneous public health conclusions (e.g., food and drug safety and adverse events surveillance). To address this issue, we propose a two‐stage analytic procedure to develop a valid variance estimator for the covariate adjustment by PS analysis strategy. We also carry out a simple empirical bootstrap resampling scheme. Both proposed procedures are implemented in an R function for public use. Extensive simulation results demonstrate the bias in the conventional variance estimator and show that both proposed variance estimators offer valid estimates for the true variance, and they are robust to complex confounding structures. The proposed methods are illustrated for a post‐surgery pain study. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

8.
It is known that measurement error leads to bias in assessing exposure effects, which can however, be corrected if independent replicates are available. For expensive replicates, two‐stage (2S) studies that produce data ‘missing by design’, may be preferred over a single‐stage (1S) study, because in the second stage, measurement of replicates is restricted to a sample of first‐stage subjects. Motivated by an occupational study on the acute effect of carbon black exposure on respiratory morbidity, we compare the performance of several bias‐correction methods for both designs in a simulation study: an instrumental variable method (EVROS IV) based on grouping strategies, which had been recommended especially when measurement error is large, the regression calibration and the simulation extrapolation methods. For the 2S design, either the problem of ‘missing’ data was ignored or the ‘missing’ data were imputed using multiple imputations. Both in 1S and 2S designs, in the case of small or moderate measurement error, regression calibration was shown to be the preferred approach in terms of root mean square error. For 2S designs, regression calibration as implemented by Stata software is not recommended in contrast to our implementation of this method; the ‘problematic’ implementation of regression calibration although substantially improved with use of multiple imputations. The EVROS IV method, under a good/fairly good grouping, outperforms the regression calibration approach in both design scenarios when exposure mismeasurement is severe. Both in 1S and 2S designs with moderate or large measurement error, simulation extrapolation severely failed to correct for bias. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

9.
Propensity score methods are increasingly being used to estimate the effects of treatments and exposures when using observational data. The propensity score was initially developed for use with binary exposures. The generalized propensity score (GPS) is an extension of the propensity score for use with quantitative or continuous exposures (eg, dose or quantity of medication, income, or years of education). We used Monte Carlo simulations to examine the performance of different methods of using the GPS to estimate the effect of continuous exposures on binary outcomes. We examined covariate adjustment using the GPS and weighting using weights based on the inverse of the GPS. We examined both the use of ordinary least squares to estimate the propensity function and the use of the covariate balancing propensity score algorithm. The use of methods based on the GPS was compared with the use of G‐computation. All methods resulted in essentially unbiased estimation of the population dose‐response function. However, GPS‐based weighting tended to result in estimates that displayed greater variability and had higher mean squared error when the magnitude of confounding was strong. Of the methods based on the GPS, covariate adjustment using the GPS tended to result in estimates with lower variability and mean squared error when the magnitude of confounding was strong. We illustrate the application of these methods by estimating the effect of average neighborhood income on the probability of death within 1 year of hospitalization for an acute myocardial infarction.  相似文献   

10.
The matched case‐control designs are commonly used to control for potential confounding factors in genetic epidemiology studies especially epigenetic studies with DNA methylation. Compared with unmatched case‐control studies with high‐dimensional genomic or epigenetic data, there have been few variable selection methods for matched sets. In an earlier paper, we proposed the penalized logistic regression model for the analysis of unmatched DNA methylation data using a network‐based penalty. However, for popularly applied matched designs in epigenetic studies that compare DNA methylation between tumor and adjacent non‐tumor tissues or between pre‐treatment and post‐treatment conditions, applying ordinary logistic regression ignoring matching is known to bring serious bias in estimation. In this paper, we developed a penalized conditional logistic model using the network‐based penalty that encourages a grouping effect of (1) linked Cytosine‐phosphate‐Guanine (CpG) sites within a gene or (2) linked genes within a genetic pathway for analysis of matched DNA methylation data. In our simulation studies, we demonstrated the superiority of using conditional logistic model over unconditional logistic model in high‐dimensional variable selection problems for matched case‐control data. We further investigated the benefits of utilizing biological group or graph information for matched case‐control data. We applied the proposed method to a genome‐wide DNA methylation study on hepatocellular carcinoma (HCC) where we investigated the DNA methylation levels of tumor and adjacent non‐tumor tissues from HCC patients by using the Illumina Infinium HumanMethylation27 Beadchip. Several new CpG sites and genes known to be related to HCC were identified but were missed by the standard method in the original paper. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

11.
In genetic association studies, it is important to distinguish direct and indirect genetic effects in order to build truly functional models. For this purpose, we consider a directed acyclic graph setting with genetic variants, primary and intermediate phenotypes, and confounding factors. In order to make valid statistical inference on direct genetic effects on the primary phenotype, it is necessary to consider all potential effects in the graph, and we propose to use the estimating equations method with robust Huber–White sandwich standard errors. We evaluate the proposed causal inference based on estimating equations (CIEE) method and compare it with traditional multiple regression methods, the structural equation modeling method, and sequential G‐estimation methods through a simulation study for the analysis of (completely observed) quantitative traits and time‐to‐event traits subject to censoring as primary phenotypes. The results show that CIEE provides valid estimators and inference by successfully removing the effect of intermediate phenotypes from the primary phenotype and is robust against measured and unmeasured confounding of the indirect effect through observed factors. All other methods except the sequential G‐estimation method for quantitative traits fail in some scenarios where their test statistics yield inflated type I errors. In the analysis of the Genetic Analysis Workshop 19 dataset, we estimate and test genetic effects on blood pressure accounting for intermediate gene expression phenotypes. The results show that CIEE can identify genetic variants that would be missed by traditional regression analyses. CIEE is computationally fast, widely applicable to different fields, and available as an R package.  相似文献   

12.
Genome‐wide association studies are helping to dissect the etiology of complex diseases. Although case‐control association tests are generally more powerful than family‐based association tests, population stratification can lead to spurious disease‐marker association or mask a true association. Several methods have been proposed to match cases and controls prior to genotyping, using family information or epidemiological data, or using genotype data for a modest number of genetic markers. Here, we describe a genetic similarity score matching (GSM) method for efficient matched analysis of cases and controls in a genome‐wide or large‐scale candidate gene association study. GSM comprises three steps: (1) calculating similarity scores for pairs of individuals using the genotype data; (2) matching sets of cases and controls based on the similarity scores so that matched cases and controls have similar genetic background; and (3) using conditional logistic regression to perform association tests. Through computer simulation we show that GSM correctly controls false‐positive rates and improves power to detect true disease predisposing variants. We compare GSM to genomic control using computer simulations, and find improved power using GSM. We suggest that initial matching of cases and controls prior to genotyping combined with careful re‐matching after genotyping is a method of choice for genome‐wide association studies. Genet. Epidemiol. 33:508–517, 2009. © 2009 Wiley‐Liss, Inc.  相似文献   

13.
Genetic association studies are a powerful tool to detect genetic variants that predispose to human disease. Once an associated variant is identified, investigators are also interested in estimating the effect of the identified variant on disease risk. Estimates of the genetic effect based on new association findings tend to be upwardly biased due to a phenomenon known as the “winner's curse.” Overestimation of genetic effect size in initial studies may cause follow‐up studies to be underpowered and so to fail. In this paper, we quantify the impact of the winner's curse on the allele frequency difference and odds ratio estimators for one‐ and two‐stage case‐control association studies. We then propose an ascertainment‐corrected maximum likelihood method to reduce the bias of these estimators. We show that overestimation of the genetic effect by the uncorrected estimator decreases as the power of the association study increases and that the ascertainment‐corrected method reduces absolute bias and mean square error unless power to detect association is high. Genet. Epidemiol. 33:453–462, 2009. © 2009 Wiley‐Liss, Inc.  相似文献   

14.
Two‐stage instrumental variable methods are commonly used to estimate the causal effects of treatments on survival in the presence of measured and unmeasured confounding. Two‐stage residual inclusion (2SRI) has been the method of choice over two‐stage predictor substitution (2SPS) in clinical studies. We directly compare the bias in the causal hazard ratio estimated by these two methods. Under a principal stratification framework, we derive a closed‐form solution for asymptotic bias of the causal hazard ratio among compliers for both the 2SPS and 2SRI methods when survival time follows the Weibull distribution with random censoring. When there is no unmeasured confounding and no always takers, our analytic results show that 2SRI is generally asymptotically unbiased, but 2SPS is not. However, when there is substantial unmeasured confounding, 2SPS performs better than 2SRI with respect to bias under certain scenarios. We use extensive simulation studies to confirm the analytic results from our closed‐form solutions. We apply these two methods to prostate cancer treatment data from Surveillance, Epidemiology and End Results‐Medicare and compare these 2SRI and 2SPS estimates with results from two published randomized trials. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

15.
Cluster randomized trials (CRTs) are often prone to selection bias despite randomization. Using a simulation study, we investigated the use of propensity score (PS) based methods in estimating treatment effects in CRTs with selection bias when the outcome is quantitative. Of four PS‐based methods (adjustment on PS, inverse weighting, stratification, and optimal full matching method), three successfully corrected the bias, as did an approach using classical multivariable regression. However, they showed poorer statistical efficiency than classical methods, with higher standard error for the treatment effect, and type I error much smaller than the 5% nominal level. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

16.
Motivated by a previously published study of HIV treatment, we simulated data subject to time‐varying confounding affected by prior treatment to examine some finite‐sample properties of marginal structural Cox proportional hazards models. We compared (a) unadjusted, (b) regression‐adjusted, (c) unstabilized, and (d) stabilized marginal structural (inverse probability‐of‐treatment [IPT] weighted) model estimators of effect in terms of bias, standard error, root mean squared error (MSE), and 95% confidence limit coverage over a range of research scenarios, including relatively small sample sizes and 10 study assessments. In the base‐case scenario resembling the motivating example, where the true hazard ratio was 0.5, both IPT‐weighted analyses were unbiased, whereas crude and adjusted analyses showed substantial bias towards and across the null. Stabilized IPT‐weighted analyses remained unbiased across a range of scenarios, including relatively small sample size; however, the standard error was generally smaller in crude and adjusted models. In many cases, unstabilized weighted analysis showed a substantial increase in standard error compared with other approaches. Root MSE was smallest in the IPT‐weighted analyses for the base‐case scenario. In situations where time‐varying confounding affected by prior treatment was absent, IPT‐weighted analyses were less precise and therefore had greater root MSE compared with adjusted analyses. The 95% confidence limit coverage was close to nominal for all stabilized IPT‐weighted but poor in crude, adjusted, and unstabilized IPT‐weighted analysis. Under realistic scenarios, marginal structural Cox proportional hazards models performed according to expectations based on large‐sample theory and provided accurate estimates of the hazard ratio. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

17.
目的 通过构建存在不同混杂结构的广义倾向性评分(generalized propensity score,GPS)模型和结局模型,探索比较三种GPS估计法:广义倾向性评分-最小二乘法(generalized propensity score-ordinary least squares,GPS-OLS),广义倾向性评分...  相似文献   

18.
Recent advancements in next‐generation DNA sequencing technologies have made it plausible to study the association of rare variants with complex diseases. Due to the low frequency, rare variants need to be aggregated in association tests to achieve adequate power with reasonable sample sizes. Hierarchical modeling/kernel machine methods have gained popularity among many available methods for testing a set of rare variants collectively. Here, we propose a new score statistic based on a hierarchical model by additionally modeling the distribution of rare variants under the case‐control study design. Results from extensive simulation studies show that the proposed method strikes a balance between robustness and power and outperforms several popular rare‐variant association tests. We demonstrate the performance of our method using the Dallas Heart Study.  相似文献   

19.
We present a new method, the delta-centralization (DC) method, to correct for population stratification (PS) in case-control association studies. DC works well even when there is a lot of confounding due to PS. The latter causes overdispersion in the usual chi-square statistics which then have non-central chi-square distributions. Other methods approach the noncentrality indirectly, but we deal with it directly, by estimating the non-centrality parameter tau itself. Specifically: (1) We define a quantity delta, a function of the relevant subpopulation parameters. We show that, for relatively large samples, delta exactly predicts the elevation of the false positive rate due to PS, when there is no true association between marker genotype and disease. (This quantity delta is quite different from Wright's F(ST) and can be large even when F(ST) is small.) (2) We show how to estimate delta, using a panel of unlinked "neutral" loci. (3) We then show that delta2 corresponds to tau the noncentrality parameter of the chi-square distribution. Thus, we can centralize the chi-square using our estimate of 6; this is the DC method. (4) We demonstrate, via computer simulations, that DC works well with as few as 25-30 unlinked markers, where the markers are chosen to have allele frequencies reasonably close (within +/- .1) to those at the test locus. (5) We compare DC with genomic control and show that where as the latter becomes overconservative when there is considerable confounding due to PS (i.e. when delta is large), DC performs well for all values of delta.  相似文献   

20.
Population stratification (PS) can lead to an inflated rate of false‐positive findings in genome‐wide association studies (GWAS). The commonly used approach of adjustment for a fixed number of principal components (PCs) could have a deleterious impact on power when selected PCs are equally distributed in cases and controls, or the adjustment of certain covariates, such as self‐identified ethnicity or recruitment center, already included in the association analyses, correctly maps to major axes of genetic heterogeneity. We propose a computationally efficient procedure, PC‐Finder, to identify a minimal set of PCs while permitting an effective correction for PS. A general pseudo F statistic, derived from a non‐parametric multivariate regression model, can be used to assess whether PS exists or has been adequately corrected by a set of selected PCs. Empirical data from two GWAS conducted as part of the Cancer Genetic Markers of Susceptibility (CGEMS) project demonstrate the application of the procedure. Furthermore, simulation studies show the power advantage of the proposed procedure in GWAS over currently used PS correction strategies, particularly when the PCs with substantial genetic variation are distributed similarly in cases and controls and therefore do not induce PS. Genet. Epidemiol. 33:432–441, 2009. © 2009 Wiley‐Liss, Inc.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号