首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Confounding due to population stratification (PS) arises when differences in both allele and disease frequencies exist in a population of mixed racial/ethnic subpopulations. Genomic control, structured association, principal components analysis (PCA), and multidimensional scaling (MDS) approaches have been proposed to address this bias using genetic markers. However, confounding due to PS can also be due to non‐genetic factors. Propensity scores are widely used to address confounding in observational studies but have not been adapted to deal with PS in genetic association studies. We propose a genomic propensity score (GPS) approach to correct for bias due to PS that considers both genetic and non‐genetic factors. We compare the GPS method with PCA and MDS using simulation studies. Our results show that GPS can adequately adjust and consistently correct for bias due to PS. Under no/mild, moderate, and severe PS, GPS yielded estimated with bias close to 0 (mean=?0.0044, standard error=0.0087). Under moderate or severe PS, the GPS method consistently outperforms the PCA method in terms of bias, coverage probability (CP), and type I error. Under moderate PS, the GPS method consistently outperforms the MDS method in terms of CP. PCA maintains relatively high power compared to both MDS and GPS methods under the simulated situations. GPS and MDS are comparable in terms of statistical properties such as bias, type I error, and power. The GPS method provides a novel and robust tool for obtaining less‐biased estimates of genetic associations that can consider both genetic and non‐genetic factors. Genet. Epidemiol. 33:679–690, 2009. © 2009 Wiley‐Liss, Inc.  相似文献   

2.
Genome‐wide association studies are helping to dissect the etiology of complex diseases. Although case‐control association tests are generally more powerful than family‐based association tests, population stratification can lead to spurious disease‐marker association or mask a true association. Several methods have been proposed to match cases and controls prior to genotyping, using family information or epidemiological data, or using genotype data for a modest number of genetic markers. Here, we describe a genetic similarity score matching (GSM) method for efficient matched analysis of cases and controls in a genome‐wide or large‐scale candidate gene association study. GSM comprises three steps: (1) calculating similarity scores for pairs of individuals using the genotype data; (2) matching sets of cases and controls based on the similarity scores so that matched cases and controls have similar genetic background; and (3) using conditional logistic regression to perform association tests. Through computer simulation we show that GSM correctly controls false‐positive rates and improves power to detect true disease predisposing variants. We compare GSM to genomic control using computer simulations, and find improved power using GSM. We suggest that initial matching of cases and controls prior to genotyping combined with careful re‐matching after genotyping is a method of choice for genome‐wide association studies. Genet. Epidemiol. 33:508–517, 2009. © 2009 Wiley‐Liss, Inc.  相似文献   

3.
4.
We propose a method to analyze family‐based samples together with unrelated cases and controls. The method builds on the idea of matched case–control analysis using conditional logistic regression (CLR). For each trio within the family, a case (the proband) and matched pseudo‐controls are constructed, based upon the transmitted and untransmitted alleles. Unrelated controls, matched by genetic ancestry, supplement the sample of pseudo‐controls; likewise unrelated cases are also paired with genetically matched controls. Within each matched stratum, the case genotype is contrasted with control/pseudo‐control genotypes via CLR, using a method we call matched‐CLR (mCLR). Eigenanalysis of numerous SNP genotypes provides a tool for mapping genetic ancestry. The result of such an analysis can be thought of as a multidimensional map, or eigenmap, in which the relative genetic similarities and differences amongst individuals is encoded in the map. Once constructed, new individuals can be projected onto the ancestry map based on their genotypes. Successful differentiation of individuals of distinct ancestry depends on having a diverse, yet representative sample from which to construct the ancestry map. Once samples are well‐matched, mCLR yields comparable power to competing methods while ensuring excellent control over Type I error. Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

5.
Outcome‐based sampling is an efficient study design for rare conditions, such as glioblastoma. It is often used in conjunction with matching, for increased efficiency and to potentially avoid bias due to confounding. A study was conducted at the Massachusetts General Hospital that involved retrospective sampling of glioblastoma patients with respect to multiple‐ordered disease states, as defined by three categories of overall survival time. To analyze such studies, we posit an adjacent categories logit model and exploit its allowance for prospective analysis of a retrospectively sampled study and its advantageous removal of set and level specific nuisance parameters through conditioning on sufficient statistics. This framework allows for any sampling design and is not limited to one level of disease within each set, such as in previous publications. We describe how this ordinal conditional model can be fit using standard conditional logistic regression procedures. We consider an alternative pseudo‐likelihood approach that potentially offers robustness under partial model misspecification at the expense of slight loss of efficiency under correct model specification for small sample sizes. We apply our methods to the Massachusetts General Hospital glioblastoma study. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

6.
Confounding due to population substructure is always a concern in genetic association studies. Although methods have been proposed to adjust for population stratification in the context of common variation, it is unclear how well these approaches will work when interrogating rare variation. Family‐based association tests can be constructed that are robust to population stratification. For example, when considering a quantitative trait, a linear model can be used that decomposes genetic effects into between‐ and within‐family components and a test of the within‐family component is robust to population stratification. However, this within‐family test ignores between‐family information potentially leading to a loss of power. Here, we propose a family‐based two‐stage rare‐variant test for quantitative traits. We first construct a weight for each variant within a gene, or other genetic unit, based on score tests of between‐family effect parameters. These weights are then used to combine variants using score tests of within‐family effect parameters. Because the between‐family and within‐family tests are orthogonal under the null hypothesis, this two‐stage approach can increase power while still maintaining validity. Using simulation, we show that this two‐stage test can significantly improve power while correctly maintaining type I error. We further show that the two‐stage approach maintains the robustness to population stratification of the within‐family test and we illustrate this using simulations reflecting samples composed of continental and closely related subpopulations.  相似文献   

7.
locStra is an ‐package for the analysis of regional and global population stratification in whole‐genome sequencing (WGS) studies, where regional stratification refers to the substructure defined by the loci in a particular region on the genome. Population substructure can be assessed based on the genetic covariance matrix, the genomic relationship matrix, and the unweighted/weighted genetic Jaccard similarity matrix. Using a sliding window approach, the regional similarity matrices are compared with the global ones, based on user‐defined window sizes and metrics, for example, the correlation between regional and global eigenvectors. An algorithm for the specification of the window size is provided. As the implementation fully exploits sparse matrix algebra and is written in C++, the analysis is highly efficient. Even on single cores, for realistic study sizes (several thousand subjects, several million rare variants per subject), the runtime for the genome‐wide computation of all regional similarity matrices does typically not exceed one hour, enabling an unprecedented investigation of regional stratification across the entire genome. The package is applied to three WGS studies, illustrating the varying patterns of regional substructure across the genome and its beneficial effects on association testing.  相似文献   

8.
9.
Genome‐wide case‐control association study is gaining popularity, thanks to the rapid development of modern genotyping technology. In such studies, population stratification is a potential concern especially when the number of study subjects is large as it can lead to seriously inflated false‐positive rates. Current methods addressing this issue are still not completely immune to excess false positives. A simple method that corrects for population stratification is proposed. This method modifies a test statistic such as the Armitage trend test by using an additive constant that measures the variation of the effect size confounded by population stratification across genomic control (GC) markers. As a result, the original statistic is deflated by a multiplying factor that is specific to the marker being tested for association. This deflating multiplying factor is guaranteed to be larger than 1. These properties are in contrast to the conventional GC method where the original statistic is deflated by a common factor regardless of the marker being tested and the deflation factor may turn out to be less than 1. The new method is introduced first for regular case‐control design and then for other situations such as quantitative traits and the presence of covariates. Extensive simulation study indicates that this new method provides an appealing alternative for genetic association analysis in the presence of population stratification. Genet. Epidemiol. 33:637–645, 2009. © 2009 Wiley‐Liss, Inc.  相似文献   

10.
A survey is conducted at w of K selection units or lists, e.g. health care institutions or weeks in a year, to estimate N, the total number of individuals with particular characteristics. Our estimator utilizes two items determined for each survey participant: the number, u, among the w lists in S and the number, j, among all K lists on which each survey participant appears. In its traditional form, selection units are chosen using probability sampling and the statistical properties of the estimator derive from the sampling mechanism. Here, selection units are purposively chosen to maximize the chance that they are ‘typical’ and a model‐based analysis is used for inference. If the sample is typical, the ML estimators of N and E(J) are unbiased. If a condition on the second moment of U/J is satisfied, the model‐based variance of the estimator of N based on a purposively chosen typical sample is smaller than one based on a randomly chosen sample. Methods to test whether the typical assumption is valid using data from the survey are not yet available. The importance of proper selection of the sample to maximize the chance that it is typical and model breakdown does not occur must be emphasized. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

11.
Many meta‐analyses combine results from only a small number of studies, a situation in which the between‐study variance is imprecisely estimated when standard methods are applied. Bayesian meta‐analysis allows incorporation of external evidence on heterogeneity, providing the potential for more robust inference on the effect size of interest. We present a method for performing Bayesian meta‐analysis using data augmentation, in which we represent an informative conjugate prior for between‐study variance by pseudo data and use meta‐regression for estimation. To assist in this, we derive predictive inverse‐gamma distributions for the between‐study variance expected in future meta‐analyses. These may serve as priors for heterogeneity in new meta‐analyses. In a simulation study, we compare approximate Bayesian methods using meta‐regression and pseudo data against fully Bayesian approaches based on importance sampling techniques and Markov chain Monte Carlo (MCMC). We compare the frequentist properties of these Bayesian methods with those of the commonly used frequentist DerSimonian and Laird procedure. The method is implemented in standard statistical software and provides a less complex alternative to standard MCMC approaches. An importance sampling approach produces almost identical results to standard MCMC approaches, and results obtained through meta‐regression and pseudo data are very similar. On average, data augmentation provides closer results to MCMC, if implemented using restricted maximum likelihood estimation rather than DerSimonian and Laird or maximum likelihood estimation. The methods are applied to real datasets, and an extension to network meta‐analysis is described. The proposed method facilitates Bayesian meta‐analysis in a way that is accessible to applied researchers. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.  相似文献   

12.
13.
In this paper, we propose a class of multivariate random effects models allowing for the inclusion of study‐level covariates to carry out meta‐analyses. As existing algorithms for computing maximum likelihood estimates often converge poorly or may not converge at all when the random effects are multi‐dimensional, we develop an efficient expectation–maximization algorithm for fitting multi‐dimensional random effects regression models. In addition, we also develop a new methodology for carrying out variable selection with study‐level covariates. We examine the performance of the proposed methodology via a simulation study. We apply the proposed methodology to analyze metadata from 26 studies involving statins as a monotherapy and in combination with ezetimibe. In particular, we compare the low‐density lipoprotein cholesterol‐lowering efficacy of monotherapy and combination therapy on two patient populations (naïve and non‐naïve patients to statin monotherapy at baseline), controlling for aggregate covariates. The proposed methodology is quite general and can be applied in any meta‐analysis setting for a wide range of scientific applications and therefore offers new analytic methods of clinical importance. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

14.
The study of gene‐environment interactions is an increasingly important aspect of genetic epidemiological investigation. Historically, it has been difficult to study gene‐environment interactions using a family‐based design for quantitative traits or when parent‐offspring trios were incomplete. The QBAT‐I provides researchers a tool to estimate and test for a gene‐environment interaction in families of arbitrary structure that are sampled without regard to the phenotype of interest, but is vulnerable to inflated type I error if families are ascertained on the basis of the phenotype. In this study, we verified the potential for type I error of the QBAT‐I when applied to samples ascertained on a trait of interest. The magnitude of the inflation increases as the main genetic effect increases and as the ascertainment becomes more extreme. We propose an ascertainment‐corrected score test that allows the use of the QBAT‐I to test for gene‐environment interactions in ascertained samples. Our results indicate that the score test and an ad hoc method we propose can often restore the nominal type I error rate, and in cases where complete restoration is not possible, dramatically reduce the inflation of the type I error rate in ascertained samples. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

15.
16.
Tests for regression coefficients such as global, local, and partial F‐tests are common in applied research. In the framework of multiple imputation, there are several papers addressing tests for regression coefficients. However, for simultaneous hypothesis testing, the existing methods are computationally intensive because they involve calculation with vectors and (inversion of) matrices. In this paper, we propose a simple method based on the scalar entity, coefficient of determination, to perform (global, local, and partial) F‐tests with multiply imputed data. The proposed method is evaluated using simulated data and applied to suicide prevention data. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

17.
Family‐based association studies are commonly used in genetic research because they can be robust to population stratification (PS). Recent advances in high‐throughput genotyping technologies have produced a massive amount of genomic data in family‐based studies. However, current family‐based association tests are mainly focused on evaluating individual variants one at a time. In this article, we introduce a family‐based generalized genetic random field (FB‐GGRF) method to test the joint association between a set of autosomal SNPs (i.e., single‐nucleotide polymorphisms) and disease phenotypes. The proposed method is a natural extension of a recently developed GGRF method for population‐based case‐control studies. It models offspring genotypes conditional on parental genotypes, and, thus, is robust to PS. Through simulations, we presented that under various disease scenarios the FB‐GGRF has improved power over a commonly used family‐based sequence kernel association test (FB‐SKAT). Further, similar to GGRF, the proposed FB‐GGRF method is asymptotically well‐behaved, and does not require empirical adjustment of the type I error rates. We illustrate the proposed method using a study of congenital heart defects with family trios from the National Birth Defects Prevention Study (NBDPS).  相似文献   

18.
Gene–environment interaction (GxE) is emphasized as one potential source of missing genetic variation on disease traits, and the ultimate goal of GxE research is prediction of individual risk and prevention of complex diseases. However, there are various challenges in statistical analysis of GxE. In this paper, we focus on the three methodological challenges: (i) the high dimensions of genes; (ii) the hierarchical structure between interaction effects and their corresponding main effects; and (iii) the correlation among subjects from family‐based population studies. In this paper, we propose an algorithm that approaches all three challenges simultaneously. This is the first penalized method focusing on an interaction search based on a linear mixed effect model. For verification, we compare the empirical performance of our new method with other existing methods in simulation study. The results demonstrate the superiority of our method under overall simulation setup. In particular, the outperformance obviously becomes greater as the correlation among subjects increases. In addition, the new method provides a robust estimate for the correlation among subjects. We also apply the new method on Genetics of Lipid Lowering Drugs and Diet Network study data. Copyright © 2017 John Wiley & Sons, Ltd.  相似文献   

19.
20.
Performance‐based financing (PBF) has been piloted in many low‐ and middle‐income countries (LMICs) as a strategy to improve access to and quality of health services. As a key component of PBF, quantity verification is carried out to ensure that reported data matches the actual number of services provided. However, cost concerns have led to a call for risk‐based verification. Existing evidence suggests misreporting is associated with factors such as complexity of indicators, high service volume, and accepted error margin. In contrast, evidence on the association of key facility characteristics with misreporting in PBF is scarce. We contributed to filling this gap in knowledge by combining administrative data from a large‐scale pilot PBF program in Burkina Faso with data from a health facility assessment in the context of an impact evaluation of the intervention. Our results showed the coexistence of both overreporting and underreporting and that misreporting varied by service indicator and health district. We also found that the number of clinical staff at the facility, the population size in the facility catchment area, and the distance between the facility and the district administration were associated with the probability of misreporting. We recommend further research of these factors in the move towards risk‐based verification. In addition, given that our analysis identified relevant associations, but could not explain them, we recommend further qualitative inquiry into verification processes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号