共查询到20条相似文献,搜索用时 0 毫秒
1.
Proper control of confounding due to population stratification is crucial for valid analysis of case-control association studies. Fine matching of cases and controls based on genetic ancestry is an increasingly popular strategy to correct for such confounding, both in genome-wide association studies (GWASs) as well as studies that employ next-generation sequencing, where matching can be used when selecting a subset of participants from a GWAS for rare-variant analysis. Existing matching methods match on measures of genetic ancestry that combine multiple components of ancestry into a scalar quantity. However, we show that including nonconfounding ancestry components in a matching criterion can lead to inaccurate matches, and hence to an improper control of confounding. To resolve this issue, we propose a novel method that assigns cases and controls to matched strata based on the stratification score (Epstein et al. [2007] Am J Hum Genet 80:921-930), which is the probability of disease given genomic variables. Matching on the stratification score leads to more accurate matches because case participants are matched to control participants who have a similar risk of disease given ancestry information. We illustrate our matching method using the African-American arm of the GAIN GWAS of schizophrenia. In this study, we observe that confounding due to stratification can be resolved by our matching approach but not by other existing matching procedures. We also use simulated data to show our novel matching approach can provide a more appropriate correction for population stratification than existing matching approaches. 相似文献
2.
Based on the symmetry of transmitted/nontransmitted alleles from heterozygous parents under the null hypothesis of no association, the work proposed here establishes a general statistical framework for constructing association tests with data from nuclear families with multiple affected children. A class of association tests is proposed for both diallelic and multiallelic markers. The proposed test statistics reduce to the transmission disequilibrium test for trios, to T(su) by Martin et al. ([1997] Am. J. Hum. Genet. 61:439-448) for affected sib pairs, and to the pedigree disequilibrium test by Martin et al. ([2000] Am. J. Hum. Genet. 67:146-154); [2001] Am. J. Hum. Genet. 68:1065-1067) when using affected sibships only. The association test used in simulation and for real data (sitosterolemia) is the one which has the best overall power in detecting association. This association test is generally more powerful than the association tests proposed by Martin et al. ([2000] Am. J. Hum. Genet. 67:146-154); [2001] Am. J. Hum. Genet. 68:1065-1067) when using only affected sibships. For the sitosterolemia data set, the association test has its most significant result (P-value=0.0012) for the marker locus on the same bacterial artificial chromosome as the disease locus. 相似文献
3.
The potential for bias from population stratification (PS) has raised concerns about case-control studies involving admixed ethnicities. We evaluated the potential bias due to PS in relating a binary outcome with a candidate gene under simulated settings where study populations consist of multiple ethnicities. Disease risks were assigned within the range of prostate cancer rates of African Americans reported in SEER registries assuming k=2, 5, or 10 admixed ethnicities. Genotype frequencies were considered in the range of 5-95%. Under a model assuming no genotype effect on disease (odds ratio (OR)=1), the range of observed OR estimates ignoring ethnicity was 0.64-1.55 for k=2, 0.72-1.33 for k=5, and 0.81-1.22 for k=10. When genotype effect on disease was modeled to be OR=2, the ranges of observed OR estimates were 1.28-3.09, 1.43-2.65, and 1.62-2.42 for k=2, 5, and 10 ethnicities, respectively. Our results indicate that the magnitude of bias is small unless extreme differences exist in genotype frequency. Bias due to PS decreases as the number of admixed ethnicities increases. The biases are bounded by the minimum and maximum of all pairwise baseline disease odds ratios across ethnicities. Therefore, bias due to PS alone may be small when baseline risk differences are small within major categories of admixed ethnicity, such as African Americans. 相似文献
4.
We use likelihood-based score statistics to test for association between a disease and a diallelic polymorphism, based on data from arbitrary types of nuclear families. The Nonfounder statistic extends the transmission disequilibrium test (TDT) to accommodate affected and unaffected offspring, missing parental genotypes, phenotypes more general than qualitative traits, such as censored survival data and quantitative traits, and residual correlation of phenotypes within families. The Founder statistic compares observed or inferred parental genotypes to those expected in the general population. Here the genotypes of affected parents and those with many affected offspring are weighted more heavily than unaffected parents and those with few affected offspring. We illustrate the tests by applying them to data on a polymorphism of the SRD5A2 gene in nuclear families with multiple cases of prostate cancer. We also use simulations to compare the power of these family-based statistics to that of the score statistic based on Cox's partial likelihood for censored survival data, and find that the family-based statistics have considerably more power when there are many untyped parents. The software program FGAP for computing test statistics is available at http://www.stanford.edu/dept/HRP/epidemiology/FGAP. 相似文献
5.
Improved correction for population stratification in genome-wide association studies by identifying hidden population structures 总被引:1,自引:0,他引:1
Hidden population substructure can cause population stratification and lead to false-positive findings in population-based genome-wide association (GWA) studies. Given a large panel of markers scanned in a GWA study, it becomes increasingly feasible to uncover the hidden population substructure within the study sample based on measured genotypes across the genome. Recognizing that population substructure can be displayed as clustered and/or continuous patterns of genetic variation, we propose a method that aims at the detection and correction of the confounding effect resulting from both patterns of population substructure. The proposed method is an extension of the EIGENSTRAT method (Price et al. [2006] Nat Genet 38:904-909). This approach is computationally feasible and easily applied to large-scale GWA studies. We show through simulation studies that, compared with the EIGENSTRAT method, the new method requires a smaller number of markers and yields a more appropriate correction for population stratification. 相似文献
6.
Kai Wang 《Genetic epidemiology》2009,33(7):637-645
Genome‐wide case‐control association study is gaining popularity, thanks to the rapid development of modern genotyping technology. In such studies, population stratification is a potential concern especially when the number of study subjects is large as it can lead to seriously inflated false‐positive rates. Current methods addressing this issue are still not completely immune to excess false positives. A simple method that corrects for population stratification is proposed. This method modifies a test statistic such as the Armitage trend test by using an additive constant that measures the variation of the effect size confounded by population stratification across genomic control (GC) markers. As a result, the original statistic is deflated by a multiplying factor that is specific to the marker being tested for association. This deflating multiplying factor is guaranteed to be larger than 1. These properties are in contrast to the conventional GC method where the original statistic is deflated by a common factor regardless of the marker being tested and the deflation factor may turn out to be less than 1. The new method is introduced first for regular case‐control design and then for other situations such as quantitative traits and the presence of covariates. Extensive simulation study indicates that this new method provides an appealing alternative for genetic association analysis in the presence of population stratification. Genet. Epidemiol. 33:637–645, 2009. © 2009 Wiley‐Liss, Inc. 相似文献
7.
Genotype-based association test for general pedigrees: the genotype-PDT 总被引:11,自引:0,他引:11
Many family-based tests of linkage disequilibrium (LD) are based on counts of alleles rather than genotypes. However, allele-based tests may not detect interactions among alleles at a single locus that are apparent when examining associations with genotypes. Family-based tests of LD based on genotypes have been developed, but they are typically valid as tests of association only in families with a single affected individual. To take advantage of families with multiple affected individuals, we propose the genotype-pedigree disequilibrium test (geno-PDT) to test for LD between marker locus genotypes and disease. Unlike previous tests for genotypic association, the geno-PDT is valid in general pedigrees. Simulations to compare the power of the allele-based PDT and geno-PDT reveal that under an additive model, the allele-based PDT is more powerful, but that the geno-PDT can have greater power when the genetic model is recessive or dominant. Perhaps the most important property of the geno-PDT is the ability to test for association with particular genotypes, which can reveal underlying patterns of association at the genotypic level. These genotype-specific tests can be used to suggest possible underlying genetic models that are consistent with the pattern of genotypic association. This is illustrated through an application to a candidate gene analysis of the MLLT3 gene in families with Alzheimer disease. The geno-PDT approach for testing genotypes in general family data provides a useful tool for identifying genes in complex disease, and partitioning individual genotype contributions will help to dissect the influence of genotype on risk. 相似文献
8.
Jaehoon An Sungho Won Sharon M. Lutz Julian Hecker Christoph Lange 《Genetic epidemiology》2019,43(8):1046-1055
Proportions of false-positive rates in genome-wide association analysis are affected by population stratification, and if it is not correctly adjusted, the statistical analysis can produce the large false-negative finding. Therefore various approaches have been proposed to adjust such problems in genome-wide association studies. However, in spite of its importance, a few studies have been conducted in genome-wide single nucleotide polymorphism (SNP)-by-environment interaction studies. In this report, we illustrate in which scenarios can lead to the false-positive rates in association mapping and approach to maintaining the overall type-1 error rate. 相似文献
9.
Weir BL 《Genetic epidemiology》2001,21(Z1):S415-S420
A range of study designs, using unrelated or family controls, were used to investigate the pattern of association with disease of single nucleotide polymorphisms (SNPs) within candidate gene 1 (simulated data). Strong evidence of disease association at the functional locus was detected using all study designs, and in the "general" but not the "isolated" population the functional polymorphism displayed considerably higher association than surrounding SNPs. There was much variation in the strength of association of SNPs with disease, up to 70% of which was explained by SNP allele frequency and distance from the functional polymorphism. Some common polymorphisms very close to the functional locus however showed no association with disease. Analysis of short haplotypes of SNPs reduced but did not totally remove this feature. 相似文献
10.
Weihua Guan Liming Liang Michael Boehnke Gonalo R. Abecasis 《Genetic epidemiology》2009,33(6):508-517
Genome‐wide association studies are helping to dissect the etiology of complex diseases. Although case‐control association tests are generally more powerful than family‐based association tests, population stratification can lead to spurious disease‐marker association or mask a true association. Several methods have been proposed to match cases and controls prior to genotyping, using family information or epidemiological data, or using genotype data for a modest number of genetic markers. Here, we describe a genetic similarity score matching (GSM) method for efficient matched analysis of cases and controls in a genome‐wide or large‐scale candidate gene association study. GSM comprises three steps: (1) calculating similarity scores for pairs of individuals using the genotype data; (2) matching sets of cases and controls based on the similarity scores so that matched cases and controls have similar genetic background; and (3) using conditional logistic regression to perform association tests. Through computer simulation we show that GSM correctly controls false‐positive rates and improves power to detect true disease predisposing variants. We compare GSM to genomic control using computer simulations, and find improved power using GSM. We suggest that initial matching of cases and controls prior to genotyping combined with careful re‐matching after genotyping is a method of choice for genome‐wide association studies. Genet. Epidemiol. 33:508–517, 2009. © 2009 Wiley‐Liss, Inc. 相似文献
11.
Cheng KF 《Statistics in medicine》2009,28(2):311-325
Family-based studies provide powerful inferences regarding associations between genetic variants and risks, but have limitations. Since very often, the availability of the parental genotypes can pose a problem for using family-based design, especially when the disease of interest has a late age of onset. To improve the efficiency of the studies, a popular approach is to reconstruct the missing genotypes from the genotypes of their offspring and correct the biases resulting from the reconstruction. In this paper, the author shows that two or more unrelated family studies, for the same candidate marker but different diseases, can also be combined to construct a more efficient test for association analysis. The usual case-control study with parental genotypes is a special case of the data discussed here. The author used a simulation study to compare the performance of the new method with other well-known methods. The results showed that the new test has an advantage of having larger power when there is no effect of population stratification between two study samples. However, if there is effect of population stratification between the two samples, the new test still maintains the expected type I error rate and has comparable power performance. Since the unrelated family studies not for the disease of interest are often readily accessible with minimal cost, the proposed method has practical value. The new approach can also be easily modified to allow for missing parental data. 相似文献
12.
Case-control study has been and continues to be one of the most popular designs in epidemiology. More recently, this design has been adopted to test for candidate genes when searching for disease genetic etiology. In this report, we present a multipoint linkage disequilibrium (LD) mapping approach with the focus on estimating the location of the target trait locus. It builds upon a representation, which shows that the difference between a case and a control in probabilities of carrying the target allele of a marker is proportional to that of the trait locus and that the proportionality factor is simply a measure of LD between the trait locus and the marker. Our method has the desired properties that (1) there is no need to specify phases of genotypic data with multiple markers, (2) it provides an estimate of location of the disease locus along with sampling uncertainty to help investigators to narrow chromosomal regions, and (3) a single test statistic is provided to test for LD in the framed region rather than testing the hypothesis one marker at a time. Our simulation work suggests that the proposed method performs well in terms of bias and coverage probability. Extension of the proposed method to account for confounding and genetic heterogeneity is discussed. We apply the proposed method to a published case-control data set for cystic fibrosis. 相似文献
13.
Rabinowitz D 《Genetic epidemiology》2003,24(4):284-290
The focus of this work is the TDT-type and family-based test statistics used for adjusting for potential confounding due to population heterogeneity or misspecified allele frequencies. A variety of heuristics have been used to motivate and derive these statistics, and the statistics have been developed for a variety of analytic goals. There appears to be no general theoretical framework, however, that may be used to evaluate competing approaches. Furthermore, there is no framework to guide the development of efficient TDT-type and family-based methods for analytic goals for which methods have not yet been proposed. The purpose of this paper is to present a theoretical framework that serves both to identify the information which is available to methods that are immune to confounding due to population heterogeneity or misspecified allele frequencies, and to inform the construction of efficient unbiased tests in novel settings. The development relies on the existence of a characterization of the null hypothesis in terms of a completely specified conditional distribution of transmitted genotypes. An important observation is that, with such a characterization, when the conditioning event is unobserved or incomplete, there is statistical information that cannot be exploited by any exact conditional test. The main technical result of this work is an approach to computing test statistics for local alternatives that exploit all of the available statistical information. 相似文献
14.
Laura E. Mitchell 《Genetic epidemiology》1995,12(6):647-651
A sequential scheme for identifying genetic markers, in linkage disequilibrium with disease susceptibility loci, was utilized to evaluate potential associations between a rare oligogenic disease and genetic variation at 360 anonymous DNA markers. ©1995 Wiley-Liss, Inc. 相似文献
15.
It is usually assumed that detection of a disease susceptability gene via marker polymorphisms in linkage disequilibrium with it is facilitated by consideration of marker haplotypes. However, capture of the marker haplotype information requires resolution of gametic phase, and this must usually be inferred statistically. Recently, we questioned the value of the marker haplotype information, and suggested that certain analyses of multivariate marker data, not based on haplotypes explicitly and not requiring resolution of gametic phase, are often more powerful than analyses based on haplotypes. Here, we review this work and assess more carefully the situations in which our conclusions might apply. We also relate these analyses to alternative approaches to haplotype analysis, namely those based on haplotype similarity and those inspired by cladistics. 相似文献
16.
We propose a new test of linkage in the presence of allelic association that uses all available information in a sample of nuclear families, including parental phenotypes, genotypes from both affected and unaffected siblings, and families with homozygous parents. The test is based on the conditional framework developed by Rabinowitz and Laird [2000: Hum Hered 50:211-223] and is thus immune to population stratification and can be applied to families with any pattern of missing information. The test statistic is a conditional likelihood ratio based on a standard two-point linkage model with allelic association, where parameters are estimated from the sample. Through a simulation study, we determined that the proposed test has near optimal power for a wide range of scenarios, outperforming FBAT both when data were complete and when parental genotypes were missing, although differences between the two tests diminish as the genetic effect is reduced. To assess robustness, we also evaluated the performance of the tests under scenarios with population stratification and found that although there is a loss of efficiency, our proposed test remains a strong competitor to FBAT. 相似文献
17.
The transmission disequilibrium test (TDT) based on case-parents trios is a powerful tool in linkage analysis and association studies. When only one parent is available, the 1-TDT is applicable in the absence of imprinting. In the presence of imprinting, a statistic is proposed, based on case-mother pairs and case-father pairs to test for linkage when association is present as well as to test for association when linkage is present. The recombination fractions are allowed to be sex-specific in this test statistic. Meanwhile, a statistic based on case-parent pairs is proposed to test for imprinting. Both test statistics can be extended to include families with more than one affected offspring. A number of simulation studies are conducted to investigate the validity of the proposed tests. The effects of different ratios of the numbers of case-mother pairs and case-father pairs on the powers of the proposed tests are studied through simulation. The results show that the optimal ratio is 1:1. How to combine case-parents, case-mother pairs, and case-father pairs jointly in testing for linkage, association, and imprinting is addressed. 相似文献
18.
Confounding due to population stratification (PS) arises when differences in both allele and disease frequencies exist in a population of mixed racial/ethnic subpopulations. Genomic control, structured association, principal components analysis (PCA), and multidimensional scaling (MDS) approaches have been proposed to address this bias using genetic markers. However, confounding due to PS can also be due to non‐genetic factors. Propensity scores are widely used to address confounding in observational studies but have not been adapted to deal with PS in genetic association studies. We propose a genomic propensity score (GPS) approach to correct for bias due to PS that considers both genetic and non‐genetic factors. We compare the GPS method with PCA and MDS using simulation studies. Our results show that GPS can adequately adjust and consistently correct for bias due to PS. Under no/mild, moderate, and severe PS, GPS yielded estimated with bias close to 0 (mean=?0.0044, standard error=0.0087). Under moderate or severe PS, the GPS method consistently outperforms the PCA method in terms of bias, coverage probability (CP), and type I error. Under moderate PS, the GPS method consistently outperforms the MDS method in terms of CP. PCA maintains relatively high power compared to both MDS and GPS methods under the simulated situations. GPS and MDS are comparable in terms of statistical properties such as bias, type I error, and power. The GPS method provides a novel and robust tool for obtaining less‐biased estimates of genetic associations that can consider both genetic and non‐genetic factors. Genet. Epidemiol. 33:679–690, 2009. © 2009 Wiley‐Liss, Inc. 相似文献
19.
Zhang Y 《Genetic epidemiology》2012,36(1):36-47
Most disease association mapping algorithms are based on hypothesis testing procedures that test one variant at a time. Those methods lose power when the disease mutations are jointly tagged by multiple variants, or when gene-gene interaction exist. Nearby variants are also correlated, for which procedures ignoring the dependence between variants will inevitably produce redundant results. With a large number of variants genotyped in current genome-wide disease association studies, simultaneous multivariant association mapping algorithms are strongly desired. We present a novel Bayesian method for automatic detection of multivariant joint association in genome-wide case-control studies. Our method has improved power and specificity over existing tools. We fit a joint probabilistic model to the entire data and identify disease variants simultaneously. The method dynamically accounts for the strong linkage disequilibrium (LD) between variants. As a result, only the primary disease variants will be identified, with all secondary associations due to LD effects filtered out. Our method better pinpoints the disease variants with improved resolution. The method is also computationally efficient for genome-wide studies. When applied to a real data set of inflammatory bowel disease (IBD) containing 401,473 variants in 4,720 individuals, our method detected all previously reported IBD loci in the same data, and recovered two missed loci. We further detected two novel interchromosome interactions. The first is between STAT3 and PARD6G, and the second is between DLG5 and an intergenic region at 5p14. We further validated the two interactions in an independent study. 相似文献
20.
Lately, many different methods of linkage, association or joint analysis for family data have been invented and refined. Common to most of those is that they require a map of markers that are in linkage equilibrium. However, at the present day, high-density single nucleotide polymorphisms (SNPs) maps are both more inexpensive to create and they have lower genotyping error. When marker data is incomplete, the crucial and computationally most demanding moment in the analysis is to calculate the inheritance distribution at a certain position on the chromosome. Recently, different ways of adjusting traditional methods of linkage analysis to denser maps of SNPs in linkage disequilibrium (LD) have been proposed. We describe a hidden Markov model which generalizes the Lander-Green algorithm. It combines Markov chain for inheritance vectors with a Markov chain modelling founder haplotypes and in this way takes account for LD between SNPs. It can be applied to association, linkage or combined association and linkage analysis, general phenotypes and arbitrary score functions. We also define a joint likelihood for linkage and association that extends an idea of Kong and Cox (1997 Am. J. Hum. Genet. 61: 1179-1188) for pure linkage analysis. 相似文献