首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Usual tests of association using tag single nucleotide polymorphisms (SNPs) assume that the alleles of the causal locus act additively and that these alleles are then predicted indirectly via a set of tag SNPs. In the presence of strong dominance effects this model is not correct and an extra term needs to be included, which uses the tag SNPs to predict the heterozygosity of the causal locus. Assuming this scenario of a strong dominance effect, we present an appropriate test statistic and investigate how much power, if any, we gain by adding this single degree of freedom for dominance.  相似文献   

2.
With reductions in genotyping costs and the fast pace of improvements in genotyping technology, it is not uncommon for the individuals in a single study to undergo genotyping using several different platforms, where each platform may contain different numbers of markers selected via different criteria. For example, a set of cases and controls may be genotyped at markers in a small set of carefully selected candidate genes, and shortly thereafter, the same cases and controls may be used for a genome-wide single nucleotide polymorphism (SNP) association study. After such initial investigations, often, a subset of "interesting" markers is selected for validation or replication. Specifically, by validation, we refer to the investigation of associations between the selected subset of markers and the disease in independent data. However, it is not obvious how to choose the best set of markers for this validation. There may be a prior expectation that some sets of genotyping data are more likely to contain real associations. For example, it may be more likely for markers in plausible candidate genes to show disease associations than markers in a genome-wide scan. Hence, it would be desirable to select proportionally more markers from the candidate gene set. When a fixed number of markers are selected for validation, we propose an approach for identifying an optimal marker-selection configuration by basing the approach on minimizing the stratified false discovery rate. We illustrate this approach using a case-control study of colorectal cancer from Ontario, Canada, and we show that this approach leads to substantial reductions in the estimated false discovery rates in the Ontario dataset for the selected markers, as well as reductions in the expected false discovery rates for the proposed validation dataset.  相似文献   

3.
The much-anticipated fixed-array, genome-wide SNP genotyping technologies make large-scale genome-wide association scans now possible for large numbers of subjects. In this paper we reconsider the problem (Satagopan and Elston [2003] Genet Epidemiol 25:149-157) of optimizing a two-stage genotyping design to deal with important new issues that are relevant when studies are expanded from candidate gene size to a genome-wide scale. We investigate how the basic two-stage genotyping approach, in which all markers are genotyped in an initial group of subjects (stage I) and only the promising markers are genotyped in additional subjects (stage II), can be used to reduce genotyping cost in a genome-wide case-control association study even after allowing for much higher per genotype costs using specially designed assays in stage II, compared to the fixed array of SNPs used in stage I. In addition, we consider the problem of using measured SNPs to make (imperfect) prediction of unmeasured SNPs for association tests of all SNPs (measured or unmeasured) genome wide and the utility of expanding genotyping densities in stage II in the regions where significant associations were detected in stage I. Under a set of reasonable but conservative assumptions, we derive optimal two-stage design configurations (sample sizes and the thresholds of significance in both stages) with these optimal designs depending both on the total number of markers tested and upon the ratios of cost in stage II versus stage I. In addition we show how existing software for power and sample size calculations can be used for the purpose of designing two-stage studies, for a wide range of assumptions about the number of markers genotyped and the costs of genotyping in each stage of the study.  相似文献   

4.
We describe a hierarchical regression modeling approach to selection of a subset of markers from the first stage of a genomewide association scan to carry forward to subsequent stages for testing on an independent set of subjects. Rather than simply selecting a subset of most significant marker-disease associations at some cutoff chosen to maximize the cost efficiency of a multistage design, we propose a prior model for the true noncentrality parameters of these associations composed of a large mass at zero and a continuous distribution of nonzero values. The prior probability of nonzero values and their prior means can be functions of various covariates characterizing each marker, such as their location relative to genes or evolutionary conserved regions, or prior linkage or association data. We propose to take the top ranked posterior expectations of the noncentrality parameters for confirmation in later stages of a genomewide scan. The statistical performance of this approach is compared with the traditional p-value ranking by simulation studies. We show that the ranking by posterior expectations performs better at selecting the true positive association than a simple ranking of p-values if at least some of the prior covariates have predictive value.  相似文献   

5.
Genetic association studies have been less successful than expected in detecting causal genetic variants, with frequent non-replication when such variants are claimed. Numerous possible reasons have been postulated, including inadequate sample size and possible unobserved stratification. Another possibility, and the focus of this paper, is that of epistasis, or gene-gene interaction. Although unlikely that we may glean information about disease mechanism, based purely upon the data, it may be possible to increase our power to detect an effect by allowing for epistasis within our test statistic. This paper derives an appropriate "omnibus" test for detecting causal loci whist allowing for numerous possible interactions and compares the power of such a test with that of the usual main effects test. This approach differs from that commonly used, for example by Marchini et al. [2005], in that it tests simultaneously for main effects and interactions, rather than interactions alone. The alternative hypothesis being tested by the "omnibus" test is whether a particular locus of interest has an effect on disease status, either marginally or epistatically and is therefore directly comparable to the main effects test at that locus. The paper begins by considering the direct case, in which the putative causal variants are observed and then extends these ideas to the indirect case in which the causal variants are unobserved and we have a set of tag single nucleotide polymorphisms (tag SNPs) representing the regions of interest. In passing, the derivation of the indirect omnibus test statistic leads to a novel "indirect case-only test for interaction".  相似文献   

6.
Optimal designs for two-stage genome-wide association studies   总被引:3,自引:0,他引:3  
Genome-wide association (GWA) studies require genotyping hundreds of thousands of markers on thousands of subjects, and are expensive at current genotyping costs. To conserve resources, many GWA studies are adopting a staged design in which a proportion of the available samples are genotyped on all markers in stage 1, and a proportion of these markers are genotyped on the remaining samples in stage 2. We describe a strategy for designing cost-effective two-stage GWA studies. Our strategy preserves much of the power of the corresponding one-stage design and minimizes the genotyping cost of the study while allowing for differences in per genotyping cost between stages 1 and 2. We show that the ratio of stage 2 to stage 1 per genotype cost can strongly influence both the optimal design and the genotyping cost of the study. Increasing the stage 2 per genotype cost shifts more of the genotyping and study cost to stage 1, and increases the cost of the study. This higher cost can be partially mitigated by adopting a design with reduced power while preserving the false positive rate or by increasing the false positive rate while preserving power. For example, reducing the power preserved in the two-stage design from 99 to 95% that of the one-stage design decreases the two-stage study cost by approximately 15%. Alternatively, the same cost savings can be had by relaxing the false positive rate by 2.5-fold, for example from 1/300,000 to 2.5/300,000, while retaining the same power.  相似文献   

7.
Current common wisdom posits that association analyses using family‐based designs have inflated type 1 error rates (if relationships are ignored) and independent controls are more powerful than familial controls. We explore these suppositions. We show theoretically that family‐based designs can have deflated type‐error rates. Through simulation, we examine the validity and power of family designs for several scenarios: cases from randomly or selectively ascertained pedigrees; and familial or independent controls. Family structures considered are as follows: sibships, nuclear families, moderate‐sized and extended pedigrees. Three methods were considered with the χ2 test for trend: variance correction (VC), weighted (weights assigned to account for genetic similarity), and naïve (ignoring relatedness) as well as the Modified Quasi‐likelihood Score (MQLS) test. Selectively ascertained pedigrees had similar levels of disease enrichment; random ascertainment had no such restriction. Data for 1,000 cases and 1,000 controls were created under the null and alternate models. The VC and MQLS methods were always valid. The naïve method was anti‐conservative if independent controls were used and valid or conservative in designs with familial controls. The weighted association method was generally valid for independent controls, and was conservative for familial controls. With regard to power, independent controls were more powerful for small‐to‐moderate selectively ascertained pedigrees, but familial and independent controls were equivalent in the extended pedigrees and familial controls were consistently more powerful for all randomly ascertained pedigrees. These results suggest a more complex situation than previously assumed, which has important implications for study design and analysis. Genet. Epidemiol. 35:174‐181, 2011. © 2011 Wiley‐Liss, Inc.  相似文献   

8.
Emerging data suggest that the genetic regulation of the biological response to inflammatory stress may be fundamentally different to the genetic underpinning of the homeostatic control (resting state) of the same biological measures. In this paper, we interrogate this hypothesis using a single‐SNP score test and a novel class‐level testing strategy to characterize protein‐coding gene and regulatory element‐level associations with longitudinal biomarker trajectories in response to stimulus. Using the proposed c lass‐level a ssociation s core s tatistic for l ongitudinal d ata, which accounts for correlations induced by linkage disequilibrium, the genetic underpinnings of evoked dynamic changes in repeatedly measured biomarkers are investigated. The proposed method is applied to data on two biomarkers arising from the Genetics of Evoked Responses to Niacin and Endotoxemia study, a National Institutes of Health‐sponsored investigation of the genomics of inflammatory and metabolic responses during low‐grade endotoxemia. Our results suggest that the genetic basis of evoked inflammatory response is different than the genetic contributors to resting state, and several potentially novel loci are identified. A simulation study demonstrates appropriate control of type‐1 error rates, relative computational efficiency, and power. Copyright © 2017 John Wiley & Sons, Ltd.  相似文献   

9.
Genome‐wide association studies (GWAS) have identified many single nucleotide polymorphisms (SNPs) associated with complex traits. However, the genetic heritability of most of these traits remains unexplained. To help guide future studies, we address the crucial question of whether future GWAS can detect new SNP associations and explain additional heritability given the new availability of larger GWAS SNP arrays, imputation, and reduced genotyping costs. We first describe the pairwise and imputation coverage of all SNPs in the human genome by commercially available GWAS SNP arrays, using the 1000 Genomes Project as a reference. Next, we describe the findings from 6 years of GWAS of 172 chronic diseases, calculating the power to detect each of them while taking array coverage and sample size into account. We then calculate the power to detect these SNP associations under different conditions using improved coverage and/or sample sizes. Finally, we estimate the percentages of SNP associations and heritability previously detected and detectable by future GWAS under each condition. Overall, we estimated that previous GWAS have detected less than one‐fifth of all GWAS‐detectable SNPs underlying chronic disease. Furthermore, increasing sample size has a much larger impact than increasing coverage on the potential of future GWAS to detect additional SNP‐disease associations and heritability.  相似文献   

10.
Association studies depend on linkage disequilibrium (LD) between a causative mutation and linked marker loci. Selecting markers that give the best chance of showing useful levels of LD with the causative mutation will increase the chances of successfully detecting an association. This report examines the variation in the extent of LD between a disease locus and one or two diallelic marker loci (termed single nucleotide polymorphisms or SNPs). We use a simulation method based on the neutral coalescent in a population of variable size to find the distribution of LD as a function of allele frequencies, the recombination rate, and the population history. Given that LD exists, the allele frequencies determine if a site will be useful for detecting an association with the disease mutation. We show that there is extensive variation in LD even for closely linked loci, implying that several markers may be needed to detect a disease locus. The distribution of LD between common variants is strongly influenced by ancestral population size. We show that in general, best results will be obtained if the frequencies of marker alleles are at least as large as the frequency of the causative mutation. Haplotypes of two or more SNPs generally have a higher probability than individual SNPs of showing useful LD with a disease mutation, although exceptions are described.  相似文献   

11.
Tag SNP selection for association studies   总被引:6,自引:0,他引:6  
This report describes current methods for selection of informative single nucleotide polymorphisms (SNPs) using data from a dense network of SNPs that have been genotyped in a relatively small panel of subjects. We discuss the following issues: (1) Optimal selection of SNPs based upon maximizing either the predictability of unmeasured SNPs or the predictability of SNP haplotypes as selection criteria. (2) The dependence of the performance of tag SNP selection methods upon the density of SNP markers genotyped for the purpose of haplotype discovery and tag SNP selection. (3) The likely power of case-control studies to detect the influence upon disease risk of common disease-causing variants in candidate genes in a haplotype-based analysis. We propose a quasi-empirical approach towards evaluating the power of large studies with this calculation based upon the SNP genotype and haplotype frequencies estimated in a haplotype discovery panel. In this calculation, each common SNP in turn is treated as a potential unmeasured causal variant and subjected to a correlation analysis using the remaining SNPs. We use a small portion of the HapMap ENCODE data (488 common SNPs genotyped over approximately a 500 kb region of chromosome 2) as an illustrative example of this approach towards power evaluation.  相似文献   

12.
Genetic association studies are becoming commonplace due to the availability of cost-effective yet sophisticated DNA sequencing and genotyping resources and technologies. In addition, technologies designed to identify molecular and subclinical phenotypes that reflect disease pathogenesis are continually being developed and refined (consider, e.g., imaging technologies, microarray-based gene expression and proteomic platforms, histological analyses of excised tissues, etc.). Unfortunately, the large-scale use of many of these molecular and subclinical phenotyping technologies in genetic association studies is difficult logistically and is currently cost-prohibitive. In this paper, we consider efficient designs for testing the association between particular genetic variations and expensive, yet appropriate, subclinical phenotypes of relevance to a disease that take advantage of twins or sibling pairs discordant for genotypes at the locus (or loci) being tested. We demonstrate that including genotypically discordant twins or siblings in an association study can result in a substantial increase in power over designs that use monozygotic twins or only unrelated individuals. We ultimately argue that, from a practical standpoint, sampling from existing family or twin-based cohorts in which: (1) follow-up studies of a genetic association are warranted in order to assess the in vivo significance of an association with respect to more refined pathological phenotypes; and/or (2) large-scale, genome-wide linkage and association studies have been pursued that have focused on clinical endpoints for which the study subjects have consented to more elaborate follow-up studies, is a powerful way to test associations.  相似文献   

13.
A new multimarker test for family-based association studies   总被引:1,自引:0,他引:1  
  相似文献   

14.
Whole-exome sequencing (WES) and whole-genome sequencing (WGS) studies are underway to investigate the impact of genetic variants on complex diseases and traits. It is customary to perform single-variant association tests for common variants and region-based association tests for rare variants. The latter may target variants with similar or opposite effects, interrogate variants with different frequencies or different functional annotations, and examine a variety of regions. The large number of tests that are performed necessitates adjustment for multiple testing. The conventional Bonferroni correction is overly conservative as the test statistics are correlated. To address this challenge, we propose a simple and accurate method based on parametric bootstrap to assess genomewide significance. We show that the correlations of the test statistics are determined primarily by the genotypes, such that the same significance threshold can be used in different studies that share a common sequencing platform. We demonstrate the usefulness of the proposed method with WES data from the National Heart, Lung, and Blood Institute Exome Sequencing Project and WGS data from the 1000 Genomes Project. We recommend the p value of as the genomewide significance threshold for testing all common and low-frequency variants (MAFs 0.1%) in the human genome.  相似文献   

15.
By systematic examination of common tag single-nucleotide polymorphisms (SNPs) across the genome, the genome-wide association study (GWAS) has proven to be a successful approach to identify genetic variants that are associated with complex diseases and traits. Although the per base pair cost of sequencing has dropped dramatically with the advent of the next-generation technologies, it may still only be feasible to obtain DNA sequence data for a portion of available study subjects due to financial constraints. Two-phase sampling designs have been used frequently in large-scale surveys and epidemiological studies where certain variables are too costly to be measured on all subjects. We consider two-phase stratified sampling designs for genetic association, in which tag SNPs for candidate genes or regions are genotyped on all subjects in phase 1, and a proportion of subjects are selected into phase 2 based on genotypes at one or more tag SNPs. Deep sequencing in the region is then applied to genotype phase 2 subjects at sequence SNPs. We investigate alternative sampling designs for selection of phase 2 subjects within strata defined by tag SNP genotypes and develop methods of inference for sequence SNP variant associations using data from both phases. In comparison to methods that use data from phase 2 alone, the combined analysis improves efficiency.  相似文献   

16.
One of main roles of omics-based association studies with high-throughput technologies is to screen out relevant molecular features, such as genetic variants, genes, and proteins, from a large pool of such candidate features based on their associations with the phenotype of interest. Typically, screened features are subject to validation studies using more established or conventional assays, where the number of evaluable features is relatively limited, so that there may exist a fixed number of features measurable by these assays. Such a limitation necessitates narrowing a feature set down to a fixed size, following an initial screening analysis via multiple testing where adjustment for multiplicity is made. We propose a two-stage screening approach to control the false discovery rate (FDR) for a feature set with fixed size that is subject to validation studies, rather than for a feature set from the initial screening analysis. Out of the feature set selected in the first stage with a relaxed FDR level, a fraction of features with most statistical significance is firstly selected. For the remaining feature set, features are selected based on biological consideration only, without regard to any statistical information, which allows evaluating the FDR level for the finally selected feature set with fixed size. Improvement of the power is discussed in the proposed two-stage screening approach. Simulation experiments based on parametric models and real microarray datasets demonstrated substantial increment in the number of screened features for biological consideration compared with the standard screening approach, allowing for more extensive and in-depth biological investigations in omics association studies.  相似文献   

17.
Abstract

Increasing evidence indicates that polymorphisms in genes relevant to spermatogenesis might modulate the efficiency of reproduction in men. Ring finger protein 8 (RNF8) and bromodomain testis-specific (BRDT) are two candidate genes associated with spermatogenesis. Here, we considered potential associations of 14 single nucleotide polymorphisms (SNPs) in RNF8 and BRDT genes in Chinese patients with non-obstructive azoospermia (NOA). We analyzed 361 men with NOA and 368 fertile controls by using Sequenom iplex technology. Our data did not reveal any variants associated with NOA susceptibility. However, we observed that rs104669 and rs195432 of RNF8 were in strong linkage disequilibrium. Haplotype analysis of the two SNPs indicated that the haplotype AC reduced the risk of NOA and the haplotype TC significantly evaluated the risk of NOA. Moreover, the RNF8 variants rs195432 (C/A p?=?0.030), rs195434 (T/C p?=?0.025), and rs2284922 (T/C p?=?0.034) were correlated with the smaller testis volume.  相似文献   

18.
Although the impacts of macronutrients and the circadian clock on obesity have been reported, the interactions between macronutrient distribution and circadian genes are unclear. The aim of this study was to explore macronutrient intake patterns in the Korean population and associations between the patterns and circadian gene variants and obesity. After applying the criteria, 5343 subjects (51.6% male, mean age 49.4 ± 7.3 years) from the Korean Genome and Epidemiology Study data and nine variants in seven circadian genes were analyzed. We defined macronutrient intake patterns by tertiles of the fat to carbohydrate ratio (FC). The very low FC (VLFC) was associated with a higher risk of obesity than the optimal FC (OFC). After stratification by the genotypes of nine variants, the obesity risk according to the patterns differed by the variants. In the female VLFC, the major homozygous allele of CLOCK rs11932595 and CRY1 rs3741892 had a higher abdominal obesity risk than those in the OFC. The GG genotype of PER2 rs2304672 in the VLFC showed greater risks for obesity and abdominal obesity. In conclusion, these findings suggest that macronutrient intake patterns were associated with obesity susceptibility, and the associations were different depending on the circadian clock genotypes of the CLOCK, PER2, and CRY1 loci.  相似文献   

19.
We propose optimized two-stage designs for genome-wide case-control association studies, using a hypothesis testing paradigm. To save genotyping costs, the complete marker set is genotyped in a sub-sample only (stage I). On stage II, the most promising markers are then genotyped in the remaining sub-sample. In recent publications, two-stage designs were proposed which minimize the overall genotyping costs. To achieve full design optimization, we additionally include sampling costs into both the cost function and the design optimization. The resulting optimal designs differ markedly from those optimized for genotyping costs only (partially optimized designs), and achieve considerable further cost reductions. Compared with partially optimized designs, fully optimized two-stage designs have higher first-stage sample proportion. Furthermore, the increment of the sample size over the one-stage design, which is necessary in two-stage designs in order to compensate for the loss of power due to partial genotyping, is less pronounced for fully optimized two-stage designs. In addition, we address the scenario where the investigator is interested to gain as much information as possible, however is restricted in terms of a budget. In that we develop two-stage designs that maximize the power under a certain cost constraint.  相似文献   

20.
In this article, we develop a powerful test for identifying single nucleotide polymorphism (SNP)-sets that are predictive of survival with data from genome-wide association studies. We first group typed SNPs into SNP-sets based on genomic features and then apply a score test to assess the overall effect of each SNP-set on the survival outcome through a kernel machine Cox regression framework. This approach uses genetic information from all SNPs in the SNP-set simultaneously and accounts for linkage disequilibrium (LD), leading to a powerful test with reduced degrees of freedom when the typed SNPs are in LD with each other. This type of test also has the advantage of capturing the potentially nonlinear effects of the SNPs, SNP-SNP interactions (epistasis), and the joint effects of multiple causal variants. By simulating SNP data based on the LD structure of real genes from the HapMap project, we demonstrate that our proposed test is more powerful than the standard single SNP minimum P-value-based test for association studies with censored survival outcomes. We illustrate the proposed test with a real data application.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号