首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
The detection of rare deleterious variants is the preeminent current technical challenge in statistical genetics. Sorting the deleterious from neutral variants at a disease locus is challenging because of the sparseness of the evidence for each individual variant. Hierarchical modeling and Bayesian model uncertainty are two techniques that have been shown to be promising in pinpointing individual rare variants that may be driving the association. Interpreting the results from these techniques from the perspective of multiple testing is a challenge and the goal of this article is to better understand their false discovery properties. Using simulations, we conclude that accurate false discovery control cannot be achieved in this framework unless the magnitude of the variants' risk is large and the hierarchical characteristics have high accuracy in distinguishing deleterious from neutral variants.  相似文献   

2.
The genetic basis of multiple phenotypes such as gene expression, metabolite levels, or imaging features is often investigated by testing a large collection of hypotheses, probing the existence of association between each of the traits and hundreds of thousands of genotyped variants. Appropriate multiplicity adjustment is crucial to guarantee replicability of findings, and the false discovery rate (FDR) is frequently adopted as a measure of global error. In the interest of interpretability, results are often summarized so that reporting focuses on variants discovered to be associated to some phenotypes. We show that applying FDR‐controlling procedures on the entire collection of hypotheses fails to control the rate of false discovery of associated variants as well as the expected value of the average proportion of false discovery of phenotypes influenced by such variants. We propose a simple hierarchical testing procedure that allows control of both these error rates and provides a more reliable basis for the identification of variants with functional effects. We demonstrate the utility of this approach through simulation studies comparing various error rates and measures of power for genetic association studies of multiple traits. Finally, we apply the proposed method to identify genetic variants that impact flowering phenotypes in Arabidopsis thaliana, expanding the set of discoveries.  相似文献   

3.
Current analysis of event‐related potentials (ERP) data is usually based on the a priori selection of channels and time windows of interest for studying the differences between experimental conditions in the spatio‐temporal domain. In this work we put forward a new strategy designed for situations when there is not a priori information about ‘when’ and ‘where’ these differences appear in the spatio‐temporal domain, simultaneously testing numerous hypotheses, which increase the risk of false positives. This issue is known as the problem of multiple comparisons and has been managed with methods that control the false discovery rate (FDR), such as permutation test and FDR methods. Although the former has been previously applied, to our knowledge, the FDR methods have not been introduced in the ERP data analysis. Here we compare the performance (on simulated and real data) of permutation test and two FDR methods (Benjamini and Hochberg (BH) and local‐fdr, by Efron). All these methods have been shown to be valid for dealing with the problem of multiple comparisons in the ERP analysis, avoiding the ad hoc selection of channels and/or time windows. FDR methods are a good alternative to the common and computationally more expensive permutation test. The BH method for independent tests gave the best overall performance regarding the balance between type I and type II errors. The local‐fdr method is preferable for high dimensional (multichannel) problems where most of the tests conform to the empirical null hypothesis. Differences among the methods according to assumptions, null distributions and dimensionality of the problem are also discussed. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

4.
Tong T  Zhao H 《Statistics in medicine》2008,27(11):1960-1972
One major goal in microarray studies is to identify genes having different expression levels across different classes/conditions. In order to achieve this goal, a study needs to have an adequate sample size to ensure the desired power. Owing to the importance of this topic, a number of approaches to sample size calculation have been developed. However, due to the cost and/or experimental difficulties in obtaining sufficient biological materials, it might be difficult to attain the required sample size. In this article, we address more practical questions for assessing power and false discovery rate (FDR) for a fixed sample size. The relationships between power, sample size and FDR are explored. We also conduct simulations and a real data study to evaluate the proposed findings.  相似文献   

5.
Power and sample size for DNA microarray studies   总被引:10,自引:0,他引:10  
A microarray study aims at having a high probability of declaring genes to be differentially expressed if they are truly expressed, while keeping the probability of making false declarations of expression acceptably low. Thus, in formal terms, well-designed microarray studies will have high power while controlling type I error risk. Achieving this objective is the purpose of this paper. Here, we discuss conceptual issues and present computational methods for statistical power and sample size in microarray studies, taking account of the multiple testing that is generic to these studies. The discussion encompasses choices of experimental design and replication for a study. Practical examples are used to demonstrate the methods. The examples show forcefully that replication of a microarray experiment can yield large increases in statistical power. The paper refers to cDNA arrays in the discussion and illustrations but the proposed methodology is equally applicable to expression data from oligonucleotide arrays.  相似文献   

6.
The original definitions of false discovery rate (FDR) and false non-discovery rate (FNR) can be understood as the frequentist risks of false rejections and false non-rejections, respectively, conditional on the unknown parameter, while the Bayesian posterior FDR and posterior FNR are conditioned on the data. From a Bayesian point of view, it seems natural to take into account the uncertainties in both the parameter and the data. In this spirit, we propose averaging out the frequentist risks of false rejections and false non-rejections with respect to some prior distribution of the parameters to obtain the average FDR (AFDR) and average FNR (AFNR), respectively. A linear combination of the AFDR and AFNR, called the average Bayes error rate (ABER), is considered as an overall risk. Some useful formulas for the AFDR, AFNR and ABER are developed for normal samples with hierarchical mixture priors. The idea of finding threshold values by minimizing the ABER or controlling the AFDR is illustrated using a gene expression data set. Simulation studies show that the proposed approaches are more powerful and robust than the widely used FDR method.  相似文献   

7.
The completion of genome sequencing projects has provided an extensive knowledge of the contents of the genomes of human, mouse, and many other organisms. Despite this, the function of most of the estimated 25,000 human genes remains largely unknown. Attention has now turned to elucidating gene function and identifying biological pathways that contribute to human diseases, including male infertility.

Our understanding of the genetic regulation of male fertility has been accelerated through the use of genetically modified mouse models including knockout, knock-in, gene-trapped, and transgenic mice. Such reverse genetic approaches however, require some fore-knowledge of a gene's function and, as such, bias against the discovery of completely novel genes and biological pathways. To facilitate high throughput gene discovery, genome-wide mouse mutagenesis via the use of a potent chemical mutagen, N-ethyl-N-nitrosourea (ENU), has been developed over the past decade. This forward genetic, or phenotype-driven, approach relies upon observing a phenotype first, then subsequently defining the underlining genetic defect. Mutations are randomly introduced into the mouse genome via ENU exposure. Through a controlled breeding scheme, mutations causing a phenotype of interest (e.g., male infertility) are then identified by linkage analysis and candidate gene sequencing.

This approach allows for the possibility of revealing comprehensive phenotype-genotype relationships for a range of genes and pathways i.e. in addition to null alleles, mice containing partial loss of function or gain-of-function mutations, can be recovered. Such point mutations are likely to be more reflective of those that occur within the human population. Many research groups have successfully used this approach to generate infertile mouse lines and some novel male fertility genes have been revealed. In this review, we focus on the utility of ENU mutagenesis for the discovery of novel male fertility regulators.  相似文献   

8.
Validation studies have been used to increase the reliability of the statistical conclusions for scientific discoveries; such studies improve the reproducibility of the findings and reduce the possibility of false positives. Here, one of the important roles of statistics is to quantify reproducibility rigorously. Two concepts were recently defined for this purpose: (i) rediscovery rate (RDR), which is the expected proportion of statistically significant findings in a study that can be replicated in the validation study and (ii) false discovery rate in the validation study (vFDR). In this paper, we aim to develop a nonparametric approach to estimate the RDR and vFDR and show an explicit link between the RDR and the FDR. Among other things, the link explains why reproducing statistically significant results even with low FDR level may be difficult. Two metabolomics datasets are considered to illustrate the application of the RDR and vFDR concepts in high‐throughput data analysis. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

9.
Identifying genes that are differentially expressed between classes of samples is an important objective of many microarray experiments. Because of the thousands of genes typically considered, there is a tension between identifying as many of the truly differentially expressed genes as possible, but not too many genes that are not really differentially expressed (false discoveries). Controlling the proportion of identified genes that are false discoveries, the false discovery proportion (FDP), is a goal of interest. In this paper, two multivariate permutation methods are investigated for controlling the FDP. One is based on a multivariate permutation testing (MPT) method that probabilistically controls the number of false discoveries, and the other is based on the Significance Analysis of Microarrays (SAM) procedure that provides an estimate of the FDP. Both methods account for the correlations among the genes. We find the ability of the methods to control the proportion of false discoveries varies substantially depending on the implementation characteristics. For example, for both methods one can proceed from the most significant gene to the least significant gene until the estimated FDP is just above the targeted level ('top-down' approach), or from the least significant gene to the most significant gene until the estimated FDP is just below the targeted level ('bottom-up' approach). We find that the top-down MPT-based method probabilistically controls the FDP, whereas our implementation of the top-down SAM-based method does not. Bottom-up MPT-based or SAM-based methods can result in poor control of the FDP.  相似文献   

10.
We address the problem of testing whether a possibly high-dimensional vector may act as a mediator between some exposure variable and the outcome of interest. We propose a global test for mediation, which combines a global test with the intersection-union principle. We discuss theoretical properties of our approach and conduct simulation studies that demonstrate that it performs equally well or better than its competitor. We also propose a multiple testing procedure, ScreenMin, that provides asymptotic control of either familywise error rate or false discovery rate when multiple groups of potential mediators are tested simultaneously. We apply our approach to data from a large Norwegian cohort study, where we look at the hypothesis that smoking increases the risk of lung cancer by modifying the level of DNA methylation.  相似文献   

11.
Many gene expression data are based on two experiments where the gene expressions of the targeted genes under both experiments are correlated. We consider problems in which objectives are to find genes that are simultaneously upregulated/downregulated under both experiments. A Bayesian methodology is proposed based on directional multiple hypotheses testing. We propose a false discovery rate specific to the problem under consideration, and construct a Bayes rule satisfying a false discovery rate criterion. The proposed method is compared with a traditional rule through simulation studies. We apply our methodology to two real examples involving microRNAs; where in one example the targeted genes are simultaneously downregulated under both experiments, and in the other the targeted genes are downregulated in one experiment and upregulated in the other experiment. We also discuss how the proposed methodology can be extended to more than two experiments. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

12.
It is increasingly recognized that multiple genetic variants, within the same or different genes, combine to affect liability for many common diseases. Indeed, the variants may interact among themselves and with environmental factors. Thus realistic genetic/statistical models can include an extremely large number of parameters, and it is by no means obvious how to find the variants contributing to liability. For models of multiple candidate genes and their interactions, we prove that statistical inference can be based on controlling the false discovery rate (FDR), which is defined as the expected number of false rejections divided by the number of rejections. Controlling the FDR automatically controls the overall error rate in the special case that all the null hypotheses are true. So do more standard methods such as Bonferroni correction. However, when some null hypotheses are false, the goals of Bonferroni and FDR differ, and FDR will have better power. Model selection procedures, such as forward stepwise regression, are often used to choose important predictors for complex models. By analysis of simulations of such models, we compare a computationally efficient form of forward stepwise regression against the FDR methods. We show that model selection includes numerous genetic variants having no impact on the trait, whereas FDR maintains a false-positive rate very close to the nominal rate. With good control over false positives and better power than Bonferroni, the FDR-based methods we introduce present a viable means of evaluating complex, multivariate genetic models. Naturally, as for any method seeking to explore complex genetic models, the power of the methods is limited by sample size and model complexity.  相似文献   

13.
Shao Y  Tseng CH 《Statistics in medicine》2007,26(23):4219-4237
DNA microarrays have been widely used for the purpose of simultaneously monitoring a large number of gene expression levels to identify differentially expressed genes. Statistical methods for the adjustment of multiple testing have been discussed extensively in the literature. An important further challenge is the existence of dependence among test statistics due to reasons such as gene co-regulation. To plan large-scale genomic studies, sample size determination with appropriate adjustment for both multiple testing and potential dependency among test statistics is crucial to avoid an abundance of false-positive results and/or serious lack of power. We introduce a general approach for calculating sample sizes for two-way multiple comparisons in the presence of dependence among test statistics to ensure adequate overall power when the false discovery rates are controlled. The usefulness of the proposed method is demonstrated via numerical studies using both simulated data and real data from a well-known study of leukaemia.  相似文献   

14.
为适应现代高科技战争的需要,突出战时救治能力的形成在于平时医疗保障积累的观念,本分析了创伤感染的危险因素,提出了控制对策,介绍了利用多媒体手段强化科技练兵,为控制医院创伤感染提供了经验。  相似文献   

15.
With recent advances in genomewide microarray technologies, whole-genome association (WGA) studies have aimed at identifying susceptibility genes for complex human diseases using hundreds of thousands of single nucleotide polymorphisms (SNPs) genotyped at the same time. In this context and to take into account multiple testing, false discovery rate (FDR)-based strategies are now used frequently. However, a critical aspect of these strAtegies is that they are applied to a collection or a family of hypotheses and, thus, critically depend on these precise hypotheses. We investigated how modifying the family of hypotheses to be tested affected the performance of FDR-based procedures in WGA studies. We showed that FDR-based procedures performed more poorly when excluding SNPs with high prior probability of being associated. Results of simulation studies mimicking WGA studies according to three scenarios are reported, and show the extent to which SNPs elimination (family contraction) prior to the analysis impairs the performance of FDR-based procedures. To illustrate this situation, we used the data from a recent WGA study on type-1 diabetes (Clayton et al. [2005] Nat. Genet. 37:1243-1246) and report the results obtained when excluding or not SNPs located inside the human leukocyte antigen region. Based on our findings, excluding markers with high prior probability of being associated cannot be recommended for the analysis of WGA data with FDR-based strategies.  相似文献   

16.
目的对中国东北地区鹿源致病性大肠杆菌血清型及携带毒力基因情况进行研究。方法根据大肠杆菌的不同毒力基因,设计了9对引物,应用单一PCR及多重PCR扩增,对85株大肠杆菌进行毒力基因检测。结果对中国东北地区患病鹿进行大肠杆菌的分离和鉴定,分离出85株鹿源大肠杆菌,通过血清学分类,鉴定出55株大肠杆菌,分别属于37个不同血清型,其中,EAST1基因阳性菌61株(71.76%),VT1基因阳性菌31株(36.47%),VT2基因阳性菌22株(25.88%),eae A阳性菌1株(1.18%),flich7基因阳性菌6株(7.06%),STb阳性菌1株(1.18%),LT阳性菌2株(2.36%),未检到STa、SLT-2e阳性菌。结论本研究初步阐明了我国东北地区鹿大肠杆菌病的主要血清型及主要致病基因,在临床治疗上应予重视。  相似文献   

17.
This paper focuses on statistical analyses in scenarios where some samples from the matched pairs design are missing, resulting in partially matched samples. Motivated by the idea of meta‐analysis, we recast the partially matched samples as coming from two experimental designs and propose a simple yet robust approach based on the weighted Z‐test to integrate the p‐values computed from these two designs. We show that the proposed approach achieves better operating characteristics in simulations and a case study, compared with existing methods for partially matched samples. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

18.
目的 在单核苷酸多态性(SNPs)数据中探讨不同模拟条件x2检验结合错误发现率(FDR)筛选SNPs位点的适用条件.方法 依据2009年2月发布HapMapⅢ期美国犹他州北欧和西欧后裔人群22号染色体前5000个SNPs数据,采用HAPGEN2模拟病例对照数据,运用Haploview4.2筛选标签SNPs(TagSNPs),比较不同模拟条件筛选致病SNPs的正确率.结果 相对危险度(RR)获取方式无显著差异;3种遗传模型均表现正确率随RR值增大而增高,RR相同时,加性模型正确率最高,显性模型次之,隐性模型最低;加性模型RR>2.2、显性模型RR>4和隐性模型RR>5时,正确率超过60%.结论 x2检验结合FDR在加性模型效果最佳,实际科研工作需依据目标疾病具体情况考虑是否适合x2检验结合FDR方法.  相似文献   

19.
Comparison of polymorphic sites such as single nucleotide polymorphisms (SNPs) within a gene between cases and controls may be useful for establishing a role of this gene in disease susceptibility. The approach includes two steps: the first step is the discovery of the different SNPs within the candidate gene and the second step is the association testing per se that can be done on the entire set of sites discovered or on a subset of these sites only. Selecting a subset of sites may increase the power to detect the association with the candidate gene since a smaller number of tests would then be performed. We proposed a strategy to select sites within a candidate gene and applied it on the Genetic Analysis Workshop 12 candidate gene data. Using these selected sites, we detected an association with candidate genes 1 and 6. © 2001 Wiley‐Liss, Inc.  相似文献   

20.
Correct selection of prognostic biomarkers among multiple candidates is becoming increasingly challenging as the dimensionality of biological data becomes higher. Therefore, minimizing the false discovery rate (FDR) is of primary importance, while a low false negative rate (FNR) is a complementary measure. The lasso is a popular selection method in Cox regression, but its results depend heavily on the penalty parameter λ. Usually, λ is chosen using maximum cross‐validated log‐likelihood (max‐cvl). However, this method has often a very high FDR. We review methods for a more conservative choice of λ. We propose an empirical extension of the cvl by adding a penalization term, which trades off between the goodness‐of‐fit and the parsimony of the model, leading to the selection of fewer biomarkers and, as we show, to the reduction of the FDR without large increase in FNR. We conducted a simulation study considering null and moderately sparse alternative scenarios and compared our approach with the standard lasso and 10 other competitors: Akaike information criterion (AIC), corrected AIC, Bayesian information criterion (BIC), extended BIC, Hannan and Quinn information criterion (HQIC), risk information criterion (RIC), one‐standard‐error rule, adaptive lasso, stability selection, and percentile lasso. Our extension achieved the best compromise across all the scenarios between a reduction of the FDR and a limited raise of the FNR, followed by the AIC, the RIC, and the adaptive lasso, which performed well in some settings. We illustrate the methods using gene expression data of 523 breast cancer patients. In conclusion, we propose to apply our extension to the lasso whenever a stringent FDR with a limited FNR is targeted. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号