期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Empirical extensions of the lasso penalty to reduce the false discovery rate in high‐dimensional Cox regression models

Nils Ternès Federico Rotolo Stefan Michiels 《Statistics in medicine》2016,35(15):2561-2573

Correct selection of prognostic biomarkers among multiple candidates is becoming increasingly challenging as the dimensionality of biological data becomes higher. Therefore, minimizing the false discovery rate (FDR) is of primary importance, while a low false negative rate (FNR) is a complementary measure. The lasso is a popular selection method in Cox regression, but its results depend heavily on the penalty parameter λ. Usually, λ is chosen using maximum cross‐validated log‐likelihood (max‐cvl). However, this method has often a very high FDR. We review methods for a more conservative choice of λ. We propose an empirical extension of the cvl by adding a penalization term, which trades off between the goodness‐of‐fit and the parsimony of the model, leading to the selection of fewer biomarkers and, as we show, to the reduction of the FDR without large increase in FNR. We conducted a simulation study considering null and moderately sparse alternative scenarios and compared our approach with the standard lasso and 10 other competitors: Akaike information criterion (AIC), corrected AIC, Bayesian information criterion (BIC), extended BIC, Hannan and Quinn information criterion (HQIC), risk information criterion (RIC), one‐standard‐error rule, adaptive lasso, stability selection, and percentile lasso. Our extension achieved the best compromise across all the scenarios between a reduction of the FDR and a limited raise of the FNR, followed by the AIC, the RIC, and the adaptive lasso, which performed well in some settings. We illustrate the methods using gene expression data of 523 breast cancer patients. In conclusion, we propose to apply our extension to the lasso whenever a stringent FDR with a limited FNR is targeted. Copyright © 2016 John Wiley & Sons, Ltd. 相似文献

2.

Bayesian pharmacovigilance signal detection methods revisited in a multiple comparison setting

Ismaïl Ahmed Franoise Haramburu Annie Fourrier‐Rglat Frantz Thiessard Carmen Kreft‐Jais Ghada Miremont‐Salam Bernard Bgaud Pascale Tubert‐Bitter 《Statistics in medicine》2009,28(13):1774-1792

Pharmacovigilance spontaneous reporting systems are primarily devoted to early detection of the adverse reactions of marketed drugs. They maintain large spontaneous reporting databases (SRD) for which several automatic signalling methods have been developed. A common limitation of these methods lies in the fact that they do not provide an auto‐evaluation of the generated signals so that thresholds of alerts are arbitrarily chosen. In this paper, we propose to revisit the Gamma Poisson Shrinkage (GPS) model and the Bayesian Confidence Propagation Neural Network (BCPNN) model in the Bayesian general decision framework. This results in a new signal ranking procedure based on the posterior probability of null hypothesis of interest and makes it possible to derive with a non‐mixture modelling approach Bayesian estimators of the false discovery rate (FDR), false negative rate, sensitivity and specificity. An original data generation process that can be suited to the features of the SRD under scrutiny is proposed and applied to the French SRD to perform a large simulation study. Results indicate better performances according to the FDR for the proposed ranking procedure in comparison with the current ones for the GPS model. They also reveal identical performances according to the four operating characteristics for the proposed ranking procedure with the BCPNN and GPS models but better estimates when using the GPS model. Finally, the proposed procedure is applied to the French data. Copyright © 2009 John Wiley & Sons, Ltd. 相似文献

3.

Nonparametric estimation of the rediscovery rate

下载免费PDF全文

Donghwan Lee Andrea Ganna Yudi Pawitan Woojoo Lee 《Statistics in medicine》2016,35(18):3203-3212

Validation studies have been used to increase the reliability of the statistical conclusions for scientific discoveries; such studies improve the reproducibility of the findings and reduce the possibility of false positives. Here, one of the important roles of statistics is to quantify reproducibility rigorously. Two concepts were recently defined for this purpose: (i) rediscovery rate (RDR), which is the expected proportion of statistically significant findings in a study that can be replicated in the validation study and (ii) false discovery rate in the validation study (vFDR). In this paper, we aim to develop a nonparametric approach to estimate the RDR and vFDR and show an explicit link between the RDR and the FDR. Among other things, the link explains why reproducing statistically significant results even with low FDR level may be difficult. Two metabolomics datasets are considered to illustrate the application of the RDR and vFDR concepts in high‐throughput data analysis. Copyright © 2016 John Wiley & Sons, Ltd. 相似文献

4.

False discovery rate and permutation test: An evaluation in ERP data analysis

Agustín Lage‐Castellanos Eduardo Martínez‐Montes Juan A. Hernández‐Cabrera Lídice Galán 《Statistics in medicine》2010,29(1):63-74

Current analysis of event‐related potentials (ERP) data is usually based on the a priori selection of channels and time windows of interest for studying the differences between experimental conditions in the spatio‐temporal domain. In this work we put forward a new strategy designed for situations when there is not a priori information about ‘when’ and ‘where’ these differences appear in the spatio‐temporal domain, simultaneously testing numerous hypotheses, which increase the risk of false positives. This issue is known as the problem of multiple comparisons and has been managed with methods that control the false discovery rate (FDR), such as permutation test and FDR methods. Although the former has been previously applied, to our knowledge, the FDR methods have not been introduced in the ERP data analysis. Here we compare the performance (on simulated and real data) of permutation test and two FDR methods (Benjamini and Hochberg (BH) and local‐fdr, by Efron). All these methods have been shown to be valid for dealing with the problem of multiple comparisons in the ERP analysis, avoiding the ad hoc selection of channels and/or time windows. FDR methods are a good alternative to the common and computationally more expensive permutation test. The BH method for independent tests gave the best overall performance regarding the balance between type I and type II errors. The local‐fdr method is preferable for high dimensional (multichannel) problems where most of the tests conform to the empirical null hypothesis. Differences among the methods according to assumptions, null distributions and dimensionality of the problem are also discussed. Copyright © 2009 John Wiley & Sons, Ltd. 相似文献

5.

Flagging clinical adverse experiences: reducing false discoveries without materially compromising power for detecting true signals

Mehrotra DV Adewale AJ 《Statistics in medicine》2012,31(18):1918-1930

Comparative analyses of safety/tolerability data from a typical phase III randomized clinical trial generate multiple p-values associated with adverse experiences (AEs) across several body systems. A common approach is to 'flag' any AE with a p-value less than or equal to 0.05, ignoring the multiplicity problem. Despite the fact that this approach can result in excessive false discoveries (false positives), many researchers avoid a multiplicity adjustment to curtail the risk of missing true safety signals. We propose a new flagging mechanism that significantly lowers the false discovery rate (FDR) without materially compromising the power for detecting true signals, relative to the common no-adjustment approach. Our simple two-step procedure is an enhancement of the Mehrotra-Heyse-Tukey approach that leverages the natural grouping of AEs by body systems. We use simulations to show that, on the basis of FDR and power, our procedure is an attractive alternative to the following: (i) the no-adjustment approach; (ii) a one-step FDR approach that ignores the grouping of AEs by body systems; and (iii) a recently proposed two-step FDR approach for much larger-scale settings such as genome-wide association studies. We use three clinical trial examples for illustration. 相似文献

6.

Signal detection in FDA AERS database using Dirichlet process

下载免费PDF全文

Na Hu Lan Huang Ram C. Tiwari 《Statistics in medicine》2015,34(19):2725-2742

In the recent two decades, data mining methods for signal detection have been developed for drug safety surveillance, using large post‐market safety data. Several of these methods assume that the number of reports for each drug–adverse event combination is a Poisson random variable with mean proportional to the unknown reporting rate of the drug–adverse event pair. Here, a Bayesian method based on the Poisson–Dirichlet process (DP) model is proposed for signal detection from large databases, such as the Food and Drug Administration's Adverse Event Reporting System (AERS) database. Instead of using a parametric distribution as a common prior for the reporting rates, as is the case with existing Bayesian or empirical Bayesian methods, a nonparametric prior, namely, the DP, is used. The precision parameter and the baseline distribution of the DP, which characterize the process, are modeled hierarchically. The performance of the Poisson–DP model is compared with some other models, through an intensive simulation study using a Bayesian model selection and frequentist performance characteristics such as type‐I error, false discovery rate, sensitivity, and power. For illustration, the proposed model and its extension to address a large amount of zero counts are used to analyze statin drugs for signals using the 2006–2011 AERS data. Copyright © 2015 John Wiley & Sons, Ltd. 相似文献

7.

Prospective Calculation of Identification Power for Individual Genes in Analyses Controlling the False Discovery Rate

Michael R. Crager 《Genetic epidemiology》2012,36(8):839-847

Recent work on prospective power and sample size calculations for analyses of high‐dimension gene expression data that control the false discovery rate (FDR) focuses on the average power over all the truly nonnull hypotheses, or equivalently, the expected proportion of nonnull hypotheses rejected. Using another characterization of power, we adapt Efron's ([2007] Ann Stat 35:1351–1377) empirical Bayes approach to post hoc power calculation to develop a method for prospective calculation of the “identification power” for individual genes. This is the probability that a gene with a given true degree of association with clinical outcome or state will be included in a set within which the FDR is controlled at a specified level. An example calculation using proportional hazards regression highlights the effects of large numbers of genes with little or no association on the identification power for individual genes with substantial association. 相似文献

8.

Analysis of multilocus models of association

Devlin B Roeder K Wasserman L 《Genetic epidemiology》2003,25(1):36-47

It is increasingly recognized that multiple genetic variants, within the same or different genes, combine to affect liability for many common diseases. Indeed, the variants may interact among themselves and with environmental factors. Thus realistic genetic/statistical models can include an extremely large number of parameters, and it is by no means obvious how to find the variants contributing to liability. For models of multiple candidate genes and their interactions, we prove that statistical inference can be based on controlling the false discovery rate (FDR), which is defined as the expected number of false rejections divided by the number of rejections. Controlling the FDR automatically controls the overall error rate in the special case that all the null hypotheses are true. So do more standard methods such as Bonferroni correction. However, when some null hypotheses are false, the goals of Bonferroni and FDR differ, and FDR will have better power. Model selection procedures, such as forward stepwise regression, are often used to choose important predictors for complex models. By analysis of simulations of such models, we compare a computationally efficient form of forward stepwise regression against the FDR methods. We show that model selection includes numerous genetic variants having no impact on the trait, whereas FDR maintains a false-positive rate very close to the nominal rate. With good control over false positives and better power than Bonferroni, the FDR-based methods we introduce present a viable means of evaluating complex, multivariate genetic models. Naturally, as for any method seeking to explore complex genetic models, the power of the methods is limited by sample size and model complexity. 相似文献

9.

Stratified false discovery control for large-scale hypothesis testing with application to genome-wide association studies

Sun L Craiu RV Paterson AD Bull SB 《Genetic epidemiology》2006,30(6):519-530

The multiplicity problem has become increasingly important in genetic studies as the capacity for high-throughput genotyping has increased. The control of False Discovery Rate (FDR) (Benjamini and Hochberg. [1995] J. R. Stat. Soc. Ser. B 57:289-300) has been adopted to address the problems of false positive control and low power inherent in high-volume genome-wide linkage and association studies. In many genetic studies, there is often a natural stratification of the m hypotheses to be tested. Given the FDR framework and the presence of such stratification, we investigate the performance of a stratified false discovery control approach (i.e. control or estimate FDR separately for each stratum) and compare it to the aggregated method (i.e. consider all hypotheses in a single stratum). Under the fixed rejection region framework (i.e. reject all hypotheses with unadjusted p-values less than a pre-specified level and then estimate FDR), we demonstrate that the aggregated FDR is a weighted average of the stratum-specific FDRs. Under the fixed FDR framework (i.e. reject as many hypotheses as possible and meanwhile control FDR at a pre-specified level), we specify a condition necessary for the expected total number of true positives under the stratified FDR method to be equal to or greater than that obtained from the aggregated FDR method. Application to a recent Genome-Wide Association (GWA) study by Maraganore et al. ([2005] Am. J. Hum. Genet. 77:685-693) illustrates the potential advantages of control or estimation of FDR by stratum. Our analyses also show that controlling FDR at a low rate, e.g. 5% or 10%, may not be feasible for some GWA studies. 相似文献

10.

Many Phenotypes Without Many False Discoveries: Error Controlling Strategies for Multitrait Association Studies

下载免费PDF全文

Christine B. Peterson Marina Bogomolov Yoav Benjamini Chiara Sabatti 《Genetic epidemiology》2016,40(1):45-56

The genetic basis of multiple phenotypes such as gene expression, metabolite levels, or imaging features is often investigated by testing a large collection of hypotheses, probing the existence of association between each of the traits and hundreds of thousands of genotyped variants. Appropriate multiplicity adjustment is crucial to guarantee replicability of findings, and the false discovery rate (FDR) is frequently adopted as a measure of global error. In the interest of interpretability, results are often summarized so that reporting focuses on variants discovered to be associated to some phenotypes. We show that applying FDR‐controlling procedures on the entire collection of hypotheses fails to control the rate of false discovery of associated variants as well as the expected value of the average proportion of false discovery of phenotypes influenced by such variants. We propose a simple hierarchical testing procedure that allows control of both these error rates and provides a more reliable basis for the identification of variants with functional effects. We demonstrate the utility of this approach through simulation studies comparing various error rates and measures of power for genetic association studies of multiple traits. Finally, we apply the proposed method to identify genetic variants that impact flowering phenotypes in Arabidopsis thaliana, expanding the set of discoveries. 相似文献

11.

Practical guidelines for assessing power and false discovery rate for a fixed sample size in microarray experiments

Tong T Zhao H 《Statistics in medicine》2008,27(11):1960-1972

One major goal in microarray studies is to identify genes having different expression levels across different classes/conditions. In order to achieve this goal, a study needs to have an adequate sample size to ensure the desired power. Owing to the importance of this topic, a number of approaches to sample size calculation have been developed. However, due to the cost and/or experimental difficulties in obtaining sufficient biological materials, it might be difficult to attain the required sample size. In this article, we address more practical questions for assessing power and false discovery rate (FDR) for a fixed sample size. The relationships between power, sample size and FDR are explored. We also conduct simulations and a real data study to evaluate the proposed findings. 相似文献

12.

微阵列数据的多重比较 总被引：3，自引：2，他引：1

荀鹏程赵杨柏建岭易洪刚于浩陈峰《中国卫生统计》2006,23(1):5-8

目的介绍阳性结果错误率（FDR）及相关控制方法在微阵列数据多重比较中的应用。方法用BH、BL、BY和ALSU四种FDR控制程序比较了3226个基因在两组乳腺癌患者中的表达差异。结果四个程序在各自实用的范围内均将FDR控制在0．05以下，检验效能由大到小的顺序为：ALSU〉BH〉BY〉BL。ALSU程序因引入m0的估计,更为合理。不仅提高了检验效能，同时又较好地控制了假阳性错误。结论在微阵列数据的比较中必须考虑FDR的控制，同时又要考虑提高检验效能。多重比较中，控制FDR比控制总Ⅰ型错误率（FWER）检验效能高，且更为实用。相似文献

13.

False Discovery Rates for Rare Variants From Sequenced Data

Marinela Capanu Venkatraman E. Seshan 《Genetic epidemiology》2015,39(2):65-76

The detection of rare deleterious variants is the preeminent current technical challenge in statistical genetics. Sorting the deleterious from neutral variants at a disease locus is challenging because of the sparseness of the evidence for each individual variant. Hierarchical modeling and Bayesian model uncertainty are two techniques that have been shown to be promising in pinpointing individual rare variants that may be driving the association. Interpreting the results from these techniques from the perspective of multiple testing is a challenge and the goal of this article is to better understand their false discovery properties. Using simulations, we conclude that accurate false discovery control cannot be achieved in this framework unless the magnitude of the variants' risk is large and the hierarchical characteristics have high accuracy in distinguishing deleterious from neutral variants. 相似文献

14.

Controlling false discovery proportion in identification of drug-related adverse events from multiple system organ classes

Xianming Tan Guanghan F. Liu Donglin Zeng William Wang Guoqing Diao Joseph F. Heyse Joseph G. Ibrahim 《Statistics in medicine》2019,38(22):4378-4389

Analyzing safety data from clinical trials to detect safety signals worth further examination involves testing multiple hypotheses, one for each observed adverse event (AE) type. There exists certain hierarchical structure for these hypotheses due to the classification of the AEs into system organ classes, and these AEs are also likely correlated. Many approaches have been proposed to identify safety signals under the multiple testing framework and tried to achieve control of false discovery rate (FDR). The FDR control concerns the expectation of the false discovery proportion (FDP). In practice, the control of the actual random variable FDP could be more relevant and has recently drawn much attention. In this paper, we proposed a two-stage procedure for safety signal detection with direct control of FDP, through a permutation-based approach for screening groups of AEs and a permutation-based approach of constructing simultaneous upper bounds for false discovery proportion. Our simulation studies showed that this new approach has controlled FDP. We demonstrate our approach using data sets derived from a drug clinical trial. 相似文献

15.

Impaired performance of FDR-based strategies in whole-genome association studies when SNPs are excluded prior to the analysis

Marenne G Dalmasso C Perdry H Génin E Broët P 《Genetic epidemiology》2009,33(1):45-53

With recent advances in genomewide microarray technologies, whole-genome association (WGA) studies have aimed at identifying susceptibility genes for complex human diseases using hundreds of thousands of single nucleotide polymorphisms (SNPs) genotyped at the same time. In this context and to take into account multiple testing, false discovery rate (FDR)-based strategies are now used frequently. However, a critical aspect of these strAtegies is that they are applied to a collection or a family of hypotheses and, thus, critically depend on these precise hypotheses. We investigated how modifying the family of hypotheses to be tested affected the performance of FDR-based procedures in WGA studies. We showed that FDR-based procedures performed more poorly when excluding SNPs with high prior probability of being associated. Results of simulation studies mimicking WGA studies according to three scenarios are reported, and show the extent to which SNPs elimination (family contraction) prior to the analysis impairs the performance of FDR-based procedures. To illustrate this situation, we used the data from a recent WGA study on type-1 diabetes (Clayton et al. [2005] Nat. Genet. 37:1243-1246) and report the results obtained when excluding or not SNPs located inside the human leukocyte antigen region. Based on our findings, excluding markers with high prior probability of being associated cannot be recommended for the analysis of WGA data with FDR-based strategies. 相似文献

16.

Were genome‐wide linkage studies a waste of time? Exploiting candidate regions within genome‐wide association studies

Yun J. Yoo Shelley B. Bull Andrew D. Paterson Daryl Waggott Lei Sun 《Genetic epidemiology》2010,34(2):107-118

A central issue in genome‐wide association (GWA) studies is assessing statistical significance while adjusting for multiple hypothesis testing. An equally important question is the statistical efficiency of the GWA design as compared to the traditional sequential approach in which genome‐wide linkage analysis is followed by region‐wise association mapping. Nevertheless, GWA is becoming more popular due in part to cost efficiency: commercially available 1M chips are nearly as inexpensive as a custom‐designed 10 K chip. It is becoming apparent, however, that most of the on‐going GWA studies with 2,000–5,000 samples are in fact underpowered. As a means to improve power, we emphasize the importance of utilizing prior information such as results of previous linkage studies via a stratified false discovery rate (FDR) control. The essence of the stratified FDR control is to prioritize the genome and maintain power to interrogate candidate regions within the GWA study. These candidate regions can be defined as, but are by no means limited to, linkage‐peak regions. Furthermore, we theoretically unify the stratified FDR approach and the weighted P‐value method, and we show that stratified FDR can be formulated as a robust version of weighted FDR. Finally, we demonstrate the utility of the methods in two GWA datasets: Type 2 diabetes (FUSION) and an on‐going study of long‐term diabetic complications (DCCT/EDIC). The methods are implemented as a user‐friendly software package, SFDR. The same stratification framework can be readily applied to other type of studies, for example, using GWA results to improve the power of sequencing data analyses. Genet. Epidemiol. 34: 107–118, 2010. © 2009 Wiley‐Liss, Inc. 相似文献

17.

Multiple hypothesis testing in genomics

Jelle J. Goeman Aldo Solari 《Statistics in medicine》2014,33(11):1946-1978

This paper presents an overview of the current state of the art in multiple testing in genomics data from a user's perspective. We describe methods for familywise error control, false discovery rate control and false discovery proportion estimation and confidence, both conceptually and practically, and explain when to use which type of error rate. We elaborate on the assumptions underlying the methods and discuss pitfalls in the interpretation of results. In our discussion, we take into account the exploratory nature of genomics experiments, looking at selection of genes before or after testing, and at the role of validation experiments. Copyright © 2014 John Wiley & Sons, Ltd. 相似文献

18.

A Bayesian methodology for detecting targeted genes under two related experiments

下载免费PDF全文

Naveen K. Bansal Hongmei Jiang Prachi Pradeep 《Statistics in medicine》2015,34(25):3362-3375

Many gene expression data are based on two experiments where the gene expressions of the targeted genes under both experiments are correlated. We consider problems in which objectives are to find genes that are simultaneously upregulated/downregulated under both experiments. A Bayesian methodology is proposed based on directional multiple hypotheses testing. We propose a false discovery rate specific to the problem under consideration, and construct a Bayes rule satisfying a false discovery rate criterion. The proposed method is compared with a traditional rule through simulation studies. We apply our methodology to two real examples involving microRNAs; where in one example the targeted genes are simultaneously downregulated under both experiments, and in the other the targeted genes are downregulated in one experiment and upregulated in the other experiment. We also discuss how the proposed methodology can be extended to more than two experiments. Copyright © 2015 John Wiley & Sons, Ltd. 相似文献

19.

Estimating the relative excess risk due to interaction: a bayesian approach

Chu H Nie L Cole SR 《Epidemiology (Cambridge, Mass.)》2011,22(2):242-248

The relative excess odds or risk due to interaction (ie, RERIOR and RERI) play an important role in epidemiologic data analysis and interpretation. Previous authors have advocated frequentist approaches based on nonparametric bootstrap, the method of variance estimates recovery, and profile likelihood for estimating confidence intervals. As an alternative, we propose a Bayesian approach that accounts for parameter constraints and estimates the RERIOR in a case-control study from a linear additive odds-ratio model, or the RERI in a cohort study from a linear additive risk-ratio model. We show that Bayesian credible intervals can often be obtained more easily than frequentist confidence intervals. Furthermore, the Bayesian approach can be easily extended to adjust for confounders. Because posterior computation with inequality constraints can be accomplished easily using free software, the proposed Bayesian approaches may be useful in practice. 相似文献

20.

Two-stage analysis for selecting fixed numbers of features in omics association studies

Takanori Kawabata Ryo Emoto Jo Nishino Kunihiko Takahashi Shigeyuki Matsui 《Statistics in medicine》2019,38(16):2956-2971

One of main roles of omics-based association studies with high-throughput technologies is to screen out relevant molecular features, such as genetic variants, genes, and proteins, from a large pool of such candidate features based on their associations with the phenotype of interest. Typically, screened features are subject to validation studies using more established or conventional assays, where the number of evaluable features is relatively limited, so that there may exist a fixed number of features measurable by these assays. Such a limitation necessitates narrowing a feature set down to a fixed size, following an initial screening analysis via multiple testing where adjustment for multiplicity is made. We propose a two-stage screening approach to control the false discovery rate (FDR) for a feature set with fixed size that is subject to validation studies, rather than for a feature set from the initial screening analysis. Out of the feature set selected in the first stage with a relaxed FDR level, a fraction of features with most statistical significance is firstly selected. For the remaining feature set, features are selected based on biological consideration only, without regard to any statistical information, which allows evaluating the FDR level for the finally selected feature set with fixed size. Improvement of the power is discussed in the proposed two-stage screening approach. Simulation experiments based on parametric models and real microarray datasets demonstrated substantial increment in the number of screened features for biological consideration compared with the standard screening approach, allowing for more extensive and in-depth biological investigations in omics association studies. 相似文献