首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 125 毫秒
1.
When a likelihood ratio is used to measure the strength of evidence for one hypothesis over another, its reliability (i.e. how often it produces misleading evidence) depends on the specification of the working model. When the working model happens to be the 'true' or 'correct' model, the probability of observing strong misleading evidence is low and controllable. But this is not necessarily the case when the working model is misspecified. Royall and Tsou (J. R. Stat. Soc., Ser. B 2003; 65:391-404) show how to adjust working models to make them robust to misspecification. Likelihood ratios derived from their 'robust adjusted likelihood' are just as reliable (asymptotically) as if the working model were correctly specified in the first place. In this paper, we apply and extend these ideas to the generalized linear model (GLM) regression setting. We provide several illustrations (both from simulated data and real data concerning rates of parasitic infection in Philippine adolescents), show how the required adjustment factor can be obtained from standard statistical software, and draw some connections between this approach and the 'sandwich estimator' for robust standard errors of regression parameters. This substantially broadens the availability and the viability of likelihood methods for measuring statistical evidence in regression settings.  相似文献   

2.
Recently, there has been much interest in using the cost-effectiveness acceptability curve (CEAC) to measure the statistical evidence of cost-effectiveness. The CEAC has two well established but fundamentally different interpretations: one frequentist and one Bayesian. As an alternative, we suggest characterizing the statistical evidence about cost-effectiveness using the likelihood function (the key element of both approaches). Its interpretation is neither dependent on the sample space nor on the prior distribution. Moreover, the probability of observing misleading evidence is low and controllable, so this approach is justifiable in the traditional sense of frequentist long-run behaviour. We propose a new graphic for displaying the evidence about cost-effectiveness and explore the strengths of likelihood methods using data from an economic evaluation of a Program in Assertive Community Treatment (PACT).  相似文献   

3.
A survey of the likelihood approach to bioequivalence trials   总被引:1,自引:0,他引:1  
Choi L  Caffo B  Rohde C 《Statistics in medicine》2008,27(24):4874-4894
Bioequivalence (BE) trials are abbreviated clinical trials whereby a generic drug or new formulation is evaluated to determine if it is 'equivalent' to a corresponding previously approved brand-name drug or formulation. In this paper, we survey the process of testing BE and advocate the likelihood paradigm for representing the resulting data as evidence. We emphasize the unique conflicts between hypothesis testing and confidence intervals in this area--which we believe are indicative of the existence of the systemic defects in the frequentist approach--that the likelihood paradigm avoids. We suggest the direct use of profile likelihoods for evaluating BE. We discuss how the likelihood approach is useful to present the evidence for both average and population BE within a unified framework. We also examine the main properties of profile likelihoods and estimated likelihoods under simulation. This simulation study shows that profile likelihoods offer a viable alternative to the (unknown) true likelihood for a range of parameters commensurate with BE research.  相似文献   

4.
5.

Background

The preferred method to evaluate public health interventions delivered at the level of whole communities is the cluster randomised trial (CRT). The practical limitations of CRTs and the need for alternative methods continue to be debated. There is no consensus on how to classify study designs to evaluate interventions, and how different design features are related to the strength of evidence.

Analysis

This article proposes that most study designs for the evaluation of cluster-level interventions fall into four broad categories: the CRT, the non-randomised cluster trial (NCT), the controlled before-and-after study (CBA), and the before-and-after study without control (BA). A CRT needs to fulfil two basic criteria: (1) the intervention is allocated at random; (2) there are sufficient clusters to allow a statistical between-arm comparison. In a NCT, statistical comparison is made across trial arms as in a CRT, but treatment allocation is not random. The defining feature of a CBA is that intervention and control arms are not compared directly, usually because there are insufficient clusters in each arm to allow a statistical comparison. Rather, baseline and follow-up measures of the outcome of interest are compared in the intervention arm, and separately in the control arm. A BA is a CBA without a control group.

Conclusion

Each design may provide useful or misleading evidence. A precise baseline measurement of the outcome of interest is critical for causal inference in all studies except CRTs. Apart from statistical considerations the exploration of pre/post trends in the outcome allows a more transparent discussion of study weaknesses than is possible in non-randomised studies without a baseline measure.
  相似文献   

6.
The evaluation of diagnostic tests attempts to obtain one or more statistical parameters which can indicate the intrinsic diagnostic utility of a test. Sensitivity, specificity and predictive value are not appropriate for this use. The likelihood ratio has been proposed as a useful measure when using a test to diagnose one of two disease states (e.g. disease present or absent). In this paper, we generalize the likelihood ratio concept to a situation in which the goal is to diagnose one of several non-overlapping disease states. A formula is derived to determine the post-test probability of a specific disease state. The post-test odds are shown to be related to the pre-test odds of a disease and to the usual likelihood ratios derived from considering the diagnosis between the target diagnosis and each alternate in turn. Hence, likelihood ratios derived from comparing pairs of diseases can be used to determine test utility in a multiple disease diagnostic situation.  相似文献   

7.
Sequential analysis is a statistical way of analysing cumulative data. Its goal is to come to a decision as soon as enough evidence is reached for one or another hypothesis. In this article three different statistical approaches, the frequentist, the Bayesian and the likelihood approach, are discussed in relation to sequential analysis. In particular, the less known likelihood approach is elucidated.  相似文献   

8.
The use of correlational probability values (p‐values) as a means of evaluating evidence in nursing and health care has largely been accepted uncritically. There are reasons to be concerned about an uncritical adherence to the use of significance testing, which has been located in the natural science paradigm. p‐values have served in hypothesis and statistical testing, such as in randomized controlled trials and meta‐analyses to support what has been portrayed as the highest levels of evidence in the framework of evidence‐based practice. Nursing has been minimally involved in the rich debate about the controversies of treating significance testing as evidentiary in the health and social sciences. In this paper, we join the dialogue by examining how and why this statistical mechanism has become entrenched as the gold standard for determining what constitutes legitimate scientific knowledge in the postpositivistic paradigm. We argue that nursing needs to critically reflect on the limitations associated with this tool of the evidence‐based movement, given the complexities and contextual factors that are inherent to nursing epistemology. Such reflection will inform our thinking about what constitutes substantive knowledge for the nursing discipline.  相似文献   

9.
B Rosner  W C Willett  D Spiegelman 《Statistics in medicine》1989,8(9):1051-69; discussion 1071-3
Errors in the measurement of exposure that are independent of disease status tend to bias relative risk estimates and other measures of effect in epidemiologic studies toward the null value. Two methods are provided to correct relative risk estimates obtained from logistic regression models for measurement errors in continuous exposures within cohort studies that may be due to either random (unbiased) within-person variation or to systematic errors for individual subjects. These methods require a separate validation study to estimate the regression coefficient lambda relating the surrogate measure to true exposure. In the linear approximation method, the true logistic regression coefficient beta* is estimated by beta/lambda, where beta is the observed logistic regression coefficient based on the surrogate measure. In the likelihood approximation method, a second-order Taylor series expansion is used to approximate the logistic function, enabling closed-form likelihood estimation of beta*. Confidence intervals for the corrected relative risks are provided that include a component representing error in the estimation of lambda. Based on simulation studies, both methods perform well for true odds ratios up to 3.0; for higher odds ratios the likelihood approximation method was superior with respect to both bias and coverage probability. An example is provided based on data from a prospective study of dietary fat intake and risk of breast cancer and a validation study of the questionnaire used to assess dietary fat intake.  相似文献   

10.
We should not pool diagnostic likelihood ratios in systematic reviews   总被引:2,自引:0,他引:2  
Some authors plead for the explicit use of diagnostic likelihood ratios to describe the accuracy of diagnostic tests. Likelihood ratios are also preferentially used by some journals, and, naturally, are also used in meta-analysis. Although likelihood ratios vary between zero and infinity, meta-analysis is complicated by the fact that not every combination in Re(+) is appropriate. The usual bivariate meta-analysis with a bivariate normal distribution can sometimes lead to positive probability mass at values that are not possible. We considered, therefore, three different statistical models that do not suffer from this drawback. All three approaches are so complicated that we advise to consider meta-analysis of sensitivity and specificity values instead of likelihood ratios.  相似文献   

11.
Reproducibility probability in clinical trials   总被引:1,自引:0,他引:1  
Shao J  Chow SC 《Statistics in medicine》2002,21(12):1727-1742
For marketing approval of a new drug product, the United States Food and Drug Administration (FDA) requires that substantial evidence of the effectiveness of the drug product be provided through the conduct of at least two adequate and well-controlled clinical trials. The purpose of conducting the second clinical trial is to study whether the clinical result from the first trial is reproducible in the second trial with the same study protocol. Under certain circumstance, the FDA Modernization Act of 1997 includes a provision to allow data from one adequate and well-controlled clinical trial investigation and confirmatory evidence to establish effectiveness for risk/benefit assessment of drug and biological candidates for approval. In this paper, we introduce the concept of reproducibility probability for a given clinical trial, which is useful in providing important information for regulatory agencies in deciding whether a single clinical trial is sufficient and for pharmaceutical companies in adjusting the sample size in a future clinical trial. Three approaches, the estimated power approach, the method of confidence bounds and the Bayesian approach, are studied in evaluating reproducibility probabilities under several study designs commonly used in clinical trials.  相似文献   

12.
Evidence and scientific research.   总被引:6,自引:2,他引:4       下载免费PDF全文
This commentary reviews the arguments for and against the use of p-values put forward in the Journal and other forums, and shows that they are all missing both a measure and concept of "evidence." The mathematics and logic of evidential theory are presented, with the log-likelihood ratio used as the measure of evidence. The profoundly different philosophy behind evidential methods (as compared to traditional ones) is presented, as well as a comparative example showing the difference between the two approaches. The reasons why we mistakenly ascribe evidential meaning to p-values and related measures are discussed. Unfamiliarity with the technology and philosophy of evidence is seen as the main reason why certain arguments about p-values persist, and why they are frequently contradictory and confusing.  相似文献   

13.
In this paper, we extend the PPL framework to the analysis of case-control (CC) data and introduce three new linkage disequilibrium (LD) statistics. These statistics measure the evidence for or against LD, rather than testing the null hypothesis of no LD, and they therefore avoid the need for multiple testing corrections. They are suitable not only for CC designs but also can be used in application to family data, ranging from trios to complex pedigrees, all under the same statistical framework, allowing for the seamless analysis of disparate data structures. They also provide other core advantages of the PPL framework, including the use of sequential updating to accumulate LD evidence across potentially heterogeneous sets or subsets of data; parameterization in terms of a very general trait likelihood, which simultaneously considers dominant, recessive, and additive models; and a straightforward mechanism for modeling two-locus epistasis. Finally, by implementing the new statistics within the PPL framework, we have a ready mechanism for incorporating linkage information, obtained from distinct data, into LD analyses in the form of a prior distribution. Here we examine the performance of the proposed LD statistics using simulated data, as well as assessing the effects of key modeling violations on this performance.  相似文献   

14.
An adaptive treatment strategy (ATS) is defined as a sequence of treatments and intermediate responses. ATS' arise when chronic diseases such as cancer and depression are treated over time with various treatment alternatives depending on intermediate responses to earlier treatments. Clinical trials are often designed to compare ATSs based on appropriate designs such as sequential randomization designs. Although recent literature provides statistical methods for analyzing data from such trials, very few articles have focused on statistical power and sample size issues. This paper presents a sample size formula for comparing the survival probabilities under two treatment strategies sharing same initial, but different maintenance treatment. The formula is based on the large sample properties of inverse‐probability‐weighted estimator. Simulation study shows strong evidence that the proposed sample size formula guarantees desired power, regardless of the true distributions of survival times. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

15.
16.
Testing millions of single nucleotide polymorphisms (SNPs) in genetic association studies has become a standard routine for disease gene discovery. In light of recent re-evaluation of statistical practice, it has been suggested that p-values are unfit as summaries of statistical evidence. Despite this criticism, p-values contain information that can be utilized to address the concerns about their flaws. We present a new method for utilizing evidence summarized by p-values for estimating odds ratio (OR) based on its approximate posterior distribution. In our method, only p-values, sample size, and standard deviation for ln(OR) are needed as summaries of data, accompanied by a suitable prior distribution for ln(OR) that can assume any shape. The parameter of interest, ln(OR), is the only parameter with a specified prior distribution, hence our model is a mix of classical and Bayesian approaches. We show that our method retains the main advantages of the Bayesian approach: it yields direct probability statements about hypotheses for OR and is resistant to biases caused by selection of top-scoring SNPs. Our method enjoys greater flexibility than similarly inspired methods in the assumed distribution for the summary statistic and in the form of the prior for the parameter of interest. We illustrate our method by presenting interval estimates of effect size for reported genetic associations with lung cancer. Although we focus on OR, the method is not limited to this particular measure of effect size and can be used broadly for assessing reliability of findings in studies testing multiple predictors.  相似文献   

17.
Group sequential designs for randomized clinical trials allow analyses of accruing data. Most group sequential designs in the literature concern the comparison of two treatments and maintain an overall prespecified type I error. As the number of treatments increases, however, so does the probability of falsely rejecting the null hypothesis. Bayesian statisticians concern themselves with the observed data and abide by the likelihood principle. As long as previous analyses do not change the likelihood, these analyses do not change Bayesian inference. In this paper, we discuss a group sequential design for a proposed randomized clinical trial comparing four treatment regimens. Bayesian ideas underlie the design and posterior probability calculations determine the criteria for stopping accrual to one or more of the treatments. We use computer simulation to estimate the frequentist properties of the design, information of interest to many of our collaborators. We show that relatively simple posterior probability calculations, along with simulations to calculate power under alternative hypotheses, can produce appealing designs for randomized clinical trials.  相似文献   

18.
Clinical trial designs often incorporate a sequential stopping rule to serve as a guide in the early termination of a study. When choosing a particular stopping rule, it is most common to examine frequentist operating characteristics such as type I error, statistical power, and precision of confidence intervals (Statist. Med. 2005, in revision). Increasingly, however, clinical trials are designed and analysed in the Bayesian paradigm. In this paper, we describe how the Bayesian operating characteristics of a particular stopping rule might be evaluated and communicated to the scientific community. In particular, we consider a choice of probability models and a family of prior distributions that allows concise presentation of Bayesian properties for a specified sampling plan.  相似文献   

19.
20.
The Fragility Index has been introduced as a complement to the P-value to summarize the statistical strength of evidence for a trial's result. The Fragility Index (FI) is defined in trials with two equal treatment group sizes, with a dichotomous or time-to-event outcome, and is calculated as the minimum number of conversions from nonevent to event in the treatment group needed to shift the P-value from Fisher's exact test over the .05 threshold. As the index lacks a well-defined probability motivation, its interpretation is challenging for consumers. We clarify what the FI may be capturing by separately considering two scenarios: (a) what the FI is capturing mathematically when the probability model is correct and (b) how well the FI captures violations of probability model assumptions. By calculating the posterior probability of a treatment effect, we show that when the probability model is correct, the FI inappropriately penalizes small trials for using fewer events than larger trials to achieve the same significance level. The analysis shows that for experiments conducted without bias, the FI promotes an incorrect intuition of probability, which has not been noted elsewhere and must be dispelled. We illustrate shortcomings of the FI's ability to quantify departures from model assumptions and contextualize the FI concept within current debate around the null hypothesis significance testing paradigm. Altogether, the FI creates more confusion than it resolves and does not promote statistical thinking. We recommend against its use. Instead, sensitivity analyses are recommended to quantify and communicate robustness of trial results.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号