首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
OBJECTIVE: Meta-analytic techniques are used to combine the results of different studies that have evaluated the accuracy of a given diagnostic test. The techniques commonly generate values that either describe the performance of a particular test or compare the discriminative ability of two tests. The later has received very little attention in the literature, and is the focus of this article. STUDY DESIGN AND SETTING: We summarize existing methods based on an odds ratio (OR) and propose a novel technique for conducting such analysis, the conditional relative odds ratio (CROR). We demonstrate how to extract the required data and calculate several different comparative indexes using a hypothetic example. RESULTS: A paired analysis is preferred to decrease selection bias and increase statistical power. There is no standard method of obtaining the standard error (SE) of each relative OR; thus, the SE of the summary index might be underestimated under the assumption of no within-study variability. CONCLUSION: The CROR method estimates less biased indexes with SEs, and conditioned on discordant results, it is much less problematic ethically and economically. However, small cell counts may lead to larger SEs, and it might be impossible to construct McNemar's 2 x 2 tables for some studies.  相似文献   

2.
The area under the curve (AUC) is commonly used as a summary measure of the receiver operating characteristic (ROC) curve. It indicates the overall performance of a diagnostic test in terms of its accuracy at various diagnostic thresholds used to discriminate cases and non-cases of disease. The AUC measure is also used in meta-analyses, where each component study provides an estimate of the test sensitivity and specificity. These estimates are then combined to calculate a summary ROC (SROC) curve which describes the relationship between-test sensitivity and specificity across studies.The partial AUC has been proposed as an alternative measure to the full AUC. When using the partial AUC, one considers only those regions of the ROC space where data have been observed, or which correspond to clinically relevant values of test sensitivity or specificity. In this paper, we extend the idea of using the partial AUC to SROC curves in meta-analysis. Theoretical and numerical results describe the variation in the partial AUC and its standard error as a function of the degree of inter-study heterogeneity and of the extent of truncation applied to the ROC space. A scaled partial area measure is also proposed to restore the property that the summary measure should range from 0 to 1.The results suggest several disadvantages of the partial AUC measures. In contrast to earlier findings with the full AUC, the partial AUC is rather sensitive to heterogeneity. Comparisons between tests are more difficult, especially if an empirical truncation process is used. Finally, the partial area lacks a useful symmetry property enjoyed by the full AUC. Although the partial AUC may sometimes have clinical appeal, on balance the use of the full AUC is preferred.  相似文献   

3.
The receiver operating characteristic (ROC) curve can be utilized to evaluate the performance of diagnostic tests. The area under the ROC curve (AUC) is a widely used summary index for comparing multiple ROC curves. Both parametric and nonparametric methods have been developed to estimate and compare the AUCs. However, these methods are usually only applicable to data collected from simple random samples and not surveys and epidemiologic studies that use complex sample designs such as stratified and/or multistage cluster sampling with sample weighting. Such complex samples can inflate variances from intra‐cluster correlation and alter the expectations of test statistics because of the use of sample weights that account for differential sampling rates. In this paper, we modify the nonparametric method to incorporate sampling weights to estimate the AUC and employ leaving‐one‐out jackknife methods along with the balanced repeated replication method to account for the effects of the complex sampling in the variance estimation of our proposed estimators of the AUC. The finite sample properties of our methods are evaluated using simulations, and our methods are illustrated by comparing the estimated AUC for predicting overweight/obesity using different measures of body weight and adiposity among sampled children and adults in the US Hispanic Health and Nutrition Examination Survey. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

4.
OBJECTIVE: For diagnostic tests, the most common graphical representation of the information is the receiver-operating characteristic (ROC) curve. The "agreement chart" displays the information of two observers independently classifying the same n items into the same k categories, and can be used if one considers one of the "observers" as the diagnostic test and the other as the known outcome. This study compares the two charts and their ability to visually portray the various relevant summary statistics that assess how good a diagnostic test may be, such as sensitivity, specificity, predictive values, and likelihood ratios. STUDY DESIGN AND SETTING: The geometric relationships displayed in the charts are first described. The relationship between the two graphical representations and various summary statistics is illustrated using data from three common epidemiologically relevant health issues: coronary heart disease, screening for breast cancer, and screening for tuberculosis. RESULTS: Whereas the ROC curve incorporates information on sensitivity and specificity, the agreement chart includes information on the positive and negative predictive values of the diagnostic test. CONCLUSION: The agreement chart should be considered as an alternative visual representation to the ROC for diagnostic tests.  相似文献   

5.
Receiver operating characteristic (ROC) curves can be used to assess the accuracy of tests measured on ordinal or continuous scales. The most commonly used measure for the overall diagnostic accuracy of diagnostic tests is the area under the ROC curve (AUC). A gold standard (GS) test on the true disease status is required to estimate the AUC. However, a GS test may sometimes be too expensive or infeasible. Therefore, in many medical research studies, the true disease status of the subjects may remain unknown. Under the normality assumption on test results from each disease group of subjects, using the expectation‐maximization (EM) algorithm in conjunction with a bootstrap method, we propose a maximum likelihood‐based procedure for the construction of confidence intervals for the difference in paired AUCs in the absence of a GS test. Simulation results show that the proposed interval estimation procedure yields satisfactory coverage probabilities and interval lengths. The proposed method is illustrated with two examples. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

6.
The summary receiver operating characteristic (SROC) curve has been recommended to represent the performance of a diagnostic test, based on data from a meta-analysis. However, little is known about the basic properties of the SROC curve or its estimate. In this paper, the position of the SROC curve is characterized in terms of the overall diagnostic odds ratio and the magnitude of inter-study heterogeneity in the odds ratio. The area under the curve (AUC) and an index Q(*) are discussed as potentially useful summaries of the curve. It is shown that AUC is maximized when the study odds ratios are homogeneous, and that it is quite robust to heterogeneity. An upper bound is derived for AUC based on an exact analytic expression for the homogeneous situation, and a lower bound based on the limit case Q(*), defined by the point where sensitivity equals specificity: Q(*) is invariant to heterogeneity. The standard error of AUC is derived for homogeneous studies, and shown to be a reasonable approximation with heterogeneous studies. The expressions for AUC and its standard error are easily computed in the homogeneous case, and avoid the need for numerical integration in the more general case. SE(AUC) and SE(Q(*)) are found to be numerically close, with SE(Q(*)) being larger if the odds ratio is very large. The methods are illustrated using data for the Pap smear screening test for cervical cancer, and for three tests for the diagnosis of metastases in cervical cancer patients.  相似文献   

7.
OBJECTIVE: To perform a meta-analysis to assess diagnostic characteristics of the CAGE in screening for alcohol abuse or dependence in a general clinical population and to test a new method for pooling of ROC curves. METHODS: Medline search performed over the period 1/1/1974 to 31/12/2001. MEASUREMENT: Calculation of diagnostic values. RESULTS: We identified 35 articles using the DSM criteria as the gold standard to test the diagnostic value of the CAGE. Only 10 studies could be included for the meta-analysis. With a cutoff point > or =2, the pooled sensitivity is far better in inpatients (0.87) than in primary care patients (0.71) or ambulatory patients (0.60). The pooled specificity also differs for each group. The likelihood ratios seem to be relatively constant over the populations (overall LR+:3.44;LR-:0.18). We calculated a pooled AUC of 0.87 (95% CI 0.85-0.89). At low specificity values, the sensitivity was homogeneous over the studies, and at a low sensitivity, the specificity was heterogeneous. CONCLUSION: The diagnostic value of the CAGE is of limited value using this test for screening purposes at his recommended cutpoint of > or =2.  相似文献   

8.
OBJECTIVE: Current methods for meta-analysis of diagnostic tests do not allow utilizing all the information from papers in which several tests have been studied on the same patient sample. We demonstrate how to combine several studies of diagnostic tests, where each study reports on more than one test and some tests (but not necessarily all of them) are shared with other papers selected for the meta-analysis. We adopt statistical methodology for repeated measurements for the purpose of meta-analysis of diagnostic tests. STUDY DESIGN AND SETTING: The method allows for missing values of some tests for some papers, takes into account different sample sizes of papers, adjusts for background and confounding factors including test-specific covariates and paper-specific covariates, and accounts for correlations of the repeated measurements within each paper. It does not need individual-level data, although it can be modified to use them, and uses the two-by-two table of test results vs. gold standard. RESULTS: The results are translated from diagnostic odds ratios (DOR) to more clinically useful measures such as predictive values, post-test probabilities, and likelihood ratios. Models to capture between-study variation are introduced. The fit and influence of specific studies on the regression can be evaluated. Furthermore, model-based tests for homogeneity of DORs across papers are presented. CONCLUSION: The use of this new method is illustrated using a recent meta-analysis of the D-dimer test for the diagnosis of deep venous thrombosis.  相似文献   

9.
BACKGROUND AND OBJECTIVE: A range of fixed-effect and random-effects meta-analytic methods are available to obtain summary estimates of measures of diagnostic test accuracy. The hierarchical summary receiver operating characteristic (HSROC) model proposed by Rutter and Gatsonis in 2001 represents a general framework for the meta-analysis of diagnostic test studies that allows different parameters to be defined as a fixed effect or random effects within the same model. The Bayesian method used for fitting the model is complex, however, and the model is not widely used. The objective of this report is to show how the model may be fitted using the SAS procedure NLMIXED and to compare the results to the fully Bayesian analysis using an example. METHODS: The HSROC model, its assumptions, and its interpretation are described. The advantages of this model over the usual summary ROC (SROC) regression model are outlined. A complex example is used to compare the estimated SROC curves, expected operating points, and confidence intervals using the alternative approaches to fitting the model. RESULTS: The empirical Bayes estimates obtained using NLMIXED agree closely with those obtained using the fully Bayesian analysis. CONCLUSION: This alternative and more straightforward method for fitting the HSROC model makes the model more accessible to meta-analysts.  相似文献   

10.
OBJECTIVE: Our systematic review summarizes the evidence about the accuracy of those tests. SEARCH STRATEGY: We performed a literature search of MEDLINE (1966-1999) and EMBASE (1988-1999) with additional reference tracking. SELECTION CRITERIA: Articles written in English, French, German, or Dutch, that addressed the accuracy of at least one physical diagnostic test for meniscus injury with arthrotomy, arthroscopy, or magnetic resonance imaging as the gold standard were included. We excluded studies if no reference group or only test-positives had been included, if the study pertained to cadavers only, or if only physical examination under anesthesia was considered. DATA COLLECTION/ANALYSIS: Two reviewers independently selected studies, assessed the methodologic quality, and abstracted data using a standardized protocol. We calculated sensitivity, specificity, and likelihood ratios for each test, and summary estimates when appropriate and possible. MAIN RESULTS: Of 402 identified studies, 13 met the inclusion criteria. The results of the index and reference tests were assessed independently (blindly) of each other in only 2 studies, and in all studies verification bias seemed to be present. The study results were highly heterogeneous The summary receiver operating characteristic curves of the assessment of joint effusion, the McMurray test and joint line tenderness indicated little discriminative power for these tests. Only the predictive value of a positive McMurray test was favorable. CONCLUSIONS: The methodologic quality of studies addressing the diagnostic accuracy of meniscal tests was poor, and the results were highly heterogeneous. The poor characteristics indicate that these tests are of little value for clinical practice.  相似文献   

11.

Background

Research is needed to determine the prevalence and variables associated with the diagnosis of flatfoot, and to evaluate the validity of three footprint analysis methods for diagnosing flatfoot, using clinical diagnosis as a benchmark.

Methods

We conducted a cross-sectional study of a population-based random sample ≥40 years old (n = 1002) in A Coruña, Spain. Anthropometric variables, Charlson’s comorbidity score, and podiatric examination (including measurement of Clarke’s angle, the Chippaux-Smirak index, and the Staheli index) were used for comparison with a clinical diagnosis method using a podoscope. Multivariate regression was performed. Informed patient consent and ethical review approval were obtained.

Results

Prevalence of flatfoot in the left and right footprint, measured using the podoscope, was 19.0% and 18.9%, respectively. Variables independently associated with flatfoot diagnosis were age (OR 1.07), female gender (OR 3.55) and BMI (OR 1.39). The area under the receiver operating characteristic curve (AUC) showed that Clarke’s angle is highly accurate in predicting flatfoot (AUC 0.94), followed by the Chippaux-Smirak (AUC 0.83) and Staheli (AUC 0.80) indices. Sensitivity values were 89.8% for Clarke’s angle, 94.2% for the Chippaux-Smirak index, and 81.8% for the Staheli index, with respective positive likelihood ratios or 9.7, 2.1, and 2.0.

Conclusions

Age, gender, and BMI were associated with a flatfoot diagnosis. The indices studied are suitable for diagnosing flatfoot in adults, especially Clarke’s angle, which is highly accurate for flatfoot diagnosis in this population.Key words: flatfoot, podiatry, validation studies, diagnostic techniques and procedures, adults  相似文献   

12.
Comparative studies of the accuracy of diagnostic tests often involve designs according to which each study participant is examined by two or more of the tests and the diagnostic examinations are interpreted by several readers. Tests are then compared on the basis of a summary index, such as the (full or partial) area under the receiver operating characteristic (ROC) curve, averaged over the population of readers. The design and analysis of such studies naturally need to take into account the correlated nature of the diagnostic test results and interpretations.In this paper, we describe the use of hierarchical modelling for ROC summary measures derived from multi-reader, multi-modality studies. The models allow the variance of the estimates to depend on the actual value of the index and account for the correlation in the data both explicitly via parameters and implicitly via the hierarchical structure. After showing how the hierarchical models can be employed in the analysis of data from multi-reader, multi-modality studies, we discuss the design of such studies using the simulation-based, Bayesian design approach of Wang and Gelfand (Stat. Sci. 2002; 17(2):193-208). The methodology is illustrated via the analysis of data from a study conducted to evaluate a computer-aided diagnosis tool for screen film mammography and via the development of design considerations for a multi-reader study comparing display modes for digital mammography. The hierarchical model methodology described in this paper is also applicable to the meta-analysis of ROC studies.  相似文献   

13.
In medical research, a two-phase study is often used for the estimation of the area under the receiver operating characteristic curve (AUC) of a diagnostic test. However, such a design introduces verification bias. One of the methods to correct verification bias is inverse probability weighting (IPW). Since the probability a subject is selected into phase 2 of the study for disease verification is known, both true and estimated verification probabilities can be used to form an IPW estimator for AUC. In this article, we derive explicit variance formula for both IPW AUC estimators and show that the IPW AUC estimator using the true values of verification probabilities even when they are known are less efficient than its counterpart using the estimated values. Our simulation results show that the efficiency loss can be substantial especially when the variance of test result in disease population is small relative to its counterpart in nondiseased population.  相似文献   

14.
Exploring sources of heterogeneity in systematic reviews of diagnostic tests   总被引:10,自引:0,他引:10  
It is indispensable for any meta-analysis that potential sources of heterogeneity are examined, before one considers pooling the results of primary studies into summary estimates with enhanced precision. In reviews of studies on the diagnostic accuracy of tests, variability beyond chance can be attributed to between-study differences in the selected cutpoint for positivity, in patient selection and clinical setting, in the type of test used, in the type of reference standard, or any combination of these factors. In addition, heterogeneity in study results can also be caused by flaws in study design. This paper critically examines some of the potential reasons for heterogeneity and the methods to explore them. Empirical support for the existence of different sources of variation is reviewed. Incorporation of sources of variability explicitly into systematic reviews on diagnostic accuracy is demonstrated with data from a recent review. Application of regression techniques in meta-analysis of diagnostic tests can provide relevant additional information. Results of such analyses will help understand problems with the transferability of diagnostic tests and to point out flaws in primary studies. As such, they can guide the design of future studies.  相似文献   

15.
目的探讨基于交叉验证的组合诊断方法在肿瘤诊断中的应用,提高诊断的科学性,分析DCE-MRI联合IVIM-DWI对于提高乳腺癌良恶性诊断的诊断效能的作用。方法以乳腺病变数据为例,将交叉验证方法引入多变量组合诊断,建立logistic回归预测模型,取灵敏度、特异度之和最大的预测概率作为最佳诊断点,并根据ROC曲线下面积对模型的预测性能进行评估,比较DCE-MRI和DCE-MRI联合IVIM-DWI对乳腺病变良、恶性的诊断效能。结果当仅用DCE-MRI作为乳腺病变良、恶性测定的检查方法时,无交叉验证时,根据病理诊断结果以及预测概率求出的AUC值为0.82,最佳的诊断点为0.50;有交叉验证时,AUC值为0.83,最佳的诊断点为0.67,有交叉验证时的诊断效果优于无交叉验证。IVIM-DWI作为DCE-MRI的辅助手段在鉴别乳腺良、恶性病变时,无交叉验证时,AUC值为0.93,最佳的诊断点为0.73;有交叉验证时,AUC值为0.95,最佳的诊断点为0.66,有交叉验证时的诊断效果优于无交叉验证,而且无论是否采用交叉验证,DCE-MRI联合IVIM-DWI的诊断效能均明显高于仅采用DCE-MR。结论在小样本情况下,可采用交叉验证的方法提高模型的外推性,从而使建立的模型对于新观测数据有更好的适应性,提高诊断的科学性;DCE-MRI联合IVIM-DWI对乳腺病变良、恶性的诊断效果优于仅采用DCE-MRI。  相似文献   

16.
The receiver operating characteristic (ROC) curve is a statistical tool for evaluating the accuracy of diagnostic tests. Investigators often compare the validity of two tests based on the estimated areas under the respective ROC curves. However, the traditional way of comparing entire areas under two ROC curves is not sensitive when two ROC curves cross each other. Also, there are some cutpoints on the ROC curves that are not considered in practice because their corresponding sensitivities or specificities are unacceptable. For the purpose of comparing the partial area under the curve (AUC) within a specific range of specificity for two correlated ROC curves, a non-parametric method based on Mann-Whitney U-statistics has been developed. The estimation of AUC along with its estimated variance and covariance is simplified by a method of grouping the observations according to their cutpoint values. The method is used to evaluate alternative logistic regression models that predict whether a subject has incident breast cancer based on information in Medicare claims data.  相似文献   

17.
The area under the receiver operating characteristic (ROC) curve (AUC) is a widely accepted summary index of the overall performance of diagnostic procedures and the difference between AUCs is often used when comparing two diagnostic systems. We developed an exact non-parametric statistical procedure for comparing two ROC curves in paired design settings. The test which is based on all permutations of the subject specific rank ratings is formally a test for equality of ROC curves that is sensitive to the alternatives of AUC difference. The operating characteristics of the proposed test were evaluated using extensive simulations over a wide range of parameters.The proposed procedure can be easily implemented in experimental ROC data sets. For small samples and for underlying parameters that are common in experimental studies in diagnostic imaging the test possesses good operating characteristics and is more powerful than the conventional non-parametric procedure for AUC comparisons.We also derived an asymptotic version of the test which uses an exact estimate of the variance in the permutation space and provides a good approximation even when the sample sizes are small. This asymptotic procedure is a simple and precise approximation to the exact test and is useful for large sample sizes where the exact test may be computationally burdensome.  相似文献   

18.
ObjectiveTo provide a solution for calculating the true-positive, false-positive, false-negative, and true-negative results from studies where only the odds ratios (ORs), number of patients with the finding, and number of patients with the target condition are given.ResultsThe quadratic formula shown here allows investigators conducting systematic reviews to back-calculate the sensitivity, specificity, and likelihood ratios (LRs) from the OR. A spreadsheet that requires only the OR, and the row and column total from the 2 × 2 table enables the back-calculation of the individual true positives, false positives, false negatives, and true negatives. Solutions are also available for the special situations when the OR = 1 or the OR is nonestimable because of zero false positives or false negatives.ConclusionsA simple spreadsheet enables those conducting systematic reviews of diagnostic tests to include studies that report only the OR. This approach should enrich the number of studies retained in meta-analyses of diagnostic tests where the desire is to create summary sensitivity, specificity, or LRs.  相似文献   

19.
The diagnostic abilities of two or more diagnostic tests are traditionally compared by their respective sensitivities and specificities, either separately or using a summary of them such as Youden's index. Several authors have argued that the likelihood ratios provide a more appropriate, if in practice a less intuitive, comparison. We present a simple graphic which incorporates all these measures and admits easily interpreted comparison of two or more diagnostic tests. We show, using likelihood ratios and this graphic, that a test can be superior to a competitor in terms of predictive values while having either sensitivity or specificity smaller. A decision theoretic basis for the interpretation of the graph is given by relating it to the tent graph of Hilden and Glasziou (Statistics in Medicine, 1996). Finally, a brief example comparing two serodiagnostic tests for Lyme disease is presented. Published in 2000 by John Wiley & Sons, Ltd.  相似文献   

20.
Confidence intervals are important summary measures that provide useful information from clinical investigations, especially when comparing data from different populations or sites. Studies of a diagnostic test should include both point estimates and confidence intervals for the tests' sensitivity and specificity. Equally important measures of a test's efficiency are likelihood ratios at each test outcome level. We present a method for calculating likelihood ratio confidence intervals for tests that have positive or negative results, tests with non-positive/non-negative results, and tests reported on an ordinal outcome scale. In addition, we demonstrate a sample size estimation procedure for diagnostic test studies based on the desired likelihood ratio confidence interval. The renewed interest in confidence intervals in the medical literature is important, and should be extended to studies analyzing diagnostic tests.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号