We have developed a model for FROC curve fitting that relates the observer's FROC performance not to the ROC performance that would be obtained if the observer's responses were scored on a per image basis, but rather to a hypothesized ROC performance that the observer would obtain in the task of classifying a set of "candidate detections" as positive or negative. We adopt the assumptions of the Bunch FROC model, namely that the observer's detections are all mutually independent, as well as assumptions qualitatively similar to, but different in nature from, those made by Chakraborty in his AFROC scoring methodology. Under the assumptions of our model, we show that the observer's FROC performance is a linearly scaled version of the candidate analysis ROC curve, where the scaling factors are just given by the FROC operating point coordinates for detecting initial candidates. Further, we show that the likelihood function of the model parameters given observational data takes on a simple form, and we develop a maximum likelihood method for fitting a FROC curve to this data. FROC and AFROC curves are produced for computer vision observer datasets and compared with the results of the AFROC scoring method. Although developed primarily with computer vision schemes in mind, we hope that the methodology presented here will prove worthy of further study in other applications as well.  相似文献   

Risk assessment is now regarded as a necessary competence in psychiatry. The area under the curve (AUC) statistic of the receiver operating characteristic curve is increasingly offered as the main evidence for accuracy of risk assessment instruments. But, even a highly statistically significant AUC is of limited value in clinical practice.  相似文献   

The relationship between gamma camera variables (total counts in image, collimator type, etc) and diagnostic imaging performance was quantitatively investigated using receiver operating characteristic (ROC) curve analysis. A College of American Pathologists (CAP) liver phantom was used with a 99Tcm flood source to generate anterior and lateral liver images containing 'cold lesions'. These images were interpreted by four nuclear medicine physicians, and the areas under the corresponding ROC curves computed. These medicine physicians, and the areas under the corresponding ROC curves computed. These areas were taken as a quantitative estimate of the imaging performance of the system. The average area under the ROC curve for the four physicians reading the same 'standard' image six times was computed to be 66.8 +/- 5.8. Experiments were performed to show the effect on diagnostic performance of (i) increasing the total image counts from 200k to 2000k, (ii) varying the phantom-to-collimator separation from 0 to 8 cm and (iii) changing the collimator type. In all cases, data were generated which demonstrated the quantitative improvement (or deterioration) resulting from these changes. These data may be used in the design of clinical imaging protocols, for which choices have to be made for each gamma camera variable.  相似文献   

The feasibility of using sequential testing (i.e., using a screening test) to reduce the length and expense of a performance-based examination with standardized-patient cases was demonstrated previously. In the present study, quantitative criteria rather than practical considerations were used to determine optimal values for the length of the screening test (i.e., number of cases) and the location of the screening pass-fail cutoff (i.e., its relation to the mean of the pass levels for the different cases). Data were derived from five classes of senior students at the Southern Illinois University School of Medicine, 1987-1991. Specifically, receiver operating characteristic (ROC) curves were plotted for screening tests of varying lengths, with the points on each ROC curve corresponding to different pass-fail cutoffs on the screening test. The results showed that good accuracy can be attained with a screening test that is only one-third the length of the full examination and that the cutoff for this screen should be set slightly above the mean of the case pass levels to maximize sensitivity and specificity. The authors conclude that their study demonstrates the value of an ROC analysis in evaluating the psychometric properties of a screening test in sequential testing.  相似文献   

This paper raises five methodological questions concerning Receiver operating characteristic (ROC) analysis: (1) can the ROC “confidence criterion” be applied in a valid, reliable way?; (2) can ROC deal with ambiguous findings?; (3) can ROC deal effectively with false-negative findings?; (4) are ROC curves susceptible to valid statistical testing?; and (5) are ROC results useful in choosing among alternative imaging modalities? A review of the evidence leads to six conclusions. First, using ROC, all radiological findings must be unambiguously scored as true-positive, true-negative, false-positive, or false-negative, often forcing arbitrary, procrustean choices on readers and evaluators. Second, ROC requires radiologists to report findings by confidence level on a consistent, reliable basis throughout a ROC experiment; something that seems unrealistic, given what is known about human performance in almost all perceptual tasks of comparable complexity. Third, as gathered during the typical experiment, ROC data are probably nominal, but treated as if ordinal (or even interval) data, leading to distorted results. Fourth, ROC does not deal effectively with false-negatives, despite their importance. Fifth, there is no satisfactory method for statistically testing the significance of observed differences between two ROC curves if they are based on nominal data. Finally, the artificial tasks required of radiologists in a ROC evaluation limit the usefulness of ROC results in choosing among the imaging modalities.  相似文献   

Swensson RG  King JL  Gur D 《Medical physics》2001,28(8):1597-1609
We propose a principled formulation of the ROC curve that is constrained in a realistic way by the mechanism of probability summation. The constrained and conventional ROC formulations were fitted to 150 separate sets of rating data taken from previous observer studies of 250 or 529 chest radiographs. A total of 20 different readers had used either discrete or continuous rating scales to evaluate those chest cases for likelihood of separate specified abnormalities: interstitial disease, pulmonary nodule, pneumothorax, alveolar infiltrate, or rib fracture. Both ROC formulations were fitted separately to every set of rating data using maximum-likelihood statistical procedures that specified each ROC curve by normally distributed latent variables with two scaling parameters, and estimated the area below the ROC curve (Az) with its standard error. The conventional and constrained binormal formulations usually fitted ROC curves that were nearly indistinguishable in form and in Az. But when fitted to asymmetric rating data that contained few false-positive cases, the conventional ROC curves often rose steeply, then flattened and extrapolated into an unrealistic upward "hook" at the higher false-positive rates. For those sets of rating data, the constrained ROC curves (without hooks) estimated larger values for Az with smaller standard errors. The constrained ROC formulation describes observers' ratings of cases at least as well as the conventional ROC, and always guarantees a realistic fitted curve for observer performance. Its estimated parameters are easy to interpret, and may also be used to predict observer accuracy in localizing the image abnormalities.  相似文献   

Receiver operating characteristic (ROC) curves are frequently used in biomedical informatics research to evaluate classification and prediction models for decision support, diagnosis, and prognosis. ROC analysis investigates the accuracy of a model's ability to separate positive from negative cases (such as predicting the presence or absence of disease), and the results are independent of the prevalence of positive cases in the study population. It is especially useful in evaluating predictive models or other tests that produce output values over a continuous range, since it captures the trade-off between sensitivity and specificity over that range. There are many ways to conduct an ROC analysis. The best approach depends on the experiment; an inappropriate approach can easily lead to incorrect conclusions. In this article, we review the basic concepts of ROC analysis, illustrate their use with sample calculations, make recommendations drawn from the literature, and list readily available software.  相似文献   

OBJECTIVE: Medical classification accuracy studies often yield continuous data based on predictive models for treatment outcomes. A popular method for evaluating the performance of diagnostic tests is the receiver operating characteristic (ROC) curve analysis. The main objective was to develop a global statistical hypothesis test for assessing the goodness-of-fit (GOF) for parametric ROC curves via the bootstrap. DESIGN: A simple log (or logit) and a more flexible Box-Cox normality transformations were applied to untransformed or transformed data from two clinical studies to predict complications following percutaneous coronary interventions (PCIs) and for image-guided neurosurgical resection results predicted by tumor volume, respectively. We compared a non-parametric with a parametric binormal estimate of the underlying ROC curve. To construct such a GOF test, we used the non-parametric and parametric areas under the curve (AUCs) as the metrics, with a resulting p value reported. RESULTS: In the interventional cardiology example, logit and Box-Cox transformations of the predictive probabilities led to satisfactory AUCs (AUC=0.888; p=0.78, and AUC=0.888; p=0.73, respectively), while in the brain tumor resection example, log and Box-Cox transformations of the tumor size also led to satisfactory AUCs (AUC=0.898; p=0.61, and AUC=0.899; p=0.42, respectively). In contrast, significant departures from GOF were observed without applying any transformation prior to assuming a binormal model (AUC=0.766; p=0.004, and AUC=0.831; p=0.03), respectively. CONCLUSIONS: In both studies the p values suggested that transformations were important to consider before applying any binormal model to estimate the AUC. Our analyses also demonstrated and confirmed the predictive values of different classifiers for determining the interventional complications following PCIs and resection outcomes in image-guided neurosurgery.  相似文献   

The role of background synaptic activity in cortical processing has recently received much attention. How do individual neurons extract information when embedded in a noisy background? When examining the impact of a synaptic input on postsynaptic firing, it is important to distinguish a change in overall firing probability from a true change in neuronal sensitivity to a particular input (synaptic efficacy) that corresponds to a change in detection performance. Here we study the impact of background synaptic input on neuronal sensitivity to individual synaptic inputs using receiver operating characteristic (ROC) analysis. We use the area under the ROC curve as a measure of synaptic efficacy, here defined as the ability of a postsynaptic action potential to identify a particular synaptic input event. An advantage of using ROC analysis to measure synaptic efficacy is that it provides a measure that is independent of postsynaptic firing rate. Furthermore, changes in mean excitation or inhibition, although affecting overall firing probability, do not modulate synaptic efficacy when measured in this way. Changes in overall conductance also affect firing probability but not this form of synaptic efficacy. Input noise, here defined as the variance of the input current, does modulate synaptic efficacy, however. This effect persists when the change in input variance is coupled with a change in conductance (as would result from changing background activity).  相似文献   

The diagnostic role of intraoperative cytology (IC) has been demonstrated by many comparative studies. These studies have used sensitivity and specificity as statistical tools, based on binary principles. Statistical methods based on binary principles appear to be inappropriate for comparing anatomic pathology studies which involve significant human judgment with a range of subjective nonbinary result patterns. In this study, we applied the receiver operating characteristic (ROC) curve, which is based on probabilistic principles for the comparison of diagnostic accuracy with IC and frozen sections (FS). Seven observers studied a variable number of IC alone, FS alone, and IC/FS together from a pool of 446 specimens. The results were analyzed by ROC curve, using the MEDCALC software program (MedCalc Software, Mariakerke, Belgium). The accuracy with IC alone and FS alone was comparable. IC alone was diagnostic for many lesions, offering the choice of not freezing the tissue, and thus avoiding the introduction of artifacts. This strongly favors the routine practice of preparing IC during intraoperative consultation.  相似文献   

Parvovirus B19 infection can cause severe effects in high-risk groups including pregnant women and immunocompromised individuals. Although serological detection of B19 infection is commonplace, minimal information is available on the absolute performance characteristics of various tests for the detection of B19 IgM. The performance of the first parvovirus B19 IgM enzyme immunoassay to be cleared by the US Food and Drug Administration (FDA) is described. The immunoassay cut-off has been established using receiver operating characteristic (ROC) analysis giving a sensitivity and specificity of detection of 89.1 and 99.4%, respectively. No cross-reactivity is observed with rubella or other viral disease IgM which cause similar symptomologies to parvovirus B19. Multi-site reproducibility studies have shown high immunoassay reproducibility with detection rates (observed/expected result) of 100% for nonreactive specimens (N=324) and strongly reactive (N=403), respectively. Immunoassay reproducibility ranged from 11.76 to 17.46% coefficient of variation for all reactive specimens tested (N=12) whereby each specimen was assayed a total of 81 times. Parvovirus B19 IgM seroprevalence of 1% was observed in a US blood donor population (N=399). In the absence of international performance criteria, this study will be of major benefit to the clinical virologist in assessing immunoassay reliability for the detection of recent infection with parvovirus B19.  相似文献   

A simple iterative method is developed for computing the maximum likelihood estimates of the components of variance and thereby the intraclass and interclass correlations, under multivariate normal assumptions involving two classes. The method works efficiently for both balanced and unbalanced data and can be readily extended to situations involving three or more classes. It is particularly suitable for application to studies of quantitative variables in genetics and is illustrated by using some dermatoglyphic data.  相似文献   

Microaneurysms (MAs) are the first manifestations of the diabetic retinopathy (DR) as well as an indicator for its progression. Their automatic detection plays a key role for both mass screening and monitoring and is therefore in the core of any system for computer-assisted diagnosis of DR. The algorithm basically comprises the following stages: candidate detection aiming at extracting the patterns possibly corresponding to MAs based on mathematical morphological black top hat, feature extraction to characterize these candidates, and classification based on support vector machine (SVM), to validate MAs. Feature vector and kernel function of SVM selection is very important to the algorithm. We use the receiver operating characteristic (ROC) curve to evaluate the distinguishing performance of different feature vectors and different kernel functions of SVM. The ROC analysis indicates the quadratic polynomial SVM with a combination of features as the input shows the best discriminating performance.  相似文献   

An algorithm for estimating haplotypes associated with several quantitative phenotypes is proposed. The concept of a receiver operating characteristic (ROC) curve was introduced, and a linear combination of the quantitative phenotypic values was considered. This set of values was divided into two parts: values for subjects with and without a particular haplotype. The goodness of its partition was evaluated by the area under the ROC curve (AUC). The AUC value varied from 0 to 1; this value was close to 1 when the partition had high accuracy. Therefore, the strength of association between phenotypes and haplotypes was considered to be proportional to the AUC value. In our algorithm, the parameters representing a degree of association between the haplotypes and phenotypes were estimated so as to maximize the AUC value; further, the haplotype with the maximum AUC value was considered to be the best haplotype associated with the phenotypes. This algorithm was implemented by using R language. The effectiveness of our algorithm was evaluated by applying it to real genotype data of the Calpine-10 gene obtained from diabetics. The results showed that our algorithm was more reasonable and advantageous for use with several quantitative phenotypes than the generalized linear model or the neural network model.  相似文献   

OBJECTIVE: To apply meta-analysis to compare the concordance between the results of 2 types of limulus amebocyte lysate (LAL) assay, gelation (GLAL) and chromogenic (CLAL), with the detection of gram-negative bacteremia in patients with suspected bacteremia. DESIGN: Meta-analysis using receiver operating characteristic-based analytical method. DATA SOURCES: MEDLINE literature search and manual reviews of article bibliographies together with direct approaches to authors of potentially eligible studies. STUDY SELECTION: The studies that were selected had all included at least 10 patients, of whom at least 2 patients were diagnosed with gram-negative bacteremia, and all had data available for extraction into a contingency table format. RESULTS: Fifty-six studies (28 GLAL and 28 CLAL studies) met the inclusion criteria. Studies were stratified by type of test (GLAL vs CLAL). Each analysis was repeated with smaller studies excluded. There was no difference between the 2 types of LAL assays. Among the CLAL studies, there was no difference between studies that did versus those that did not use the sepsis syndrome criteria as a basis for patient inclusion. Among 45 studies for which data on the proportion of non-Enterobacteriaceae were available, there was a trend toward higher concordance as this proportion increased. CONCLUSIONS: The concordance between the LAL test and the detection of gram-negative bacteremia in patients with suspected bacteremia is no higher with the CLAL assay than with the original GLAL version. However, the concordance is higher among studies with a higher proportion of non-Enterobacteriaceae among the gram-negative bacteremia isolates.  相似文献   

Laboratory test's diagnostic performances are generally estimated by means of their sensibility, specificity and positive and negative predictive values. Unfortunately, these indices reflect only imperfectly the capacity of a test to correctly classify subjects into clinically relevant subgroups. The appeal to ROC (receiver operating characteristic) curve appears as a tool of choice for this evaluation. Used in the medical domain since the 60s, ROC curve is a graphic representation of the relation existing between the sensibility and the specificity of a test, calculated for all possible cut-off. It allows the determination and the comparison of the diagnostic performances of several tests. It is also used to consider the optimal cut-off of a test, by taking into account epidemiological and medical - economic data of the disease. Used in numerous medical domains, this statistical tool is easily accessible thanks to the development of computer softwares. This article exposes the principles of construction and exploitation of a ROC curve.  相似文献   

Wilkins-Chalgren agar and Meat-Yeast agar were evaluated as media for antibiotic susceptibility testing using 112 anaerobic bacterial strains. The results obtained with the two media using the diffusion method were compared with those obtained by the dilution method as reference method. The results were analyzed by the receiver operating characteristic (ROC) procedure allowing a graphic representation of sensitivity and specificity of the technique for each cut-off value. The area under the ROC curves was calculated to compare the accuracy of the two methods. Six antibiotics were tested including amoxicillin, cefoxitin, piperacillin, doxycycline and clindamycin. For amoxicillin and clindamycin, the two methods showed a high and identical discriminative power for distinguishing susceptible bacteria from the others. Diffusion in Wilkins-Chalgren agar appeared better than diffusion in Meat-Yeast agar for separating resistant bacteria from bacteria of intermediate susceptibility (amoxicillin p<0.005; clindamycin p<0.04). For other drugs, diffusion in Wilkins-Chalgren agar always had a discriminative power higher than that obtained with diffusion in Meat-Yeast agar for separating susceptible bacteria from the others (cefoxitin p<0.0005; piperacillin p<0.02; doxycycline p<0.05). The Wilkins-Chalgren agar medium thus appeared superior to the Meat-Yeast agar medium using the ROC evaluation method, which would deserve wider utilization in the field of microbiology.  相似文献   

