首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 21 毫秒
1.
Group sequential testing procedures have been proposed as an approach to conserving resources in biomarker validation studies. Previously, we derived the asymptotic properties of the sequential empirical positive predictive value (PPV) and negative predictive value (NPV) curves, which summarize the predictive accuracy of a continuous marker, under case‐control sampling. A limitation of this approach is that the prevalence cannot be estimated from a case‐control study and must be assumed known. In this paper, we consider group sequential testing of the predictive accuracy of a continuous biomarker with unknown prevalence. First, we develop asymptotic theory for the sequential empirical PPV and NPV curves when the prevalence must be estimated, rather than assumed known in a case‐control study. We then discuss how our results can be combined with standard group sequential methods to develop group sequential testing procedures and bias‐adjusted estimators for the PPV and NPV curve. The small sample properties of the proposed group sequential testing procedures and estimators are evaluated by simulation, and we illustrate our approach in the context of a study to validate a novel biomarker for prostate cancer. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

2.
The accuracy of a binary-scale diagnostic test can be represented by sensitivity (Se), specificity (Sp) and positive and negative predictive values (PPV and NPV). Although Se and Sp measure the intrinsic accuracy of a diagnostic test that does not depend on the prevalence rate, they do not provide information on the diagnostic accuracy of a particular patient. To obtain this information we need to use PPV and NPV. Since PPV and NPV are functions of both the accuracy of the test and the prevalence of the disease, constructing their confidence intervals for a particular patient is not straightforward. In this paper, a novel method for the estimation of PPV and NPV, as well as their confidence intervals, is developed. For both predictive values, standard, adjusted and their logit transformed-based confidence intervals are compared using coverage probabilities and interval lengths in a simulation study. These methods are then applied to two case-control studies: a diagnostic test assessing the ability of the e4 allele of the apolipoprotein E gene (ApoE.e4) on distinguishing patients with late-onset Alzheimer's disease (AD) and a prognostic test assessing the predictive ability of a 70-gene signature on breast cancer metastasis.  相似文献   

3.
In the field of diagnostic medicine, comparative clinical trials are necessary for assessing the utility of one diagnostic test over another. The area under the receiver operating characteristic (ROC) curve, commonly referred to as AUC, is a general measure of a test's inherent ability to distinguish between patients with and without a condition. Standardized AUC difference is the most frequently used statistic for comparing two diagnostic tests. In therapeutic comparative clinical trials with sequential patient entry, fixed sample design (FSD) is unjustified on ethical and economical grounds and group sequential design (GSD) is frequently used. In this paper, we argue that the same reasoning exists for the comparative clinical trials in diagnostic medicine and hence GSD should be utilized in this field for designing trials. Since computation of the stopping boundaries of GSD and data analysis after a group sequential test rely heavily on Brownian motion approximation, we derive the asymptotic distribution of the standardized AUC difference statistic and point out its resemblance to the Brownian motion. Boundary determination and sample size calculation are then illustrated through an example from a cancer clinical trial.  相似文献   

4.
Screening and diagnostic tests are important in disease prevention or control. The predictive values of positive and negative (PPV and NPV) test results are two of four operational characteristics of a screening test. We review an existing method based on the generalized estimating equation (GEE) methodology for comparing predictive values from the same sample of subjects and propose two Wald test statistics derived from the weighted least squares (WLS) method for the analysis of categorical data. Using these results, we propose sample size calculation formulae for this problem. Simulation studies are conducted to compare the performances of the two Wald test statistics (one based on the difference of two PPVs or NPVs, another based on the logarithm of the ratio of two PPVs or NPVs) and the score/Wald test statistic derived from GEE. We recommend using the difference-based WLS approach.  相似文献   

5.
Receiver operating characteristic (ROC) curves and their associated indices are valuable tools for the assessment of the accuracy of diagnostic tests. The area under the ROC curve is a popular summary measure of the accuracy of a test. The full area under the ROC curve, however, has been criticized because it gives equal weight to all false positive error rates. Alternative indices include the area under the ROC curve in a particular range of false positive rates (‘partial’ area) and the sensitivity of the test for a single fixed false positive rate (FPR). We present a unified approach for computing sample size for binormal ROC curves and their indices. Our method uses Taylor series expansions to derive approximate large-sample estimates of the variance and covariance of binormal ROC curve parameters. Several examples from diagnostic radiology illustrate the proposed method. © 1997 John Wiley & Sons, Ltd.  相似文献   

6.
ROC curves and summary measures of accuracy derived from them, such as the area under the ROC curve, have become the standard for describing and comparing the accuracy of diagnostic tests. Methods for estimating ROC curves rely on the existence of a gold standard which dichotomizes patients into disease present or absent. There are, however, many examples of diagnostic tests whose gold standards are not binary-scale, but rather continuous-scale. Unnatural dichotomization of these gold standards leads to bias and inconsistency in estimates of diagnostic accuracy. In this paper, we propose a non-parametric estimator of diagnostic test accuracy which does not require dichotomization of the gold standard. This estimator has an interpretation analogous to the area under the ROC curve. We propose a confidence interval for test accuracy and a statistical test for comparing accuracies of tests from paired designs. We compare the performance (i.e. CI coverage, type I error rate, power) of the proposed methods with several alternatives. An example is presented where the accuracies of two quick blood tests for measuring serum iron concentrations are estimated and compared.  相似文献   

7.
Youden index is widely utilized in studies evaluating accuracy of diagnostic tests and performance of predictive, prognostic, or risk models. However, both one and two independent sample tests on Youden index have been derived ignoring the dependence (association) between sensitivity and specificity, resulting in potentially misleading findings. Besides, paired sample test on Youden index is currently unavailable. This article develops efficient statistical inference procedures for one sample, independent, and paired sample tests on Youden index by accounting for contingency correlation, namely associations between sensitivity and specificity and paired samples typically represented in contingency tables. For one and two independent sample tests, the variances are estimated by Delta method, and the statistical inference is based on the central limit theory, which are then verified by bootstrap estimates. For paired samples test, we show that the estimated covariance of the two sensitivities and specificities can be represented as a function of kappa statistic so the test can be readily carried out. We then show the remarkable accuracy of the estimated variance using a constrained optimization approach. Simulation is performed to evaluate the statistical properties of the derived tests. The proposed approaches yield more stable type I errors at the nominal level and substantially higher power (efficiency) than does the original Youden's approach. Therefore, the simple explicit large sample solution performs very well. Because we can readily implement the asymptotic and exact bootstrap computation with common software like R, the method is broadly applicable to the evaluation of diagnostic tests and model performance. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

8.
In this paper, we develop methods to combine multiple biomarker trajectories into a composite diagnostic marker using functional data analysis (FDA) to achieve better diagnostic accuracy in monitoring disease recurrence in the setting of a prospective cohort study. In such studies, the disease status is usually verified only for patients with a positive test result in any biomarker and is missing in patients with negative test results in all biomarkers. Thus, the test result will affect disease verification, which leads to verification bias if the analysis is restricted only to the verified cases. We treat verification bias as a missing data problem. Under both missing at random (MAR) and missing not at random (MNAR) assumptions, we derive the optimal classification rules using the Neyman-Pearson lemma based on the composite diagnostic marker. We estimate thresholds adjusted for verification bias to dichotomize patients as test positive or test negative, and we evaluate the diagnostic accuracy using the verification bias corrected area under the ROC curves (AUCs). We evaluate the performance and robustness of the FDA combination approach and assess the consistency of the approach through simulation studies. In addition, we perform a sensitivity analysis of the dependency between the verification process and disease status for the approach under the MNAR assumption. We apply the proposed method on data from the Religious Orders Study and from a non-small cell lung cancer trial.  相似文献   

9.
An important measure for comparison of accuracy between two diagnostic procedures is the difference in paired areas under the receiver operating characteristic (ROC) curves. Non-parametric and maximum likelihood methods have been proposed for interval estimation for the difference in paired areas under ROC curves. However, these two methods are asymptotic procedures and their performance in finite sample sizes has not been thoroughly investigated. We propose to use the concept of generalized pivotal quantities (GPQs) to construct an exact confidence interval for the difference in paired areas under ROC curves. A simulation study is conducted to empirically investigate the probability coverage and expected length of the three methods for various combinations of sample sizes, values of the area under the ROC curve and correlations. Simulation results demonstrate that the exact confidence interval based on the concept of GPQs provides not only sufficient probability coverage but also reasonable expected length. Numerical examples using published data sets illustrate the proposed method.  相似文献   

10.
Receiver operating characteristic (ROC) curves can be used to assess the accuracy of tests measured on ordinal or continuous scales. The most commonly used measure for the overall diagnostic accuracy of diagnostic tests is the area under the ROC curve (AUC). A gold standard (GS) test on the true disease status is required to estimate the AUC. However, a GS test may sometimes be too expensive or infeasible. Therefore, in many medical research studies, the true disease status of the subjects may remain unknown. Under the normality assumption on test results from each disease group of subjects, using the expectation‐maximization (EM) algorithm in conjunction with a bootstrap method, we propose a maximum likelihood‐based procedure for the construction of confidence intervals for the difference in paired AUCs in the absence of a GS test. Simulation results show that the proposed interval estimation procedure yields satisfactory coverage probabilities and interval lengths. The proposed method is illustrated with two examples. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

11.
OBJECTIVE: For diagnostic tests, the most common graphical representation of the information is the receiver-operating characteristic (ROC) curve. The "agreement chart" displays the information of two observers independently classifying the same n items into the same k categories, and can be used if one considers one of the "observers" as the diagnostic test and the other as the known outcome. This study compares the two charts and their ability to visually portray the various relevant summary statistics that assess how good a diagnostic test may be, such as sensitivity, specificity, predictive values, and likelihood ratios. STUDY DESIGN AND SETTING: The geometric relationships displayed in the charts are first described. The relationship between the two graphical representations and various summary statistics is illustrated using data from three common epidemiologically relevant health issues: coronary heart disease, screening for breast cancer, and screening for tuberculosis. RESULTS: Whereas the ROC curve incorporates information on sensitivity and specificity, the agreement chart includes information on the positive and negative predictive values of the diagnostic test. CONCLUSION: The agreement chart should be considered as an alternative visual representation to the ROC for diagnostic tests.  相似文献   

12.
The area under the receiver operating characteristic (ROC) curve (AUC) is a widely accepted summary index of the overall performance of diagnostic procedures and the difference between AUCs is often used when comparing two diagnostic systems. We developed an exact non-parametric statistical procedure for comparing two ROC curves in paired design settings. The test which is based on all permutations of the subject specific rank ratings is formally a test for equality of ROC curves that is sensitive to the alternatives of AUC difference. The operating characteristics of the proposed test were evaluated using extensive simulations over a wide range of parameters.The proposed procedure can be easily implemented in experimental ROC data sets. For small samples and for underlying parameters that are common in experimental studies in diagnostic imaging the test possesses good operating characteristics and is more powerful than the conventional non-parametric procedure for AUC comparisons.We also derived an asymptotic version of the test which uses an exact estimate of the variance in the permutation space and provides a good approximation even when the sample sizes are small. This asymptotic procedure is a simple and precise approximation to the exact test and is useful for large sample sizes where the exact test may be computationally burdensome.  相似文献   

13.
From the patients’ management perspective, a good diagnostic test should contribute to both reflecting the true disease status and improving clinical outcomes. The diagnostic randomized clinical trial is designed to combine both diagnostic tests and therapeutic interventions. Evaluation of diagnostic tests is carried out with therapeutic outcomes as the primary endpoint rather than test accuracy. We lay out the probability framework for evaluating such trials. We compare two commonly referred designs—the two‐arm design and the paired design—in a formal statistical hypothesis testing setup and identify the causal connection between the two tests. The paired design is shown to be more efficient than the two‐arm design. The efficiency gains vary depending on the discordant rates of test results. We derive sample size formulas for both binary and continuous endpoints. We derive estimation of important quantities under the paired design and also conduct simulation studies to verify the theoretical results. We illustrate the method with an example of designing a randomized study on preoperative staging of bladder cancer. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

14.
Receiver operating characteristic (ROC) curves are commonly used to summarize the classification accuracy of diagnostic tests. It is not uncommon in medical practice that multiple diagnostic tests are routinely performed or multiple disease markers are available for the same individuals. When the true disease status is verified by a gold standard (GS) test, a variety of methods have been proposed to combine such potential correlated tests to increase the accuracy of disease diagnosis. In this article, we propose a method of combining multiple diagnostic tests in the absence of a GS. We assume that the test values and their classification accuracies are dependent on covariates. Simulation studies are performed to examine the performance of the combination method. The proposed method is applied to data from a population-based aging study to compare the accuracy of three screening tests for kidney function and to estimate the prevalence of moderate kidney impairment.  相似文献   

15.
Clinical studies to evaluate the relative accuracies of two diagnostic modalities via their receiver operating characteristic (ROC) curves are currently conducted using fixed sample designs: cases are accrued until a predetermined sample size is achieved and, at that point, the areas under the ROC curves are computed and compared (Radiology 1982; 143:29-36; Radiology 1983; 148:839-843). In prospective ROC studies (Radiology 1990; 175:571-575), participants are recruited from a clinically defined cohort and diagnostic test information is obtained and interpreted in advance of ascertaining the definitive proof of diagnosis ('gold standard'). In retrospective studies, cases are selected from a set of patient records and their diagnostic tests are interpreted without knowledge of the 'gold standard'. The conduct of ROC studies requires considerable effort and resources, particularly for the collection of 'gold standard' information. Thus, it is highly desirable to search for designs which are more efficient than using a fixed sample.In this paper, we discuss the formulation and application of group sequential designs (GSDs) to comparative ROC studies based on non-parametric Wilcoxon estimators of the area under the ROC curves. The approach is applicable to comparisons of ROC curve areas of two tests measured on either continuous or ordinal scales on same cases ('paired' designs) with one reader. The adoption of GSDs may lead to substantial savings in the number of required cases, thus resulting in both time and resource use efficiency.  相似文献   

16.
Regional confidence bands for ROC curves   总被引:2,自引:0,他引:2  
The performance of a diagnostic test is characterised by its specificity and sensitivity. For a quantitative diagnostic test these criteria depend on the selected cut-off point. The receiver operating characteristic (ROC) curve of a quantitative diagnostic test is generated by plotting sensitivity against specificity as the cut-off point runs through the whole range of possible test values. In practice, the ROC curve is estimated from clinical data. One important goal is to select an optimal cut-off point. For this purpose the sample variability has to be taken into account. Recently, Campbell has introduced nonparametric asymptotic simultaneous confidence bands that are valid for the whole ROC curve. In this paper a nonparametric asymptotic approach for the construction of regional confidence bands for ROC curves is proposed. It can be applied for any specificity interval of interest. Our approach is based on the asymptotic theory of empirical and quantile processes. To investigate the small sample properties of the different approaches, a Monte Carlo study was conducted using normal and log-normal data. A method for sample size calculation is presented. Finally, the approaches are applied to a tumour marker in the diagnosis of bone marrow metastases.  相似文献   

17.
Technologic advances give rise to new tests for detecting disease in many fields, including cancer and sexually transmitted disease. Before a new disease screening test is approved for public use, its accuracy should be shown to be better than or at least not inferior to an existing test. Standards do not yet exist for designing and analysing studies to address this issue. Established principles for the design of therapeutic studies can be adapted for studies of screening tests. In particular, drawing upon methods for superiority and non-inferiority studies of therapeutic agents, we propose that confidence intervals for the relative accuracy of dichotomous tests drive the design of comparative studies of disease screening tests. We derive sample size formulae for a variety of designs, including studies where patients undergo several tests and studies where patients receive only one of the tests under evaluation. Both cohort and case-control study designs are considered. Modifications to the confidence intervals and sample size formulae are discussed to accommodate studies where, because of the invasive nature of definitive testing, true disease status can only be obtained for subjects who are positive on one or more of the screening tests. The methods proposed are applied to a study comparing a modified pap test to the conventional pap for cervical cancer screening. The impact of error in the gold standard reference test on the design and evaluation of comparative screening test studies is also discussed.  相似文献   

18.
In a meta‐analysis of diagnostic accuracy studies, the sensitivities and specificities of a diagnostic test may depend on the disease prevalence since the severity and definition of disease may differ from study to study due to the design and the population considered. In this paper, we extend the bivariate nonlinear random effects model on sensitivities and specificities to jointly model the disease prevalence, sensitivities and specificities using trivariate nonlinear random‐effects models. Furthermore, as an alternative parameterization, we also propose jointly modeling the test prevalence and the predictive values, which reflect the clinical utility of a diagnostic test. These models allow investigators to study the complex relationship among the disease prevalence, sensitivities and specificities; or among test prevalence and the predictive values, which can reveal hidden information about test performance. We illustrate the proposed two approaches by reanalyzing the data from a meta‐analysis of radiological evaluation of lymph node metastases in patients with cervical cancer and a simulation study. The latter illustrates the importance of carefully choosing an appropriate normality assumption for the disease prevalence, sensitivities and specificities, or the test prevalence and the predictive values. In practice, it is recommended to use model selection techniques to identify a best‐fitting model for making statistical inference. In summary, the proposed trivariate random effects models are novel and can be very useful in practice for meta‐analysis of diagnostic accuracy studies. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

19.

Background

In epidemiologic studies, cancer stage is an important predictor of outcomes. However, cancer stage is typically unavailable in medical insurance claims datasets, thus limiting the usefulness of such data for epidemiologic studies. Therefore, we sought to develop an algorithm to predict cancer stage based on covariates available from claims-based data.

Methods

We identified a cohort of 77,306 women age ≥ 66 years with stage I-IV breast cancer, using the Surveillence Epidemiology and End Results (SEER)-Medicare database. We formulated an algorithm to predict cancer stage using covariates (demographic, tumor, and treatment characteristics) obtained from claims. Logistic regression models derived prediction equations in a training set, and equations' test characteristics (sensitivity, specificity, positive predictive value (PPV), and negative predictive value [NPV]) were calculated in a validation set.

Results

Of the entire sample of women diagnosed with invasive breast cancer, 51% had stage I; 26% stage II; 11% stage III; and 4% stage IV disease. The equation predicting stage IV disease achieved sensitivity of 81%, specificity 89%, positive predictive value (PPV) 24%, and negative predictive value (NPV) 99%, while the equation distinguishing stage I/II from stage III disease achieved sensitivity 83%, specificity 78%, PPV 98%, and NPV 31%. Combined, the equations most accurately identified early stage disease and ascertained a sample in which 98% of patients were stage I or II.

Conclusions

A claims-based algorithm was utilized to predict breast cancer stage, and was particularly successful when used to identify early stage disease. These prediction equations may be applied in future studies of breast cancer patients, substantially improving the utility of claims-based studies in this group. This method may similarly be employed to develop algorithms permitting claims-based epidemiologic studies of patients with other cancers.  相似文献   

20.
Although statistical methodology is well‐developed for comparing diagnostic tests in terms of their sensitivity and specificity, comparative inference about predictive values is not. In this paper, we consider the analysis of studies comparing operating characteristics of two diagnostic tests that are measured on all subjects and have test outcomes from multiple sites with varying number of sites among subjects. We have developed a new approach for comparing sensitivity, specificity, positive predictive value, and negative predictive value with simple variance calculation and, in particular, focus on comparing tests using difference of positive and negative predictive values. Simulation studies are conducted to show the performance of our approach. We analyze real data on patients with lung cancer, based on their diagnostic tests, to illustrate the methodology. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号