首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Medical Education 2011: 45 : 1199–1208 Context Cut‐scores, reliability and validity vary among standard‐setting methods. The modified Angoff method (MA) is a well‐known standard‐setting procedure, but the three‐level Angoff approach (TLA), a recent modification, has not been extensively evaluated. Objectives This study aimed to compare standards and pass rates in an objective structured clinical examination (OSCE) obtained using two methods of standard setting with discussion and reality checking, and to assess the reliability and validity of each method. Methods A sample of 105 medical students participated in a 14‐station OSCE. Fourteen and 10 faculty members took part in the MA and TLA procedures, respectively. In the MA, judges estimated the probability that a borderline student would pass each station. In the TLA, judges estimated whether a borderline examinee would perform the task correctly or not. Having given individual ratings, judges discussed their decisions. One week after the examination, the procedure was repeated using normative data. Results The mean score for the total test was 54.11% (standard deviation: 8.80%). The MA cut‐scores for the total test were 49.66% and 51.52% after discussion and reality checking, respectively (the consequent percentages of passing students were 65.7% and 58.1%, respectively). The TLA yielded mean pass scores of 53.92% and 63.09% after discussion and reality checking, respectively (rates of passing candidates were 44.8% and 12.4%, respectively). Compared with the TLA, the MA showed higher agreement between judges (0.94 versus 0.81) and a narrower 95% confidence interval in standards (3.22 versus 11.29). Conclusions The MA seems a more credible and reliable procedure with which to set standards for an OSCE than does the TLA, especially when a reality check is applied.  相似文献   

2.
CONTEXT: Medical schools in the UK set their own graduating examinations and pass marks. In a previous study we examined the equivalence of passing standards using the Angoff standard-setting method. To address the limitation this imposed on that work, we undertook further research using a standard-setting method specifically designed for objective structured clinical examinations (OSCEs). METHODS: Six OSCE stations were incorporated into the graduating examinations of 3 of the medical schools that took part in the previous study. The borderline group method (BGM) or borderline regression method (BRM) was used to derive the pass marks for all stations in the OSCE. We compared passing standards at the 3 schools. We also compared the results within the schools with their previously generated Angoff pass marks. RESULTS: The pass marks derived using the BGM or BRM were consistent across 2 of the 3 schools, whereas the third school generated pass marks which were (with a single exception) much lower. Within-school comparisons of pass marks revealed that in 2 schools the pass marks generally did not significantly differ using either method, but for 1 school the Angoff mark was consistently and significantly lower than the BRM. DISCUSSION: The pass marks set using the BGM or BRM were more consistent across 2 of the 3 medical schools than pass marks set using the Angoff method. However, 1 medical school set significantly different pass marks from the other 2 schools. Although this study is small, we conclude that passing standards at different medical schools cannot be guaranteed to be equivalent.  相似文献   

3.
INTRODUCTION: An earlier study showed that an Angoff procedure with > or = 10 recently graduated students as judges can be used to estimate the passing score of a progress test. As the acceptability and feasibility of this approach are questionable, we conducted an Angoff procedure with test item writers as judges. This paper reports on the reliability and credibility of this procedure and compares the standards set by the two different panels. METHODS: Fourteen item writers judged 146 test items. Recently graduated students had assessed these items in a previous study. Generalizability was investigated as a function of the number of items and judges. Credibility was judged by comparing the pass/fail rates associated with the Angoff standard, a relative standard and a fixed standard. The Angoff standards obtained by item writers and graduates were compared. RESULTS: The variance associated with consistent variability of item writers across items was 1.5% and for graduate students it was 0.4%. An acceptable error score required 39 judges. Item-Angoff estimates of the two panels and item P-values correlated highly. Failure rates of 57%, 55% and 7% were associated with the item writers' standard, the fixed standard and the graduates' standard, respectively. CONCLUSION: The graduates' and the item writers' standards differed substantially, as did the associated failure rates. A panel of 39 item writers is not feasible. The item writers' passing score appears to be less credible. The credibility of the graduates' standard needs further evaluation. The acceptability and feasibility of a panel consisting of both students and item writers may be worth investigating.  相似文献   

4.
INTRODUCTION: Progress testing is an assessment method that samples the complete domain of knowledge that is considered pertinent to undergraduate medical education. Because of the comprehensive nature of this test, it is very difficult to set a passing score. We obtained a progress test standard using an Angoff procedure with recent graduates as judges. This paper reports on the reliability and credibility of this approach. METHODS: The Angoff procedure was applied to a sample of 146 progress test items. The items were judged by a panel of eight recently graduated students. Generalizability theory was used to investigate the reliability as a function of the number of items and judges. Credibility was judged by comparing the pass/fail rates resulting from the standard arrived at by the Angoff procedure with those obtained using a relative and a fixed standard. RESULTS: The results indicate that an acceptable error score can be achieved, yielding a precision within one percentage on the scoring scale, by using 10 judges on a full-length progress test (i.e. 250 items). The pass/fail rates associated with the Angoff standard came closest to those of the relative standard, which takes variations in test difficulty into account. A high correlation was found between item-Angoff estimates and the item P-values. CONCLUSION: The results of this study suggest that the Angoff procedure, using recently graduated students as judges, is an appropriate standard setting method for a progress test.  相似文献   

5.
BACKGROUND: Objective structured clinical examination (OSCE) standard-setting procedures are not well developed and are often time-consuming and complex. We report an evaluation of a simple 'contrasting groups' method, applied to an OSCE conducted simultaneously in three separate schools. SUBJECTS: Medical students undertaking an end-of-fifth year multidisciplinary OSCE. METHODS: Using structured marking sheets, pairs of examiners independently scored student performance at each OSCE station. Examiners also provided a global rating of overall performance. The actual scores of any borderline candidates at each station were averaged to provide a passing score for each station. The passing scores for all stations were combined to become the passing score for the whole exam. Validity was determined by making comparisons with performance on other fifth-year assessments. Reliability measures comprised interschool agreement, interexaminer agreement and interstation variability. RESULTS: The approach was simple and had face validity. There was a stronger association between the performance of borderline candidates on the OSCE and their in-course assessments than with their performance on the written exam, giving a weak measure of construct validity in the absence of a better 'gold standard'. There was good agreement between examiners in identifying borderline candidates. There were significant differences between schools in the borderline score for some stations, which disappeared when more than three stations were aggregated. CONCLUSION: This practical method provided a valid and reliable competence-based pass mark. Combining marks from all stations before determining the pass mark was more reliable than making decisions based on individual stations.  相似文献   

6.
H Morrison    H McNally    C Wylie    P McFaul  W Thompson 《Medical education》1996,30(5):345-348
The objective structured clinical examination (OSCE) now has an established place in the assessment of the medical undergraduate. While much has been written about the reliability of the OSCE, empirical work on the determination of the passing score which represents competence on the OSCE is rarely encountered. If the OSCE is to play its role in the 'high stakes' testing of clinical competence, it is important that this passing score be set reliably and defensibly. This article illustrates how a two-session modified Angoff standard-setting procedure is used to set the passing score on a 14 station Obstetrics and Gynaecology OSCE used to assess final year students at The Queen's University of Belfast. The Angoff methodology harnesses the professional judgement of expert judges to establish defensible standards. Four university teachers, five non-academic consultants and six junior clinical staff took part in a two-session Angoff standard-setting procedure. In the first session, the judges (individually and in silence) used their professional judgement to estimate the score which a minimally competent final year obstetrics and gynaecology student should achieve on each tested element of the OSCE. In the second session they revised their session 1 judgements in the light of the OSCE scores of real students and the opportunity for structured discussion. The passing score for the OSCE is reported together with the statistical measures which assure its reliability.  相似文献   

7.
This study represents an attempt at incorporating empirical item difficulty data into the Angoff standard-setting procedure without affecting the subjective judgment of the raters. The Rasch-model ability level corresponding to minimal competence was estimated for each of 536 items on the American Association of State Social Work Boards (AASSWB) social work licensure examinations from their empirical calibrations and Angoff ratings. The mean of these estimates for all items on a given examination was taken as the level of minimal competence of the entire examination. This procedure yielded raw passing scores that were 4 to 6 items lower (out of 150) and pass rates that were 7% to 15% higher than those obtained using the "standard" Angoff procedure.  相似文献   

8.
While Objective Structured Clinical Examinations (OSCEs) have become widely used to assess clinical competence at the end of undergraduate medical courses, the method of setting the passing score varies greatly, and there is no agreed best methodology. While there is an assumption that the passing standard at graduation is the same at all medical schools, there is very little quantitative evidence in the field. In the United Kingdom, there is no national licensing examination; each medical school sets its own graduating assessment and successful completion by candidates leads to the licensed right to practice by the General Medical Council. Academics at five UK medical school were asked to set passing scores for six OSCE stations using the Angoff method, following a briefing session on this technique. The results were collated and analysed. The passing scores set for the each of the stations varied widely across the five medical schools. The implication for individual students at the different medical schools is that a student with the same level of competency may pass at one medical school but would fail at another even when the test is identical. Postulated reasons for this difference include different conceptions of the minimal level of competence acceptable for graduating students and the possible unsuitability of the Angoff method for performance based clinical tests.  相似文献   

9.
OSCE examinations were held in May and June 2002 for all third and fourth year and some fifth year medical students at the University of Leeds. There has been an arbitrary pass mark of 65% for these examinations. However, we recognise that it is important to adopt a systematic approach towards standard setting in all examinations so held a trial of the borderline approach to standard setting for third and fifth year examinations. This paper reports our findings. The results for the year 3 OSCE demonstrated that the borderline approach to standard setting is feasible and offers a method to ensure that the pass standard is both justifiable and credible. It is efficient, requiring much less time than other methods and has the advantage of using the judgements of expert clinicians about actual practice. In addition it offers a way of empowering clinicians because it uses their expertise.  相似文献   

10.
In 1994 and 1995, the Medical Council of Canada used an innovative approach to set the pass mark on its large scale, multi-center national OSCE which is designed to assess basic clinical and communication skills in physicians in Canada after 15 months of post-graduate medical training. The goal of this article is to describe the new approach and to present the experience with the method during its first two years of operation. The approach utilizes the global judgments of the physician examiners at each station to identify the candidates with borderline performances. The scores of the candidates whose performances are judged to be borderline are summed for each station, yielding an initial passing score for all stations and then the examination as a whole. The latter score is then adjusted upward one standard error of measurement for the final passing score and is used as one of the criteria to pass the examination. Based on the results to date, the new approach has worked well. The advantages, disadvantages and areas of possible refinement for the approach are reviewed. This revised version was published online in June 2006 with corrections to the Cover Date.  相似文献   

11.
CONTEXT: Continuing professional development (CPD) of general practitioners. OBJECTIVE: Criterion-referenced standards for assessing performance in the real practice of general practitioners (GPs) should be available to identify learning needs or poor performers for CPD. The applicability of common standard setting procedures in authentic assessment has not been investigated. METHODS: To set a standard for assessment of GP-patient communication with video observation of daily practice, we investigated 2 well known examples of 2 different standard setting approaches. An Angoff procedure was applied to 8 written cases. A borderline regression method was applied to videotaped consultations of 88 GPs. The procedures and outcomes were evaluated by the applicability of the procedure, the reliability of the standards and the credibility as perceived by the stakeholders, namely, the GPs. RESULTS: Both methods are applicable and reliable; the obtained standards are credible according to the GPs. CONCLUSIONS: Both modified methods can be used to set a standard for assessment in daily practice. The context in which the standard will be used - i.e. the specific purpose of the standard, the moment the standard must be available or if specific feedback must be given - is important because methods differ in practical aspects.  相似文献   

12.
Background To establish credible, defensible and acceptable passing scores for written tests is a challenge for health profession educators. Angoff procedures are often used to establish pass/fail decisions for written and performance tests. In an Angoff procedure judges’ expertise and professional skills are assumed to influence their ratings of the items during standard-setting. The purpose of this study was to investigate the impact of judges’ item-related knowledge on their judgement of the difficulty of items, and second, to determine the stability of differences between judges. Method Thirteen judges were presented with two sets of 60 items on different occasions. They were asked to not only judge the difficulty of the items but also to answer them, without the benefit of the answer key. For each of the 120 items an Angoff estimate and an item score were obtained. The relationship between the Angoff estimate and the item score was examined by applying a regression analysis to the 60 items (Angoff estimate, score) for each judge at each occasion. Results and conclusions This study shows that in standard-setting the individual judgement of the individual item is not only a reflection of the difficulty of the item but also of the inherent stringency of the judge and his/her subject-related knowledge. Considerable variation between judges in their stringency was found, and Angoff estimates were significantly affected by a judge knowing or not knowing the answer to the item. These findings stress the importance of a careful selection process of the Angoff judges when making pass/fail decisions in health professions education. They imply that judges should be selected who are not only capable of conceptualising the ‘minimally competent student’, but who would also be capable of answering all the items.  相似文献   

13.
Searle J 《Medical education》2000,34(5):363-366
CONTEXT: The responsibility to determine just who is competent to practice medicine, and at what standard, is great. Whilst there is still a period available for potential remediation, examinations at the completion of year three of the four-year Graduate Entry Medical Programme (GEMP) at Flinders University of South Australia (FUSA) are high stakes and contain the majority of final summative assessment for the certification of student to doctor. Therefore, the medical school has recently examined its methods for certification, the clinical practice standards sought in its programme and how to determine these standards. DESIGN: For all assessments a standard was documented and methods employed to set these standards using specific measures of performance. A modification of the Angoff method was applied to the written examination and the Rothman method, using two criteria, was used to determine competency in the objective structured clinical examination (OSCE). These methods were used for the first time in 1998. Both methods used trained 'experts' as standard setters and both methods used the notion of the 'borderline candidate' to determine the passing standard. This paper describes these two criterion-referenced standard-setting procedures as used in this school and related examination performance. CONCLUSIONS: Whilst the use of standard-setting procedures goes part way to defining and measuring competence, it is time consuming and requires significant examiner training and acceptance. Using 50% to determine who is and isn't competent is simpler but not transparent, fair nor defensible.  相似文献   

14.
In the field of clinical laboratory science, certification may be obtained by passing the National Certification Agency for Medical Laboratory Personnel (NCA) examination. This multiple-choice test is competence-based and criterion-referenced, and uses a modified Angoff procedure to establish the passing score. For this study, the NCA examination scores of 1,868 certification applicants (mostly new graduates) and 111 selected laboratory practitioners were compared. Although the NCA examination is designed to define the level of minimum competence, the failure rate of practitioners identified by their supervisors as minimally competent was almost four times greater that that of the certification applicants. Even the most competent group of practitioners scored well below the applicants for certification. These findings suggest that the examination cut-off point may not really define minimal competence and that the method used to determine the passing score might not be appropriate for certification examinations.  相似文献   

15.
When setting standards, administrators of small-scale OSCEs often face several challenges, including a lack of resources, a lack of available expertise in statistics, and difficulty in recruiting judges. The Modified Borderline-Group Method is a standard setting procedure that compensates for these challenges by using physician examiners and is easy to use making it a good choice for small scale OSCEs. Unfortunately, the use of this approach may introduce a new challenge. Because a small scale OSCE has a small number of examinees, there may be few examinees in the borderline range, which could introduce an unintentional bias. A standard setting method called The Borderline Regression Method will be described. This standard setting method is similar to the Modified Borderline-Group Method but incorporates a linear regression approach allowing the cut score to be set using the scores from all examinees and not from a subset. The current study uses confidence intervals to analyze the precision of cut scores derived from both approaches when applied to a small scale OSCE.  相似文献   

16.
An apparent difference in the results of the clinical examination of the final M.B., B.S. was observed following replacement of the traditional long case (TLC) with the Objective Structured Clinical Examination (OSCE) in 1979. This led to a study of the results of two consecutive years of each method, 1976 and 1978 (TLC), and 1979 and 1980 (OSCE). The OSCE pass rate of 61% was found to be significantly lower than TLC pass rate of 93% (P less than 0.05). Using the analysis of variance and the critical difference (CD) of the mean scores of the different types of examination, no significant difference was found to exist between the two TLC examinations or between the two OSCE examinations. However, significant difference exists between the TLC of 1978 and the OSCE of 1979, P less than 0.05. A comparison of the CD of MCQ to those of TLC and OSCE suggests that less differences exist between MCQ and OSCE scores compared to MCQ and TLC scores, and by 1980 no significant difference exists between MCQ and OSCE. OSCE, like MCQ, will therefore appear an acceptable method of examination and perhaps a more effective method of clinical examination than TLC.  相似文献   

17.
The ‘Simulated Surgery’ is an alternative consulting skills component of the Membership examination of the Royal College of General Practitioners, which is the professional certifying examination for GP registrars (family medicine residents) in the UK. It consists of a 20 station OSCE and is taken by a small cohort of candidates (10--30) who are unable to provide a videotape of their patient interviews for assessment. The passmark for this examination has been set by a modified contrasting groups method, in which all the examiners make pass/fail judgements on all the candidates' performance by reviewing their whole-test grades. A consistent passmark was obtained for two different cohorts and this method should allow a constant passing standard to be maintained under changing circumstances in the future. This revised version was published online in June 2006 with corrections to the Cover Date.  相似文献   

18.
The decision to pass or fail a medical student is a ‘high stakes’ one. The aim of this study is to introduce and demonstrate the feasibility and practicality of a new objective standard-setting method for determining the pass/fail cut-off score from borderline grades. Three methods for setting up pass/fail cut-off scores were compared: the Regression Method, the Borderline Group Method, and the new Objective Borderline Method (OBM). Using Year 5 students’ OSCE results from one medical school we established the pass/fail cut-off scores by the abovementioned three methods. The comparison indicated that the pass/fail cut-off scores generated by the OBM were similar to those generated by the more established methods (0.840 ≤ r ≤ 0.998; p < .0001). Based on theoretical and empirical analysis, we suggest that the OBM has advantages over existing methods in that it combines objectivity, realism, robust empirical basis and, no less importantly, is simple to use.  相似文献   

19.
Cluster analysis can be a useful statistical technique for setting minimum passing scores on high-stakes examinations by grouping examinees into homogenous clusters based on their responses to test items. It has been most useful for supplementing data or validating minimum passing scores determined from expert judgment approaches, such as the Ebel and Nedelsky methods. However, there is no evidence supporting how well cluster analysis converges with the modified Angoff method, which is frequently used in medical credentialing. Therefore, the purpose of this study is to investigate the efficacy of cluster analysis for validating Angoff-derived minimum passing scores. Data are from 652 examinees who took a national credentialing examination based on a content-by-process test blueprint. Results indicate a high degree of consistency in minimum passing score estimates derived from the modified Angoff and cluster analysis methods. However, the stability of the estimates from cluster analysis across different samples was modest.  相似文献   

20.
BACKGROUND: Little is known about the ability of pre-registration house officers (PRHOs) to perform basic clinical skills just prior to entering the medical register. OBJECTIVES: To find out whether PRHOs have deficiencies in basic clinical skills and to determine if the PRHOs themselves or their consultants are aware of them. METHOD: All 40 PRHOs at the Chelsea and Westminster and Whittington Hospitals were invited to undertake a 17 station OSCE of basic clinical skills. Each station was marked by one examiner completing an overall global score after completing an itemised checklist. An adequate station performance was the acquisition of a pass/borderline pass grade. Prior to the OSCE, a questionnaire was given to each PRHO asking them to rate their own abilities (on a 5-point scale) in the skills tested. A similar questionnaire was sent to the educational supervisors of each PRHO asking them to rate their house officer's ability in each of the same skills. RESULTS: Twenty-two PRHOs participated. Each PRHO failed to perform adequately a mean of 2.4 OSCE stations (SD 1.8, range 1-8). There were no significant correlations between OSCE performance and either self- or educational supervisor ratings. The supervisor felt unable to give an opinion on PRHO abilities in 18% of the skills assessed. DISCUSSION: This study suggests that PRHOs may have deficiencies in basic clinical skills at the time they enter the medical register. Neither the PRHOs themselves nor their consultants identified these deficiencies. A large regional study with sufficient power is required to explore the generalizability of these concerns in more detail.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号