首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
PURPOSE: Earlier studies of absolute standard setting procedures for objective structured clinical examinations (OSCEs) show inconsistent results. This study compared a rational and an empirical standard setting procedure. Reliability and credibility were examined first. The impact of a reality check was then established. METHODS: The OSCE included 16 stations and was taken by trainees in their final year of postgraduate training in general practice and experienced general practitioners. A modified Angoff (independent judgements, no group discussion) with and without a reality check was used as a rational procedure. A method related to the borderline group procedure, the borderline regression (BR) method, was used as an empirical procedure. Reliability was assessed using generalisability theory. Credibility was assessed by comparing pass rates and by relating the passing scores to test difficulty. RESULTS: The passing scores were 73.4% for the Angoff procedure without reality check (Angoff I), 66.0% for the Angoff procedure with reality check (Angoff II) and 57.6% for the BR method. The reliabilities (expressed as root mean square errors) were 2.1% for Angoffs I and II, and 0.6% for the BR method. The pass rates of the trainees and GPs were 19% and 9% for Angoff I, 66% and 46% for Angoff II, and 95% and 80% for the BR method, respectively. The correlation between test difficulty and passing score was 0.69 for Angoff I, 0.88 for Angoff II and 0.86 for the BR method. CONCLUSION: The BR method provides a more credible and reliable standard for an OSCE than a modified Angoff procedure. A reality check improves the credibility of the Angoff procedure but does not improve its reliability.  相似文献   

2.
A structured assessment of newly qualified medical graduates   总被引:4,自引:0,他引:4  
  相似文献   

3.
BACKGROUND: There is still a great deal to be learnt about teaching and assessing undergraduate communication skills, particularly as formal teaching in this area expands. One approach is to use the summative assessments of these skills in formative ways. Discourse analysis of data collected from final year examinations sheds light on the grounds for assessing students as 'good' or 'poor' communicators. This approach can feed into the teaching/learning of communication skills in the undergraduate curriculum. SETTING: A final year UK medical school objective structured clinical examination (OSCE). METHODS: Four scenarios, designed to assess communication skills in challenging contexts, were included in the OSCE. Video recordings of all interactions at these stations were screened. A sample covering a range of good, average and poor performances were transcribed and analysed. Discourse analysis methods were used to identify 'key components of communicative style'. FINDINGS: Analysis revealed important differences in communicative styles between candidates who scored highly and those who did poorly. These related to: empathetic versus 'retractive' styles of communicating; the importance of thematically staging a consultation, and the impact of values and assumptions on the outcome of a consultation. CONCLUSION: Detailed discourse analysis sheds light on patterns of communicative style and provides an analytic language for students to raise awareness of their own communication. This challenges standard approaches to teaching communication and shows the value of using summative assessments in formative ways.  相似文献   

4.
CONTEXT: Factors that interfere with the ability to interpret assessment scores or ratings in the proposed manner threaten validity. To be interpreted in a meaningful manner, all assessments in medical education require sound, scientific evidence of validity. PURPOSE: The purpose of this essay is to discuss 2 major threats to validity: construct under-representation (CU) and construct-irrelevant variance (CIV). Examples of each type of threat for written, performance and clinical performance examinations are provided. DISCUSSION: The CU threat to validity refers to undersampling the content domain. Using too few items, cases or clinical performance observations to adequately generalise to the domain represents CU. Variables that systematically (rather than randomly) interfere with the ability to meaningfully interpret scores or ratings represent CIV. Issues such as flawed test items written at inappropriate reading levels or statistically biased questions represent CIV in written tests. For performance examinations, such as standardised patient examinations, flawed cases or cases that are too difficult for student ability contribute CIV to the assessment. For clinical performance data, systematic rater error, such as halo or central tendency error, represents CIV. The term face validity is rejected as representative of any type of legitimate validity evidence, although the fact that the appearance of the assessment may be an important characteristic other than validity is acknowledged. CONCLUSIONS: There are multiple threats to validity in all types of assessment in medical education. Methods to eliminate or control validity threats are suggested.  相似文献   

5.
Objectives To investigate the experiences and opinions of programme directors, clinical supervisors and trainees on an in‐training assessment (ITA) programme on a broad spectrum of competence for first year training in anaesthesiology. How does the programme work in practice and what are the benefits and barriers? What are the users' experiences and thoughts about its effect on training, teaching and learning? What are their attitudes towards this concept of assessment? Methods Semistructured interviews were conducted with programme directors, supervisors and trainees from 3 departments. Interviews were audiotaped and transcribed. The content of the interviews was analysed in a consensus process among the authors. Results The programme was of benefit in making goals and objectives clear, in structuring training, teaching and learning, and in monitoring progress and managing problem trainees. There was a generally positive attitude towards assessment. Trainees especially appreciated the coupling of theory with practice and, in general, the programme inspired an academic dialogue. Issues of uncertainty regarding standards of performance and conflict with service declined over time and experience with the programme, and departments tended to resolve practical problems through structured planning. Discussion Three interrelated factors appeared to influence the perceived value of assessment in postgraduate education: (1) the link between patient safety and individual practice when assessment is used as a licence to practise without supervision rather than as an end‐of‐training examination; (2) its benefits to educators and learners as an educational process rather than as merely a method of documenting competence, and (3) the attitude and rigour of assessment practice.  相似文献   

6.
INTRODUCTION: The literature on how in-training assessment (ITA) works in practice and what educational outcomes can actually be achieved is limited. One of the aims of introducing ITA is to increase trainees' clinical confidence; this relies on the assumption that assessment drives learning through its content, format and programming. The aim of this study was to investigate the effect of introducing a structured ITA programme on junior doctors' clinical confidence. The programme was aimed at first year trainees in anaesthesiology. METHODS: The study involved a nationwide survey of junior doctors' self-confidence in clinical performance before (in 2001) and 2 years after (in 2003) the introduction of an ITA programme. Respondents indicated confidence on a 155-item questionnaire related to performance of clinical skills and tasks reflecting broad aspects of competence. A total of 23 of these items related to the ITA programme. RESULTS: The response rate was 377/531 (71%) in 2001 and 344/521 (66%) in 2003. There were no statistically significant differences in mean levels of confidence before and 2 years after the introduction of the ITA programme - neither in aspects that were related to the programme nor in those that were unrelated to the programme. DISCUSSION: This study demonstrates that the introduction of a structured ITA programme did not have any significant effect on trainees' mean level of confidence on a broad range of aspects of clinical competence. The importance of timeliness and rigorousness in the application of ITA is discussed.  相似文献   

7.
CONTEXT: Reliability is defined as the extent to which a result reflects all possible measurements of the same construct. It is an essential measurement characteristic. Unfortunately, there are few objective tests for the most important aspects of the professional role because they are complex and intangible. In addition, professional performance varies markedly from setting to setting and case to case. Both these factors threaten reliability. AIM: This paper describes the classical approach to evaluating reliability and points out the limitations of this approach. It goes on to describe how generalisability theory solves many of these limitations. CONDITIONS: A G-study uses variance component analysis to measure the contributions that all relevant factors make to the result (observer, situation, case, assessee and their interactions). This information can be combined to reflect the reliability of a single observation as a reflection of all possible measurements - a true reflection of reliability. It can also be used to estimate the reliability of a combined sample of several different observations, or to predict how many observations are required with different test formats to achieve a given level of reliability. Worked examples are used to illustrate the concepts.  相似文献   

8.
Context  Following a 15-week attachment in paediatrics and child health, general practice and dermatology medical students in their second clinical year at this medical school undertake a high-stakes assessment including an objective structured clinical examination (OSCE). There were 2 hypotheses. Firstly, groups of similar stations map to competency domains identifiable by factor analysis. Secondly, poor performance in individual domains is compensated for by achieving the required standard of performance across the whole assessment.
Methods  A total of 647 medical students were assessed by an OSCE during 5 individual examination sittings (diets) over 2 years. Ten scoring stations in the OSCE were analysed and confirmatory factor analysis performed comparing a 1-factor model (where all the stations are discrete entities related to one underlying domain) with a 3-factor model (where the stations load onto 3 domains from a previously reported exploratory factor analysis).
Results  The 3-factor model yielded a significantly better fit to the data (χ= 15.3, P  <   0.01). Assessing the compensation data of 1 diet, 29 of 127 students failed in 1 or more domains described, whereas only 5 failed if compensation was allowed across all domains.
Discussion  Confirmatory factor analysis showed a significant fit of the data to previously described competency domains for a high-stakes undergraduate OSCE. Compensation within but not between competency domains would provide a more robust standard, improve validity, and substantially reduce the pass rate.  相似文献   

9.
BACKGROUND: The intern year is a key time for the acquisition of clinical skills, both procedural and cognitive. We have previously described self-reported confidence and experience for a number of clinical skills, finding high levels of confidence among Australian junior doctors. This has never been correlated with an objective measure of competence. AIMS AND HYPOTHESIS: We aimed to determine the relationship between self-reported confidence and observed competence for a number of routine, procedural clinical skills. METHODS: A group of 30 junior medical officers in their first postgraduate year (PGY1) was studied. All subjects completed a questionnaire concerning their confidence and experience in the performance of clinical skills. A competency-based assessment instrument concerning 7 common, practical, clinical skills was developed, piloted and refined. All 30 PGY1s then completed an assessment using this instrument. Comparisons were then made between the PGY1s' self-reported levels of confidence and tutors' assessments of their competence. RESULTS: A broad range of competence levels was revealed by the clinical skills assessments. There was no correlation between the PGY1s' self-ratings of confidence and their measured competencies. CONCLUSIONS: Junior medical officers in PGY1 demonstrate a broad range of competence levels for several common, practical, clinical skills, with some performing at an inadequate level. There is no relationship between their self-reported level of confidence and their formally assessed performance. This observation raises important caveats about the use of self-assessment in this group.  相似文献   

10.
PURPOSE: Although expert clinicians approach interviewing in a different manner than novices, OSCE measures have not traditionally been designed to take into account levels of expertise. Creating better OSCE measures requires an understanding of how the interviewing style of experts differs objectively from novices. METHODS: Fourteen clinical clerks, 14 family practice residents and 14 family physicians were videotaped during 2 15-minute standardized patient interviews. Videotapes were reviewed and every utterance coded by type including questions, empathic comments, giving information, summary statements and articulated transitions. Utterances were plotted over time and examined for characteristic patterns related to level of expertise. RESULTS: The mean number of utterances exceeded one every 10 s for all groups. The largest proportion was questions, ranging from 76% of utterances for clerks to 67% for experts. One third of total utterances consisted of a group of 'low frequency' types, including empathic comments, information giving and summary statements. The topic was changed often by all groups. While utterance type over time appeared to show characteristic patterns reflective of expertise, the differences were not robust. Only the pattern of use of summary statements was statistically different between groups (P < 0.05). CONCLUSIONS: Measures that are sensitive to the nature of expertise, including the sequence and organisation of questions, should be used to supplement OSCE checklists that simply count questions. Specifically, information giving, empathic comments and summary statements that occupy a third of expert interviews should be credited. However, while there appear to be patterns of utterances that characterise levels of expertise, in this study these patterns were subtle and not amenable to counting and classification.  相似文献   

11.
BACKGROUND: The frequency and nature of standardised patient (SP) recording errors during clinical performance examinations (CPX) have an effect on case scores and ultimately on pass/fail decisions. PURPOSE: To determine the effect of SP recording errors on case scores. METHODS: Standardised patients completed checklists immediately after each encounter. To determine checklist accuracy, multiple reviewers developed a checklist key for each student encounter studied. The total errors, the net errors, the errors of commission and omission and error rates by competency skill were analysed. RESULTS: The frequency of errors in history taking was greater than in physical examination, and the majority of errors were made in the students' favour. Summing the errors of commission and omission decreased the effect of total errors on student scores. CONCLUSIONS: High levels of SP recording accuracy are achievable. When errors occur, the net effect is usually in the students' favour.  相似文献   

12.
The use of clinical simulations in assessment   总被引:1,自引:0,他引:1  
  相似文献   

13.
BACKGROUND: Assessment plays a key role in the learning process. The validity of any given assessment tool should ideally be established. If an assessment is to act as a guide to future teaching and learning then its predictive validity must be established. AIM: To assess the ability of an objective structured clinical examination (OSCE) taken at the end of the first clinical year of an undergraduate medical degree to predict later performance in clinical examinations. METHODS: Performance of two consecutive cohorts of year 3 medical undergraduates (n=138 and n=128) in a 23 station OSCE were compared with their performance in 5 subsequent clinical examinations in years 4 and 5 of the course. RESULTS: Poor performance in the OSCE was strongly associated with later poor performance in other clinical examinations. Students in the lowest three deciles of OSCE performance were 6 times more likely to fail another clinical examination. Receiver operating characteristic curves were constructed as a method to criterion reference the cut point for future examinations. CONCLUSION: Performance in an OSCE taken early in the clinical course strongly predicts later clinical performance. Assessing subsequent student performance is a powerful tool for assessing examination validity. The use of ROC curves represents a novel method for determining future criterion referenced examination cut points.  相似文献   

14.
OBJECTIVE: To determine whether postgraduate students are able to assess the quality of undergraduate medical examinations and to establish whether faculty can use their results to troubleshoot the curriculum in terms of its content and evaluation. SUBJECTS: First and second year family medicine postgraduate students. MATERIALS: A randomly generated sample of undergraduate medical examination questions. METHODS: Postgraduate students were given two undergraduate examinations which included questions with an item difficulty (ID) > 0.60. The students answered and then rated each question on a scale of 1-7. RESULTS: The percentage of postgraduate students answering each question correctly correlated significantly with the average perceived relevance (Examination 1: r=0.372; P < 0.05; Examination 2: r=0.458; P < 0.05). Questions plotted for average postgraduate/undergraduate performance ratio versus the average perceived relevance were significantly correlated (Examination 1: r=0.462; P < 0.01; Examination 2: r=0.458; P < 0.05). CONCLUSIONS: This study offers a method of validating question appropriateness prior to examination administration. The design has the potential to be used as a model for determining the relevancy of a medical curriculum.  相似文献   

15.
INTRODUCTION: As we move from standard 'long case' final examinations to new objective structured formats, we need to ensure the new is at least as good as the old. Furthermore, knowledge of which examination format best predicts medical student progression and clinical skills development would be of value. METHODS: A group of medical students sat both the standard long case examination and the new objective structured clinical examination (OSCE) to introduce this latter examination to our Medical School for final MB. At the end of their pre-registration year, the group and their supervising consultants submitted performance evaluation questionnaires. RESULTS: Thirty medical students sat both examinations and 20 returned evaluation questionnaires. Of the 72 consultants approached, 60 (83%) returned completed questionnaires. No correlation existed between self- and consultant reported performance. The traditional finals examination was inversely associated with consultant assessment. Better performing students were not rated as better doctors. The OSCE (and its components) was more consistent and showed positive associations with consultant ratings across the board. DISCUSSION: Major discrepancies exist between the 2 examination formats, in data interpretation and practical skills, which are explicitly tested in OSCEs but less so in traditional finals. Standardised marking schemes may reduce examiner variability and discretion and weaken correlations across the 2 examinations. This pilot provides empirical evidence that OSCEs assess different clinical domains than do traditional finals. Additionally, OSCEs improve prediction of clinical performance as assessed by independent consultants. CONCLUSION: Traditional finals and OSCEs correlate poorly with one another. Objective structured clinical examinations appear to correlate well with consultant assessment at the end of the pre-registration house officer year.  相似文献   

16.
OBJECTIVES: This study investigates: (1) which personality traits are typical of medical students as compared to other students, and (2) which personality traits predict medical student performance in pre-clinical years. DESIGN: This paper reports a cross-sectional inventory study of students in nine academic majors and a prospective longitudinal study of one cohort of medical students assessed by inventory during their first preclinical year and by university examination at the end of each pre-clinical year. SUBJECTS AND METHODS: In 1997, a combined total of 785 students entered medical studies courses in five Flemish universities. Of these, 631 (80.4%) completed the NEO-PI-R (i.e. a measure of the Five-Factor Model of Personality). This was also completed by 914 Year 1 students of seven other academic majors at Ghent University. Year end scores for medical students were obtained for 607 students in Year 1, for 413 in Year 2, and for 341 in Year 3. RESULTS: Medical studies falls into the group of majors where students score highest on extraversion and agreeableness. Conscientiousness (i.e. self-achievement and self-discipline) significantly predicts final scores in each pre-clinical year. Medical students who score low on conscientiousness and high on gregariousness and excitement-seeking are significantly less likely to sit examinations successfully. CONCLUSIONS: The higher scores for extraversion and agreeableness, two dimensions defining the interpersonal dynamic, may be beneficial for doctors' collaboration and communication skills in future professional practice. Because conscientiousness affects examination results and can be reliably assessed at the start of a medical study career, personality assessment may be a useful tool in student counselling and guidance.  相似文献   

17.
PURPOSE: To examine the validity of a written knowledge test of skills for performance on an OSCE in postgraduate training for general practice. METHODS: A randomly-selected sample of 47 trainees in general practice took a knowledge test of skills, a general knowledge test and an OSCE. The OSCE included technical stations and stations including complete patient encounters. Each station was checklist rated and global rated. RESULTS: The knowledge test of skills was better correlated to the OSCE than the general knowledge test. Technical stations were better correlated to the knowledge test of skills than stations including complete patient encounters. For the technical stations the rating system had no influence on the correlation. For the stations including complete patient encounters the checklist rating correlated better to the knowledge test of skills than the global rating. CONCLUSION: The results of this study support the predictive validity of the knowledge test of skills. In postgraduate training for general practice a written knowledge test of skills can be used as an instrument to estimate the level of clinical skills, especially for group evaluation, such as in studies examining the efficacy of a training programme or as a screening instrument for deciding about courses to be offered. This estimation is more accurate when the content of the test matches the skills under study. However, written testing of skills cannot replace direct observation of performance of skills.  相似文献   

18.
INTRODUCTION: An earlier study showed that an Angoff procedure with > or = 10 recently graduated students as judges can be used to estimate the passing score of a progress test. As the acceptability and feasibility of this approach are questionable, we conducted an Angoff procedure with test item writers as judges. This paper reports on the reliability and credibility of this procedure and compares the standards set by the two different panels. METHODS: Fourteen item writers judged 146 test items. Recently graduated students had assessed these items in a previous study. Generalizability was investigated as a function of the number of items and judges. Credibility was judged by comparing the pass/fail rates associated with the Angoff standard, a relative standard and a fixed standard. The Angoff standards obtained by item writers and graduates were compared. RESULTS: The variance associated with consistent variability of item writers across items was 1.5% and for graduate students it was 0.4%. An acceptable error score required 39 judges. Item-Angoff estimates of the two panels and item P-values correlated highly. Failure rates of 57%, 55% and 7% were associated with the item writers' standard, the fixed standard and the graduates' standard, respectively. CONCLUSION: The graduates' and the item writers' standards differed substantially, as did the associated failure rates. A panel of 39 item writers is not feasible. The item writers' passing score appears to be less credible. The credibility of the graduates' standard needs further evaluation. The acceptability and feasibility of a panel consisting of both students and item writers may be worth investigating.  相似文献   

19.
INTRODUCTION: In 1997 the Royal College of Paediatrics and Child Health introduced portfolios to guide and monitor the learning of specialist registrars. We studied their value for assessment. METHODS: Using Bigg's SOLO criteria we devised a marking scheme based on 6 domains of competence: clinical, communication, teaching and learning, ethics and attitudes, management and evaluation, and creation of evidence. We rated portfolios according to quality of evidence presented and expectations by year of training. We similarly assessed trainee performance in the annual record of in-training assessment (RITA) interview. Specific advice based on the results of the first portfolio assessments was circulated to all trainees, instructing them to increase the structure and decrease the bulk of portfolios. A second sample of portfolios was reviewed a year later, using similar evaluations, to determine the effects. RESULTS: A total of 76 portfolios were assessed in year 1 by a single rater; 30 portfolios were assessed in year 2 by 2 independent raters. The quality of documentation improved from year 1 to year 2 but there was no significant increase in portfolio scores. The inter-rater correlation coefficient of the portfolio assessment method was 0.52 (Cohen's kappa 0.35). The inter-rater correlation coefficient of the RITA interview was 0.71 (Cohen's kappa 0.38). There was moderate inter-assessment correlation between portfolios and RITA interviews (kappa 0.26 in year 1 and 0.29 in year 2). Generalisability analysis suggested that 5 successive ratings by a single observer or independent ratings by 4 observers on the same occasion would be needed to yield a generalisability coefficient > 0.8 for overall portfolio rating. CONCLUSIONS: This method of portfolio assessment is insufficiently reliable as a sole method for high stakes, single-instance assessment, but has a place as part of a triangulation process. Repeated portfolio assessment by paired observers would increase reliability. Longer term studies are required to establish whether portfolio assessment positively influences learner behaviour.  相似文献   

20.
OBJECTIVES: To evaluate the development, validity and reliability of a multimodality objective structured clinical examination (OSCE) in undergraduate psychiatry, integrating interactive face-to-face and telephone history taking and communication skills stations, videotape mental state examinations and problem-oriented written stations. METHODS: The development of the OSCE on a restricted budget is described. This study evaluates the validity and reliability of 4 15-18-station OSCEs for 128 students over 1 year. Face and content validity were assessed by a panel of clinicians and from feedback from OSCE participants. Correlations with consultant clinical 'firm grades' were performed. Interrater reliability and internal consistency (interstation reliability) were assessed using generalisability theory. RESULTS: The OSCE was feasible to conduct and had a high level of high perceived face and content validity. Consultant firm grades correlated moderately with scores on interactive stations and poorly with written and video stations. Overall reliability was moderate to good, with G-coefficients in the range 0.55-0.68 for the 4 OSCEs. CONCLUSIONS: Integrating a range of modalities into an OSCE in psychiatry appears to represent a feasible, generally valid and reliable method of examination on a restricted budget. Different types of stations appear to have different advantages and disadvantages, supporting the integration of both interactive and written components into the OSCE format.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号