首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Context  The dissemination of objective structured clinical examinations (OSCEs) is hampered by requirements for high levels of staffing and a significantly higher workload compared with multiple-choice examinations. Senior medical students may be able to support faculty staff to assess their peers. The aim of this study is to assess the reliability of student tutors as OSCE examiners and their acceptance by their peers.
Methods  Using a checklist and a global rating, teaching doctors (TDs) and student tutors (STs) simultaneously assessed students in basic clinical skills at 4 OSCE stations. The inter-rater agreement between TDs and STs was calculated by kappa values and paired t -tests. Students then completed a questionnaire to assess their acceptance of student peer examiners.
Results  All 214 Year 3 students at the University of Göttingen Medical School were evaluated in spring 2005. Student tutors gave slightly better average grades than TDs (differences of 0.02–0.20 on a 5-point Likert scale). Inter-rater agreement at the stations ranged from 0.41 to 0·64 for checklist assessment and global ratings; overall inter-rater agreement on the final grade was 0.66. Most students felt that assessment by STs would result in the same grades as assessment by TDs (64%) and that it would be similarly objective (69%). Nearly all students (95%) felt confident that they could evaluate their peers themselves in an OSCE.
Conclusions  On the basis of our results, STs can act as examiners in summative OSCEs to assess basic medical skills. The slightly better grades observed are of no practical concern. Students accepted assessment performed by STs.  相似文献   

2.
Medical students' levels of anxiety under different conditions of stress were investigated, as well as the stability of anxiety ratings from one examination to another. After completing an end-of-term psychiatry examination, fourth-year medical students at Monash University were asked to score the Visual Analogue Scale for Anxiety (VASA) for three situational cues; usual day-to-day anxiety, highest anxiety associated with major exams the previous year and anxiety experienced in the end-of-term examination just completed. Twenty-eight weeks later students rated their anxiety in a subsequent end-of-term psychiatry examination. Most students rated themselves toward the lower end of the VASA for day-to-day anxiety and as having significantly, though not markedly higher anxiety in the end-of-term psychiatry examinations. The previous year's examinations, marking the end of pre-clinical training, provoked extremely high anxiety for most students, who achieved academically despite this. Comparison of anxiety ratings for the two end-of-term examinations indicated that VASA ratings shifted substantially for half the class. This variation suggests that students' levels of anxiety are not stable and predictable from one examination to another. Examination anxiety should not be seen necessarily as a consistent response to a specific and recurring situation. It is postulated that a range of situational factors and personal pressures, operating at the time, may determine how much anxiety is experienced as a reaction to the examination.  相似文献   

3.
4.
OBJECTIVE: To quantify the clinical consistency of expert panelists' ratings of appropriateness of pre-operative and post-operative chemotherapy plus radiation for rectal cancer. METHODS: A panel of nine physicians (two surgeons, four medical oncologists, three radiation oncologists) rated the appropriateness of providing pre-operative and post-operative treatments for rectal cancer, utilizing a modified-Delphi (RAND/UCLA) approach. Clinical scenarios were paired so that each component of a pair differed by only one clinical feature (e.g. tumor stage). A pair of appropriateness ratings was defined as inconsistent when the clinical scenario that should have had the higher (or at least equal) appropriateness rating was given a lower rating. The rate of inconsistency was analyzed for panelists' ratings of pre- and post-operative chemotherapy plus radiation. RESULTS: The final panel rating was inconsistent for 1.19% of pre-operative scenario pairs, and 0.77% of post-operative scenario pairs. Using the conventional RAND/UCLA definition of appropriateness, the magnitude of the inconsistency would produce inconsistent appropriateness ratings in 0.43% of pre-operative and 0.11% of post-operative scenario pairs. There was significant variation in the rate of inconsistency among individual panelists' final ratings of both pre-operative (range: 0.43-5.17%, P < 0.001) and post-operative (range: 0.51-2.34%, P < 0.001) scenarios. Panelists' overall average rate of inconsistency improved significantly after the panel meeting and discussion (from 5.62 to 2.25% for pre-operative scenarios, and from 1.47 to 1.24% for post-operative scenarios, both P < 0.05). There was no clear difference between specialty groups. Inconsistency was related to the structure of the rating manual: in the second round there were no inconsistent ratings when scenario pairs occurred on the same page of the manual. CONCLUSIONS: The RAND/UCLA appropriateness method can produce ratings for cancer treatment that are highly clinically consistent. Modifications to the structure of rating manuals to facilitate direct assessment of consistency at the time of rating may reduce inconsistency further.  相似文献   

5.
6.
OBJECTIVE: To assess the clinical consistency of expert panelists' ratings of appropriateness for coronary artery bypass surgery. DESIGN: Quantitative analysis of panelists' ratings. PARTICIPANTS: Nine physicians (three cardiothoracic surgeons, four cardiologists, and two internists) convened by RAND to establish criteria for the appropriateness of coronary artery bypass surgery. MAIN OUTCOMES MEASURES: Percentage of indication-pairs given clinically inconsistent ratings (i.e. higher rating assigned to one member of an indication-pair when rating should have been equal or lower). RESULTS: In the final round of appropriateness ratings, among 1785 pairs of indications differing only on a single clinical factor (e.g., three-vessel vs. two-vessel stenosis), 6.6% were assigned clinically inconsistent ratings by individual panelists, but only 2.7% received inconsistent ratings from the panel as a whole (using the median panel rating as the criterion). Internists on the panel provided fewer inconsistent ratings (4.6%) than either cardiologists (7.8%) or cardiothoracic surgeons (6.3%) (p < 0.001). More inconsistencies were noted when the factor distinguishing otherwise identical indications was symptom severity (inconsistency rate, 13.2%) or intensity of medical therapy (13.2%) than when it was number of stenosed vessels (3.8%) or proximal left anterior descending (PLAD) involvement (1.9%). Contrary to expectations, panelists' inconsistency rates increased between the initial and final rounds of appropriateness ratings (from 3.9 to 6.6%, p < 0.001). Panelists' mean ratings across indications were only weakly correlated with individual inconsistency rates (r = 0.18, p = ns). CONCLUSIONS: The RAND/UCLA method for assessing the appropriateness of coronary revascularization generally produces criteria that are clinically consistent. However, research is needed to understand the sources of panelists' inconsistencies and to reduce inconsistency rates further.  相似文献   

7.
This study sought to develop and pilot-test a patient-completed rating scale of medical student effectiveness while training in psychiatry. Specifically, the study focused on: (1) examining the statistical reliability and validity of a new scale; (2) using the measure to assess patient satisfaction with medical student care during the psychiatry clerkship; and (3) providing some experience-based recommendations about utilizing patient feedback when training medical students in psychiatry. Data were collected over an entire academic year and involved 35 third-year medical students and ratings of their performance provided by 102 psychiatric inpatients. Participants were recruited from three inpatient units and the rate of participation was 62.5%. Principle components analyses of the Medical Student Interviewing Performance Scale (MSIPQ) showed that the overall scale consisted of two, theoretically relevant subscales called 'Rapport' and 'Treatment Feedback.' Each subscale had adequate reliability. In addition, the two subscales were shown to each account for unique variance in two separate questions assessing the patient's overall ratings of rapport and treatment helpfulness. Finally, patients' mean ratings of medical student effectiveness were examined and showed very high levels of satisfaction with the student-patient relationship and the quality of care received. This study is among the first to examine patient satisfaction with medical student providers among a general population of psychiatric inpatients. Recommendations are made about the ways in which the MSIPQ can be used to strengthen the training of medical students in psychiatry.  相似文献   

8.
CONTEXT: Since 1986, the Ontario Ministry of Health has provided a medical licensure preparation programme for international medical graduates. Because of the diversity in candidates' oral English proficiency, this competency has been viewed as a particularly important selection criterion. OBJECTIVES: To assess and compare the quality of ratings of oral English proficiency of international medical graduates provided by physician examiners and by standardized patients (SPs). PARTICIPANTS AND MATERIALS: The study samples consisted of 73 candidates for the Ontario International Medical Graduate (IMG) Program, and physician examiners and SPs in five 10-minute encounter objective structured clinical examination (OSCE) stations. Materials used were a seven-item speaking performance rating instrument prepared for the Ontario IMG Program. METHODS: Rating sheets were scanned and the results analysed using SPSS 9.0 for Windows. RESULTS: Correlations between the physician and SP ratings on the seven items ranged from 0.52 to 0.70. The SPs provided more lenient ratings. Mean alpha reliability for the physicians' ratings on the seven items was 0.59, and for the SPs' 0.64. There was poor agreement between the two sets of raters in identifying problematic candidates. CONCLUSIONS: Notwithstanding the sizable correlations between the ratings provided by the two rater groups, the results demonstrated that there was little agreement between the two groups in identifying the potentially problematic candidates. The physicians were less prone than the SPs to rate candidates as problematic. SPs may be better placed than the physician examiners to directly assess IMG candidates' oral English proficiency.  相似文献   

9.
The purpose of this study was to develop objective assessment instruments for use in psychomotor skill training and to test the instruments for interobserver reliability. Two checklist style instruments, one for suturing and one for endotracheal intubation, were developed through a process of review of standard texts, consultation with local experts and field testing. Following development they were used by paired examiners in an Objective Structured Clinical Examination (OSCE) setting to test the instruments for interobserver reliability. A total of 88 final year medical students were recruited from the five Ontario medical schools to participate as examinees. The checklists worked well within the practical constraints of a 10 minute OSCE station and demonstrated a high level of interobserver reliability with Kappa scores of 0·65 for the suturing checklist and 0·71 for the intubation checklist. Furthermore, the Kappa scores for individual checklist items served to identify items which demonstrated poor interobserver reliability and thus highlighted them for review.  相似文献   

10.
Hodges B  McIlroy JH 《Medical education》2003,37(11):1012-1016
PURPOSE: There are several reasons for using global ratings in addition to checklists for scoring objective structured clinical examination (OSCE) stations. However, there has been little evidence collected regarding the validity of these scales. This study assessed the construct validity of an analytic global rating with 4 component subscales: empathy, coherence, verbal and non-verbal expression. METHODS: A total of 19 Year 3 and 38 Year 4 clinical clerks were scored on content checklists and these global ratings during a 10-station OSCE. T-tests were used to assess differences between groups for overall checklist and global scores, and for each of the 4 subscales. RESULTS: The mean global rating was significantly higher for senior clerks (75.5% versus 71.3%, t55 = 2.12, P < 0.05) and there were significant differences by level of training for the coherence (t55 = 3.33, P < 0.01) and verbal communication (t55 = 2.33, P < 0.05) subscales. Interstation reliability was 0.70 for the global rating and ranged from 0.58 to 0.65 for the subscales. Checklist reliability was 0.54. CONCLUSION: In this study, a summated analytic global rating demonstrated construct validity, as did 2 of the 4 scales measuring specific traits. In addition, the analytic global rating showed substantially higher internal consistency than did the checklists, a finding consistent with that seen in previous studies cited in the literature. Global ratings are an important element of OSCE measurement and can have good psychometric properties. However, OSCE researchers should clearly describe the type of global ratings they use. Further research is needed to define the most effective global rating scales.  相似文献   

11.
Wass V  Jolly B 《Medical education》2001,35(8):729-734
BACKGROUND: A London medical school final MBBS examination for 155 candidates. OBJECTIVE: To investigate whether observing the student-patient interaction in a history taking (HT) long case adds incremental information to the traditional presentation component. DESIGN: A prospective study of a HT long case which included both examiner observation of the student-patient interview (Part 1) and traditional presentation to different examiners (Part 2). Checklist and global ratings of both parts were compared. Examiners were paired to estimate inter-rater reliability. The students also took a 20 station Objective Structured Clinical Examination (OSCE). OUTCOME MEASURES: Correlation of (I) examiner ratings for observation and presentation of the HT long case (II) examiner pair ratings and (III) stepwise regression analysis of scores for the HT long case with OSCE scores. RESULTS: Seventy-five (48.4%) candidates had two examiner pairs marking their case history. Observation and presentation scores correlated poorly (checklist 0.38 and global 0.33). Checklist and global scores for each part correlated at higher levels (observation 0.64 and presentation 0.61). Inter-rater reliability correlations were higher for observation (checklist 0.72 and global 0.71) than for presentation (checklist 0.38 and global 0.60). When HT long case scores were correlated with OSCE scores, using stepwise regression, global presentation scores showed the highest correlation with the OSCE score (0.36) and the global observation score contributed a further 12% to the correlation (0.50). CONCLUSION: Observation of history taking in a long case appears to measure a useful and distinct component of clinical competence over and above the contribution made by the presentation.  相似文献   

12.
INTRODUCTION: As we move from standard 'long case' final examinations to new objective structured formats, we need to ensure the new is at least as good as the old. Furthermore, knowledge of which examination format best predicts medical student progression and clinical skills development would be of value. METHODS: A group of medical students sat both the standard long case examination and the new objective structured clinical examination (OSCE) to introduce this latter examination to our Medical School for final MB. At the end of their pre-registration year, the group and their supervising consultants submitted performance evaluation questionnaires. RESULTS: Thirty medical students sat both examinations and 20 returned evaluation questionnaires. Of the 72 consultants approached, 60 (83%) returned completed questionnaires. No correlation existed between self- and consultant reported performance. The traditional finals examination was inversely associated with consultant assessment. Better performing students were not rated as better doctors. The OSCE (and its components) was more consistent and showed positive associations with consultant ratings across the board. DISCUSSION: Major discrepancies exist between the 2 examination formats, in data interpretation and practical skills, which are explicitly tested in OSCEs but less so in traditional finals. Standardised marking schemes may reduce examiner variability and discretion and weaken correlations across the 2 examinations. This pilot provides empirical evidence that OSCEs assess different clinical domains than do traditional finals. Additionally, OSCEs improve prediction of clinical performance as assessed by independent consultants. CONCLUSION: Traditional finals and OSCEs correlate poorly with one another. Objective structured clinical examinations appear to correlate well with consultant assessment at the end of the pre-registration house officer year.  相似文献   

13.
Rater errors in a clinical skills assessment of medical students   总被引:1,自引:0,他引:1  
The authors used a many-faceted Rasch measurement model to analyze rating data from a clinical skills assessment of 173 fourth-year medical students to investigate four types of rater errors: leniency, inconsistency, the halo effect, and restriction of range. Students performed six clinical tasks with 6 standardized patients (SPs) selected from a pool of 17 SPs. SPs rated the performance of each student in six skills: history taking, physical examination, interpersonal skills, communication technique, counseling skills, and physical examination etiquette. SPs showed statistically significant differences in their rating severity, indicating rater leniency error. Four SPs exhibited rating inconsistency. Four SPs restricted their ratings in high categories. Only 1 SP exhibited a halo effect. Administrators of objective structured clinical examinations should be vigilant for various types of rater errors and attempt to reduce or eliminate those errors to improve the validity of inferences based on objective structured clinical examination scores.  相似文献   

14.
CONTEXT: Although examiners are a large source of variability in the objective structured clinical examination (OSCE), the exact causes of examiner variance remain understudied. OBJECTIVE: This study aimed to determine whether examiner familiarity with candidates influences candidate scores. METHODS: A total of 24 candidates from 4 neonatal-perinatal training programmes participated in a 10-station OSCE. Sixteen trainees and 7 examiners came from a single centre (site A) and 8 candidates and 5 examiners came from the other 3 centres. Examiners completed station-specific binary checklists and an overall global rating; standardised patients (SPs) and standardised health professionals (SHPs) completed 4 process ratings and the overall rating. A fixed-effect, 2-way analysis of variance was performed to ascertain whether there was interaction between examiner site and candidate site. RESULTS: Interstation Cronbach's alpha was 0.80 for the examiner checklist, 0.88 for the examiner global rating and 0.88 for the SP or SHP global rating. Although the checklist scores awarded by site A examiners were significantly higher than those awarded by non-site A examiners, there was no significant interaction between examiner and candidate site (P = 0.124). Similarly, the interaction between examiner and candidate site for the global rating was not significant (P = 0.207). Global ratings awarded by SPs and SHPs were also higher in stations where site A faculty examined site A candidates, suggesting the observed differences may have been related to performance. CONCLUSIONS: Results from this small dataset suggest that examiner familiarity with candidates does not influence how examiners score candidates, confirming the objective nature of the OSCE. Confirmation with a larger study is required.  相似文献   

15.
Variation in the accuracy of examiner judgements is a source of measurement error in performance-based tests. In previous studies using doctor subjects, examiner training yielded marginal or no improvement in the accuracy of examiner judgments. This study reports an experiment on accuracy of scoring in which provision of training and background of examiners are systematically varied. Experienced teaching staff, medical students and lay subjects were randomly assigned to either training or no-training groups. Using detailed behavioural check-lists, they subsequently scored videotaped performance on two clinical cases, and accuracy of their judgments was appraised. Results indicated that the need for and effectiveness of training varied across groups: it was least needed and least effective for the teaching staff group, more needed and effective for medical students, and most needed and effective for the lay group. The accuracy of the lay group after training approached the accuracy of untrained teaching staff. Trained medical students were as accurate as trained teaching staff. For teaching staff and medical students training also influenced the nature of errors made by reducing the number of errors of commission. It was concluded that training varies in effectiveness as a function of medical experience and that trained lay persons can be utilized as examiners in performance-based tests.  相似文献   

16.
Evidence of clinical competence for medical students entering the clinical clerkships at the University of Kansas College of Health Sciences is established by passing two different examinations: a 100 item multiple choice examination and a videotaped history and physical examination by each student of a simulated patient, being rated by that patient and two examiners. In 1976 the class of 196 medical students took an average 1.85 written examinations per student. With 70% or better constituting a passing score, 30.6% passed on the first attempt, 55.6% the second, 11.2% the third and 2.5% the fourth. Each student passed the televised practical examination and had the opportunity to review his or her videotape with a critiqued data base and the examiners' and simulated patient's evaluations in hand. Correlation coefficients for all 196 students between scores of written examinations, medicine tutors, examiners and professional patients revealed weak but significant correlations between the assessments of examiners and medical tutors and assessments of examiners and written examination scores, but not between other evaluations. This scheme of proof of competence appears to be objective and direct, and serves the convenience of both students and teaching staff.  相似文献   

17.
BACKGROUND: The membership examination of the Royal College of General Practitioners (RCGP) uses structured oral examinations to assess candidates' decision making skills and professional values. AIM: To estimate three indices of reliability for these oral examinations. METHODS: In summer 1998, a revised system was introduced for the oral examinations. Candidates took two 20-minute (five topic) oral examinations with two examiner pairs. Areas for oral topics had been identified. Examiners set their own topics in three competency areas (communication, professional values and personal development) and four contexts (patient, teamwork, personal, society). They worked in two pairs (a quartet) to preplan questions on 10 topics. The results were analysed in detail. Generalisability theory was used to estimate three indices of reliability: (A) intercase (B) pass/fail decision and (C) standard error of measurement (SEM). For each index, a benchmark requirement was preset at (A) 0.8 (B) 0.9 and (C) 0.5. RESULTS: There were 896 candidates in total. Of these, 87 candidates (9.7%) failed. Total score variance was attributed to: 41% candidates, 32% oral content, 27% examiners and general error. Reliability coefficients were: (A) intercase 0.65; (B) pass/fail 0.85. The SEM was 0.52 (i.e. precise enough to distinguish within one unit on the rating scale). Extending testing time to four 20-minute oral examinations, each with two examiners, or five orals, each with one examiner, would improve intercase and pass/fail reliabilities to 0.78 and 0.94, respectively. CONCLUSION: Structured oral examinations can achieve reliabilities appropriate to high stakes examinations if sufficient resources are available.  相似文献   

18.
Undergraduate medical students of the Ben Gurion University were evaluated upon completion of their fourth- and sixth-year medical clerkships by a 17-item rating scale, a multiple choice question (MCQ) test and a patient-oriented oral examination by two academic staff members. Pearson's correlation coefficient between the fourth-and sixth-year global ratings was r = 0.44 (P less than or equal to 0.001), while that between the fourth- and sixth-year MCQ scores was r = 0.54 P less than or equal to 0.001). Pearson's coefficient between the global ratings and the MCQ scores in the sixth year was r = 0.25 (P less than or equal to 0.05). Stepwise regression analysis revealed that the ratings on the parameters 'reliability', 'knowledge', 'organization', 'diligence' and 'case presentation' were the most predictive of the overall global rating. It is concluded that the reproducibility of 'subjective' expert assessment of performance through global rating scales is comparable to that of 'objective' evaluation through written MCQ, even though these measures assess different domains of competence at different levels of simulations. It is recommended that the clinical performance of undergraduate medical students should be assessed by a combination of subjective and objective measures.  相似文献   

19.
The previous paper (Rix et al., 1985) described the production of two videotaped clinical examinations for use in assessing undergraduate medical students during their psychiatry clerkship. In this paper assessments by videotape are compared with conventional assessments available to the examiners. The highest correlations were between the videotape examination results and written multiple choice questionnaire results, suggesting that they test a common area of clinical competence: knowledge and interpretation of psychopathology. Videotape examination results correlated poorly or not at all with the teachers' global ratings and clinical examination results, which may be indicative of relative success in devising procedures for the assessment of fairly independent abilities.  相似文献   

20.
The face validity of a final professional clinical examination   总被引:1,自引:0,他引:1  
OBJECTIVE: To develop new methods of evaluating face validity in the context of a revised final professional examination for medical undergraduates, organized on three sites, over 2 days. METHODS: The opinion of the students and examiners was surveyed by Likert-style questionnaires, with additional open comments. Expert opinion was gathered from external examiner reports and a recent Quality Assurance Agency (QAA) Subject Review Report. RESULTS: The questionnaires had an overall response rate of 84%. Internal reliability, assessed by comparing responses to appropriate questions, was good with an equivalence of 45% (weighted kappa 0.54) for the students and 33% (weighted kappa 0.41) for the assessors. There was little evidence of inconsistency between days or sites. The majority of the opinions from the students, examiners and external experts were positive. Negative comments related to time pressure and case mix. CONCLUSION: The measurement of face validity proved feasible and valuable and will assist in the further development of the course and the examination.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号