首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The objective structured clinical examination in undergraduate psychiatry   总被引:1,自引:0,他引:1  
Inadequate attention has been given to verifying the psychometric attributes of the objective structured clinical examination (OSCE), yet its popularity has been increasing in recent years. Our 6 years' experience in Nigeria showed that OSCE is practicable in undergraduate psychiatry assessment and there is evidence over consecutive years that it has satisfactory reliability and criterion-based validity. The importance of students' feedback in assessing the quality of examination is reinforced, and subtle, less tangible elements which determine students' performance, such as social interactional mystique and some personality traits, are worthy of evaluative research.  相似文献   

2.
The assessment of clinical competence has traditionally been carried out through standard evaluations such as multiple choice question and bedside oral examinations. The attributes which constitute clinical competence are multidimensional, and we have modified the objective structured clinical examination (OSCE) to measure these various competencies. We have evaluated the validity and reliability of the OSCE in a paediatric clinical clerkship. We divided the examination into the four components of competence (clinical skills, problem-solving, knowledge, and patient management) and evaluated the performance of 77 fourth-year medical students. The skill and content domains of the OSCE were carefully defined, agreed upon, sampled and reproduced. This qualitative evaluation of the examination was both adequate and appropriate. We achieved both acceptable interstation and intertask reliability. When correlated with concurrent methods of evaluation we found the OSCE to be an accurate measure of paediatric knowledge and patient management skills. The OSCE did not correlate, however, with traditional measures of clinical skills including history-taking and physical examination. Our OSCE, as outlined, offers an objective means of identifying weaknesses and strengths in specific areas of clinical competence and is therefore an important addition to the traditional tools of evaluation.  相似文献   

3.
CONTEXT: The College of Medicine and Medical Sciences at the Arabian Gulf University, Bahrain, replaced the traditional long case/short case clinical examination on the final MD examination with a direct observation clinical encounter examination (DOCEE). Each student encountered four real patients. Two pairs of examiners from different disciplines observed the students taking history and conducting physical examinations and jointly assessed their clinical competence. OBJECTIVES: To determine the reliability and validity of the DOCEE by investigating whether examiners agree when scoring, ranking and classifying students; to determine the number of cases and examiners necessary to produce a reliable examination, and to establish whether the examination has content and concurrent validity. SUBJECTS: Fifty-six final year medical students and 22 examiners (in pairs) participated in the DOCEE in 2001. METHODS: Generalisability theory, intraclass correlation, Pearson correlation and kappa were used to study reliability and agreement between the examiners. Case content and Pearson correlation between DOCEE and other examination components were used to study validity. RESULTS: Cronbach's alpha for DOCEE was 0.85. The intraclass and Pearson correlation of scores given by specialists and non-specialists ranged from 0.82 to 0.93. Kappa scores ranged from 0.56 to 1.00. The overall intraclass correlation of students' scores was 0.86. The generalisability coefficient with four cases and two raters was 0.84. Decision studies showed that increasing the cases from one to four improved reliability to above 0.8. However, increasing the number of raters had little impact on reliability. The use of a pre-examination blueprint for selecting the cases improved the content validity. The disattenuated Pearson correlations between DOCEE and other performance measures as a measure of concurrent validity ranged from 0.67 to 0.79. CONCLUSIONS: The DOCEE was shown to have good reliability and interrater agreement between two independent specialist and non-specialist examiners on the scoring, ranking and pass/fail classification of student performance. It has adequate content and concurrent validity and provides unique information about students' clinical competence.  相似文献   

4.
Newble D 《Medical education》2004,38(2):199-203
The traditional clinical examination has been shown to have serious limitations in terms of its validity and reliability. The OSCE provides some answers to these limitations and has become very popular. Many variants on the original OSCE format now exist and much research has been done on various aspects of their use. Issues to be addressed relate to organization matters and to the quality of the assessment. This paper focuses particularly on the latter with respect to ways of ensuring content validity and achieving acceptable levels of reliability. A particular concern has been the demonstrable need for long examinations if high levels of reliability are to be achieved. Strategies for reducing the practical difficulties this raises are discussed. Standard setting methods for use with OSCEs are described.  相似文献   

5.
Reliability and learning from the objective structured clinical examination   总被引:3,自引:0,他引:3  
The difficulties in measurement of the clinical performance of students in the health professions are well known by educators. One innovative measure incorporated in several of the educational programmes, including the BSc in Nursing programme, in the Faculty of Health Sciences, at McMaster University, Hamilton, Ontario, Canada is the objective structured clinical examination (OSCE). The purpose of this study was to determine the reliability of this evaluation method, both within and between stations. One problem that has been noted by users of the OSCE method is that performance on individual OSCE stations is poorly correlated across stations, apparently regardless of the particular content of the station. A number of hypotheses have been advanced to attempt to explain this phenomenon: performance of any skill is sufficiently variable that the correlation is poor; different skills have little common basis, so that there is no generalizability from one to another, or reliability of assessment in any one station is low. To test these hypotheses, a study was designed for test-retest and interrater reliability. Students undergoing a 10-station OSCE also repeated their starting OSCE station at the end of the examination circuit. In addition, several stations were rated by more than one observer (interrater). This study of 71 first-year BScN students showed that the interrater reliability was high (ICC = 0.80 to 0.99), and test-retest reliability on the same station was good (ICC = 0.66 to 0.86); however, correlation across stations was low (alpha = 0.198). Thus it is apparent that there is high consistency of repeated performance of a skill but little consistency of performance on different skills.  相似文献   

6.
PURPOSE: Earlier studies of absolute standard setting procedures for objective structured clinical examinations (OSCEs) show inconsistent results. This study compared a rational and an empirical standard setting procedure. Reliability and credibility were examined first. The impact of a reality check was then established. METHODS: The OSCE included 16 stations and was taken by trainees in their final year of postgraduate training in general practice and experienced general practitioners. A modified Angoff (independent judgements, no group discussion) with and without a reality check was used as a rational procedure. A method related to the borderline group procedure, the borderline regression (BR) method, was used as an empirical procedure. Reliability was assessed using generalisability theory. Credibility was assessed by comparing pass rates and by relating the passing scores to test difficulty. RESULTS: The passing scores were 73.4% for the Angoff procedure without reality check (Angoff I), 66.0% for the Angoff procedure with reality check (Angoff II) and 57.6% for the BR method. The reliabilities (expressed as root mean square errors) were 2.1% for Angoffs I and II, and 0.6% for the BR method. The pass rates of the trainees and GPs were 19% and 9% for Angoff I, 66% and 46% for Angoff II, and 95% and 80% for the BR method, respectively. The correlation between test difficulty and passing score was 0.69 for Angoff I, 0.88 for Angoff II and 0.86 for the BR method. CONCLUSION: The BR method provides a more credible and reliable standard for an OSCE than a modified Angoff procedure. A reality check improves the credibility of the Angoff procedure but does not improve its reliability.  相似文献   

7.
BACKGROUND: Assessment plays a key role in the learning process. The validity of any given assessment tool should ideally be established. If an assessment is to act as a guide to future teaching and learning then its predictive validity must be established. AIM: To assess the ability of an objective structured clinical examination (OSCE) taken at the end of the first clinical year of an undergraduate medical degree to predict later performance in clinical examinations. METHODS: Performance of two consecutive cohorts of year 3 medical undergraduates (n=138 and n=128) in a 23 station OSCE were compared with their performance in 5 subsequent clinical examinations in years 4 and 5 of the course. RESULTS: Poor performance in the OSCE was strongly associated with later poor performance in other clinical examinations. Students in the lowest three deciles of OSCE performance were 6 times more likely to fail another clinical examination. Receiver operating characteristic curves were constructed as a method to criterion reference the cut point for future examinations. CONCLUSION: Performance in an OSCE taken early in the clinical course strongly predicts later clinical performance. Assessing subsequent student performance is a powerful tool for assessing examination validity. The use of ROC curves represents a novel method for determining future criterion referenced examination cut points.  相似文献   

8.
INTRODUCTION: Structured assessment, embedded in a training programme, with systematic observation, feedback and appropriate documentation may improve the reliability of clinical assessment. This type of assessment format is referred to as in-training assessment (ITA). The feasibility and reliability of an ITA programme in an internal medicine clerkship were evaluated. The programme comprised 4 ward-based test formats and 1 outpatient clinic-based test format. Of the 4 ward-based test formats, 3 were single-sample tests, consisting of 1 student-patient encounter, 1 critical appraisal session and 1 case presentation. The other ward-based test and the outpatient-based test were multiple sample tests, consisting of 12 ward-based case write-ups and 4 long cases in the outpatient clinic. In all the ITA programme consisted of 19 assessments. METHODS: During 41 months, data were collected from 119 clerks. Feasibility was defined as over two thirds of the students obtaining 19 assessments. Reliability was estimated by performing generalisability analyses with 19 assessments as items and 5 test formats as items. RESULTS: A total of 73 students (69%) completed 19 assessments. Reliability expressed by the generalisability coefficients was 0.81 for 19 assessments and 0.55 for 5 test formats. CONCLUSIONS: The ITA programme proved to be feasible. Feasibility may be improved by scheduling protected time for assessment for both students and staff. Reliability may be improved by more frequent use of some of the test formats.  相似文献   

9.
INTRODUCTION: This study describes the development of an instrument to measure the ability of medical students to reflect on their performance in medical practice. METHODS: A total of 195 Year 4 medical students attending a 9-hour clinical ethics course filled in a semi-structured questionnaire consisting of reflection-evoking case vignettes. Two independent raters scored their answers. Respondents were scored on a 10-point scale for overall reflection score and on a scale of 0-2 for the extent to which they mentioned a series of perspectives in their reflections. We analysed the distribution of scores, the internal validity and the effect of being pre-tested with an alternate form of the test on the scores. The relationships between overall reflection score and perspective score, and between overall reflection score and gender, career preference and work experience were also calculated. RESULTS: The interrater reliability was sufficient. The range of scores on overall reflection was large (1-10), with a mean reflection score of 4.5-4.7 for each case vignette. This means that only 1 or 2 perspectives were mentioned, and hardly any weighing of perspectives took place. The values over the 2 measurements were comparable and were strongly related. Women had slightly higher scores than men, as had students with work experience in health care, and students considering general practice as a career. CONCLUSIONS: Reflection in medical practice can be measured using this semistructured questionnaire built on case vignettes. The mean score allows for the measurement of improvement by future educational efforts. The wide range of individual differences allows for comparisons between groups. The differences found between groups of students were as expected and support the validity of the instrument.  相似文献   

10.
OBJECTIVES: The aim of curriculum reform in medical education is to improve students' clinical and communication skills. However, there are contradicting results regarding the effectiveness of such reforms. METHODS: A study of internal medicine students was carried out using a static group design. The experimental group consisted of 77 students participating in 7 sessions of communication training, 7 sessions of skills-laboratory training and 7 sessions of bedside-teaching, each lasting 1.5 hours. The control group of 66 students from the traditional curriculum participated in equally as many sessions but was offered only bedside teaching. Students' cognitive and practical skills performance was assessed using Multiple Choice Question (MCQ) testing and an objective structured clinical examination (OSCE), delivered by examiners blind to group membership. RESULTS: The experimental group performed significantly better on the OSCE than did the control group (P < 0.01), whereas the groups did not differ on the MCQ test (P < 0.15). This indicates that specific training in communication and basic clinical skills enabled students to perform better in an OSCE, whereas its effects on knowledge did not differ from those of the traditional curriculum. CONCLUSION: Curriculum reform promoting communication and basic clinical skills are effective and lead to an improved performance in history taking and physical examination skills.  相似文献   

11.
The long case     
BACKGROUND: The long case has been gradually replaced by the objective structured clinical examination (OSCE) as a summative assessment of clinical skills. Its demise occurred against a paucity of psychometric research. This article reviews the current status of the long case, appraising its strengths and weaknesses as an assessment tool. ISSUES: There is a conflict between validity and reliability. The long case assesses an integrated clinical reaction between doctor and real patients and has high face validity. Intercase reliability is the prime problem. As most examinations traditionally used a single case only, problems of content specificity and standardisation were not addressed. DISCUSSION: Recent research suggests that testing across more cases does improve reliability. Better structuring of tests and direct observation increases validity. Substituting standardised cases for real patients may be of little benefit compared to increasing the sample of cases. CONCLUSIONS: Observed long cases can be useful for assessment depending on the sample size of cases and examiners. More research is needed into the exact nature of intercase and interexaminer variance and consequential validity. Feasibility remains a key problem. More exploration of combined assessments using real patients with OSCEs is suggested.  相似文献   

12.
CONTEXT: Factors that interfere with the ability to interpret assessment scores or ratings in the proposed manner threaten validity. To be interpreted in a meaningful manner, all assessments in medical education require sound, scientific evidence of validity. PURPOSE: The purpose of this essay is to discuss 2 major threats to validity: construct under-representation (CU) and construct-irrelevant variance (CIV). Examples of each type of threat for written, performance and clinical performance examinations are provided. DISCUSSION: The CU threat to validity refers to undersampling the content domain. Using too few items, cases or clinical performance observations to adequately generalise to the domain represents CU. Variables that systematically (rather than randomly) interfere with the ability to meaningfully interpret scores or ratings represent CIV. Issues such as flawed test items written at inappropriate reading levels or statistically biased questions represent CIV in written tests. For performance examinations, such as standardised patient examinations, flawed cases or cases that are too difficult for student ability contribute CIV to the assessment. For clinical performance data, systematic rater error, such as halo or central tendency error, represents CIV. The term face validity is rejected as representative of any type of legitimate validity evidence, although the fact that the appearance of the assessment may be an important characteristic other than validity is acknowledged. CONCLUSIONS: There are multiple threats to validity in all types of assessment in medical education. Methods to eliminate or control validity threats are suggested.  相似文献   

13.
OBJECTIVE: To describe the development, organization, implementation and evaluation of a yearly multicentre, identical and simultaneous objective structured clinical examination (OSCE). SUBJECTS: All fifth-year medical students in a 6-year undergraduate medical programme. SETTING: The Christchurch, Dunedin and Wellington Schools of Medicine of the University of Otago, New Zealand. METHOD: One practice and two full 18-station OSCEs have been completed over 2 years, for up to 72 students per centre, in three centres. The process of development and logistics is described. Data are presented on validity, reliability and fairness. RESULTS: Face and content validity were established. Internal consistency was 0.83-0. 86 and interexaminer reliability, as assessed by the coefficient of correlation, averaged 0.78. Students rated the OSCE highly on relevance. Of the total variance in total OSCE marks, the schools contributed 6.9%, and the students 93.1%, in the first year. In the second year the schools contributed 6.2% and the students 93.8%. CONCLUSION: Implementation of a psychometrically sound, multicentre, simultaneous and identical OSCE is possible with a low level of interschool variation.  相似文献   

14.
Summary: Because of dissatisfaction with the traditional long case procedure as a method of examining the clinical competence of medical students undertaking a psychiatry term, an alternative 'direct' method whereby two examiners observe the interaction between student and patient has been developed and is described. This method of examining allows the examiners to set and evaluate case-specific tasks. It is demonstrated that the two examiners achieve satisfactory inter-rater reliability both with respect to the mark awarded and the difficulty the patient presents and that, as one would wish, these two measures do not correlate. Students' opinions regarding the examination were assessed pre- and post-examination using visual analogue scales. The students found the examination stressful but rated the method as an appropriate form of clinical assessment both before and after their examination. The method is seen as having several advantages which must be set against the disadvantage of its being relatively expensive of examiners' time.  相似文献   

15.
AIM: Because it deals with qualitative information, portfolio assessment inevitably involves some degree of subjectivity. The use of stricter assessment criteria or more structured and prescribed content would improve interrater reliability, but would obliterate the essence of portfolio assessment in terms of flexibility, personal orientation and authenticity. We resolved this dilemma by using qualitative research criteria as opposed to reliability in the evaluation of portfolio assessment. METHODOLOGY/RESEARCH DESIGN: Five qualitative research strategies were used to achieve credibility and dependability of assessment: triangulation, prolonged engagement, member checking, audit trail and dependability audit. Mentors read portfolios at least twice during the year, providing feedback and guidance (prolonged engagement). Their recommendation for the end-of-year grade was discussed with the student (member checking) and submitted to a member of the portfolio committee. Information from different sources was combined (triangulation). Portfolios causing persistent disagreement were submitted to the full portfolio assessment committee. Quality assurance procedures with external auditors were used (dependability audit) and the assessment process was thoroughly documented (audit trail). RESULTS: A total of 233 portfolios were assessed. Students and mentors disagreed on 7 (3%) portfolios and 9 portfolios were submitted to the full committee. The final decision on 29 (12%) portfolios differed from the mentor's recommendation. CONCLUSION: We think we have devised an assessment procedure that safeguards the characteristics of portfolio assessment, with credibility and dependability of assessment built into the judgement procedure. Further support for credibility and dependability might be sought by means of a study involving different assessment committees.  相似文献   

16.
Background  Medical students' final clinical grades in internal medicine are based on the results of multiple assessments that reflect not only the students' knowledge, but also their skills and attitudes.
Objective  To examine the sources of validity evidence for internal medicine final assessment results comprising scores from 3 evaluations and 2 examinations.
Methods  The final assessment scores of 8 cohorts of Year 4 medical students in a 6-year undergraduate programme were analysed. The final assessment scores consisted of scores in ward evaluations (WEs), preceptor evaluations (PREs), outpatient clinic evaluations (OPCs), general knowledge and problem-solving multiple-choice questions (MCQs), and objective structured clinical examinations (OSCEs). Sources of validity evidence examined were content, response process, internal structure, relationship to other variables, and consequences.
Results  The median generalisability coefficient of the OSCEs was 0.62. The internal consistency reliability of the MCQs was 0.84. Scores for OSCEs correlated well with WE, PRE and MCQ scores with observed (disattenuated) correlation of 0.36 (0.77), 0.33 (0.71) and 0.48 (0.69), respectively. Scores for WEs and PREs correlated better with OSCE than MCQ scores. Sources of validity evidence including content, response process, internal structure and relationship to other variables were shown for most components.
Conclusion  There is sufficient validity evidence to support the utilisation of various types of assessment scores for final clinical grades at the end of an internal medicine rotation. Validity evidence should be examined for any final student evaluation system in order to establish the meaningfulness of the student assessment scores.  相似文献   

17.
OBJECTIVE: This study examined the influence of gender on undergraduate performance in psychiatry among final year medical students at the University College Hospital, Ibadan, Nigeria. METHODS: Results in all parts of the examination in psychiatry for the 2001 graduating class were obtained. In addition, performance scores were obtained for entrance examinations to medical school, preclinical subjects (anatomy, physiology and biochemistry) and clinical subjects (paediatrics, obstetrics and gynaecology, internal medicine and surgery). The mean marks according to gender, with 95% intervals, were calculated and tested for significance. RESULTS: A total of 234 students (160 men and 74 women) took the examinations in psychiatry. Women performed better than men in both the multiple choice questions (MCQ) examination (P = 0.0044) and the clinical assessment (P= 0.0000063). The women were significantly younger than the men (P = 0-0000007) and performance in both parts of the examination decreased with increasing age. There were no differences between the genders in entrance examination scores or preclinical scores but there were significant differences between the genders in performance in clinical subjects such as paediatrics, obstetrics and gynaecology and internal medicine. CONCLUSION: Women performed better than men in all parts of the psychiatry examination, with the difference being more marked in the clinical aspect. A superior performance on the part of women was noted in all clinical subjects. However, where an examination did not involve verbal interaction, there was no difference in performance between the genders. A direct correlation between increasing age and decreasing performance in examinations was also seen.  相似文献   

18.
CONTEXT: Objective structured clinical examinations (OSCEs) can be used for formative and summative evaluation. We sought to determine the generalisability of students' summary scores aggregated from formative OSCE cases distributed across 5 clerkships during Year 3 of medical school. METHODS: Five major clerkships held OSCEs with 2-4 cases each during their rotations. All cases used 15-minute student-standardised patient encounters and performance was assessed using clinical and communication skills checklists. As not all students completed every clerkship or OSCE case, the generalisability (G) study was an unbalanced student x (case : clerkship) design. After completion of the G study, a decision (D) study was undertaken and phi (phi) values for different cut-points were calculated. RESULTS: The data for this report were collected over 2 academic years involving 262 Year 3 students. The G study found that 9.7% of the score variance originated from the student, 3.1% from the student-clerkship interaction, and 87.2% from the student-case nested within clerkship effect. Using the variance components from the G study, the D study suggested that if students completed 3 OSCE cases in each of the 5 different clerkships, the reliability of the aggregated scores would be 0.63. The phi, calculated at a cut-point 1 standard deviation below the mean, would be approximately 0.85. CONCLUSIONS: Aggregating case scores from low stakes OSCEs within clerkships results in a score set that allows for very reliable decisions about which students are performing poorly. Medical schools can use OSCE case scores collected over a clinical year for summative evaluation.  相似文献   

19.
PURPOSE: To examine the validity of a written knowledge test of skills for performance on an OSCE in postgraduate training for general practice. METHODS: A randomly-selected sample of 47 trainees in general practice took a knowledge test of skills, a general knowledge test and an OSCE. The OSCE included technical stations and stations including complete patient encounters. Each station was checklist rated and global rated. RESULTS: The knowledge test of skills was better correlated to the OSCE than the general knowledge test. Technical stations were better correlated to the knowledge test of skills than stations including complete patient encounters. For the technical stations the rating system had no influence on the correlation. For the stations including complete patient encounters the checklist rating correlated better to the knowledge test of skills than the global rating. CONCLUSION: The results of this study support the predictive validity of the knowledge test of skills. In postgraduate training for general practice a written knowledge test of skills can be used as an instrument to estimate the level of clinical skills, especially for group evaluation, such as in studies examining the efficacy of a training programme or as a screening instrument for deciding about courses to be offered. This estimation is more accurate when the content of the test matches the skills under study. However, written testing of skills cannot replace direct observation of performance of skills.  相似文献   

20.
PURPOSE: At the Faculty of Medicine at the Katholieke Universiteit Leuven, Belgium, we have developed a final examination that consists of extended matching multiple-choice questions. Extended matching questions (EMQs) originate from a case and have 1 correct answer within a list of at least 7 alternatives. If EMQs assess clinical reasoning, we can assume there will be a difference between the ways students and experienced doctors solve the problems within the questions. This study compared students' and residents' processes of solving EMQs. METHODS: Twenty final year students and 20 fourth or fifth year residents specialising in internal medicine solved 20 EMQs aloud. All questions concerned diagnosis or pathogenesis. Ten EMQs related to internal medicine and 10 questions to other medical disciplines. The session was audio-taped and transcribed. RESULTS: The residents correctly answered significantly more questions concerning internal medicine than did the students. Their reasoning was more "forward" and less "backward". No difference between residents and students was found for the other questions. The residents scored better on internal medicine than on the other questions. They used more backward and less forward reasoning when solving the other questions than they did with the internal medicine questions. The better half of the respondents used significantly more forward and less backward reasoning than did the poorer half. CONCLUSION: In accordance with the literature, medical expertise was characterised by forward reasoning, whereas outside their area of expertise, the subjects switched over to backward reasoning. It is possible to assess processes of clinical reasoning using EMQs.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号