首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到10条相似文献,搜索用时 171 毫秒
1.
This study compares the reliability, validity, and efficiency of three multiple-choice question (MCQs) ability scales with patient management problems (PMPs). Data are from the 1980, 1981, and 1982 American Board of Internal Medicine Certifying Examinations. The MCQ ability scales were constructed by classifying the one best answer and multiple-true/false questions in each examination as measuring predominantly clinical judgment, synthesis, or knowledge. Clinical judgment items require prioritizing or weighing management decisions; synthesis items require the integration of findings into a diagnostic decision; and knowledge items stress recall of factual information. Analyses indicate that the MCQ ability scales are more reliable and valid per unit of testing time than are PMPs and that clinical judgment and synthesis scales are slightly more correlated with PMPs than is the knowledge scale. Additionally, all MCQ ability scales seem to be measuring the same aspects of competence as PMPs.  相似文献   

2.
Patient management problems (PMP) are being used in medical examinations with increasing frequency despite evidence which throws doubt on their validity as measures of clinical competence. This study investigated the construct validity of a PMP constructed in both written and interview formats. Each test was administered to groups of students of different seniorities and to two groups of Docotor, interns and post-interns. The pattern of scores for the different groups was not that expected of a valid test of competence. The most competent groups (the postinterns) generally scored less well on the calculated indices than the senior students and interns. These findings were similar for both formats of the test so cueing was not thought to be the major factor. It appears that the scoring system is at fault.
A comparison of performance on the written and interview (uncued) formats showed that many more options were chosen by all groups tested on the written PMP.
It was concluded that written PMPs cannot yet be regarded as a valid simulation of clinical performance. Although content validity is high this does not appear to be so for construct validity or concurrent validity.  相似文献   

3.
Summary. Although the cueing effects inherent in conventional multiple choice questions (MCQs) present serious limitations, this format continues to dominate testing programmes. The present study was undertaken to estimate the effects of cueing when MCQs are used to test medical students, and to evaluate the reliability, validity and feasibility of an alternative testing format. Equivalent items in both MCQ and open-ended, or uncued (Un-Q), formats were administered to 34 third- and fourth-year medical students. The students' mean % correct score on the MCQs was 11 percentage points higher than their mean level of performance on equivalent Un-Qs. When a second set of more difficult items was administered to 16 of these students, their mean performance on the MCQ items was 22 percentage points higher than their performance on equivalent Un-Qs. The results support the feasibility of large group administration of tests constructed in an open-ended format that can be scored by computer. Not only is this format equally reliable and economical when compared with the MCQ, but it also provides important advantages that strengthen its face validity. The Un-Q format can be used to test either simple recall or certain higher level problem-solving skills that cannot be tested by MCQs. Even more important, the results also suggest that the Un-Q format may be a more effective discriminator of academically marginal examinees.  相似文献   

4.
Farmer EA  Page G 《Medical education》2005,39(12):1188-1194
AIM: This paper in the series on professional assessment provides a practical guide to writing key features problems (KFPs). Key features problems test clinical decision-making skills in written or computer-based formats. They are based on the concept of critical steps or 'key features' in decision making and represent an advance on the older, less reliable patient management problem (PMP) formats. METHOD: The practical steps in writing these problems are discussed and illustrated by examples. Steps include assembling problem-writing groups, selecting a suitable clinical scenario or problem and defining its key features, writing the questions, selecting question response formats, preparing scoring keys, reviewing item quality and item banking. CONCLUSION: The KFP format provides educators with a flexible approach to testing clinical decision-making skills with demonstrated validity and reliability when constructed according to the guidelines provided.  相似文献   

5.
PURPOSE: The purpose of this study was to gather additional evidence for the validity and reliability of spoken English proficiency ratings provided by trained standardized patients (SPs) in high-stakes clinical skills examination. METHOD: Over 2500 candidates who took the Educational Commission for Foreign Medical Graduates' (ECFMG) Clinical Skills Assessment (CSA) were studied. The CSA consists of 10 or 11 timed clinical encounters. Standardized patients evaluate spoken English proficiency and interpersonal skills in every encounter. Generalizability theory was used to estimate the consistency of spoken English ratings. Validity coefficients were calculated by correlating summary English ratings with CSA scores and other external criterion measures. Mean spoken English ratings were also compared by various candidate background variables. RESULTS: The reliability of the spoken English ratings, based on 10 independent evaluations, was high. The magnitudes of the associated variance components indicated that the evaluation of a candidate's spoken English proficiency is unlikely to be affected by the choice of cases or SPs used in a given assessment. Proficiency in spoken English was related to native language (English versus other) and scores from the Test of English as a Foreign Language (TOEFL). DISCUSSION: The pattern of the relationships, both within assessment components and with external criterion measures, suggests that valid measures of spoken English proficiency are obtained. This result, combined with the high reproducibility of the ratings over encounters and SPs, supports the use of trained SPs to measure spoken English skills in a simulated medical environment.  相似文献   

6.
CONTEXT: The College of Medicine and Medical Sciences at the Arabian Gulf University, Bahrain, replaced the traditional long case/short case clinical examination on the final MD examination with a direct observation clinical encounter examination (DOCEE). Each student encountered four real patients. Two pairs of examiners from different disciplines observed the students taking history and conducting physical examinations and jointly assessed their clinical competence. OBJECTIVES: To determine the reliability and validity of the DOCEE by investigating whether examiners agree when scoring, ranking and classifying students; to determine the number of cases and examiners necessary to produce a reliable examination, and to establish whether the examination has content and concurrent validity. SUBJECTS: Fifty-six final year medical students and 22 examiners (in pairs) participated in the DOCEE in 2001. METHODS: Generalisability theory, intraclass correlation, Pearson correlation and kappa were used to study reliability and agreement between the examiners. Case content and Pearson correlation between DOCEE and other examination components were used to study validity. RESULTS: Cronbach's alpha for DOCEE was 0.85. The intraclass and Pearson correlation of scores given by specialists and non-specialists ranged from 0.82 to 0.93. Kappa scores ranged from 0.56 to 1.00. The overall intraclass correlation of students' scores was 0.86. The generalisability coefficient with four cases and two raters was 0.84. Decision studies showed that increasing the cases from one to four improved reliability to above 0.8. However, increasing the number of raters had little impact on reliability. The use of a pre-examination blueprint for selecting the cases improved the content validity. The disattenuated Pearson correlations between DOCEE and other performance measures as a measure of concurrent validity ranged from 0.67 to 0.79. CONCLUSIONS: The DOCEE was shown to have good reliability and interrater agreement between two independent specialist and non-specialist examiners on the scoring, ranking and pass/fail classification of student performance. It has adequate content and concurrent validity and provides unique information about students' clinical competence.  相似文献   

7.
An account is given of the introduction of MCQ in an Indian Medical institution in 1974/75.
Preliminary results suggest that the reliability coefficient of MCQs rises with their difficulty, and that Pearson-type coefficient indicates a higher degree of correlation with essay marks than do Spearman coefficients.
A 'student assessment response' taken with standard item analysis technique may be of value in assessing questions for future use.  相似文献   

8.
9.
CONTEXT: Monitoring the teaching effectiveness of attending physicians is important to enhancing the quality of graduate medical education. METHODS: We used a critical incident technique with 35 residents representing a cross-section of programmes in a teaching hospital to develop a 23-item rating form. We obtained ratings of 11 attending physicians in internal medicine and general surgery from 54 residents. We performed linear and logistic regression analysis to relate the items on the form to the residents' overall ratings of the attending physicians and the programme directors' ratings of the attending physicians. RESULTS: The residents rated the attending physicians highly in most areas, but lower in provision of feedback, clarity of written communication and cost-effectiveness in making clinical decisions. When we used the residents' overall ratings as the criterion, the most important aspects of attending physicians' teaching were clarity of written communication, cost-effectiveness, commitment of time and energy and whether the resident would refer a family member or friend to the physician. When we used the programme directors' ratings as the criterion, the additional important aspects of performance were concern for the residents' professional well-being, knowledge of the literature and the delivery of clear verbal and written communication. CONCLUSIONS: The critical incident technique can be used to develop an instrument that demonstrates content and construct validity. We found that residents consider commitment of time to teaching and clinical effectiveness to be the most important dimensions of faculty teaching. Other important dimensions include written and verbal communication, cost-effectiveness and concern for residents' professional development.  相似文献   

10.
Context  We wished to determine which factors are important in ensuring interviewers are able to make reliable and valid decisions about the non-cognitive characteristics of candidates when selecting candidates for entry into a graduate-entry medical programme using the multiple mini-interview (MMI).
Methods  Data came from a high-stakes admissions procedure. Content validity was assured by using a framework based on international criteria for sampling the behaviours expected of entry-level students. A variance components analysis was used to estimate the reliability and sources of measurement error. Further modelling was used to estimate the optimal configurations for future MMI iterations.
Results  This study refers to 485 candidates, 155 interviewers and 21 questions taken from a pre- prepared bank. For a single MMI question and 1 assessor, 22% of the variance between scores reflected candidate-to-candidate variation. The reliability for an 8-question MMI was 0.7; to achieve 0.8 would require 14 questions. Typical inter-question correlations ranged from 0.08 to 0.38. A disattenuated correlation with the Graduate Australian Medical School Admissions Test (GAMSAT) subsection 'Reasoning in Humanities and Social Sciences' was 0.26.
Conclusions  The MMI is a moderately reliable method of assessment. The largest source of error relates to aspects of interviewer subjectivity, suggesting interviewer training would be beneficial. Candidate performance on 1 question does not correlate strongly with performance on another question, demonstrating the importance of context specificity. The MMI needs to be sufficiently long for precise comparison for ranking purposes. We supported the validity of the MMI by showing a small positive correlation with GAMSAT section scores.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号