首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到10条相似文献,搜索用时 140 毫秒
1.
Purpose. Post-encounter written exercises (e.g., patient notes) have been included in clinical skills assessments that use standardized patients. The purpose of this study was to estimate the generalizability of the scores from these written exercises when they are rated by various trained health professionals, including physicians. Method. The patient notes from a 10 station clinical skills examination involving 10 first year emergency medicine residents were analytically scored by four rater groups: three physicians, three nurses, three fourth year medical students, three billing clerks. Generalizability analyses were used to partition the various sources of error variance and derive reliability-like coefficients for each group of raters. Results. The generalizability analyses indicated that case-to-case variability was a major source of error variance in the patient note scores. The variance attributable to the rater or to the rater by examinee interaction was negligible. This finding was consistent across the four rater groups. Generalizability coefficients in excess of 0.80 were achieved for each of the four sets of raters. Physicians did, however, produce the most dependable scores. Conclusion. There is little advantage, from a reliability perspective, in using more than one trained physician, or other health professional who is adequately trained to score the patient note. Measurement error is introduced primarily by case sampling variability. This suggests that, if required, increases in the generalizability of the patient note scores can be made through the addition of cases, and not the addition of raters. This revised version was published online in July 2006 with corrections to the Cover Date.  相似文献   

2.
Medical Education 2011: 45 : 570–577 Objectives Progress tests give a continuous measure of a student’s growth in knowledge. However, the result at each test instance is subject to measurement error from a variety of sources. Previous tests contain useful information that might be used to reduce this error. A Bayesian statistical approach to using this prior information was investigated. Methods We first developed a Bayesian model that used the result from only one preceding test to update both the current estimated test score and its standard error of measurement (SEM). This was then extended to include results from all previous tests. Results The Bayesian model leads to an exponentially weighted combination of test scores. The results show smoothing of test scores when all previous tests are included in the model. The effective sample size is doubled, leading to a 30% reduction in measurement error. Conclusions A Bayesian approach can give improved score estimates and smaller SEMs. The method is simple to use with large cohorts of students and frequent tests. The smoothing of raw scores should give greater consistency in rank ordering of students and hence should better identify both high‐performing students and those in need of remediation.  相似文献   

3.
CONTEXT: Standardised assessments of practising doctors are receiving growing support, but theoretical and logistical issues pose serious obstacles. OBJECTIVES: To obtain reference performance levels from experienced doctors on computer-based case simulation (CCS) and standardised patient-based (SP) methods, and to evaluate the utility of these methods in diagnostic assessment. SETTING AND PARTICIPANTS: The study was carried out at a military tertiary care facility and involved 54 residents and credentialed staff from the emergency medicine, general surgery and internal medicine departments. MAIN OUTCOME MEASURES: Doctors completed 8 CCS and 8 SP cases targeted at doctors entering the profession. Standardised patient performances were compared to archived Year 4 medical student data. RESULTS: While staff doctors and residents performed well on both CCS and SP cases, a wide range of scores was exhibited on all cases. There were no significant differences between the scores of participants from differing specialties or of varying experience. Among participants who completed both CCS and SP testing (n = 44), a moderate positive correlation between CCS and SP checklist scores was observed. There was a negative correlation between doctor experience and SP checklist scores. Whereas the time students spent with SPs varied little with clinical task, doctors appeared to spend more time on communication/counselling cases than on cases involving acute/chronic medical problems. CONCLUSION: Computer-based case simulations and standardised patient-based assessments may be useful as part of a multimodal programme to evaluate practising doctors. Additional study is needed on SP standard setting and scoring methods. Establishing empirical likelihoods for a range of performances on assessments of this character should receive priority.  相似文献   

4.
CONTEXT: Item response theory (IRT) measurement models are discussed in the context of their potential usefulness in various medical education settings such as assessment of achievement and evaluation of clinical performance. PURPOSE: The purpose of this article is to compare and contrast IRT measurement with the more familiar classical measurement theory (CMT) and to explore the benefits of IRT applications in typical medical education settings. SUMMARY: CMT, the more common measurement model used in medical education, is straightforward and intuitive. Its limitation is that it is sample-dependent, in that all statistics are confounded with the particular sample of examinees who completed the assessment. Examinee scores from IRT are independent of the particular sample of test questions or assessment stimuli. Also, item characteristics, such as item difficulty, are independent of the particular sample of examinees. The IRT characteristic of invariance permits easy equating of examination scores, which places scores on a constant measurement scale and permits the legitimate comparison of student ability change over time. Three common IRT models and their statistical assumptions are discussed. IRT applications in computer-adaptive testing and as a method useful for adjusting rater error in clinical performance assessments are overviewed. CONCLUSIONS: IRT measurement is a powerful tool used to solve a major problem of CMT, that is, the confounding of examinee ability with item characteristics. IRT measurement addresses important issues in medical education, such as eliminating rater error from performance assessments.  相似文献   

5.
Context  The objective structured clinical examination (OSCE) requires the use of standardised patients (SPs). Recruitment of SPs can be challenging and factors assumed to be neutral may vary between SPs. On stations that are considered gender-neutral, either male or female SPs may be used. This may lead to an increase in measurement error. Prior studies on SP gender have often confounded gender with case.
Objective  The objective of this study was to assess whether a variation in SP gender on the same case resulted in a systematic difference in student scores.
Methods  At the University of Ottawa, 140 Year 3 medical students participated in a 10-station OSCE. Two physical examination stations were selected for study because they were perceived to be 'gender-neutral'. One station involved the physical examination of the back and the other of the lymphatic system. On each of the study stations, male and female SPs were randomly allocated.
Results  There was no difference in mean scores on the back examination station for students with female (6.96/10.00) versus male (7.04/10.00) SPs ( P  = 0.713). However, scores on the lymphatic system examination station showed a significant difference, favouring students with female (8.30/10.00) versus male (7.41/10.00) SPs ( P  < 0.001). Results were not dependent on student gender.
Conclusions  The gender of the SP may significantly affect student performance in an undergraduate OSCE in a manner that appears to be unrelated to student gender. It would be prudent to use the same SP gender for the same case, even on seemingly gender-neutral stations.  相似文献   

6.
Physician–patient communication is a clinical skill that can be learned and has a positive impact on patient satisfaction and health outcomes. A concerted effort at all medical schools is now directed at teaching and evaluating this core skill. Student communication skills are often assessed by an Objective Structure Clinical Examination (OSCE). However, it is unknown what sources of error variance are introduced into examinee communication scores by various OSCE components. This study primarily examined the effect different examiners had on the evaluation of students’ communication skills assessed at the end of a family medicine clerkship rotation. The communication performance of clinical clerks from Classes 2005 and 2006 were assessed using six OSCE stations. Performance was rated at each station using the 28-item Calgary-Cambridge guide. Item Response Theory analysis using a Multifaceted Rasch model was used to partition the various sources of error variance and generate a “true” communication score where the effects of examiner, case, and items are removed. Variance and reliability of scores were as follows: communication scores (.20 and .87), examiner stringency/leniency (.86 and .91), case (.03 and .96), and item (.86 and .99), respectively. All facet scores were reliable (.87–.99). Examiner variance (.86) was more than four times the examinee variance (.20). About 11% of the clerks’ outcome status shifted using “true” rather than observed/raw scores. There was large variability in examinee scores due to variation in examiner stringency/leniency behaviors that may impact pass–fail decisions. Exploring the benefits of examiner training and employing “true” scores generated using Item Response Theory analyses prior to making pass/fail decisions are recommended.  相似文献   

7.
We report on two evaluability assessments (EAs) of social prescribing (SP) services in South East England conducted in 2016/7. We aimed to demonstrate how EAs can be used to assess whether a programme is ready to be evaluated for outcomes, what changes would be needed to do so and whether the evaluation would contribute to improved programme performance. We also aimed to draw out the lessons learned through the EA process and consider how these can inform the design and evaluation of SP schemes. EAs followed the steps described by Wholey, New Dir Eval 33:77, (1987) and Leviton et al., Annu Rev Public Health 31:213, (2010), including collaboration with stakeholders, elaboration, testing and refinement of an agreed programme theory, understanding the programme reality, identification and review of existing data sources and assessment against key criteria. As a result, evaluation of the services was not recommended. Necessary changes to allow for future evaluation include gaining access to electronic patient records, establishing procedures for collection of baseline and outcome data and linking to data on use of other healthcare services. Lessons learned include ensuring that: (a) SP schemes are developed with involvement (and buy in) of relevant stakeholders; (b) information governance and data sharing agreements are in place from the start; (c) staffing levels are sufficient to cover the range of activities involved in service delivery, data monitoring, reporting, evaluation and communication with stakeholders; (d) SP schemes are co‐located with primary care services; and (e) referral pathways and linkages to health service data systems are established as part of the programme design. We conclude that EA provides a valuable tool for informing the design and evaluation of SP schemes. EA can help commissioners to make best use of limited evaluation resources and prioritise which programmes need to be evaluated, as well as how, why and when.  相似文献   

8.
Genome‐wide scans of nucleotide variation in human subjects are providing an increasing number of replicated associations with complex disease traits. Most of the variants detected have small effects and, collectively, they account for a small fraction of the total genetic variance. Very large sample sizes are required to identify and validate findings. In this situation, even small sources of systematic or random error can cause spurious results or obscure real effects. The need for careful attention to data quality has been appreciated for some time in this field, and a number of strategies for quality control and quality assurance (QC/QA) have been developed. Here we extend these methods and describe a system of QC/QA for genotypic data in genome‐wide association studies (GWAS). This system includes some new approaches that (1) combine analysis of allelic probe intensities and called genotypes to distinguish gender misidentification from sex chromosome aberrations, (2) detect autosomal chromosome aberrations that may affect genotype calling accuracy, (3) infer DNA sample quality from relatedness and allelic intensities, (4) use duplicate concordance to infer SNP quality, (5) detect genotyping artifacts from dependence of Hardy‐Weinberg equilibrium test P‐values on allelic frequency, and (6) demonstrate sensitivity of principal components analysis to SNP selection. The methods are illustrated with examples from the “Gene Environment Association Studies” (GENEVA) program. The results suggest several recommendations for QC/QA in the design and execution of GWAS. Genet. Epidemiol. 34: 591–602, 2010. © 2010 Wiley‐Liss, Inc.  相似文献   

9.
Training of head control in the sitting and semi-prone positions   总被引:3,自引:0,他引:3  
The purpose of this study was to compare the semi-prone (SP) and sitting (SIT) training positions with respect to head control in children with cerebral palsy, before and after 5 weeks biofeedback training using a head position trainer (HPT). Four children were randomly assigned to each of two training groups: (a) SP on a prone board inclined 45 degrees above the horizontal and (b) SIT in their personal wheelchair and orthotic device. The HPT, secured to the child's head, controlled a video-cassette player, turning it off when the head deviated beyond 25 degrees from the vertical (termed an error). The time in error and the number of errors during test periods of 3 minutes, without feedback and completed in both the SP and the SIT positions, were determined immediately before and after, and at 16 weeks after training. The SIT trained group performed significantly better immediately post-training in three of four comparisons (P < 0.01), but the groups performed similarly in the other eight comparisons--four immediately pre-training and four at 16 weeks post-training (P > 0.05). Post-training scores for the total group (n = 8) were significantly improved over pre-training scores, regardless of the test position or the criterion measurement (P < 0.05). Biofeedback training with a HPT can be effective in either the SIT or the SP positions, with improvement lasting at least 16 weeks after training is discontinued.  相似文献   

10.
Many research questions focused on characterizing usual, or long-term average, dietary intake of populations and subpopulations rely on short-term intake data. The objective of this paper is to review key assumptions, statistical techniques, and considerations underpinning the use of short-term dietary intake data to make inference about usual dietary intake. The focus is on measurement error and strategies to mitigate its effects on estimated characteristics of population-level usual intake, with attention to relevant analytic issues such as accounting for survey design. Key assumptions are that short-term assessments are subject to random error only (i.e., unbiased for individual usual intake) and that some aspects of the error structure apply to all respondents, allowing estimation of this error structure in data sets with only a few repeat measures per person. Under these assumptions, a single 24-hour dietary recall per person can be used to estimate group mean intake; and with as little as one repeat on a subsample and with more complex statistical techniques, other characteristics of distributions of usual intake, such as percentiles, can be estimated. Related considerations include the number of days of data available, skewness of intake distributions, whether the dietary components of interest are consumed nearly daily by nearly everyone or episodically, the number of correlated dietary components of interest, time-varying nuisance effects related to day of week and season, and variance estimation and inference. Appropriate application of assumptions and recommended statistical techniques allows researchers to address a range of research questions, though it is imperative to acknowledge systematic error (bias) in short-term data and its implications for conclusions.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号