首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
PURPOSE: Process-oriented global ratings, which assess "overall performance" on one or a number of domains, have been purported to capture nuances of expert performance better than checklists. Pilot data indicate that students change behaviors depending on their perceptions of how they are being scored, while experts do not. This study examines the impact of the students' orientation to the rating system on OSCE scores and the interstation reliability of the checklist and global scores. METHOD: A total of 57 third- and fourth-year medical students at one school were randomly assigned to two groups and performed a ten-station OSCE. Group 1 was told that scores were based on checklists. Group 2 was informed that performance would be rated using global ratings geared toward assessing overall competence. All candidates were scored by physician-examiners who were unaware of the students' orientations to the rating system and who used both checklists and global rating forms. RESULTS: A mixed two-factor ANOVA identified a significant interaction of rating form by group (F(1,55) = 5.5, p <.05), with Group 1 (checklist-oriented) having higher checklist scores but lower global scores than did Group 2 (oriented to global ratings). In addition, Group 1 had higher interstation alpha coefficients than did Group 2 for both global scores (0.74 versus 0.63) and checklist scores (0.63 versus 0.40). CONCLUSIONS: The interaction effect on total exam scores suggests that students adapt their behaviors to the system of evaluation. However, the lower reliability coefficients for both forms found in the process-oriented global-rating group suggest that an individual's capacity to adapt to the system of global rating forms is relatively station-specific, possibly depending on his or her expertise in the domain represented in each station.  相似文献   

2.
3.
Although clinical judgment is often used in assessment and treatment planning, rarely has research examined its reliability, validity, or impact in practice settings. This study tailored the frequency of home visits in a prevention program for aggressive- disruptive children (n = 410; 56% minority) on the basis of 2 kinds of clinical judgment: ratings of parental functioning using a standardized multi-item scale and global assessments of family need for services. Stronger reliability and better concurrent and predictive validity emerged for the 1st kind of clinical judgment than for the 2nd. Exploratory analyses suggested that using ratings of parental functioning to tailor treatment recommendations improved the impact of the intervention by the end of 3rd grade but using more global assessments of family need did not.  相似文献   

4.
PURPOSE: To develop a valid and reliable examination to assess the technical proficiency of family medicine residents' performance of minor surgical office procedures. METHOD: A multi-station OSCE-style examination using bench-model simulations of minor surgical procedures was developed. Participants were a randomly selected group of 33 family medicine residents (PGY-1 = 16, PGY-2 = 17) and 14 senior surgical residents who functioned as a validation group. Examiners were qualified surgeons and family physicians who used both checklists and global rating scales to score the participants' performances. RESULTS: When family medicine residents were evaluated by family physicians, interstation reliabilities were .29 for checklists and .42 for global ratings. When family medicine residents were evaluated by surgeons, the reliabilities were .53 for checklists and .75 for global ratings. Interrater reliability, measured as a correlation for total examination scores, was .97. Mean scores on the examination were 60%, 64%, and 87% for PGY-1 family medicine, PGY-2 family medicine, and surgery residents, respectively. The difference in scores between family medicine and surgery residents was significant (p < .001), providing evidence of construct validity. CONCLUSION: A new examination developed for assessing family medicine residents' skills with minor surgical office procedures is reliable and has evidence for construct validity. The examination has low reliability when family physicians serve as examiners, but moderate reliability when surgeons are the evaluators.  相似文献   

5.
This research compared independent ratings of criminal psychopathy (Hare's Psychopathy Checklist, Hare, 1991) from National Parole Board case files alone with ratings based upon file information plus a semi-structured interview. Notwithstanding high interrater reliability using National Parole Board (NPB) case files alone (n = 35), Psychopathy Checklist (PCL-R) scores had to be prorated because 30% of the items could not be scored. Comparisons between file only and independent file plus interview ratings of criminal psychopathy for a larger sample (N = 120) resulted in relatively low inter-rater reliability. Interrater agreement for ratings of psychopathy with and without an interview was statistically significant (p <.006), yet 40% of the cases received different diagnoses when the different procedures were used. Contrary to earlier findings (Wong, 1988), file only PCL-R ratings were not routinely an underestimate of file plus interview PCL-R ratings.  相似文献   

6.
PURPOSE: To examine the relationship between graduates' performances on a prototype of the National Board of Medical Examiners' Step 2 CS and other undergraduate measures with their residency directors' ratings of their performances as interns. METHOD: Data were collected for the 2001 and 2002 graduates from the study institution. Checklist and interpersonal scores from the prototype Step 2 CS, along with United States Medical Licensing Examination (USMLE) Step 1 and 2 scores and undergraduate grade-point average (GPA), were correlated with residency directors' ratings (average score for six competencies, quartile ranking, and isolated interpersonal communication competency score). Stepwise linear regression was used to identify the best outcome predictors. RESULTS: Quartile ranking was more highly correlated with GPA than Step 2 CS prototype interpersonal score, USMLE Step 2 score, USMLE Step 1 score, and Step 2 CS prototype checklist score. The average score on the residency director's survey was more highly correlated with GPA than USMLE Step 2 score, USMLE Step 1 score, Step 2 CS prototype interpersonal score, and Step 2 CS prototype checklist score. The best predictors for both quartile ranking and average competency score were GPA and Step 2 CS prototype interpersonal score (R(2) = 0.26 and 0.28). CONCLUSION: Both scores from the Step 2 CS prototype significantly correlated with the interns' quartile ranking and average competency score. Only GPA and Step 2 CS prototype interpersonal score accounted for most of the variance of performance in the regression model.  相似文献   

7.
BACKGROUND: Assuming that psychomotor disturbances represent the core and are specific of melancholia, Parker et al. have developed the CORE, an 18-item scale assessing retardation, agitation and non-interactivity by behavioural observation which is able to distinguish melancholia from other depressive disorders. We report an inter-rater reliability study of the French version of the CORE. METHODS: 35 French-speaking in-patients, with ICD-10 criteria for major depression underwent a video-recorded interview aimed to rate the CORE. Each patient's recording was rated by 5 psychiatrists. We tested the inter-rater reliability of the total CORE score and of each of its three subscales' scores using the intra-class correlation coefficient (ICC). A cut-off score for melancholia was established against the opinion of the psychiatrist in charge of the patient using a ROC curve. We used Cohen's kappa to assess the agreement between raters as to rate patients above the cut-off, namely the allocation of melancholia. RESULTS: The global ICC for the total score was 0.88 and ranged from 0.97 to 0.75 for the varying rater dyads. A ROC curve yielded an optimal cut-off of 5 for melancholia. The global kappa for the agreement on melancholia allocation was good (0.65). LIMITATIONS: The five raters had not exactly the same condition of quotation. The agreement for the "agitation" subscale was poor. CONCLUSION: The French version of the CORE, has a good to excellent inter-rater reliability for the total score as for the allocation of melancholia according to a cut-off. Further validation studies are required to allow research application.  相似文献   

8.
9.
The performance of foreign medical graduates on multistation standardized patient-based tests and used to determine the validity and generalizability of global ratings of their clinical competence made by expert examiners. Data were derived from the entrance examinations of the 1989 and 1990 applicants to the Ontario Pre-Internship Program and the exit examination of 24 participants from the 1989 cohort. For each candidate, the examiners completed a detailed checklist and two five-point global ratings dealing with the candidate's approach to the patients' problem and attitude toward the patient. Generalizability coefficients for both ratings were satisfactory and stable across cohorts. Construct validity of the global ratings was demonstrated by comparing entry and exit ratings and by evidence of significant and positive correlations between the global ratings and total test scores. Tentative evidence of criterion validity of the global ratings was demonstrated. These findings suggest that global ratings by expert examiners can be used as an effective form of assessment in multistation standardized patient examinations.  相似文献   

10.

Background

Social media can promote healthy behaviors by facilitating engagement and collaboration among health professionals and the public. Thus, social media is quickly becoming a vital tool for health promotion. While guidelines and trainings exist for public health professionals, there are currently no standardized measures to assess individual social media competency among Certified Health Education Specialists (CHES) and Master Certified Health Education Specialists (MCHES).

Objective

The aim of this study was to design, develop, and test the Social Media Competency Inventory (SMCI) for CHES and MCHES.

Methods

The SMCI was designed in three sequential phases: (1) Conceptualization and Domain Specifications, (2) Item Development, and (3) Inventory Testing and Finalization. Phase 1 consisted of a literature review, concept operationalization, and expert reviews. Phase 2 involved an expert panel (n=4) review, think-aloud sessions with a small representative sample of CHES/MCHES (n=10), a pilot test (n=36), and classical test theory analyses to develop the initial version of the SMCI. Phase 3 included a field test of the SMCI with a random sample of CHES and MCHES (n=353), factor and Rasch analyses, and development of SMCI administration and interpretation guidelines.

Results

Six constructs adapted from the unified theory of acceptance and use of technology and the integrated behavioral model were identified for assessing social media competency: (1) Social Media Self-Efficacy, (2) Social Media Experience, (3) Effort Expectancy, (4) Performance Expectancy, (5) Facilitating Conditions, and (6) Social Influence. The initial item pool included 148 items. After the pilot test, 16 items were removed or revised because of low item discrimination (r<.30), high interitem correlations (Ρ>.90), or based on feedback received from pilot participants. During the psychometric analysis of the field test data, 52 items were removed due to low discrimination, evidence of content redundancy, low R-squared value, or poor item infit or outfit. Psychometric analyses of the data revealed acceptable reliability evidence for the following scales: Social Media Self-Efficacy (alpha=.98, item reliability=.98, item separation=6.76), Social Media Experience (alpha=.98, item reliability=.98, item separation=6.24), Effort Expectancy(alpha =.74, item reliability=.95, item separation=4.15), Performance Expectancy (alpha =.81, item reliability=.99, item separation=10.09), Facilitating Conditions (alpha =.66, item reliability=.99, item separation=16.04), and Social Influence (alpha =.66, item reliability=.93, item separation=3.77). There was some evidence of local dependence among the scales, with several observed residual correlations above |.20|.

Conclusions

Through the multistage instrument-development process, sufficient reliability and validity evidence was collected in support of the purpose and intended use of the SMCI. The SMCI can be used to assess the readiness of health education specialists to effectively use social media for health promotion research and practice. Future research should explore associations across constructs within the SMCI and evaluate the ability of SMCI scores to predict social media use and performance among CHES and MCHES.  相似文献   

11.
OBJECTIVE: To evaluate the reliability and validity of a new observational measure of children's procedure-related distress behaviors, the Brief Behavioral Distress Scale (BBDS), to provide clinicians with an efficient, economical alternative measure that does not depend on continuous interval coding. METHODS: Forty-eight randomly selected videotaped invasive medical procedures performed on children (ages 2 to 10 years) with chronic illness were coded with the BBDS and the Observation Scale of Behavioral Distress (OSBD). Reliability and validity analyses along with item analysis were conducted. RESULTS: Total distress scores of the BBDS were highly correlated with six of seven concurrent validity measures from multiple sources (i.e., OSBD, parent ratings, two nurse ratings, child self-report, and a physiological arousal measure, heart rate) (range r =.57-.76, p <.001-.0001). A robust association was found between the BBDS distress scores and OSBD total distress scores (r =.72, p <.0001). For two concurrent validity measures, the BBDS demonstrated stronger associations than did the OSBD. Interrater reliability was high for each BBDS distress behavior category. CONCLUSIONS: Based on the findings reported, the BBDS is a reliable and valid measure of children's procedure-related distress with functional utility in both research and clinical settings.  相似文献   

12.
PURPOSE: To determine whether a longitudinal residents-as-teachers curriculum improves generalist residents' teaching skills. METHOD: From May 2001 to February 2002, 23 second-year generalist residents in four residencies affiliated with the University of California, Irvine, College of Medicine, completed a randomized, controlled trial of a longitudinal residents-as-teachers program. Thirteen intervention residents underwent a 13-hour curriculum during one-hour noon conferences twice monthly for six months, practicing teaching skills and receiving checklist-guided feedback. In a 3.5-hour, eight-station objective structured teaching examination (OSTE) enacted and rated by 15 senior medical students before and after the curriculum, two trained, blinded raters independently assessed each station with detailed, case-specific rating scales (rating scale reliability = 0.96, inter-rater reliability = 0.78). RESULTS: The intervention and control groups were similar in academic performance, specialty distribution, and gender (chi(2) = 0.434, p =.81). On a five-point Likert scale (5 = best teaching skills), intervention and control residents showed similar mean pretest OSTE scores (2.83 vs. 2.88, p =.736). The intervention group improved their mean overall OSTE scores 22.3% (more than two standard deviations) from 2.83 (pretest) to 3.46 (post-test; p <.0005; 95% CI 0.53 to 0.72). Intervention residents also improved significantly on six of eight OSTE stations. Within the control group, no pretest-to-post-test change achieved statistical significance. Mann-Whitney and Wilcoxon signed-rank tests confirmed these results. CONCLUSIONS: Generalist residents randomly assigned to receive a 13-hour longitudinal residents-as-teachers curriculum consistently showed improved OSTE scores. Future research should clarify which aspects of residents-as-teachers curricula most effectively improve educational outcomes.  相似文献   

13.
阿尔茨海默病评定量表中文译本的效度和信度   总被引:8,自引:0,他引:8  
目的:评价阿尔茨海默病评定量表(Alxheimer’s Disease Assessment Scale,ADAS)中文译本的效度和信度。方法:选择20例符合NINCDS-ASRSA诊断标准的很可能AD患者为被试,两名评定者盲法评定,通过评定者间一致性评价量表信度。进行ADAS、MMSE、GDS、ADL和Blessed Roth量表评分,通过相关分析考察量表效度。ADAS各条目评分进行相差分析,以  相似文献   

14.

Objectives

A reliable tool to evaluate flow is paramount in systemic sclerosis (SSc). We describe herein on the one hand a systematic literature review on the reliability of laser speckle contrast analysis (LASCA) to measure the peripheral blood perfusion (PBP) in SSc and perform an additional pilot study, investigating the intra- and inter-rater reliability of LASCA.

Methods

A systematic search was performed in 3 electronic databases, according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. In the pilot study, 30 SSc patients and 30 healthy subjects (HS) underwent LASCA assessment. Intra-rater reliability was assessed by having a first anchor rater performing the measurements at 2 time-points and inter-rater reliability by having the anchor rater and a team of second raters performing the measurements in 15 SSc and 30 HS. The measurements were repeated with a second anchor rater in the other 15 SSc patients, as external validation.

Results

Only 1 of the 14 records of interest identified through the systematic search was included in the final analysis. In the additional pilot study: intra-class correlation coefficient (ICC) for intra-rater reliability of the first anchor rater was 0.95 in SSc and 0.93 in HS, the ICC for inter-rater reliability was 0.97 in SSc and 0.93 in HS. Intra- and inter-rater reliability of the second anchor rater was 0.78 and 0.87.

Conclusions

The identified literature regarding the reliability of LASCA measurements reports good to excellent inter-rater agreement. This very pilot study could confirm the reliability of LASCA measurements with good to excellent inter-rater agreement and found additionally good to excellent intra-rater reliability. Furthermore, similar results were found in the external validation.  相似文献   

15.
We examined the relationship between ratings of patient cooperation and neuropsychological test performance in a sample (N = 333) of dementing patients and normal controls. We also examined the stability of the relationship in a subset of this same sample (N = 299) who were retested a year later. All the correlation coefficients in both years were significant, with a median Pearson of.64 in year one and.725 in year two. The test-retest reliability for ratings of cooperativeness over the one-year time period (rated by different examiners) was also significant,r =.64 (p <.0001). This analysis indicates that cooperation plays a significant role in neuropsychological test performance and that ratings of cooperativeness are relatively stable over periods of up to a year in length.  相似文献   

16.
儿童孤独症筛查量表的编制与信度、效度分析   总被引:7,自引:1,他引:7  
目的:编制儿童孤独症筛查量表,并对该量表进行信度和效度分析.方法:基于目前精神医学界对儿童孤独症临床表现和诊断的认识及孤独症诊断访谈量表的研究结果编制量表.选择儿童孤独症组、精神发育迟滞组和正常儿童组作为研究对象进行信度、效度检验.结果:该量表各项目的评定者信度和重测信度(Kappa值)分别为0.429-0.639(除一项外)和0.404-0.732.各因子的评定者信度和重测信度(相关系数)分别为0.944-0.988和0.840-0.984.量表总分的评定者信度和重测信度(相关系数)分别为0.933和0.986.量表的分半信度为0.969.同质性信度除一项外,其他各项目间的相关系数为0.444-0.855.P均<0.01.该量表各项目的鉴别诊断效度:三组间χ2值为20.658-108.152;各两组间Z值分别为2.186-9.264(除二项外).各因子的区分效度:三组间F值为138.227-468.368;各两组间组间均数差为2.916-14.542.量表总分的区分效度:三组间F值为562.563,各两组间组间均数差为10.91-34.64.p为0.029-0.000.量表的所有项目可聚为三个因子.筛查界限分为21分,诊断界限分为24分.结论:该量表具有较好的信度和效度,可用于儿童孤独症的筛查和诊断.  相似文献   

17.
The present study examined the relationship between the Rorschach Ego Impairment Index (EII) and psychiatric severity. Search procedures yielded 13 independent samples (total N = 1402, average n = 108, standard deviation = 90) for inclusion in the meta‐analysis. Inter‐rater reliability analyses demonstrated that coding of effect sizes and moderator variables was completed with good to excellent reliability. Results indicated that higher EII scores were associated with greater psychiatric severity, with an overall weighted effect size of r = 0.29, p = 0.000002 (95% confidence interval = 0.17–0.40), supporting the EII's validity as a measure of psychological impairment. Publication bias analyses did not indicate any significant cause for concern regarding the results. The data were demonstrably heterogeneous (Q = 56.82, p = 0.0000001), and results of post‐hoc tests indicated that effect sizes with dependent variables obtained via researcher ratings were significantly larger than any of the following: effect sizes with dependent variables obtained via clinician ratings, informant ratings, information about level of treatment or placement status or self‐report ratings (p's = 0.0005, 0.003, <0.001, <0.001, respectively). In addition, there was a trend for effect sizes based on performance‐based measures to be larger than those based on information about level of treatment or placement status (p = 0.098) as well as those based on self‐report measures (p = 0.076). Other moderator analyses were non‐significant (p 's > 0.10). Copyright © 2010 John Wiley & Sons, Ltd. Key Practitioner Message: ? The Rorschach Ego Impairment Index (EII) demonstrated validity in measuring psychiatric severity across a range of normative, outpatient, residential, and inpatient samples. ? The degree of the EII's validity in assessing psychiatric severity compared favorably to the overall validity of the Rorschach and the MMPI. ? The EII appears to be most valid in capturing psychiatric severity as measured by researcher ratings of social competency or estimated ego impairment.  相似文献   

18.
This study examined the interrater reliability and temporal stability of a scoring system developed by Troyer, Moscovich, and Winocur [Neuropsychologia 11 (1997) 138] to measure clustering and switching on verbal fluency (VF) tasks such as the Controlled Oral Word Association Test (COWAT) [Benton, A.L., Hamsher, K., & Sivan, A.B. (1983). Multilingual aphasia examination (3rd ed.). Iowa City, IA: AJA Associates]. Seven independent raters scored COWAT protocols of 125 healthy participants in accordance with the rules proposed by Troyer et al. Intraclass coefficients were near perfect, ranging from.96 for total number of clusters to.99 for total number of switches. Test-retest reliability coefficients (n=55) were poor to modest (r=.47 for clusters, r=.58 for switches, and r=.70 for total words). Significant improvement in performance was observed across most COWAT indices, suggesting a practice effect. Modifications to test administration are suggested to improve the stability of cluster and switch scores, as well as other variables for further study.  相似文献   

19.
The assessment of children's psychopathology is often based on parental report. Earlier studies have suggested that rater bias can affect the estimates of genetic, shared environmental and unique environmental influences on differences between children. The availability of a large dataset of maternal as well as paternal ratings of psychopathology in 7‐year old children enabled (i) the analysis of informant effects on these assessments, and (ii) to obtain more reliable estimates of the genetic and non‐genetic effects. DSM‐oriented measures of affective, anxiety, somatic, attention‐deficit/hyperactivity, oppositional‐defiant, conduct, and obsessive‐compulsive problems were rated for 12,310 twin pairs from the Netherlands Twin Register by mothers (N = 12,085) and fathers (N = 8,516). The effects of genetic and non‐genetic effects were estimated on the common and rater‐specific variance. For all scales, mean scores on maternal ratings exceeded paternal ratings. Parents largely agreed on the ranking of their child's problems (r 0.60–0.75). The heritability was estimated over 55% for maternal and paternal ratings for all scales, except for conduct problems (44–46%). Unbiased shared environmental influences, i.e., on the common variance, were significant for affective (13%), oppositional (13%), and conduct problems (37%). In clinical settings, different cutoffs for (sub)clinical scores could be applied to paternal and maternal ratings of their child's psychopathology. Only for conduct problems, shared environmental and genetic influences explain an equal amount in differences between children. For the other scales, genetic factors explain the majority of the variance, especially for the common part that is free of rater bias. © 2016 The Authors. American Journal of Medical Genetics Part B: Neuropsychiatric Genetics Published by Wiley Periodicals, Inc.  相似文献   

20.
PURPOSE: To test whether global ratings of checklists are a viable alternative to global ratings of actual clinical performance for use as a criterion for standardized-patient (SP) assessment. METHOD: Five faculty physicians independently observed and rated videotaped performances of 44 medical students on the seven SP cases that comprise the fourth-year assessment administered at The Morchand Center of Mount Sinai School of Medicine to students in the eight member schools in the New York City Consortium. A year later, the same panel of raters reviewed and rated checklists for the same 44 students on five of the same SP cases. RESULTS: The mean global ratings of clinical competence were higher with videotapes than checklists, whereas the mean global ratings of interpersonal and communication skills were lower with videotapes. The correlations for global ratings of clinical competence showed only moderate agreement between the videotape and checklist ratings; and for interpersonal and communication skills, the correlations were somewhat weaker. CONCLUSION: The results raise serious questions about the viability of global ratings of checklists as an alternative to ratings of observed clinical performance as a criterion for SP assessment.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号