首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Context A test score is a number which purportedly reflects a candidate’s proficiency in some clearly defined knowledge or skill domain. A test theory model is necessary to help us better understand the relationship that exists between the observed (or actual) score on an examination and the underlying proficiency in the domain, which is generally unobserved. Common test theory models include classical test theory (CTT) and item response theory (IRT). The widespread use of IRT models over the past several decades attests to their importance in the development and analysis of assessments in medical education. Item response theory models are used for a host of purposes, including item analysis, test form assembly and equating. Although helpful in many circumstances, IRT models make fairly strong assumptions and are mathematically much more complex than CTT models. Consequently, there are instances in which it might be more appropriate to use CTT, especially when common assumptions of IRT cannot be readily met, or in more local settings, such as those that may characterise many medical school examinations. Objectives The objective of this paper is to provide an overview of both CTT and IRT to the practitioner involved in the development and scoring of medical education assessments. Methods The tenets of CCT and IRT are initially described. Then, main uses of both models in test development and psychometric activities are illustrated via several practical examples. Finally, general recommendations pertaining to the use of each model in practice are outlined. Discussion Classical test theory and IRT are widely used to address measurement‐related issues that arise from commonly used assessments in medical education, including multiple‐choice examinations, objective structured clinical examinations, ward ratings and workplace evaluations. The present paper provides an introduction to these models and how they can be applied to answer common assessment questions. Medical Education 2010: 44 : 109–117  相似文献   

2.
INTRODUCTION: Several studies have shown the need to include the physical work environment among the dimensions included in job satisfaction evaluation. However, this dimension was not included in the Font-Roja questionnaire. The present study introduces two items exploring this dimension and adheres to the hypothesis that physical work environment has a significant impact on job satisfaction evaluation. METHOD: A total of 227 geriatric workers participated in this study. The participants completed the Font-Roja job satisfaction questionnaire with 2 additional items exploring the physical work environment. Factor analysis and principal components analysis with rotation varimax were used to determine the diverse components of job satisfaction. To determine the coherence of the scales and the consistency of the added items, Cronbach's a was used. These methods were applied to both questionnaires, the classical 24-item questionnaire and the extended 26-item questionnaire. RESULTS: The classical Font-Roja questionnaire was composed of 8 factors, explaining 60.02% of the variance. The extended questionnaire was structured into 9 factors, explaining 61.81% of the variance. The new factor was composed of both added items. The internal consistency of the Font-Roja classical scale was alpha = 0.773 and that of the extended scale was alpha = 0.791. DISCUSSION: The extended scale is superior to the classical scale. The results obtained seem to support the hypothesis that, for analysis of job satisfaction, the instruments used should contain items on the physical work environment.  相似文献   

3.
Background

Health outcomes researchers are increasingly applying Item Response Theory (IRT) methods to questionnaire development, evaluation, and refinement efforts.

Objective

To provide a brief overview of IRT, to review some of the critical issues associated with IRT applications, and to demonstrate the basic features of IRT with an example.

Methods

Example data come from 6,504 adolescent respondents in the National Longitudinal Study of Adolescent Health public use data set who completed to the 19-item Feelings Scale for depression. The sample was split into a development and validation sample. Scale items were calibrated in the development sample with the Graded Response Model and the results were used to construct a 10-item short form. The short form was evaluated in the validation sample by examining the correspondence between IRT scores from the short form and the original, and by comparing the proportion of respondents identified as depressed according to the original and short form observed cut scores.

Results

The 19 items varied in their discrimination (slope parameter range: .86–2.66), and item location parameters reflected a considerable range of depression (−.72–3.39). However, the item set is most discriminating at higher levels of depression. In the validation sample IRT scores generated from the short and long forms were correlated at .96 and the average difference in these scores was −.01. In addition, nearly 90% of the sample was classified identically as at risk or not at risk for depression using observed score cut points from the short and long forms.

Conclusions

When used appropriately, IRT can be a powerful tool for questionnaire development, evaluation, and refinement, resulting in precise, valid, and relatively brief instruments that minimize response burden.

  相似文献   

4.

Background  

The International Classification of Functioning, Disability and Health (ICF) proposes three main health outcomes, Impairment (I), Activity Limitation (A) and Participation Restriction (P), but good measures of these constructs are needed The aim of this study was to use both Classical Test Theory (CTT) and Item Response Theory (IRT) methods to carry out an item analysis to improve measurement of these three components in patients having joint replacement surgery mainly for osteoarthritis (OA).  相似文献   

5.
Background: As part of a larger study whose objective is to develop an abbreviated version of the EORTC QLQ-C30 suitable for research in palliative care, analyses were conducted to determine the feasibility of generating a shorter version of the 4-item emotional functioning (EF) scale that could be scored in the original metric. Methods: We used data from 24 European cancer studies conducted in 10 different languages (n=8242). Item selection was based on analyses by item response theory (IRT). Based on the IRT results, a simple scoring algorithm was developed to predict the original 4-item EF sum scale score from a reduced number of items. Results: Both a 3-item and a 2-item version (item 21 ‘Did you feel tense?’ and item 24 ‘Did you feel depressed?’) predicted the total score with excellent agreement and very little bias. In group comparisons, the 2-item scale led to the same conclusions as those based on the original 4-item scale with little or no loss of measurement efficiency. Conclusion: Although these results are promising, confirmatory studies are needed based on independent samples. If such additional studies yield comparable results, incorporation of the 2-item EF scale in an abbreviated version of the QLQ-C30 for use in palliative care research settings would be justified. The analyses reported here demonstrate the usefulness of the IRT-based methodology for shortening questionnaire scales.  相似文献   

6.
Abstract

Aims: This study examines the reliability and validity of the Mastery Scale-Chinese version (MS-C) when applied to three groups diagnosed with major depression, schizophrenia, or HIV/AIDS. Methods: The individuals participating in the study were recruited from outpatient units of a medical center and a municipal hospital in northern Taiwan. The study sample (n = 2009) included 237 patients with depressive disorders, 160 with schizophrenia, and 1612 with HIV/AIDS. The reliability and construct validity of the MS-C was evaluated by confirmatory factor analysis (CFA) and Rasch analysis. Results: The results of the CFA showed that the MS-C has adequate construct validity with all indices meeting the criteria, except for chi-square values. The Rasch analysis supported the four-point rating scale structure and a unidimensional construct of the MS-C. The DIF analysis showed that all items demonstrated stable measurement properties among the different diagnosis groups (major depression, schizophrenia, HIV/AIDS). Conclusion: This study found that MS-C has acceptable psychometric qualities in terms of reliability, construct validity, rating scale performance, and item characteristics when applied to patients with depression, schizophrenia, and HIV/AIDS in Taiwan.  相似文献   

7.
ObjectivesTo propose a multidimensional item response theory (MIRT) scoring system for the Short Form 12 (SF-12) with good psychometric properties in terms of fit and reliability.Study Design and SettingsTwo models, indicating physical (PCS) and mental component summary (MCS) dimensions, were fitted to SF-12 data from the European Study of the Epidemiology of Mental Disorders, a representative sample from European adult general population (n = 21,425; response rate = 61.2%). Goodness of fit, information, reliability, and agreement of individual scores were compared with the classical SF-12 and RAND-12 algorithms.ResultsThe bidimensional response process (BRP) model, where all items are indicators of both dimensions, yielded the best fit (root mean square error of approximation = 0.057, comparative fit index = 0.95, and Tucker–Lewis index = 0.94), and highly agreed with PCS and MCS scores from the SF-12 (intraclass correlation coefficients of 0.92 and 0.88, respectively) and RAND-12 (0.88 and 0.95). Regarding reliability, the BRP yielded 0.75 and 0.77 (PCS and MCS, respectively), greater than SF-12 (0.65 and 0.66) and RAND-12 (0.65 and 0.67). As indicated by scale linking, MIRT scores can be interpreted similarly to the classical scores.ConclusionThe MIRT models showed a clear construct structure for the PCS and MCS dimensions, defined by functional and role limitation content. Results support the use of SF-12 MIRT-based scores as a valid and reliable option to assess health status.  相似文献   

8.
The purpose of this paper was to evaluate the psychometric properties of a stage-specific self-efficacy scale for physical activity with classical test theory (CTT), confirmatory factor analysis (CFA) and item response modeling (IRM). Women who enrolled in the Women On The Move study completed a 20-item stage-specific self-efficacy scale developed for this study [n=226, 51.1% African-American and 48.9% Hispanic women, mean age = 49.2 (+/-7.0) years, mean body mass index = 29.7 (+/-6.4)]. Three analyses were conducted: (i) a CTT item analysis, (ii) a CFA to validate the factor structure and (iii) an IRM analysis. The CTT item analysis and the CFA results showed that the scale had high internal consistency (ranging from 0.76 to 0.93) and a strong factor structure. Results also showed that the scale could be improved by modifying or eliminating some of the existing items without significantly altering the content of the scale. The IRM results also showed that the scale had few items that targeted high self-efficacy and the stage-specific assumption underlying the scale was rejected. In addition, the IRM analyses found that the five-point response format functioned more like a four-point response format. Overall, employing multiple methods to assess the psychometric properties of the stage-specific self-efficacy scale demonstrated the complimentary nature of these methods and it highlighted the strengths and weaknesses of this scale.  相似文献   

9.
This paper compares the approach and resultant outcomes of item response models (IRMs) and classical test theory (CTT). First, it reviews basic ideas of CTT, and compares them to the ideas about using IRMs introduced in an earlier paper. It then applies a comparison scheme based on the AERA/APA/NCME 'Standards for Educational and Psychological Tests' to compare the two approaches under three general headings: (i) choosing a model; (ii) evidence for reliability--incorporating reliability coefficients and measurement error--and (iii) evidence for validity--including evidence based on instrument content, response processes, internal structure, other variables and consequences. An example analysis of a self-efficacy (SE) scale for exercise is used to illustrate these comparisons. The investigation found that there were (i) aspects of the techniques and outcomes that were similar between the two approaches, (ii) aspects where the item response modeling approach contributes to instrument construction and evaluation beyond the classical approach and (iii) aspects of the analysis where the measurement models had little to do with the analysis or outcomes. There were no aspects where the classical approach contributed to instrument construction or evaluation beyond what could be done with the IRM approach. Finally, properties of the SE scale are summarized and recommendations made.  相似文献   

10.
Objectives To test the validity and reliability of selected scales, namely, decision latitude, psychological job demand, social support, job insecurity, and macro-level decision latitude from the Korean version of the job content questionnaire (K-JCQ), as part of a psychosocial epidemiological study among university hospital workers. Methods K-JCQ was developed by translation and back translation complying with the JCQ usage policy, and its psychometric properties were explored among 338 workers (290 females and 48 males) in a university hospital in Korea. Internal consistency was examined using Cronbach’s alpha correlation coefficients. Factorial validity was tested using exploratory factor analysis. Pearson’s correlation coefficients were used for test–retest reliability among a subset of 157 workers who responded to a repeat survey. Criterion-related validity was assessed by investigating the effects of the scales on job satisfaction and self-identity through work in multiple regression models. Results Cronbach’s alpha for all selected scales was higher than 0.6, except for job insecurity (0.53) and macro-level decision authority (0.52), indicating appropriate internal consistency. Correlation coefficients between test and retest scales of decision latitude, psychological job demand, and social support were 0.60, 0.41, and 0.35, respectively. Exploratory factor analysis found three- and four-factor models, i.e., with and without macro-level decision latitude, respectively, closely corresponding to the theoretical constructs. High levels of decision latitude and social support, and low levels of psychological job demand and job insecurity were significantly associated with high level of job satisfaction. Higher self-identity through work was positively related to decision latitude and social support. Conclusions These findings suggest that K-JCQ is valid and reliable for assessing psychosocial job stress among Korean workers. Macro-level decision latitude showed a separate factorial structure and was strongly associated with task-level decision latitude.  相似文献   

11.
Uniform diagnostic criteria for the night eating syndrome (NES), a disorder characterized by a delay in the circadian pattern of eating, have not been established. Proposed criteria for NES were evaluated using item response theory (IRT) analysis. Six studies yielded 1,481 Night Eating Questionnaires which were coded to reflect the presence/absence of five night eating symptoms. Symptoms were evaluated based on the clinical usefulness of their diagnostic information and on the assumptions of IRT analysis (unidimensionality, monotonicity, local item independence, correct model specification), using a two parameter logistic (2PL) IRT model. Reports of (1) nocturnal eating and/or evening hyperphagia, (2) initial insomnia, and (3) night awakenings showed high precision in discriminating those with night eating problems, while morning anorexia and delayed morning meal provided little additional information. IRT is a useful tool for evaluating the diagnostic criteria of psychiatric disorders and can be used to evaluate potential diagnostic criteria of NES empirically. Behavioral factors were identified as useful discriminators of NES. Future work should also examine psychological factors in conjunction with those identified here.  相似文献   

12.
On July 1, 1997, in the Canton of Vaud, Switzerland, a pilot experiment of Hospital-at-Home Care (H-Hcare) was set up for a 2-year period at four sites to measure patients' satisfaction with this type of health care. Out of 174 patients referred to the H-Hcare program for a wide range of treatments, 107 were medical patients admitted for heart failure, community acquired pneumonia, or for an infectious disease requiring i.v.-antibiotherapy; 95 of these agreed to express H-Hcare satisfaction and dissatisfactions during a semistructured interview conducted 6 weeks after admission. H-Hcare was considered a viable alternative to hospitalization when the illness is not too serious, and for patients who are still independent and need little care. When patients are more severely ill, they prefer to go to hospital to avoid overburdening their caregivers and to feel more secure.  相似文献   

13.
OBJECTIVE: The development and application of a questionnaire that eventually could be used as a management tool and a means of promoting the quality of care provided in 'P. & A. Kyriakou' Children's Hospital. DESIGN: Parents' survey; during treatment of their children. SETTING: 'P. & A. Kyriakou' Children's Hospital, Athens, Greece. PARTICIPANTS: Sample of 240 parents. MAIN OUTCOME MEASURE: Parent satisfaction. RESULTS: The most important finding of the study, although normative statements cannot be made, appears to be signalling of low satisfaction with care. The general mean observed (45 on a scale of 100) is not close to the mean (76) derived from a systematic review of 221 satisfaction studies. Moreover, satisfaction appears to be very low (14/100) for the procedures of the hospital, low for the outpatient dimension (42/100) and rather satisfactory for the inpatient dimension (61/100). CONCLUSION: Data-based feedback as a management tool has been associated with improved organizational functioning. However, systematic use of this intervention within Greek hospitals has been limited. Therefore, the next phase of the project will be used as feedback to the Governing Board and the personnel of the hospital. Finally, a study will be planned to investigate the effects of implementing changes based on parents' ratings of staff performance.  相似文献   

14.

Background  

In the recent years there is a growing interest in Greece concerning the measurement of the satisfaction of patients who are visiting the outpatient clinics of National Health System (NHS) general acute hospitals. The aim of this study is therefore to develop a patient satisfaction questionnaire and provide its preliminary validation.  相似文献   

15.
目的 应用经典测量理论与项目反应理论对慢性胃炎患者生命质量量表QLICD-CG(V2.0)的条目进行分析。方法 采用QLICD-CG(V2.0)量表,对163名慢性胃炎患者进行生命质量评估。利用Multilog 7.03软件进行项目反应理论分析得出每个条目的难度、区分度系数和信息量,同时结合经典测量理论分析的4种统计方法来评价条目质量的优劣。结果 CTT结果显示,除了3个条目(GPH3、GPS3、CG11)外,剩余条目都符合4种统计学方法至少满足3种的标准;IRT结果显示,所有条目的难度系数都在-6.42~4.36,而且随着难度等级(B1→B4)增加呈现出单调递增的趋势,所有条目的区分度都在1.37~1.69,所有条目的平均信息量都在0.356~0.780。39个条目中,37个条目的性能良好,2个条目(GPH3、GPS3)需要优化。结论 QLICD-CG(V2.0)量表的大部分条目的性能较好,但少数条目仍需进一步改进。  相似文献   

16.
BACKGROUND: Two main models are currently used to evaluate psychosocial factors at work: the Job Strain model developed by Karasek and the Effort-Reward Imbalance model. A French version of the first model has been validated for the dimensions of psychological demands and decision latitude. As regards the second one evaluating three dimensions (extrinsic effort, reward, and intrinsic effort), there are several versions in different languages, but until recently there was no validated French version. The objective of this study was to explore the psychometric properties of the French version of the Effort-Reward Imbalance model in terms of internal consistency, factorial validity, and discriminant validity. METHODS: The present study was based on the GAZEL cohort and included the 10 174 subjects who were working at the French national electric and gas company (EDF-GDF) and answered the questionnaire in 1998. A French version of Effort-Reward Imbalance was included in this questionnaire. This version was obtained by a standard forward/backward translation procedure. RESULTS: Internal consistency was satisfactory for the three scales of extrinsic effort, reward, and intrinsic effort: Cronbach's Alpha coefficients higher than 0.7 were observed. A one-factor solution was retained for the factor analysis of the scale of extrinsic effort. A three-factor solution was retained for the factor analysis of reward, and these dimensions were interpreted as the factor analysis of intrinsic effort did not support the expected four-dimension structure. The analysis of discriminant validity displayed significant associations between measures of Effort-Reward Imbalance and the variables of sex, age, education level, and occupational grade. CONCLUSION: This study is the first one supporting satisfactory psychometric properties of the French version of the Effort-Reward Imbalance model. However, the factorial validity of intrinsic effort could be questioned. Furthermore, as most previous studies were based on male samples working in specific occupations, the present one is also one of the first to show strong associations between measures of this model and social class variables in a population of men and women employed in various occupations.  相似文献   

17.
18.
OBJECTIVES: To design and validate an instrument to assess satisfaction with home care services, in both self-administered and telephone versions. METHODS: We performed a cross-sectional observational study of the population using home care services in the health districts of Malaga, Costa del Sol, Almeria and Granada (Spain). A questionnaire was designed by an expert panel using a Deplhi technique. Reliability between the self-administered and telephone versions was analyzed. Finally, internal consistency and construct validity were assessed. RESULTS: Reliability between the self-administered and telephone versions was high (intraclass correlation coefficient = 0.876; 95% CI, 0.726-0.941; p = 0.0001). Internal consistency was adequate (Cronbach's alpha: 0.853 and 0.799 for both versions, with or without caregiver, respectively). The factorial analysis explained 66.80% and 67.81% of the observed variance for the two versions (with or without caregiver, respectively). Two factors were isolated and related to interpersonal relationships, the role of the carer, and decision making. CONCLUSION: Assessment of satisfaction with home care can be performed with the dimensions routinely used in satisfaction studies, but these should be evaluated with instruments designed ad hoc. Accessibility, communication and interpersonal relationships have a high explanatory value in satisfaction among this population.  相似文献   

19.
Mungas D  Reed BR 《Statistics in medicine》2000,19(11-12):1631-1644
An ideal measure of global functioning for patients with dementia would discriminate at very high and very low levels of functioning and would have linear measurement properties such that a given change in score corresponds to the same amount of change in underlying ability at any part of the ability continuum. Using item response theory methods, linearity of test measurement can be directly assessed and items can be selected to construct a test with desired measurement characteristics. The purpose of this study was to apply item response theory methods to evaluating and developing global functioning scales. Subjects were 1207 patients who had received comprehensive dementia evaluations. Items were selected from two measures of cognitive functioning (Mini Mental State Examination, MMS; Blessed Information Memory Concentration Test, BIMCT) and one measure of independent functioning (Blessed-Roth Dementia Rating Scale, BRDRS). The MMS and BIMCT showed significant non-linearity of measurement, especially at low and high ability levels. A brief composite measure was created by selecting from the three instruments 25 items that fit a uniform distribution of item difficulty across the entire range of ability measured by the three instruments. This composite measure and the BRDRS showed better linearity of measurement than the other two instruments. Results have implications for development of a psychometrically sophisticated, brief measure of global functioning for clinical and research use in dementia.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号