首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到18条相似文献,搜索用时 140 毫秒
1.
正项目反应理论(item response theory,IRT)属于现代测量理论的一种,是针对经典测量理论(classic test theory,CTT)在实践中的局限性而提出的,其主要优点为参数和能力估计的不变性~([1])。除此之外,IRT在量表编制中优于CTT的三个特点:CTT注重量表的整体特性,而IRT则关注构成量表的每一条目的特性;根据待测潜在特质水平选择项目;对项目和量表特性的  相似文献   

2.
目的 应用经典测量理论(classical test theory, CTT)和项目反应理论(item response theory, IRT)对慢性肺源性心脏病生命质量测定量表[QLICD-CPHD(V2.0)]进行条目分析与评价。方法 采用QLICD-CPHD(V2.0)量表对184名慢性肺源性心脏病患者进行调查,运用经典测量理论中的相关系数法、变异度法、因子分析法、克朗巴赫系数法对QLICD-CPHD(V2.0)量表的条目进行分析评定,同时运用项目反应理论中的Samejima等级反应模型计算量表每个条目的难度、信息量和区分度系数。结果 CTT结果提示存在7个条目不满足至少3种的统计学要求,其中共性模块6个,特异模块1个。IRT结果显示条目区分度范围为1.18~1.44较为适宜。难度系数随难度等级(B1→B4)增加而单调递增,存在部分条目难度系数b超过标准值范围。各条目平均信息量分布在0.185~0.576。结论 经CTT与IRT分析,QLICD-CPHD(V2.0)量表的大部分条目质量较高,具有较好的区分度,但仍有少量条目需进一步分析和修订。  相似文献   

3.
正项目反应理论(item response theory,IRT)也称条目反应理论,广泛用于教育学、心理学及医学量表测验中。Parscale软件是实现IRT理论的常用软件,由Eiji M uraki和Darrell Bock等开发,现在由Scientific Softw are International(SSI)公司拥有(http://w w w.ssicentral.com/irt/)。Parscale软件可以用于二  相似文献   

4.
目的 编制毒品成瘾多维心理因素评估量表,并检验其心理测量特性。方法 在半结构化访谈、文献调研的基础上,从风险因素和保护因素两方面入手,并基于个体心理和社会心理两系统确立最常见的11个维度,构建毒品成瘾多维心理因素评估初始量表,通过两次修订形成最终量表。随后对316名吸毒者进行正式量表测试。用经典测量理论(classical test theory,CTT)和项目反应理论(item response theory,IRT)的拓展分部评分模型(generalized partial credit model,GPCM)的方法对量表进行项目质量分析,采用R语言、Mplus 7.0和SPSS 21.0进行数据分析。结果 正式量表包括64个条目,4个分量表和11个因子。量表的Cronbach α系数为0.95,重测信度为0.65。验证性因素分析的结果表明量表具有良好的结构效度。最终在CTT框架下量表的项目区分度在0.40~0.84之间,难度在0.28~0.68之间,IRT框架下量表的项目区分度在0.40~5.18之间,项目难度参数在-1.06~2.70之间。结论 该量表符合心理测量学要求,可以作为吸毒人员成瘾的评估工具。  相似文献   

5.
目的 运用经典测量理论(CTT)和项目反应理论(IRT)对药物成瘾生命质量测定量表QLICD-DA(V2.0)的条目进行进一步分析。方法 采用QLICD-DA(V2.0)对192名药物成瘾患者进行调查,利用IRT中的Samejima模型计算每个条目的平均信息量、区分度和难度系数,并结合CTT中的克朗巴赫法、变异度法、相关系数法和因子分析法四种统计方法对条目进行分析。结果 在IRT分析中,除了条目GPH1、GPH2、GPH3、GPH4、GPH5、GPH9,其余条目平均信息量都大于0.11,区分度0.79~2.30,难度系数都在-5.07~3.38,且随难度等级(B1→B4)增加而单调递增;在CTT分析中一共有28条目均满足3种及以上的统计方法要求,结合CTT和IRT共选出39个条目。结论 QLICD-DA(V2.0)的大部分条目性能良好,但仍有部分条目需要进一步评价和修订。  相似文献   

6.
目的 应用 CTT 与 IRT 两种分析理论对宫颈癌患者生命质量量表(QLICP-CE V2.0)的条目进行分析与评价。 方法 通过应用 QLICP-CE(V2.0)对 186 例宫颈癌病人进行测评,采用经典测量理论 CTT 中的四种统计方法(变异度法、相关系数法、因子分析法、克朗巴赫系数法)来评价条目质量的好坏。同时采用项目反应理论IRT中的 Samejima 等级反应模型计算每个条目的难度、区分度系数和信息量。 结果 CTT 分析结果提示 QLICP-CE(V2.0)共性模块中有 9 个条目与其所在领域的相关性比较低,而特异模块中有3个。IRT结果显示所有条目的区分度较好,取值范围均在0.64~1.33;44个条目中有35个条目的难度系数取值范围在-3.49~3.76,且随着难度等级(B1→B4)的增加呈现出单调递增的趋势;除 3 个条目外所有条目的平均信息量均较好。 结论 QLICP-CE(V2.0)量表所有条目区分度比较好,大部分条目的性能良好,但仍然有少部分条目有待进一步修订并验证效果。  相似文献   

7.
项目反应理论(item response theory,IRT)是一种可以精确测量被试能力的现代测量理论,起源于20世纪30年代末;到70年代,IRT逐渐替代了传统的经典测验理论(classical test theory,CTT),成为了测验理论的重点。IRT模型的特点是建立了项目性能、被试潜在特质水平与项目应答正确概率的关系[1-2]。与CTT相比,IRT中项目的难度、区分度和猜测度更为清晰、易懂,它将项目特性与被试水平定义在同一维度内,避免了对项目和被试的评价过分依赖抽样样本的局限性。  相似文献   

8.
证实性因子分析在量表信度中的应用研究   总被引:2,自引:0,他引:2  
量表、问卷等测量工具广泛应用于行为学、心理学、社会学、医学及教育学等领域的研究中。对调查问卷进行分析之前,首先要考虑所测量的数值是否可靠、准确。只有信度及效度较好时,量表(问卷)的分析才是可靠的。信度的分析有两种,即内部一致性分析(分半信度、Cronbach α系数、θ系数和Ω系数等)及稳定性或重复测量的分析(重测信度、复本信度、评分者信度)。在各种各样的信度系数中,  相似文献   

9.
正条目反应理论(item response theory,IRT),也称为项目反应理论,受到越来越多的关注,广泛应用于智力、心理量表、考试系统等潜变量的测量。最近几年,在量表的研制中,条目反应理论逐渐应用于条目的评价和选择~([1-3])。如Liu等研制适合中医疗效评价的重症肌无力量表~([4-5]),陈新林等研发鼻咽癌患者生存质量量表~([6]),董丽敏等用于评价哮喘患者PRO量表~([7]),  相似文献   

10.
摘要:目的 编制具有预防医学专业特色的本科生专业满意度测量量表。方法 查阅文献与专家访谈,构建预防医学本科生专业满意测量条目库,经专家咨询等方法形成“初始量表”。选取109名预防医学本科生进行预试验,探究性因素分析形成“预防医学本科生专业满意度测量量表测试版”。再次选取4所不同学校的250名预防医学本科生完成测试版量表,进行验证性因素分析,形成正式量表后进行量表信度与效度检验。结果 量表由4个维度、39个条目构成,具有较好的内部一致性(α系数大于0.8)与建构信度(ρc大于0.8),且具有良好的内容效度与结构效度。结论 量表信度、效度均良好,可以应用于预防医学本科生专业满意度的评价。  相似文献   

11.
Context A test score is a number which purportedly reflects a candidate’s proficiency in some clearly defined knowledge or skill domain. A test theory model is necessary to help us better understand the relationship that exists between the observed (or actual) score on an examination and the underlying proficiency in the domain, which is generally unobserved. Common test theory models include classical test theory (CTT) and item response theory (IRT). The widespread use of IRT models over the past several decades attests to their importance in the development and analysis of assessments in medical education. Item response theory models are used for a host of purposes, including item analysis, test form assembly and equating. Although helpful in many circumstances, IRT models make fairly strong assumptions and are mathematically much more complex than CTT models. Consequently, there are instances in which it might be more appropriate to use CTT, especially when common assumptions of IRT cannot be readily met, or in more local settings, such as those that may characterise many medical school examinations. Objectives The objective of this paper is to provide an overview of both CTT and IRT to the practitioner involved in the development and scoring of medical education assessments. Methods The tenets of CCT and IRT are initially described. Then, main uses of both models in test development and psychometric activities are illustrated via several practical examples. Finally, general recommendations pertaining to the use of each model in practice are outlined. Discussion Classical test theory and IRT are widely used to address measurement‐related issues that arise from commonly used assessments in medical education, including multiple‐choice examinations, objective structured clinical examinations, ward ratings and workplace evaluations. The present paper provides an introduction to these models and how they can be applied to answer common assessment questions. Medical Education 2010: 44 : 109–117  相似文献   

12.
13.
CONTEXT: Item response theory (IRT) measurement models are discussed in the context of their potential usefulness in various medical education settings such as assessment of achievement and evaluation of clinical performance. PURPOSE: The purpose of this article is to compare and contrast IRT measurement with the more familiar classical measurement theory (CMT) and to explore the benefits of IRT applications in typical medical education settings. SUMMARY: CMT, the more common measurement model used in medical education, is straightforward and intuitive. Its limitation is that it is sample-dependent, in that all statistics are confounded with the particular sample of examinees who completed the assessment. Examinee scores from IRT are independent of the particular sample of test questions or assessment stimuli. Also, item characteristics, such as item difficulty, are independent of the particular sample of examinees. The IRT characteristic of invariance permits easy equating of examination scores, which places scores on a constant measurement scale and permits the legitimate comparison of student ability change over time. Three common IRT models and their statistical assumptions are discussed. IRT applications in computer-adaptive testing and as a method useful for adjusting rater error in clinical performance assessments are overviewed. CONCLUSIONS: IRT measurement is a powerful tool used to solve a major problem of CMT, that is, the confounding of examinee ability with item characteristics. IRT measurement addresses important issues in medical education, such as eliminating rater error from performance assessments.  相似文献   

14.
BACKGROUND AND OBJECTIVES: Most health-related quality-of-life questionnaires include multi-item scales. Scale scores are usually estimated as simple sums of the item scores. However, scoring procedures utilizing more information from the items might improve measurement abilities, and thereby reduce the needed sample sizes. We investigated whether item response theory (IRT)-based scoring improved the measurement abilities of the EORTC QLQ-C30 physical functioning, emotional functioning, and fatigue scales. METHODS: Using a database of 13,010 subjects we estimated the relative validities of IRT scoring compared to sum scoring of the scales. RESULTS: The mean relative validities were 1.04 (physical), 1.03 (emotional), and 0.97 (fatigue). None of these were significantly larger than 1. Thus, no gain in measurement abilities using IRT scoring was found for these scales. Possible explanations include that the items in the scales are not constructed for IRT scoring and that the scales are relatively short. CONCLUSION: IRT scoring of the three longest EORTC QLQ-C30 scales did not improve measurement abilities compared to the traditional sum scoring of the scales.  相似文献   

15.
目的 应用经典测量理论与项目反应理论对慢性胃炎患者生命质量量表QLICD-CG(V2.0)的条目进行分析。方法 采用QLICD-CG(V2.0)量表,对163名慢性胃炎患者进行生命质量评估。利用Multilog 7.03软件进行项目反应理论分析得出每个条目的难度、区分度系数和信息量,同时结合经典测量理论分析的4种统计方法来评价条目质量的优劣。结果 CTT结果显示,除了3个条目(GPH3、GPS3、CG11)外,剩余条目都符合4种统计学方法至少满足3种的标准;IRT结果显示,所有条目的难度系数都在-6.42~4.36,而且随着难度等级(B1→B4)增加呈现出单调递增的趋势,所有条目的区分度都在1.37~1.69,所有条目的平均信息量都在0.356~0.780。39个条目中,37个条目的性能良好,2个条目(GPH3、GPS3)需要优化。结论 QLICD-CG(V2.0)量表的大部分条目的性能较好,但少数条目仍需进一步改进。  相似文献   

16.
OBJECTIVE: To cocalibrate the Mini-Mental State Examination, the Modified Mini-Mental State, the Cognitive Abilities Screening Instrument, and the Community Screening Instrument for Dementia using item response theory (IRT) to compare screening cut points used to identify cases of dementia from different studies, to compare measurement properties of the tests, and to explore the implications of these measurement properties on longitudinal studies of cognitive functioning over time. STUDY DESIGN AND SETTING: We used cross-sectional data from three large (n>1000) community-based studies of cognitive functioning in the elderly. We used IRT to cocalibrate the scales and performed simulations of longitudinal studies. RESULTS: Screening cut points varied quite widely across studies. The four tests have curvilinear scaling and varied levels of measurement precision, with more measurement error at higher levels of cognitive functioning. In longitudinal simulations, IRT scores always performed better than standard scoring, whereas a strategy to account for varying measurement precision had mixed results. CONCLUSION: Cocalibration allows direct comparison of cognitive functioning in studies using any of these four tests. Standard scoring appears to be a poor choice for analysis of longitudinal cognitive testing data. More research is needed into the implications of varying levels of measurement precision.  相似文献   

17.

Objective:

To compare the measurement properties of the Modified Health Assessment Questionnaire [MHAQ], the SF-36® Health Survey 10 item Physical Functioning scale [PF10], and scores from an item response theory (IRT) based scale combining the two measures.

Study Design:

Rheumatoid arthritis (RA) patients (n = 339) enrolled in a multi-center, randomized, double-blind, placebo-controlled trial completed the MHAQ and the SF-36 pre- and post-treatment. Psychometric analyses used confirmatory factor analysis and IRT models. Analyses of variance were used to assess sensitivity to changes in disease severity (defined by the American College of Rheumatism (ACR)) using change scores in MHAQ, PF10, and IRT scales. Analyses of covariance were used to assess treatment responsiveness.

Results:

For the entire score range, the 95% confidence interval around individual patient scores was smaller for the combined (total) IRT based scale than for other measures. The MHAQ and PF10 were about 70% and 50% as efficient as the total IRT score of physical functioning in discriminating among ACR groups, respectively. The MHAQ and PF10 were also less efficient than the total IRT score in discriminating among treatment groups.

Conclusions:

Combining scales from the two short forms yields a more powerful tool with greater sensitivity to treatment response.  相似文献   

18.
Kosinski  M.  Bjorner  J.B.  Ware Jr  J.E.  Batenhorst  A.  Cady  R.K. 《Quality of life research》2003,12(8):903-912
Background: While item response theory (IRT) offers many theoretical advantages over classical test theory in the construction and scoring of patient based measures of health few studies compare scales constructed from both methodologies head to head. Objective: Compare the responsiveness to treatment of migraine specific scales scored using summated rating scale methods vs. IRT methods. Methods: The data came from three clinical studies of migraine treatment that used the Migraine Specific Quality of Life Questionnaire (MSQ). Five methods of quantifying responsiveness were used to evaluate and compare changes from pre- to post-treatment in MSQ scales scored using Likert and IRT scaling methods. Results: Changes in all MSQ scale scores from pre- to post-treatment were highly significant in all three studies. A single index scored from the MSQ using IRT methods was determined to be more responsive than any one of the MSQ subscales across the five methods used to quantify responsiveness. Across 13 of the 15 tests (5 responsiveness methods * 3 studies) conducted, the single index scored from the MSQ using IRT methods was the most responsive measure. Conclusions: IRT methods increased the responsiveness of the MSQ to the treatment of migraine. The results agree with the psychometric evidence that suggest that it is feasible to score a single index from the MSQ using IRT methods. This approach warrants further testing with other measures of migraine impact.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号