首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.

Purpose

Computerized adaptive test (CAT) methods, based on item response theory (IRT), enable a patient-reported outcome instrument to be adapted to the individual patient while maintaining direct comparability of scores. The EORTC Quality of Life Group is developing a CAT version of the widely used EORTC QLQ-C30. We present the development and psychometric validation of the item pool for the first of the scales, physical functioning (PF).

Methods

Initial developments (including literature search and patient and expert evaluations) resulted in 56 candidate items. Responses to these items were collected from 1,176 patients with cancer from Denmark, France, Germany, Italy, Taiwan, and the United Kingdom. The items were evaluated with regard to psychometric properties.

Results

Evaluations showed that 31 of the items could be included in a unidimensional IRT model with acceptable fit and good content coverage, although the pool may lack items at the upper extreme (good PF). There were several findings of significant differential item functioning (DIF). However, the DIF findings appeared to have little impact on the PF estimation.

Conclusions

We have established an item pool for CAT measurement of PF and believe that this CAT instrument will clearly improve the EORTC measurement of PF.  相似文献   

2.
目的 应用经典测量理论与项目反应理论对慢性胃炎患者生命质量量表QLICD-CG(V2.0)的条目进行分析。方法 采用QLICD-CG(V2.0)量表,对163名慢性胃炎患者进行生命质量评估。利用Multilog 7.03软件进行项目反应理论分析得出每个条目的难度、区分度系数和信息量,同时结合经典测量理论分析的4种统计方法来评价条目质量的优劣。结果 CTT结果显示,除了3个条目(GPH3、GPS3、CG11)外,剩余条目都符合4种统计学方法至少满足3种的标准;IRT结果显示,所有条目的难度系数都在-6.42~4.36,而且随着难度等级(B1→B4)增加呈现出单调递增的趋势,所有条目的区分度都在1.37~1.69,所有条目的平均信息量都在0.356~0.780。39个条目中,37个条目的性能良好,2个条目(GPH3、GPS3)需要优化。结论 QLICD-CG(V2.0)量表的大部分条目的性能较好,但少数条目仍需进一步改进。  相似文献   

3.
《Value in health》2022,25(4):525-533
ObjectivesThe development of measures such as the EQ-HWB (EQ Health and Wellbeing) requires selection of items. This study explored the psychometric performance of candidate items, testing their validity in patients, social carer users, and carers.MethodsArticle and online surveys that included candidate items (N = 64) were conducted in Argentina, Australia, China, Germany, United Kingdom, and the United States. Psychometric assessment on missing data, response distributions, and known group differences was undertaken. Dimensionality was explored using exploratory and confirmatory factor analysis. Poorly fitting items were identified using information functions, and the function of each response category was assessed using category characteristic curves from item response theory (IRT) models. Differential item functioning was tested across key subgroups.ResultsThere were 4879 respondents (Argentina = 508, Australia = 514, China = 497, Germany = 502, United Kingdom = 1955, United States = 903). Where missing data were allowed, it was low (UK article survey 2.3%; US survey 0.6%). Most items had responses distributed across all levels. Most items could discriminate between groups with known health conditions with moderate to large effect sizes. Items were less able to discriminate across carers. Factor analysis found positive and negative measurement factors alongside the constructs of interest. For most of the countries apart from China, the confirmatory factor analysis model had good fit with some minor modifications. IRT indicated that most items had well-functioning response categories but there was some evidence of differential item functioning in many items.ConclusionsItems performed well in classical psychometric testing and IRT. This large 6-country collaboration provided evidence to inform item selection for the EQ-HWB measure.  相似文献   

4.

Purpose  

Content validity of patient-reported outcomes (PROs) is evaluated primarily during item development, but subsequent psychometric analyses, particularly for item response theory (IRT)-derived scales, often result in considerable item pruning and potential loss of content. After selecting items for the PROMIS banks based on psychometric and content considerations, we invited external content expert reviews of the degree to which the initial domain names and definitions represented the calibrated item bank content.  相似文献   

5.
《Value in health》2022,25(9):1566-1574
ObjectivesIn economic evaluations, quality of life is measured using patient-reported outcome measures (PROMs), such as the EQ-5D-5L. A key assumption for the validity of PROMs data is measurement invariance, which requires that PROM items and response options are interpreted the same across respondents. If measurement invariance is violated, PROMs exhibit differential item functioning (DIF), whereby individuals from different groups with the same underlying health respond differently, potentially biasing scores. One important group of healthcare consumers who have been shown to have different views or priorities over health is older adults. This study investigates age-related DIF in the EQ-5D-5L using item response theory (IRT) and ordinal logistic regression approaches.MethodsMultiple-group IRT models were used to investigate DIF, by assessing whether older adults aged 65+ years and younger adults aged 18 to 64 years with the same underlying health had different IRT parameter estimates and expected item and EQ-5D-5L level sum scores. Ordinal logistic regression was also used to examine whether DIF resulted in meaningful differences in expected EQ level sum scores. Effect sizes examined whether DIF indicated meaningful score differences.ResultsThe anxiety/depression item exhibited meaningful DIF in both approaches, with older adults less likely to report problems. Pain/discomfort and mobility exhibited DIF to a lesser extent.ConclusionsWhen using the EQ-5D-5L to evaluate interventions and make resource allocation decisions, scoring bias due to DIF should be controlled for to prevent inefficient service provision, where the most cost-effective services are not provided, which could be detrimental to patients and the efficiency of health budgets.  相似文献   

6.
Li Y  Baser R 《Statistics in medicine》2012,31(18):2010-2026
The US Food and Drug Administration recently announced the final guidelines on the development and validation of patient-reported outcomes (PROs) assessments in drug labeling and clinical trials. This guidance paper may boost the demand for new PRO survey questionnaires. Henceforth, biostatisticians may encounter psychometric methods more frequently, particularly item response theory (IRT) models to guide the shortening of a PRO assessment instrument. This article aims to provide an introduction on the theory and practical analytic skills in fitting a generalized partial credit model (GPCM) in IRT. GPCM theory is explained first, with special attention to a clearer exposition of the formal mathematics than what is typically available in the psychometric literature. Then, a worked example is presented, using self-reported responses taken from the international personality item pool. The worked example contains step-by-step guides on using the statistical languages r and WinBUGS in fitting the GPCM. Finally, the Fisher information function of the GPCM model is derived and used to evaluate, as an illustrative example, the usefulness of assessment items by their information contents. This article aims to encourage biostatisticians to apply IRT models in the re-analysis of existing data and in future research.  相似文献   

7.
8.
Context A test score is a number which purportedly reflects a candidate’s proficiency in some clearly defined knowledge or skill domain. A test theory model is necessary to help us better understand the relationship that exists between the observed (or actual) score on an examination and the underlying proficiency in the domain, which is generally unobserved. Common test theory models include classical test theory (CTT) and item response theory (IRT). The widespread use of IRT models over the past several decades attests to their importance in the development and analysis of assessments in medical education. Item response theory models are used for a host of purposes, including item analysis, test form assembly and equating. Although helpful in many circumstances, IRT models make fairly strong assumptions and are mathematically much more complex than CTT models. Consequently, there are instances in which it might be more appropriate to use CTT, especially when common assumptions of IRT cannot be readily met, or in more local settings, such as those that may characterise many medical school examinations. Objectives The objective of this paper is to provide an overview of both CTT and IRT to the practitioner involved in the development and scoring of medical education assessments. Methods The tenets of CCT and IRT are initially described. Then, main uses of both models in test development and psychometric activities are illustrated via several practical examples. Finally, general recommendations pertaining to the use of each model in practice are outlined. Discussion Classical test theory and IRT are widely used to address measurement‐related issues that arise from commonly used assessments in medical education, including multiple‐choice examinations, objective structured clinical examinations, ward ratings and workplace evaluations. The present paper provides an introduction to these models and how they can be applied to answer common assessment questions. Medical Education 2010: 44 : 109–117  相似文献   

9.
ObjectiveTo create a tool to measure college students’ functional, interactive, and critical nutrition literacy.Design(1) Focus group: item generation, (2) expert review, (3) exploratory factor structure analysis, (4) item refinement and modification, (5) factor structure validation, and (6) criterion validation.SettingTwo land-grant college campuses.ParticipantsCollege students aged between 18 and 24 years.Main Outcome MeasuresSurvey data was used to assess nutrition literacy.AnalysisExploratory factor analysis, confirmatory factor analysis (CFA), item response theory (IRT) analyses, and correlations.ResultsOne-hundred and twenty-three items were generated and tested in an online survey format. Items were eliminated on the basis of face validity, expert feedback, exploratory factor analysis, and CFA/IRT. The 3 measures (functional, interactive, and critical) were analyzed separately. All 3 measures showed reasonable model fit in the CFA/IRT models. Criterion validity showed small to medium effect sizes between measures and fruit/vegetable intake. Reliability estimates met reasonable standards for each measure.Conclusions and ImplicationsThe Young Adult Nutrition Literacy Tool is a novel instrument that measures all 3 domains of nutrition literacy. Strengths include a rigorous 6-step development process, reasonable psychometric properties, and a large breadth of items.  相似文献   

10.
目的 应用 CTT 与 IRT 两种分析理论对宫颈癌患者生命质量量表(QLICP-CE V2.0)的条目进行分析与评价。 方法 通过应用 QLICP-CE(V2.0)对 186 例宫颈癌病人进行测评,采用经典测量理论 CTT 中的四种统计方法(变异度法、相关系数法、因子分析法、克朗巴赫系数法)来评价条目质量的好坏。同时采用项目反应理论IRT中的 Samejima 等级反应模型计算每个条目的难度、区分度系数和信息量。 结果 CTT 分析结果提示 QLICP-CE(V2.0)共性模块中有 9 个条目与其所在领域的相关性比较低,而特异模块中有3个。IRT结果显示所有条目的区分度较好,取值范围均在0.64~1.33;44个条目中有35个条目的难度系数取值范围在-3.49~3.76,且随着难度等级(B1→B4)的增加呈现出单调递增的趋势;除 3 个条目外所有条目的平均信息量均较好。 结论 QLICP-CE(V2.0)量表所有条目区分度比较好,大部分条目的性能良好,但仍然有少部分条目有待进一步修订并验证效果。  相似文献   

11.
ObjectiveTo document the development and psychometric evaluation of the Patient-Reported Outcomes Measurement Information System (PROMIS) Physical Function (PF) item bank and static instruments.Study Design and SettingThe items were evaluated using qualitative and quantitative methods. A total of 16,065 adults answered item subsets (n > 2,200/item) on the Internet, with oversampling of the chronically ill. Classical test and item response theory methods were used to evaluate 149 PROMIS PF items plus 10 Short Form-36 and 20 Health Assessment Questionnaire-Disability Index items. A graded response model was used to estimate item parameters, which were normed to a mean of 50 (standard deviation [SD] = 10) in a US general population sample.ResultsThe final bank consists of 124 PROMIS items covering upper, central, and lower extremity functions and instrumental activities of daily living. In simulations, a 10-item computerized adaptive test (CAT) eliminated floor and decreased ceiling effects, achieving higher measurement precision than any comparable length static tool across four SDs of the measurement range. Improved psychometric properties were transferred to the CAT's superior ability to identify differences between age and disease groups.ConclusionThe item bank provides a common metric and can improve the measurement of PF by facilitating the standardization of patient-reported outcome measures and implementation of CATs for more efficient PF assessments over a larger range.  相似文献   

12.

Purpose

The present study investigates the properties of the French version of the OUT-PATSAT35 questionnaire, which evaluates the outpatients’ satisfaction with care in oncology using classical analysis (CTT) and item response theory (IRT).

Methods

This cross-sectional multicenter study includes 692 patients who completed the questionnaire at the end of their ambulatory treatment. CTT analyses tested the main psychometric properties (convergent and divergent validity, and internal consistency). IRT analyses were conducted separately for each OUT-PATSAT35 domain (the doctors, the nurses or the radiation therapists and the services/organization) by models from the Rasch family. We examined the fit of the data to the model expectations and tested whether the model assumptions of unidimensionality, monotonicity and local independence were respected.

Results

A total of 605 (87.4 %) respondents were analyzed with a mean age of 64 years (range 29–88). Internal consistency for all scales separately and for the three main domains was good (Cronbach’s α 0.74–0.98). IRT analyses were performed with the partial credit model. No disordered thresholds of polytomous items were found. Each domain showed high reliability but fitted poorly to the Rasch models. Three items in particular, the item about “promptness” in the doctors’ domain and the items about “accessibility” and “environment” in the services/organization domain, presented the highest default of fit. A correct fit of the Rasch model can be obtained by dropping these items. Most of the local dependence concerned items about “information provided” in each domain. A major deviation of unidimensionality was found in the nurses’ domain.

Conclusions

CTT showed good psychometric properties of the OUT-PATSAT35. However, the Rasch analysis revealed some misfitting and redundant items. Taking the above problems into consideration, it could be interesting to refine the questionnaire in a future study.  相似文献   

13.
ObjectivesDevelopment of an item pool to construct a future computerized adaptive test (CAT) for fatigue in rheumatoid arthritis (RA). The item pool was based on the patients' perspective and examined for face and content validity previously. This study assessed the fit of the items with seven predefined dimensions and examined the item pool's dimensionality structure in statistical terms.Study Design and SettingA total of 551 patients with RA participated in this study. Several steps were conducted to come from an explorative item pool to a psychometrically sound item bank. The item response theory (IRT) analysis using the generalized partial credit model was conducted for each of the seven predefined dimensions. Poorly fitting items were removed. Finally, the best possible multidimensional IRT (MIRT) model for the data was identified.ResultsIn IRT analysis, 49 items showed insufficient item characteristics. Items with a discriminative ability below 0.60 and/or model misfit effect sizes greater than 0.10 were removed. Factor analysis on the 196 remaining items revealed three dimensions, namely severity, impact, and variability of fatigue. The dimensions were further confirmed in MIRT model analysis.ConclusionThis study provided an initially calibrated item bank and showed which dimensions and items can be used for the development of a multidimensional CAT for fatigue in RA.  相似文献   

14.

Purpose

In patient-reported outcome research that utilizes item response theory (IRT), using statistical significance tests to detect misfit is usually the focus of IRT model-data fit evaluations. However, such evaluations rarely address the impact/consequence of using misfitting items on the intended clinical applications. This study was designed to evaluate the impact of IRT item misfit on score estimates and severity classifications and to demonstrate a recommended process of model-fit evaluation.

Methods

Using secondary data sources collected from the Patient-Reported Outcome Measurement Information System (PROMIS) wave 1 testing phase, analyses were conducted based on PROMIS depression (28 items; 782 cases) and pain interference (41 items; 845 cases) item banks. The identification of misfitting items was assessed using Orlando and Thissen’s summed-score item-fit statistics and graphical displays. The impact of misfit was evaluated according to the agreement of both IRT-derived T-scores and severity classifications between inclusion and exclusion of misfitting items.

Results

The examination of the presence and impact of misfit suggested that item misfit had a negligible impact on the T-score estimates and severity classifications with the general population sample in the PROMIS depression and pain interference item banks, implying that the impact of item misfit was insignificant.

Conclusions

Findings support the T-score estimates in the two item banks as robust against item misfit at both the group and individual levels and add confidence to the use of T-scores for severity diagnosis in the studied sample. Recommendations on approaches for identifying item misfit (statistical significance) and assessing the misfit impact (practical significance) are given.
  相似文献   

15.

Objective

Study aim was to translate the PROMIS® pain interference (PI) item bank (41 items) into German, test its psychometric properties in patients with chronic low back pain and develop static subforms.

Methods

We surveyed N = 262 patients undergoing rehabilitation who were asked to fill out questionnaires at the beginning and 2 weeks after the end of rehabilitation, applying the Oswestry Disability Index (ODI) and Pain Disability Index (PDI) in addition to the PROMIS® PI items. For psychometric testing, a 1-parameter item response theory (IRT) model was used. Exploratory and confirmatory factor analyses as well as reliability and construct validity analyses were conducted.

Results

The assumptions regarding IRT scaling of the translated PROMIS® PI item bank as a whole were not confirmed. However, we succeeded in devising three static subforms (PI-G scales: PI mental 13 items, PI functional 11 items, PI physical 4 items), revealing good psychometric properties.

Conclusion

The PI-G scales in their static form can be recommended for use in German-speaking countries. Their strengths versus the ODI and PDI are that pain interference is assessed in a differentiated manner and that several psychometric values are somewhat better than those associated with the ODI and PDI (distribution properties, IRT model fit, reliability). To develop an IRT-scaled item bank of the German translations of the PROMIS® PI items, it would be useful to have additional studies (e.g., with larger sample sizes and using a 2-parameter IRT model).  相似文献   

16.
ObjectivesTo provide a standardized metric for the assessment of depression severity to enable comparability among results of established depression measures.Study Design and SettingA common metric for 11 depression questionnaires was developed applying item response theory (IRT) methods. Data of 33,844 adults were used for secondary analysis including routine assessments of 23,817 in- and outpatients with mental and/or medical conditions (46% with depressive disorders) and a general population sample of 10,027 randomly selected participants from three representative German household surveys.ResultsA standardized metric for depression severity was defined by 143 items, and scores were normed to a general population mean of 50 (standard deviation = 10) for easy interpretability. It covers the entire range of depression severity assessed by established instruments. The metric allows comparisons among included measures. Large differences were found in their measurement precision and range, providing a rationale for instrument selection. Published scale-specific threshold scores of depression severity showed remarkable consistencies across different questionnaires.ConclusionAn IRT-based instrument-independent metric for depression severity enables direct comparisons among established measures. The "common ruler" simplifies the interpretation of depression assessment by identifying key thresholds for clinical and epidemiologic decision making and facilitates integrative psychometric research across studies, including meta-analysis.  相似文献   

17.
ObjectiveThe objective of this study was to develop a questionnaire that could integrate patient and provider items on mobility and self-care into unidimensional scales. The instrument should be suitable for various measurement models (patient and provider data [PAT–PRO], only patient data [PAT], only provider data [PRO]).Study Design and SettingThe existing instruments, MOSES-Patient and MOSES-Provider, were integrated into the MOSES-Combi and completed by a total of 1,019 neurology, cardiac, or musculoskeletal patients and/or their physicians (MOSES = acronym for “mobilty and self-care”).ResultsAfter selection of 18 items, all 12 scales of the MOSES-Combi (87 items) were largely unidimensional, met the standards for a 1-parameter item-response theory (IRT) model, were sufficiently reliable, and showed no differential item functioning (DIF) for age or gender. The person parameters set in the PAT–PRO measurement model show at least moderate, but usually substantial, agreement with those set in the PRO and PAT measurement models.ConclusionThe advantages of the MOSES-Combi are that it can be used for various measurement models and is suitable for studying agreement between patient and provider assessments because of its psychometric properties (same scaling for patient and provider items). Integration of various data sources in an IRT scale can be extended to other assessments.  相似文献   

18.
The importance of health literacy has grown considerably among researchers, clinicians, patients, and policymakers. Better instruments and measurement strategies are needed. Our objective was to develop a new health literacy instrument using novel health information technology and modern psychometrics. We designed Health LiTT as a self-administered multimedia touchscreen test based on item response theory (IRT) principles. We enrolled a diverse group of 619 English-speaking, primary care patients in clinics for underserved patients. We tested three item types (prose, document, quantitative) that worked well together to reliably measure a single dimension of health literacy. The Health LiTT score meets psychometric standards (reliability of 0.90 or higher) for measurement of individual respondents in the low to middle range. Mean Health LiTT scores were associated with age, race/ethnicity, education, income, and prior computer use (p?相似文献   

19.

Purpose

Sarcoidosis is a multisystem disease that can negatively impact health-related quality of life (HRQL) across generic (e.g., physical, social and emotional wellbeing) and disease-specific (e.g., pulmonary, ocular, dermatologic) domains. Measurement of HRQL in sarcoidosis has largely relied on generic patient-reported outcome tools, with little disease-specific measures available. The purpose of this paper is to present the development and testing of disease-specific item banks and short forms of lung, skin and eye problems, which are a part of a new patient-reported outcome (PRO) instrument called the sarcoidosis assessment tool.

Methods

After prioritizing and selecting the most important disease-specific domains, we wrote new items to reflect disease-specific problems by drawing from patient focus group and clinician expert survey data that were used to create our conceptual model of HRQL in sarcoidosis. Item pools underwent cognitive interviews by sarcoidosis patients (n = 13), and minor modifications were made. These items were administered in a multi-site study (n = 300) to obtain item calibrations and create calibrated short forms using item response theory (IRT) approaches.

Results

From the available item pools, we created four new item banks and short forms: (1) skin problems, (2) skin stigma, (3) lung problems, and (4) eye Problems. We also created and tested supplemental forms of the most common constitutional symptoms and negative effects of corticosteroids.

Conclusions

Several new sarcoidosis-specific PROs were developed and tested using IRT approaches. These new measures can advance more precise and targeted HRQL assessment in sarcoidosis clinical trials and clinical practice.  相似文献   

20.
《Value in health》2021,24(12):1807-1819
ObjectivesThis study aimed to develop and assess the content validity of a patient-reported outcomes (PROs) instrument to measure symptoms and impacts experienced by patients with active multiple myeloma (MM).MethodsThe PRO instrument was developed using an iterative, mixed-methods approach. The list of concepts was generated based on a review of existing evidence (qualitative studies and literature) and post hoc psychometric evaluations of 2 PRO instruments in 3 clinical trials. A total of 30 adult patients with MM from the United States participated in hybrid concept elicitation/cognitive debriefing interviews to validate the content validity of the newly developed PRO instrument. Translatability assessment was completed in 8 languages.ResultsThe item generation process resulted in 17 symptom and 9 impact concepts for evaluation. The concept elicitation interviews and analysis were based on the first 25 participants; evidence of saturation was observed. The cognitive debriefing interviews and analysis were based on the last 23 participants across 4 waves of interviews. On the basis patient feedback, 10 items were removed, and 1 item was added to the PRO instrument. The translatability assessment resulted in 1 minor revision. The multiple myeloma symptom and impact questionnaire (MySIm-Q) includes 11 symptom and 6 impact concepts, organized within 8 hypothesized subdomains, with each concept measured using a 5-point verbal rating scale and a 7-day recall period.ConclusionsThe MySIm-Q instrument was developed using rigorous and mixed methodology and with direct input from patients who received a diagnosis of MM. The MySIm-Q has good content validity and is culturally relevant for use in global clinical trials.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号