首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.

Purpose  

With a newly developed questionnaire, which integrates the patient and provider ratings on mobility and self-care on an IRT scale (MOSES-Combi) and thus ensures the same measurement scale for patients and providers, we examined the level of agreement between neurology patients and the physicians treating them, how agreement changes after rehabilitation, and what factors affect the extent of agreement.  相似文献   

2.
3.

Objective:

To compare the measurement properties of the Modified Health Assessment Questionnaire [MHAQ], the SF-36® Health Survey 10 item Physical Functioning scale [PF10], and scores from an item response theory (IRT) based scale combining the two measures.

Study Design:

Rheumatoid arthritis (RA) patients (n = 339) enrolled in a multi-center, randomized, double-blind, placebo-controlled trial completed the MHAQ and the SF-36 pre- and post-treatment. Psychometric analyses used confirmatory factor analysis and IRT models. Analyses of variance were used to assess sensitivity to changes in disease severity (defined by the American College of Rheumatism (ACR)) using change scores in MHAQ, PF10, and IRT scales. Analyses of covariance were used to assess treatment responsiveness.

Results:

For the entire score range, the 95% confidence interval around individual patient scores was smaller for the combined (total) IRT based scale than for other measures. The MHAQ and PF10 were about 70% and 50% as efficient as the total IRT score of physical functioning in discriminating among ACR groups, respectively. The MHAQ and PF10 were also less efficient than the total IRT score in discriminating among treatment groups.

Conclusions:

Combining scales from the two short forms yields a more powerful tool with greater sensitivity to treatment response.  相似文献   

4.

Objectives

We review the papers presented at the NCI/DIA conference, to identify areas of controversy and uncertainty, and to highlight those aspects of item response theory (IRT) and computer adaptive testing (CAT) that require theoretical or empirical research in order to justify their application to patient reported outcomes (PROs).

Background

IRT and CAT offer exciting potential for the development of a new generation of PRO instruments. However, most of the research into these techniques has been in non-healthcare settings, notably in education. Educational tests are very different from PRO instruments, and consequently problematic issues arise when adapting IRT and CAT to healthcare research.

Results

Clinical scales differ appreciably from educational tests, and symptoms have characteristics distinctly different from examination questions. This affects the transferring of IRT technology. Particular areas of concern when applying IRT to PROs include inadequate software, difficulties in selecting models and communicating results, insufficient testing of local independence and other assumptions, and a need of guidelines for estimating sample size requirements. Similar concerns apply to differential item functioning (DIF), which is an important application of IRT. Multidimensional IRT is likely to be advantageous only for closely related PRO dimensions.

Conclusions

Although IRT and CAT provide appreciable potential benefits, there is a need for circumspection. Not all PRO scales are necessarily appropriate targets for this methodology. Traditional psychometric methods, and especially qualitative methods, continue to have an important role alongside IRT. Research should be funded to address the specific concerns that have been identified.
  相似文献   

5.
Objective To determine (i) the dimensional invariance of instrumental and basic activities of daily living (IADL/ADL) by gender subgroups, and (ii) the extent to which ADL dimensionality varies with the inclusion or exclusion of nondisabled people. Methods Data were taken from the 1999 Spanish Survey on Disability, Impairment and State of Health. The analysis focussed on 6,522 people aged over 65 years who received help to perform or were unable to perform IADL/ADL items. Unidimensional and multidimensional item response theory (IRT) models were applied to this sample. Results In the female sample, IADL/ADL items formed a scale with sufficient unidimensionality to fit a two-parameter logistic IRT model. In the male sample, the structure was bidimensional: self-care and mobility, and household activities. When the sample was composed of IADL/ADL disabled people, ADL items formed a unidimensional scale; when it was composed only of ADL disabled people, they formed a bidimensional structure: self-care and mobility. Conclusions IADL/ADL items can be combined in a single scale to measure severity of functional disability in females, but not in males. Separate aggregated scores must be considered for each subdomain, basic mobility and self-care, in order to measure the severity of ADL disability.  相似文献   

6.

Objective

The objective of the present study is to describe the item response theory (IRT) analysis of the National Institutes of Health (NIH) Patient Reported Outcomes Measurement Information System (PROMIS?) pediatric parent proxy-report item banks and the measurement properties of the new PROMIS? Parent Proxy Report Scales for ages 8?C17?years.

Methods

Parent proxy-report items were written to parallel the pediatric self-report items. Test forms containing the items were completed by 1,548 parent?Cchild pairs. CCFA and IRT analyses of scale dimensionality and item local dependence, and IRT analyses of differential item functioning were conducted.

Results

Parent proxy-report item banks were developed and IRT parameters are provided. The recommended unidimensional short forms for the PROMIS? Parent Proxy Report Scales are item sets that are subsets of the pediatric self-report short forms, setting aside items for which parent responses exhibit local dependence. Parent proxy-report demonstrated moderate to low agreement with pediatric self-report.

Conclusions

The study provides initial calibrations of the PROMIS? parent proxy-report item banks and the creation of the PROMIS? Parent Proxy-Report Scales. It is anticipated that these new scales will have application for pediatric populations in which pediatric self-report is not feasible.  相似文献   

7.
Background: Item response theory (IRT) is a powerful framework for analyzing multiitem scales and is central to the implementation of computerized adaptive testing. Objectives: To explain the use of IRT to examine measurement properties and to apply IRT to a questionnaire for measuring migraine impact – the Migraine Specific Questionnaire (MSQ). Methods: Data from three clinical studies that employed the MSQ-version 1 were analyzed by confirmatory factor analysis for categorical data and by IRT modeling. Results: Confirmatory factor analyses showed very high correlations between the factors hypothesized by the original test constructions. Further, high item loadings on one common factor suggest that migraine impact may be adequately assessed by only one score. IRT analyses of the MSQ were feasible and provided several suggestions as to how to improve the items and in particular the response choices. Out of 15 items, 13 showed adequate fit to the IRT model. In general, IRT scores were strongly associated with the scores proposed by the original test developers and with the total item sum score. Analysis of response consistency showed that more than 90% of the patients answered consistently according to a unidimensional IRT model. For the remaining patients, scores on the dimension of emotional function were less strongly related to the overall IRT scores that mainly reflected role limitations. Such response patterns can be detected easily using response consistency indices. Analysis of test precision across score levels revealed that the MSQ was most precise at one standard deviation worse than the mean impact level for migraine patients that are not in treatment. Thus, gains in test precision can be achieved by developing items aimed at less severe levels of migraine impact. Conclusions: IRT proved useful for analyzing the MSQ. The approach warrants further testing in a more comprehensive item pool for headache impact that would enable computerized adaptive testing.  相似文献   

8.

Purpose

Computerized adaptive test (CAT) methods, based on item response theory (IRT), enable a patient-reported outcome instrument to be adapted to the individual patient while maintaining direct comparability of scores. The EORTC Quality of Life Group is developing a CAT version of the widely used EORTC QLQ-C30. We present the development and psychometric validation of the item pool for the first of the scales, physical functioning (PF).

Methods

Initial developments (including literature search and patient and expert evaluations) resulted in 56 candidate items. Responses to these items were collected from 1,176 patients with cancer from Denmark, France, Germany, Italy, Taiwan, and the United Kingdom. The items were evaluated with regard to psychometric properties.

Results

Evaluations showed that 31 of the items could be included in a unidimensional IRT model with acceptable fit and good content coverage, although the pool may lack items at the upper extreme (good PF). There were several findings of significant differential item functioning (DIF). However, the DIF findings appeared to have little impact on the PF estimation.

Conclusions

We have established an item pool for CAT measurement of PF and believe that this CAT instrument will clearly improve the EORTC measurement of PF.  相似文献   

9.
Background This study investigates the usefulness of the nonparametric monotone homogeneity model for evaluating and constructing Health-Related Quality-of-Life Scales consisting of polytomous items, and compares it to the often-used parametric graded response model. Methods The nonparametric monotone homogeneity model is a general model of which all known parametric models for polytomous items are special cases. Merits, drawbacks, and possibilities of nonparametric and parametric models and available software are discussed. Particular attention is given to the monotone homogeneity model (also known as the Mokken model), and the often-used parametric graded response model. Results Data from the WHOQOL-Bref were analyzed using both the monotone homogeneity model and the graded response model. The monotone homogeneity model analysis yielded unidimensional scales for each content domain. Scalability coefficients further showed that some items have limited scalability with respect to the other items in the same scale. The parametric IRT analyses lead to the rejection of some of the items. Conclusions The nonparametric monotone homogeneity model is highly suited for data analysis in a health-related quality-of-life context, and the parametric graded response model may add interesting features to measurement provided the model fits the data well.  相似文献   

10.

Purpose

The Patient-Reported Outcomes (PRO) Measurement Information System (PROMIS®) has developed assessment tools for numerous PROs, most using a 7-day recall format. We examined whether modifying the recall period for use in daily diary research would affect the psychometric characteristics of several PROMIS measures.

Methods

Daily versions of short-forms for three PROMIS domains (pain interference, fatigue, depression) were administered to a general population sample (n = 100) for 28 days. Analyses used multilevel item response theory (IRT) models. We examined differential item functioning (DIF) across recall periods by comparing the IRT parameters from the daily data with the PROMIS 7-day recall IRT parameters. Additionally, we examined whether the IRT parameters for day-to-day within-person changes are invariant to those for between-person (cross-sectional) differences in PROs.

Results

Dimensionality analyses of the daily data suggested a single dimension for each PRO domain, consistent with PROMIS instruments. One-third of the daily items showed uniform DIF when compared with PROMIS 7-day recall, but the impact of DIF on the scale level was minor. IRT parameters for within-person changes differed from between-person parameters for 3 depression items, which were more sensitive for measuring change than between-person differences, but not for pain interference and fatigue items. Notably, mean scores from daily diaries were significantly lower than the PROMIS 7-day recall norms.

Conclusions

The results provide initial evidence supporting the adaptation of PROMIS measures for daily diary research. However, scores from daily diaries cannot be directly interpreted on PROMIS norms established for 7-day recall.  相似文献   

11.

Background

The European Organisation of Research and Treatment of Cancer (EORTC) Quality of Life Group is developing computerized adaptive testing (CAT) versions of all EORTC Quality of Life Questionnaire (QLQ-C30) scales with the aim to enhance measurement precision. Here we present the results on the field-testing and psychometric evaluation of the item bank for cognitive functioning (CF).

Methods

In previous phases (I–III), 44 candidate items were developed measuring CF in cancer patients. In phase IV, these items were psychometrically evaluated in a large sample of international cancer patients. This evaluation included an assessment of dimensionality, fit to the item response theory (IRT) model, differential item functioning (DIF), and measurement properties.

Results

A total of 1030 cancer patients completed the 44 candidate items on CF. Of these, 34 items could be included in a unidimensional IRT model, showing an acceptable fit. Although several items showed DIF, these had a negligible impact on CF estimation. Measurement precision of the item bank was much higher than the two original QLQ-C30 CF items alone, across the whole continuum. Moreover, CAT measurement may on average reduce study sample sizes with about 35–40% compared to the original QLQ-C30 CF scale, without loss of power.

Conclusion

A CF item bank for CAT measurement consisting of 34 items was established, applicable to various cancer patients across countries. This CAT measurement system will facilitate precise and efficient assessment of HRQOL of cancer patients, without loss of comparability of results.
  相似文献   

12.
Li Y  Baser R 《Statistics in medicine》2012,31(18):2010-2026
The US Food and Drug Administration recently announced the final guidelines on the development and validation of patient-reported outcomes (PROs) assessments in drug labeling and clinical trials. This guidance paper may boost the demand for new PRO survey questionnaires. Henceforth, biostatisticians may encounter psychometric methods more frequently, particularly item response theory (IRT) models to guide the shortening of a PRO assessment instrument. This article aims to provide an introduction on the theory and practical analytic skills in fitting a generalized partial credit model (GPCM) in IRT. GPCM theory is explained first, with special attention to a clearer exposition of the formal mathematics than what is typically available in the psychometric literature. Then, a worked example is presented, using self-reported responses taken from the international personality item pool. The worked example contains step-by-step guides on using the statistical languages r and WinBUGS in fitting the GPCM. Finally, the Fisher information function of the GPCM model is derived and used to evaluate, as an illustrative example, the usefulness of assessment items by their information contents. This article aims to encourage biostatisticians to apply IRT models in the re-analysis of existing data and in future research.  相似文献   

13.
Purpose To improve the mental health component of the Work Disability Functional Assessment Battery (WD-FAB), developed for the US Social Security Administration’s (SSA) disability determination process. Specifically our goal was to expand the WD-FAB scales of mood & emotions, resilience, social interactions, and behavioral control to improve the depth and breadth of the current scales and expand the content coverage to include aspects of cognition & communication function. Methods Data were collected from a random, stratified sample of 1695 claimants applying for the SSA work disability benefits, and a general population sample of 2025 working age adults. 169 new items were developed to replenish the WD-FAB scales and analyzed using factor analysis and item response theory (IRT) analysis to construct unidimensional scales. We conducted computer adaptive test (CAT) simulations to examine the psychometric properties of the WD-FAB. Results Analyses supported the inclusion of four mental health subdomains: Cognition & Communication (68 items), Self-Regulation (34 items), Resilience & Sociability (29 items) and Mood & Emotions (34 items). All scales yielded acceptable psychometric properties. Conclusions IRT methods were effective in expanding the WD-FAB to assess mental health function. The WD-FAB has the potential to enhance work disability assessment both within the context of the SSA disability programs as well as other clinical and vocational rehabilitation settings.  相似文献   

14.
BackgroundCulture of safety (COS) is recognized as a critical component of patient safety but can be burdensome to measure due to survey length. This project aimed to develop a shortened COS survey with comparable measurement properties to a validated 19-item instrument.MethodsItem response theory (IRT) was used to reduce items from a 19-item COS survey at a 10-hospital health system. Using a 50% random sample, IRT was applied to evaluate survey question discrimination and information. Concepts from the key questions in each subdomain were reworded into a new abbreviated scale. Cognitive interviews with clinicians were conducted to validate reworded questions for adequacy, clarity, and consistency of interpretation.ResultsThe 19-item survey was reduced with IRT to 4 items. Cronbach's alpha for the 4-item IRT–derived scale was 0.80 (average inter-item covariance = 0.36) and was comparable to the original scale despite ~75% reduction in items. Pearson correlation between the 4-item scale and the original scale was > 0.90. The 4-item scale demonstrated convergent validity. Results were replicated in a 50% random validation sample. Cognitive interviews revealed inadequacy of the shortened scale in assessing error-reporting culture. A fifth item was developed and qualitatively validated for this construct.ConclusionUsing a mixed methods approach, a lengthy COS survey was condensed and revised to a brief 5-question survey with comparable measurement properties and respondent interpretation. A shorter instrument necessarily loses detailed insight into multiple aspects of safety culture, and organizations should consider trade-offs in choosing to develop a briefer survey.  相似文献   

15.
《Value in health》2022,25(7):1090-1098
ObjectivesAlthough best practices from electronic patient-reported outcome (PRO) measures are transferable, the migration of clinician-reported outcome (ClinRO) assessments to electronic modes requires recommendations that address their unique properties, such as the user (eg, clinician), and complexity associated with programming of clinical content. Faithful migration remains essential to ensuring that the content and psychometric properties of the original scale (ie, validated reference) are preserved, such that clinicians completing the ClinRO assessments interpret and respond to the items the same way regardless of data collection mode. The authors present a framework for how to “faithfully” migrate electronic ClinRO assessments for successful deployment in clinical trials.MethodsCritical Path Institute’s Electronic PRO Consortium and PRO Consortium convened a consensus panel of representatives from member firms to develop recommendations for electronic migration and implementation of ClinRO assessments in clinical trials based on industry standards, regulatory guidelines where available, and relevant literature. The recommendations were reviewed and approved by all member firms from both consortia.Consensus RecommendationsStandard, minimal electronic modifications for ClinRO assessments are described. This article also outlines implementation steps, including planning, startup, electronic clinical outcome assessment system development, training, and deployment. The consensus panel proposes that functional clinical testing by a clinician or clinical outcome assessment expert, as well as copyright holder review of screenshots (if possible) are sufficient to support minimal modifications during migration. Additional evidence generation is proposed for modifications that deviate significantly from the validated reference.  相似文献   

16.
《Value in health》2022,25(4):525-533
ObjectivesThe development of measures such as the EQ-HWB (EQ Health and Wellbeing) requires selection of items. This study explored the psychometric performance of candidate items, testing their validity in patients, social carer users, and carers.MethodsArticle and online surveys that included candidate items (N = 64) were conducted in Argentina, Australia, China, Germany, United Kingdom, and the United States. Psychometric assessment on missing data, response distributions, and known group differences was undertaken. Dimensionality was explored using exploratory and confirmatory factor analysis. Poorly fitting items were identified using information functions, and the function of each response category was assessed using category characteristic curves from item response theory (IRT) models. Differential item functioning was tested across key subgroups.ResultsThere were 4879 respondents (Argentina = 508, Australia = 514, China = 497, Germany = 502, United Kingdom = 1955, United States = 903). Where missing data were allowed, it was low (UK article survey 2.3%; US survey 0.6%). Most items had responses distributed across all levels. Most items could discriminate between groups with known health conditions with moderate to large effect sizes. Items were less able to discriminate across carers. Factor analysis found positive and negative measurement factors alongside the constructs of interest. For most of the countries apart from China, the confirmatory factor analysis model had good fit with some minor modifications. IRT indicated that most items had well-functioning response categories but there was some evidence of differential item functioning in many items.ConclusionsItems performed well in classical psychometric testing and IRT. This large 6-country collaboration provided evidence to inform item selection for the EQ-HWB measure.  相似文献   

17.

Purpose

Huntington disease (HD) is an incurable terminal disease. Thus, end of life (EOL) concerns are common in these individuals. A quantitative measure of EOL concerns in HD would enable a better understanding of how these concerns impact health-related quality of life. Therefore, we developed new measures of EOL for use in HD.

Methods

An EOL item pool of 45 items was field tested in 507 individuals with prodromal or manifest HD. Exploratory and confirmatory factor analyses (EFA and CFA, respectively) were conducted to establish unidimensional item pools. Item response theory (IRT) and differential item functioning analyses were applied to the identified unidimensional item pools to select the final items.

Results

EFA and CFA supported two separate unidimensional sets of items: Concern with Death and Dying (16 items), and Meaning and Purpose (14 items). IRT and DIF supported the retention of 12 Concern with Death and Dying items and 4 Meaning and Purpose items. IRT data supported the development of both a computer adaptive test (CAT) and a 6-item, static short form for Concern with Death and Dying.

Conclusion

The HDQLIFE Concern with Death and Dying CAT and corresponding 6-item short form, and the 4-item calibrated HDQLIFE Meaning and Purpose scale demonstrate excellent psychometric properties. These new measures have the potential to provide clinically meaningful information about end-of-life preferences and concerns to clinicians and researchers working with individuals with HD. In addition, these measures may also be relevant and useful for other terminal conditions.
  相似文献   

18.

Objectives

We propose the application of a bifactor model for exploring the dimensional structure of an item response matrix, and for handling multidimensionality.

Background

We argue that a bifactor analysis can complement traditional dimensionality investigations by: (a) providing an evaluation of the distortion that may occur when unidimensional models are fit to multidimensional data, (b) allowing researchers to examine the utility of forming subscales, and, (c) providing an alternative to non-hierarchical multidimensional models for scaling individual differences.

Method

To demonstrate our arguments, we use responses (N =  1,000 Medicaid recipients) to 16 items in the Consumer Assessment of Healthcare Providers and Systems (CAHPS©2.0) survey.

Analyses

Exploratory and confirmatory factor analytic and item response theory models (unidimensional, multidimensional, and bifactor) were estimated.

Results

CAHPS© items are consistent with both unidimensional and multidimensional solutions. However, the bifactor model revealed that the overwhelming majority of common variance was due to a general factor. After controlling for the general factor, subscales provided little measurement precision.

Conclusion

The bifactor model provides a valuable tool for exploring dimensionality related questions. In the Discussion, we describe contexts where a bifactor analysis is most productively used, and we contrast bifactor with multidimensional IRT models (MIRT). We also describe implications of bifactor models for IRT applications, and raise some limitations.
  相似文献   

19.

Purpose

A fundamental assumption of patient-reported outcomes (PRO) measurement is that all individuals interpret questions about their health status in a consistent manner, such that a measurement model can be constructed that is equivalently applicable to all people in the target population. The related assumption of sample homogeneity has been assessed in various ways, including the many approaches to differential item functioning analysis.

Methods

This expository paper describes the use of latent variable mixture modeling (LVMM), in conjunction with item response theory (IRT), to examine: (a) whether a sample is homogeneous with respect to a unidimensional measurement model, (b) implications of sample heterogeneity with respect to model-predicted scores (theta), and (c) sources of sample heterogeneity. An example is provided using the 10 items of the Short-Form Health Status (SF-36®) physical functioning subscale with data from the Canadian Community Health Survey (2003) (N = 7,030 adults in Manitoba).

Results

The sample was not homogeneous with respect to a unidimensional measurement structure. Specification of three latent classes, to account for sample heterogeneity, resulted in significantly improved model fit. The latent classes were partially explained by demographic and health-related variables.

Conclusion

The illustrative analyses demonstrate the value of LVMM in revealing the potential implications of sample heterogeneity in the measurement of PROs.
  相似文献   

20.
This article provides an overview of item response theory (IRT) models and how they can be appropriately applied to patient-reported outcomes (PROs) measurement. Specifically, the following topics are discussed: (a) basics of IRT, (b) types of IRT models, (c) how IRT models have been applied to date, and (d) new directions in applying IRT to PRO measurements.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号