首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Item banks and Computerized Adaptive Testing (CAT) have the potential to greatly improve the assessment of health outcomes. This review describes the unique features of item banks and CAT and discusses how to develop item banks. In CAT, a computer selects the items from an item bank that are most relevant for and informative about the particular respondent; thus optimizing test relevance and precision. Item response theory (IRT) provides the foundation for selecting the items that are most informative for the particular respondent and for scoring responses on a common metric. The development of an item bank is a multi-stage process that requires a clear definition of the construct to be measured, good items, a careful psychometric analysis of the items, and a clear specification of the final CAT. The psychometric analysis needs to evaluate the assumptions of the IRT model such as unidimensionality and local independence; that the items function the same way in different subgroups of the population; and that there is an adequate fit between the data and the chosen item response models. Also, interpretation guidelines need to be established to help the clinical application of the assessment. Although medical research can draw upon expertise from educational testing in the development of item banks and CAT, the medical field also encounters unique opportunities and challenges.  相似文献   

2.

Background

Modern psychometric methods based on item response theory (IRT) can be used to develop adaptive measures of health-related quality of life (HRQL). Adaptive assessment requires an item bank for each domain of HRQL. The purpose of this study was to develop item banks for five domains of HRQL relevant to arthritis.

Methods

About 1,400 items were drawn from published questionnaires or developed from focus groups and individual interviews and classified into 19 domains of HRQL. We selected the following 5 domains relevant to arthritis and related conditions: Daily Activities, Walking, Handling Objects, Pain or Discomfort, and Feelings. Based on conceptual criteria and pilot testing, 219 items were selected for further testing. A questionnaire was mailed to patients from two hospital-based clinics and a stratified random community sample. Dimensionality of the domains was assessed through factor analysis. Items were analyzed with the Generalized Partial Credit Model as implemented in Parscale. We used graphical methods and a chi-square test to assess item fit. Differential item functioning was investigated using logistic regression.

Results

Data were obtained from 888 individuals with arthritis. The five domains were sufficiently unidimensional for an IRT-based analysis. Thirty-one items were deleted due to lack of fit or differential item functioning. Daily Activities had the narrowest range for the item location parameter (-2.24 to 0.55) and Handling Objects had the widest range (-1.70 to 2.27). The mean (median) slope parameter for the items ranged from 1.15 (1.07) in Feelings to 1.73 (1.75) in Walking. The final item banks are comprised of 31–45 items each.

Conclusion

We have developed IRT-based item banks to measure HRQL in 5 domains relevant to arthritis. The items in the final item banks provide adequate psychometric information for a wide range of functional levels in each domain.  相似文献   

3.
BACKGROUND AND OBJECTIVE: Measuring physical functioning (PF) within and across postacute settings is critical for monitoring outcomes of rehabilitation; however, most current instruments lack sufficient breadth and feasibility for widespread use. Computer adaptive testing (CAT), in which item selection is tailored to the individual patient, holds promise for reducing response burden, yet maintaining measurement precision. We calibrated a PF item bank via item response theory (IRT), administered items with a post hoc CAT design, and determined whether CAT would improve accuracy and precision of score estimates over random item selection. METHODS: 1,041 adults were interviewed during postacute care rehabilitation episodes in either hospital or community settings. Responses for 124 PF items were calibrated using IRT methods to create a PF item bank. We examined the accuracy and precision of CAT-based scores compared to a random selection of items. RESULTS: CAT-based scores had higher correlations with the IRT-criterion scores, especially with short tests, and resulted in narrower confidence intervals than scores based on a random selection of items; gains, as expected, were especially large for low and high performing adults. CONCLUSION: The CAT design may have important precision and efficiency advantages for point-of-care functional assessment in rehabilitation practice settings.  相似文献   

4.
OBJECTIVE: To equate physical functioning (PF) items with Back Pain Functional Scale (BPFS) items, develop a computerized adaptive test (CAT) designed to assess lumbar spine functional status (LFS) in people with lumbar spine impairments, and compare discriminant validity of LFS measures (theta(IRT)) generated using all items analyzed with a rating scale Item Response Theory model (RSM) and measures generated using the simulated CAT (theta(CAT)). METHODS: We performed a secondary analysis of retrospective intake rehabilitation data. RESULTS: Unidimensionality and local independence of 25 BPFS and PF items were supported. Differential item functioning was negligible for levels of symptom acuity, gender, age, and surgical history. The RSM fit the data well. A lumbar spine specific CAT was developed that was 72% more efficient than using all 25 items to estimate LFS measures. theta(IRT) and theta(CAT) measures did not discriminate patients by symptom acuity, age, or gender, but discriminated patients by surgical history in similar clinically logical ways. theta(CAT) measures were as precise as theta(IRT) measures. CONCLUSION: A body part specific simulated CAT developed from an LFS item bank was efficient and produced precise measures of LFS without eroding discriminant validity.  相似文献   

5.
6.
Fatigue is a common symptom among cancer patients and the general population. Due to its subjective nature, fatigue has been difficult to effectively and efficiently assess. Modern computerized adaptive testing (CAT) can enable precise assessment of fatigue using a small number of items from a fatigue item bank. CAT enables brief assessment by selecting questions from an item bank that provide the maximum amount of information given a person's previous responses. This article illustrates steps to prepare such an item bank, using 13 items from the Functional Assessment of Chronic Illness Therapy Fatigue Subscale (FACIT-F) as the basis. Samples included 1022 cancer patients and 1010 people from the general population. An Item Response Theory (IRT)-based rating scale model, a polytomous extension of the Rasch dichotomous model was utilized. Nine items demonstrating acceptable psychometric properties were selected and positioned on the fatigue continuum. The fatigue levels measured by these nine items along with their response categories covered 66.8% of the general population and 82.6% of the cancer patients. Although the operational CAT algorithms to handle polytomously scored items are still in progress, we illustrated how CAT may work by using nine core items to measure level of fatigue. Using this illustration, a fatigue measure comparable to its full-length 13-item scale administration was obtained using four items. The resulting item bank can serve as a core to which will be added a psychometrically sound and operational item bank covering the entire fatigue continuum.  相似文献   

7.

Purpose

Sarcoidosis is a multisystem disease that can negatively impact health-related quality of life (HRQL) across generic (e.g., physical, social and emotional wellbeing) and disease-specific (e.g., pulmonary, ocular, dermatologic) domains. Measurement of HRQL in sarcoidosis has largely relied on generic patient-reported outcome tools, with little disease-specific measures available. The purpose of this paper is to present the development and testing of disease-specific item banks and short forms of lung, skin and eye problems, which are a part of a new patient-reported outcome (PRO) instrument called the sarcoidosis assessment tool.

Methods

After prioritizing and selecting the most important disease-specific domains, we wrote new items to reflect disease-specific problems by drawing from patient focus group and clinician expert survey data that were used to create our conceptual model of HRQL in sarcoidosis. Item pools underwent cognitive interviews by sarcoidosis patients (n = 13), and minor modifications were made. These items were administered in a multi-site study (n = 300) to obtain item calibrations and create calibrated short forms using item response theory (IRT) approaches.

Results

From the available item pools, we created four new item banks and short forms: (1) skin problems, (2) skin stigma, (3) lung problems, and (4) eye Problems. We also created and tested supplemental forms of the most common constitutional symptoms and negative effects of corticosteroids.

Conclusions

Several new sarcoidosis-specific PROs were developed and tested using IRT approaches. These new measures can advance more precise and targeted HRQL assessment in sarcoidosis clinical trials and clinical practice.  相似文献   

8.
BACKGROUND AND OBJECTIVES: Most health-related quality-of-life questionnaires include multi-item scales. Scale scores are usually estimated as simple sums of the item scores. However, scoring procedures utilizing more information from the items might improve measurement abilities, and thereby reduce the needed sample sizes. We investigated whether item response theory (IRT)-based scoring improved the measurement abilities of the EORTC QLQ-C30 physical functioning, emotional functioning, and fatigue scales. METHODS: Using a database of 13,010 subjects we estimated the relative validities of IRT scoring compared to sum scoring of the scales. RESULTS: The mean relative validities were 1.04 (physical), 1.03 (emotional), and 0.97 (fatigue). None of these were significantly larger than 1. Thus, no gain in measurement abilities using IRT scoring was found for these scales. Possible explanations include that the items in the scales are not constructed for IRT scoring and that the scales are relatively short. CONCLUSION: IRT scoring of the three longest EORTC QLQ-C30 scales did not improve measurement abilities compared to the traditional sum scoring of the scales.  相似文献   

9.

Purpose  

Content validity of patient-reported outcomes (PROs) is evaluated primarily during item development, but subsequent psychometric analyses, particularly for item response theory (IRT)-derived scales, often result in considerable item pruning and potential loss of content. After selecting items for the PROMIS banks based on psychometric and content considerations, we invited external content expert reviews of the degree to which the initial domain names and definitions represented the calibrated item bank content.  相似文献   

10.

Objective

The objective of the present study is to describe the extension of the National Institutes of Health Patient-Reported Outcomes Measurement Information System (PROMIS®) pediatric parent proxy-report item banks for parents of children ages 5–7 years, and to investigate differential item functioning (DIF) between the data obtained from parents of 5–7-year-old children with the data obtained from parents of 8–17 year-old children in the original construction of the scales.

Methods

Item response theory (IRT) analyses of DIF were conducted comparing data from the 5–7 age group with data from the established scales for ages 8–17 across 5 generic health domains (physical functioning, pain, fatigue, emotional health, and social health) and asthma.

Results

IRT DIF analyses revealed that the majority of the items functioned similarly with responses from parents of younger and older children. A small number of items were removed from the item bank for younger children, and a few items that exhibited statistical DIF were retained in the pools with the caveat that they should not be used in studies that involve comparisons of younger children with older children.

Conclusions

The study confirms that most of the items in the PROMIS parent proxy-report item banks can be used with parents of children ages 5–7. It is anticipated that these new scales will have application for younger pediatric populations when pediatric self-report is not feasible.
  相似文献   

11.
BACKGROUND AND OBJECTIVE: To develop computerized adaptive tests (CATs) designed to assess lower extremity functional status (FS) in people with lower extremity impairments using items from the Lower Extremity Functional Scale and compare discriminant validity of FS measures generated using all items analyzed with a rating scale Item Response Theory model (theta(IRT)) and measures generated using the simulated CATs (theta(CAT)). METHODS: Secondary analysis of retrospective intake rehabilitation data. RESULTS: Unidimensionality of items was strong, and local independence of items was adequate. Differential item functioning (DIF) affected item calibration related to body part, that is, hip, knee, or foot/ankle, but DIF did not affect item calibration for symptom acuity, gender, age, or surgical history. Therefore, patients were separated into three body part specific groups. The rating scale model fit all three data sets well. Three body part specific CATs were developed: each was 70% more efficient than using all LEFS items to estimate FS measures. theta(IRT) and theta(CAT) measures discriminated patients by symptom acuity, age, and surgical history in similar ways. theta(CAT) measures were as precise as theta(IRT) measures. CONCLUSION: Body part-specific simulated CATs were efficient and produced precise measures of FS with good discriminant validity.  相似文献   

12.

Purpose

To develop a social health measurement framework, to test items in diverse populations and to develop item response theory (IRT) item banks.

Methods

A literature review guided framework development of Social Function and Social Relationships sub-domains. Items were revised based on patient feedback, and Social Function items were field-tested. Analyses included exploratory factor analysis (EFA), confirmatory factor analysis (CFA), two-parameter IRT modeling and evaluation of differential item functioning (DIF).

Results

The analytic sample included 956 general population respondents who answered 56 Ability to Participate and 56 Satisfaction with Participation items. EFA and CFA identified three Ability to Participate sub-domains. However, because of positive and negative wording, and content redundancy, many items did not fit the IRT model, so item banks do not yet exist. EFA, CFA and IRT identified two preliminary Satisfaction item banks. One item exhibited trivial age DIF.

Conclusion

After extensive item preparation and review, EFA-, CFA- and IRT-guided item banks help provide increased measurement precision and flexibility. Two Satisfaction short forms are available for use in research and clinical practice. This initial validation study resulted in revised item pools that are currently undergoing testing in new clinical samples and populations.  相似文献   

13.

Background

The European Organisation of Research and Treatment of Cancer (EORTC) Quality of Life Group is developing computerized adaptive testing (CAT) versions of all EORTC Quality of Life Questionnaire (QLQ-C30) scales with the aim to enhance measurement precision. Here we present the results on the field-testing and psychometric evaluation of the item bank for cognitive functioning (CF).

Methods

In previous phases (I–III), 44 candidate items were developed measuring CF in cancer patients. In phase IV, these items were psychometrically evaluated in a large sample of international cancer patients. This evaluation included an assessment of dimensionality, fit to the item response theory (IRT) model, differential item functioning (DIF), and measurement properties.

Results

A total of 1030 cancer patients completed the 44 candidate items on CF. Of these, 34 items could be included in a unidimensional IRT model, showing an acceptable fit. Although several items showed DIF, these had a negligible impact on CF estimation. Measurement precision of the item bank was much higher than the two original QLQ-C30 CF items alone, across the whole continuum. Moreover, CAT measurement may on average reduce study sample sizes with about 35–40% compared to the original QLQ-C30 CF scale, without loss of power.

Conclusion

A CF item bank for CAT measurement consisting of 34 items was established, applicable to various cancer patients across countries. This CAT measurement system will facilitate precise and efficient assessment of HRQOL of cancer patients, without loss of comparability of results.
  相似文献   

14.

Purpose

To develop a vision-targeted health-related quality of life (HRQOL) measure for the NIH Toolbox for the Assessment of Neurological and Behavioral Function.

Methods

We conducted a review of existing vision-targeted HRQOL surveys and identified color vision, low luminance vision, distance vision, general vision, near vision, ocular symptoms, psychosocial well-being, and role performance domains. Items in existing survey instruments were sorted into these domains. We selected non-redundant items and revised them to improve clarity and to limit the number of different response options. We conducted 10 cognitive interviews to evaluate the items. Finally, we revised the items and administered them to 819 individuals to calibrate the items and estimate the measure’s reliability and validity.

Results

The field test provided support for the 53-item vision-targeted HRQOL measure encompassing 6 domains: color vision, distance vision, near vision, ocular symptoms, psychosocial well-being, and role performance. The domain scores had high levels of reliability (coefficient alphas ranged from 0.848 to 0.940). Validity was supported by high correlations between National Eye Institute Visual Function Questionnaire scales and the new-vision-targeted scales (highest values were 0.771 between psychosocial well-being and mental health, and 0.729 between role performance and role difficulties), and by lower mean scores in those groups self-reporting eye disease (F statistic with p < 0.01 for all comparisons except cataract with ocular symptoms, psychosocial well-being, and role performance scales).

Conclusions

This vision-targeted HRQOL measure provides a basis for comprehensive assessment of the impact of eye diseases and treatments on daily functioning and well-being in adults.  相似文献   

15.
Item response theory (IRT), item banking and computer adaptive testing (CAT) methods have the potential to provide novel platforms for the collection, analysis and dissemination of patient data on health status and well-being. There are considerable challenges associated with building and maintaining a national item bank and it is uncertain whether there is sufficient interest among key stakeholders for IRT-based and CAT measures. The most convincing activity is demonstrating that the approach is feasible, psychometrically sound and useful in different specific applications. Demonstrated success opens up the possibility of more widespread acceptability and application. As part of the development effort, there needs to be continued meetings and discussion with psychometricians, instrument developers, clinical researchers, the FDA, pharmaceutical industry researchers and a managed care organizations about the advantages and disadvantages of a national item bank.  相似文献   

16.

Purpose

Computerized adaptive test (CAT) methods, based on item response theory (IRT), enable a patient-reported outcome instrument to be adapted to the individual patient while maintaining direct comparability of scores. The EORTC Quality of Life Group is developing a CAT version of the widely used EORTC QLQ-C30. We present the development and psychometric validation of the item pool for the first of the scales, physical functioning (PF).

Methods

Initial developments (including literature search and patient and expert evaluations) resulted in 56 candidate items. Responses to these items were collected from 1,176 patients with cancer from Denmark, France, Germany, Italy, Taiwan, and the United Kingdom. The items were evaluated with regard to psychometric properties.

Results

Evaluations showed that 31 of the items could be included in a unidimensional IRT model with acceptable fit and good content coverage, although the pool may lack items at the upper extreme (good PF). There were several findings of significant differential item functioning (DIF). However, the DIF findings appeared to have little impact on the PF estimation.

Conclusions

We have established an item pool for CAT measurement of PF and believe that this CAT instrument will clearly improve the EORTC measurement of PF.  相似文献   

17.
To make meaningful cross-cultural comparisons of health-related quality of life (HRQOL) or to pool international research data, it is essential to create culturally unbiased measures that detect clinically important differences between patients. We evaluated the measurement properties of the Functional Assessment of Cancer Therapy-Breast (FACT-B) in 111 Austrian and 144 U.S. patients with breast cancer using item response theory (IRT) methods. A small number of items were identified as displaying statistically significant differential item functioning (DIF), suggesting possible measurement bias. The majority of the items functioned similarly between the two cultural groups. U.S. patients reported lower (worse) physical function and well-being compared with Austrian patients, higher (better) social/family well-being and similar emotional well-being, before and after adjustment for DIF. IRT and related measurement models provide useful methods for assessing cross-cultural equivalence and determining which items can be pooled across languages before analyzing HRQOL data. Determination of clinically significant cross-cultural differences will require additional investigation.  相似文献   

18.
OBJECTIVE: The Patient-Reported Outcomes Measurement Information System (PROMIS) was initiated to improve precision, reduce respondent burden, and enhance the comparability of health outcomes measures. We used item response theory (IRT) to construct and evaluate a preliminary item bank for physical function assuming four subdomains. STUDY DESIGN AND SETTING: Data from seven samples (N=17,726) using 136 items from nine questionnaires were evaluated. A generalized partial credit model was used to estimate item parameters, which were normed to a mean of 50 (SD=10) in the US population. Item bank properties were evaluated through Computerized Adaptive Test (CAT) simulations. RESULTS: IRT requirements were fulfilled by 70 items covering activities of daily living, lower extremity, and central body functions. The original item context partly affected parameter stability. Items on upper body function, and need for aid or devices did not fit the IRT model. In simulations, a 10-item CAT eliminated floor and decreased ceiling effects, achieving a small standard error (< 2.2) across scores from 20 to 50 (reliability >0.95 for a representative US sample). This precision was not achieved over a similar range by any comparable fixed length item sets. CONCLUSION: The methods of the PROMIS project are likely to substantially improve measures of physical function and to increase the efficiency of their administration using CAT.  相似文献   

19.

Background  

Health Related Quality of Life (HRQoL) is a relevant variable in the evaluation of health outcomes. Questionnaires based on Classical Test Theory typically require a large number of items to evaluate HRQoL. Computer Adaptive Testing (CAT) can be used to reduce tests length while maintaining and, in some cases, improving accuracy. This study aimed at validating a CAT based on Item Response Theory (IRT) for evaluation of generic HRQoL: the CAT-Health instrument.  相似文献   

20.
This paper reviews important methodological considerations for developing item banks and computerized adaptive scales (commonly called computerized adaptive tests in the educational measurement literature, yielding the acronym CAT), including issues of the reference population, dimensionality, dichotomous versus polytomous response scales, differential item functioning (DIF) and conditional scoring, mode effects, the impact of local dependence, and innovative approaches to assessment using CATs in health outcomes research.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号