首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
This study investigated psychometric properties of an instrument for assessing perceived occupational value, the 26-item OVal-pd. Data from 225 Swedish subjects with and without known mental illness were analysed regarding fit to the Rasch measurement model (partial credit model), differential item functioning (DIF), and functioning of the OVal-pd four-category response scale. The reliability (index of person separation, analogous to Cronbach's alpha) was good (0.92) but there were signs of overall and item level (six items) misfit. There was DIF between people with and without mental illness for three items. Iterative deletion of misfitting items resulted in a new 18-item DIF-free scale with good overall and individual item fit and maintained reliability (0.91). There were no disordered response category thresholds. These observations also held true in separate analyses among people with and without mental illness. Thus, the first steps of ensuring that occupational value can be measured in a valid and reliable way have been taken. Still, occupational value is a dynamic construct and the aspects that fit the construct may vary between contexts. This has implications for, e.g., cross-cultural research and calls for identification of a core set of culture-free items to allow for valid cross-cultural comparisons.  相似文献   

2.
Rasch analysis is now used widely to assess the measurement properties of health status questionnaires. This study tested the stability of the AQ20 – a dichotomous-response measure of health status in asthma, using parameters estimated by a Rasch model. One hundred forty-four asthma patients completed the AQ20 on five occasions over 3 months. At visit 1, two items showed significant misfit and were deleted. At each visit, the overall mean item–person and item–trait interaction statistics from the remaining 18 items (AQ18) were very similar. The repeatability of the item calibrations was excellent (intraclass correlation coefficient 0.95), despite the patients’ health having changed (repeated-measures ANOVA: FEV1 and AQ18 score p<0.0001). Tests of differential item functioning (DIF) over time showed that one item increased in severity. This item refers to ability to garden, and changes in response patterns could be related to seasonal changes over the study period. We conclude that this study has highlighted the usefulness of multiple repeat assessments which allow items to be tested for DIF over time. This is important as inclusion of ‘time-dependent’ items in a questionnaire may reduce the reliability of the instrument.  相似文献   

3.
OBJECTIVE: The Multidimensional Health Locus of Control (MHLC) scales are widely used to measure beliefs about determinants of persons' health. We evaluated the scales over the largest-ever disease-specific sample of subjects using a combined-method psychometric approach. STUDY DESIGN AND SETTING: We performed a secondary analysis of data from 1,206 subjects from three osteoarthritis studies, using Rasch analysis and confirmatory factor analysis simultaneously. Differential item functioning (DIF) by gender and data source, scale dimensionality, and item fit were examined. The Rasch model fit the data if Rasch residual principal components analysis (PCA) corroborated three distinct dimensions and item fit statistics fell between 0.80 and 1.20. The confirmatory factor (CFA) model fit the data if factor loadings exceeded 0.50 for all items. RESULTS: DIF by gender or data source was not materially evident for any items. PCA supported existence of three dimensions in the data. Both Rasch and CFA models fit the data for 16 items; two items were detected as misperforming. When these items were removed, fit of both models improved. CONCLUSION: Results of this large-sample evaluation of the MHLC scales corroborated earlier findings that removal of certain items improves the scales. The combined Rasch-CFA approach provided better insight to scale performance problems than either method alone provided.  相似文献   

4.
This paper compares a qualitative and a quantitative (Rasch) method of item assessment for developing the content of a food insecurity scale for Bangladesh. Data are derived from the Bangladesh Food Insecurity Measurement and Validation Study, in which researchers collected 2 rounds of ethnographic information and 3 rounds of conventional household survey data between 2001 and 2003. The qualitative method of scale development relied on content experts and respondents themselves to evaluate household food insecurity items generated through ethnographic research. The quantitative method applied the Rasch model to assess the fit of the same items using representative survey data. The Rasch model was then used to test for differential item functioning (DIF) across diverse demographic and geographic subgroups. The qualitative assessment flagged and discarded 10 items, leaving 13. The Rasch assessment of infit and outfit flagged 3 items, and the Rasch DIF test discarded another 10 items, leaving a total of 10 items in the Rasch-derived scale. The 2 scales contained 8 of the same items. The qualitatively and quantitatively derived scales were highly correlated (r = 0.96, P < 0.01), and the 2 methods located 90% of households in the same food insecurity tercile. This convergence lends added confidence to the use of either scale for identifying food-insecure households in different regions of Bangladesh. Multiple methods should continue to be applied in a systematic and transparent way to lend additional credence to the results when they converge and to pinpoint directions for further clarification where they do not.  相似文献   

5.
ObjectivesFatigue is a common and distressing symptom in cancer patients due to both the disease and its treatments. The concept of fatigue is multidimensional and includes both physical and mental components. The 22-item Revised Piper Fatigue Scale (RPFS) is a multidimensional instrument developed to assess cancer-related fatigue. This study reports on the construct validity of the Swedish version of the RPFS from the perspective of Rasch measurement.MethodsThe Swedish version of the RPFS was answered by 196 cancer patients fatigued after 4 to 5 weeks of curative radiation therapy. Data from the scale were fitted to the Rasch measurement model. This involved testing a series of assumptions, including the stochastic ordering of items, local response dependency, and unidimensionality. A series of fit statistics were computed, differential item functioning (DIF) was tested, and local response dependency was accommodated through testlets.ResultsThe Behavioral, Affective and Sensory domains all satisfied the Rasch model expectations. No DIF was observed, and all domains were found to be unidimensional. The Mood/Cognitive scale failed to fit the model, and substantial multidimensionality was found. Splitting the scale between Mood and Cognitive items resolved fit to the Rasch model, and new domains were unidimensional without DIF.ConclusionsThe current Rasch analyses add to the evidence of measurement properties of the scale and show that the RPFS has good psychometric properties and works well to measure fatigue. The original four-factor structure, however, was not supported.  相似文献   

6.
BACKGROUND AND OBJECTIVE: To develop computerized adaptive tests (CATs) designed to assess lower extremity functional status (FS) in people with lower extremity impairments using items from the Lower Extremity Functional Scale and compare discriminant validity of FS measures generated using all items analyzed with a rating scale Item Response Theory model (theta(IRT)) and measures generated using the simulated CATs (theta(CAT)). METHODS: Secondary analysis of retrospective intake rehabilitation data. RESULTS: Unidimensionality of items was strong, and local independence of items was adequate. Differential item functioning (DIF) affected item calibration related to body part, that is, hip, knee, or foot/ankle, but DIF did not affect item calibration for symptom acuity, gender, age, or surgical history. Therefore, patients were separated into three body part specific groups. The rating scale model fit all three data sets well. Three body part specific CATs were developed: each was 70% more efficient than using all LEFS items to estimate FS measures. theta(IRT) and theta(CAT) measures discriminated patients by symptom acuity, age, and surgical history in similar ways. theta(CAT) measures were as precise as theta(IRT) measures. CONCLUSION: Body part-specific simulated CATs were efficient and produced precise measures of FS with good discriminant validity.  相似文献   

7.

Purpose

The shoulder pain and disability index (SPADI) has been extensively evaluated for its psychometric properties using classical test theory (CTT). The purpose of this study was to evaluate its structural validity using Rasch model analysis.

Methods

Responses to the SPADI from 1030 patients referred for physiotherapy with shoulder pain and enrolled in a prospective cohort study were available for Rasch model analysis. Overall fit, individual person and item fit, response format, dependence, unidimensionality, targeting, reliability and differential item functioning (DIF) were examined.

Results

The SPADI pain subscale initially demonstrated a misfit due to DIF by age and gender. After iterative analysis it showed good fit to the Rasch model with acceptable targeting and unidimensionality (overall fit Chi-square statistic 57.2, p?=?0.1; mean item fit residual 0.19 (1.5) and mean person fit residual 0.44 (1.1); person separation index (PSI) of 0.83. The disability subscale however shows significant misfit due to uniform DIF even after iterative analyses were used to explore different solutions to the sources of misfit (overall fit (Chi-square statistic 57.2, p?=?0.1); mean item fit residual 0.54 (1.26) and mean person fit residual 0.38 (1.0); PSI 0.84).

Conclusions

Rasch Model analysis of the SPADI has identified some strengths and limitations not previously observed using CTT methods. The SPADI should be treated as two separate subscales. The SPADI is a widely used outcome measure in clinical practice and research; however, the scores derived from it must be interpreted with caution. The pain subscale fits the Rasch model expectations well. The disability subscale does not fit the Rasch model and its current format does not meet the criteria for true interval-level measurement required for use as a primary endpoint in clinical trials. Clinicians should therefore exercise caution when interpreting score changes on the disability subscale and attempt to compare their scores to age- and sex-stratified data.
  相似文献   

8.
Purpose

Psychosomatic symptoms and mental health problems are highly prevalent in multimorbid elderly people challenging general practitioners to differentiate between normal stress and psychopathological conditions. The 4DSQ is a Dutch questionnaire developed to detect anxiety, depression, somatization, and distress in primary care. This study aims to analyze measurement equivalence between a German version and the original Dutch instrument.

Methods

A Dutch and a German sample of multimorbid elderly people, matched by gender and age, were analyzed. Equivalence of scale structures was assessed by confirmatory factor analysis (CFA). To evaluate measurement equivalence across languages, differential item functioning (DIF) was analyzed using Mantel–Haenszel method and hybrid ordinal logistic regression analysis. Differential test functioning (DTF) was assessed using Rasch analysis.

Results

A total of 185 German and 185 Dutch participants completed the questionnaire. The CFA confirmed one-factor models for all scales of both 4DSQ versions. Nine items in three scales were flagged with DIF. The anxiety scale showed to be free of DIF. DTF analysis revealed negligible scale impact of DIF.

Conclusions

The German 4DSQ demonstrated measurement equivalence to the original Dutch instrument. Hence, it can be considered a valid questionnaire for the screening for mental health problems in primary care.

  相似文献   

9.
Purpose

To assess the equivalence of self-reports of physical functioning between pediatric respondents to the English- and Spanish-language patient-reported outcomes measurement information system (PROMIS®) physical functioning item banks.

Methods

The PROMIS pediatric physical functioning item banks include 29 upper extremity items and 23 mobility items. A sample of 5091 children and adolescents (mean age = 12 years, range 8–17; 49% male) completed the English-language version of the items. A sample of 605 children and adolescents (mean age = 12 years, range 8–17; 55% male; 96% Hispanic) completed the Spanish-language version of the items.

Results

We found language (English versus Spanish) differential item functioning (DIF) for 4 upper extremity items and 7 mobility items. Product-moment correlations between estimated upper extremity and mobility scores using the English versus the equated Spanish item parameters for Spanish-language respondents were 0.98 and 0.99, respectively. After excluding cases with significant person misfit, we found DIF for the same 4 upper extremity items that had DIF in the full sample and for 12 mobility items (including the same 7 mobility items that had DIF in the full sample). The identification of DIF items between English- and Spanish-language respondents was affected slightly by excluding respondents displaying person misfit.

Conclusions

The results of this study provide support for measurement equivalence of self-reports of physical functioning by children and adolescents who completed the English- and Spanish-language surveys. Future analyses are needed to replicate the results of this study in other samples.

  相似文献   

10.
Objective  To examine the psychometric characteristics of the brief version of the World Health Organization Quality of Life (WHOQOL-BREF) questionnaire in rural-community-dwelling older people in Taiwan using Rasch analysis. Methods  This is a cross-sectional study. A total of 1200 subjects aged ≥65 years were recruited to complete the 26-item WHOQOL-BREF. Scale dimensionality, item difficulty, scale reliability and separation, item targeting, item-person map, and differential item functioning (DIF) were examined. Results  The four WHOQOL-BREF scales (physical capacity, psychological well-being, social relationships, and environment) were found to be unidimensional and reliable. The item–person map for each domain indicated that the spread of the item thresholds sufficiently covered the latent trait continuum being measured. However, gaps in content coverage were identified in the social domain. Analyses of the DIF revealed that one psychological item (body image) exhibited DIF across the two age groups (old–old vs. young–old) and that two social items (sexual activity and friends’ support) displayed DIF across genders and the two age groups. Conclusions  Rasch analysis is a comprehensive method of psychometric evaluation of the WHOQOL-BREF and identifies areas for improvements. Three items displaying age-related DIF (body image, sexual activity, and friends’ support) may potentially cause biased health-related QOL assessments, and their impacts on scores should be carefully examined.  相似文献   

11.
Background:  One method of evaluating the construct validity of instruments is the Rasch Measurement Model (RMM), an increasingly popular method used for test construction and validation.
Aim:  The aim was to examine the construct validity of the Developmental Test of Visual-Motor Integration 5th Edition (VMI) by applying the RMM to evaluate its scalability, dimensionality, differential item functioning and hierarchical ordering.
Method:  The participants were 400 children aged 5 to 12 years, recruited from six schools in Melbourne, Victoria, who completed the VMI under the supervision of an occupational therapist. VMI items 1, 2 and 3 were excluded from the Rasch analysis since all of the children achieved a perfect score on these items.
Results:  None of the items exhibited RMM misfit due to goodness-of-fit mean square (MnSq) infit statistics and standardised z (ZStd) scores being outside the specified acceptable range. VMI item 9 (copied circle) exhibited differential item functioning based on gender. In relation to hierarchical ordering of items, several were found to have similar logit difficulty values. For example, VMI items 26, 27 and 29; items 18, 22 and 24; and items 4, 5 and 11 were found to have the same level of challenge. As well, the VMI scale item logit measure order did not match that presented in the VMI test manual.
Conclusion:  Theoretically, the VMI items are developmentally ordered; however, this ordering was not mirrored by the item logit difficulty scores obtained. This has scoring implications, where scoring a respondent's VMI test booklet is terminated after three consecutive items are not passed. Clinicians should also be aware that item 9 may exhibit bias related to gender.  相似文献   

12.
ObjectiveTo assess psychometric quality of the vision-related quality of life core measure (VCM1) and feasibility in a community-based sample.Study Design and SettingCross-sectional data were used from an observational study among visually impaired patients (n = 296) and a community-based sample with low vision (n = 98) from the Longitudinal Aging Study Amsterdam. Calibration was performed within the graded response model on the patient sample, including item fit, differential item functioning (DIF), DIF impact, and psychometric information. DIF between both samples was investigated for assessing feasibility of the VCM1 in community-based studies.ResultsAll items fitted the model. There was no significant DIF within the patient sample, except between self-report and proxy report subgroups. The maximum difference in expected scores was ?0.42. Item information was highest for item 4 “depression” and lowest for item 1 “embarrassment.” Test information showed full coverage of the disability continuum. DIF was present between patient and community-based samples. However, DIF items had low impact on the expected test scores.ConclusionsDIF that was found on single items between administration type subgroups and sample subgroups was negligible at the level of the expected test scores. This means that DIF had no substantial impact on the VCM1. Therefore, psychometric quality and feasibility of the VCM1 can be considered satisfactory.  相似文献   

13.
The Core Food Security Measure (CFSM) is used nationally to assess the extent and severity of household food insecurity in the previous 12 mo due to inadequate money for food. Both a scale measure and a categorical measure were developed from a national cross-sectional sample. The objective of this research was to determine whether the CFSM scale measure is a reliable and valid food security measure for use in Hawaii, where at least 50% of the population is of Asian or Pacific Islander descent. We completed an independent assessment of the robustness of the internal scale construct validity of the CSFM scale measure and hierarchical order of items using the same Rasch methods used previously to develop the CSFM. From a sample of 1664 respondents, data from 362 were used in the Rasch analysis. Item goodness-of-fit statistics indicated that responses to the "adults cut the size or skip meals" item and its follow-up item were redundant [outfit mean-square residual (MnSq) = 0.6, z = -2]. Responses to the "(un)able to eat balanced meals" item were erratic (outfit MnSq = 2.1, z = 2). Findings pertaining to goodness-of-fit of the respondents indicated an acceptable rate of misfit (4.7%). Rate of misfit did not vary with family status or with any ethnic group except the Samoans. Overall, the CFSM scale measure fit as well with the Hawaii data as it did with national data, although identified limitations may affect food security monitoring and research.  相似文献   

14.

Objective

To evaluate the extent of differential item functioning (DIF) within the thyroid-specific quality of life patient-reported outcome measure, ThyPRO, according to sex, age, education and thyroid diagnosis.

Study design and setting

A total of 838 patients with benign thyroid diseases completed the ThyPRO questionnaire (84 five-point items, 13 scales). Uniform and nonuniform DIF were investigated using ordinal logistic regression, testing for both statistical significance and magnitude (?R 2 > 0.02). Scale level was estimated by the sum score, after purification.

Results

Twenty instances of DIF in 17 of the 84 items were found. Eight according to diagnosis, where the goiter scale was the one most affected, possibly due to differing perceptions in patients with auto-immune thyroid diseases compared to patients with simple goiter. Eight DIFs according to age were found, of which 5 were in positively worded items, which younger patients were more likely to endorse; one according to gender: women were more likely to report crying, and three according to educational level. The vast majority of DIF had only minor influence on the scale scores (0.1–2.3 points on the 0–100 scales), but two DIF corresponded to a difference of 4.6 and 9.8, respectively.

Conclusion

Ordinal logistic regression identified DIF in 17 of 84 items. The potential impact of this on the present scales was low, but items displaying DIF could be avoided when developing abbreviated scales, where the potential impact of DIF (due to fewer items) will be larger.  相似文献   

15.
Background The original Dutch Four-Dimensional Symptom Questionnaire (4DSQ), which measures distress, depression, anxiety and somatization, has been translated into Polish with the aim of providing primary health care with a good screening instrument for the detection of the most prevalent mental health problems (anxiety, somatization, depression and distress). Aim To check if the Polish version is cross-culturally valid so that the scores of Polish subjects can be compared with the scores of Dutch subjects and the Dutch cut-off points can be used in Polish subjects. Method 4DSQ data were collected from a mixed sample of students and primary care attendees. The Polish data were compared with the 4DSQ data of a matched sample of Dutch students and primary care attendees. Two methods of differential item functioning (DIF) analysis, ordinal logistic regression and generalized Mantel-Haenszel, were used to detect items with DIF, and linear regression analysis was used to estimate the scale-level impact of DIF. Results Four items showing DIF were detected in the distress scale, one in the somatization scale and one in the anxiety scale. The DIF in distress caused Polish subjects with moderate scores to score circa 1 point less than their Dutch counterparts. Conclusions The results of the DIF analyses suggest that the Polish 4DSQ measures the same constructs as the Dutch 4DSQ and that the Dutch norms can be used for the Polish subjects, except for distress: the first cut-off point should be one point lower.  相似文献   

16.
We present results of item-response bias analyses of the exogenous variables age, gender, and race for all items from the Center for Epidemiologic Studies Depression (CES-D) scale using data (N = 2340) from the New Haven component of the Established Populations for Epidemiologic Studies of the Elderly (EPESE). The proportional odds of blacks responding higher on the CES-D items "people are unfriendly" and "people dislike me" were 2.29 (95% confidence interval: 1.74, 3.02) and 2.96 (95% confidence interval: 2.15, 4.07) times that of whites matched on overall depressive symptoms, respectively. In addition, the proportional odds of women responding higher on the CES-D item "crying spells" were 2.14 (95% confidence interval: 1.60, 2.82) times that of men matched on overall depressive symptoms. Our data indicate the CES-D would have greater validity among this diverse group of older men and women after removal of the crying item and two interpersonal items.  相似文献   

17.
ObjectiveComputer adaptive tests (CATs) offer a flexible, test fair, and economic opportunity for accurate measurement of anxiety in patients with cardiovascular diseases (CVDs). The objective of this study was to develop and calibrate an item bank [anxiety item bank for cardiovascular patients (AIB-cardio)] as a prerequisite for an anxiety-CAT in CVD patients.Study Design and SettingAfter pretesting for relevance and comprehension, a pool of 155 anxiety items was answered on a five-point Likert scale. Sample consisted of 715 CVD patients, who were recruited in 14 German cardiac rehabilitation centers. A confirmatory factor analysis (CFA), Mokken analysis, and Rasch analysis were conducted.ResultsThe results of CFA and Mokken analysis confirmed one factor structure and double monotonicity. In Rasch analysis, merging response categories and removing items with misfit, differential item functioning or local response dependency reduced the AIB-cardio to 37 items. The AIB-cardio fitted to the Rasch model with a nonsignificant item–trait interaction (chi-square, 133.89; degrees of freedom, 111; P = 0.07). Person separation reliability was 0.85, and unidimensionality could be verified.ConclusionThe calibrated, unidimensional AIB-cardio provides the basis for a CAT to assess anxiety in rehabilitation patients with CVD with good psychometric properties. Further testing in other cardiovascular patients is needed to increase generalizability.  相似文献   

18.

Purpose

The aim of this study was to explore the psychometric properties of the 22-item Social Participation Questionnaire (SPQ).

Methods

The SPQ was administered to 789 adult primary care patients with depressive symptoms. As the items were intended to be summed together to provide total score, Rasch analysis (partial credit model) was applied to assess the overall fit of the model, individual item fit, differential item functioning (DIF), targeting of persons, response dependency, unidimensionality and person separation.

Results

To improve the scale’s fit, it was necessary to re-score the response format. Two items demonstrated some DIF for gender and eight items showed DIF for age. To support the assumption of unidimensionality post hoc principal component analysis was performed. The analysis showed two subtests of the residuals with positive and negative loadings, but the person estimates derived from these two subtests were not statistically different to that derived from all items taken together. The response dependence between two items was identified; however, the magnitude of difficulty was very small. Although the questionnaire appeared to have insufficient items to assess the full spectrum of informal social contact, the SPQ was reasonably well targeted.

Conclusion

The SPQ is a promising questionnaire for the measurement of social participation although it could benefit from the inclusion of further items to measure informal social contact. This study found support for the internal validity, internal consistency reliability, and unidimensionality. A future study will investigate whether targeting can be improved when additional items are included.  相似文献   

19.

Purpose

It is important for clinical practice and research that measurement scales of well-being and quality of life exhibit only minimal differential item functioning (DIF). DIF occurs where different groups of people endorse items in a scale to different extents after being matched by the intended scale attribute. We investigate the equivalence or otherwise of common methods of assessing DIF.

Method

Three methods of measuring age- and sex-related DIF (ordinal logistic regression, Rasch analysis and Mantel χ2 procedure) were applied to Hospital Anxiety Depression Scale (HADS) data pertaining to a sample of 1,068 patients consulting primary care practitioners.

Results

Three items were flagged by all three approaches as having either age- or sex-related DIF with a consistent direction of effect; a further three items identified did not meet stricter criteria for important DIF using at least one method. When applying strict criteria for significant DIF, ordinal logistic regression was slightly less sensitive.

Conclusions

Ordinal logistic regression, Rasch analysis and contingency table methods yielded consistent results when identifying DIF in the HADS depression and HADS anxiety scales. Regardless of methods applied, investigators should use a combination of statistical significance, magnitude of the DIF effect and investigator judgement when interpreting the results.  相似文献   

20.
Valderas  J.M.  Alonso  J.  Prieto  L.  Espallargues  M.  Castells  X. 《Quality of life research》2004,13(1):35-44
BACKGROUND: In spite of a well-established development of instruments, difficulty in interpreting health related quality of life scores may limit its use in clinical practice. OBJECTIVE: To develop generalizable interpretation aids for a measure of perceived functional visual status, the VF-14 index. DESIGN: Item Response Theory (Rasch analysis) was used to analyze the performance of VF-14 items. The 'ruler' aid was derived from the most difficult activity (item) a patient is able to do without difficulty; the 'clinical scenarios' aid, first identified all significantly different clusters of items within the index and then estimated the mean expected difficulty (responses) to perform a benchmark item in each cluster. SETTING: The study was conducted in four hospitals and six ambulatory cataract surgery centers in Barcelona, Spain. PATIENTS: One hundred and ninety-eight patients scheduled for first eye cataracts surgery. MEASUREMENTS: The self-reported VF-14 index and clinical measures were used. RESULTS: All VF-14 items were found unidimensional with three items showing only partial misfit. For a patient with a VF-14 Rasch score of 71, the 'ruler' aid indicated that 'doing fine handwork' would be the most requiring activity he/she would perform without difficulty. The 'clinical scenarios' aid estimated that such a patient would be unable to 'drive at night', would have some difficulty 'reading small print' and no difficulty 'doing fine handwork', 'watching TV' or 'recognizing people'. Concordance between modeled and observed responses was fair to substantial. CONCLUSIONS: Simple content-based interpretation aids for the VF-14 scores were developed that should facilitate its use in clinical practice. These aids should be easily generalizable to other quality of life instruments.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号