首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 187 毫秒
1.
Establishing measurement equivalence is important because inaccurate assessment may lead to incorrect estimates of effects in research, and to suboptimal decisions at the individual, clinical level. Examination of differential item functioning (DIF) is a method for studying measurement equivalence. An item (i.e., one question in a longer scale) exhibits DIF if the item response differs across groups (e.g., gender, race), controlling for an estimate of the construct being measured. A distinction between applications in health, as contrasted with other settings such as educational and aptitude testing, is that there are many health-related constructs and multiple measures of each, few of which have received much critical evaluation. Discussed in this article are several methods for detection of differential item functioning (DIF), including non-parametric and parametric methods such as logistic regression, and those based on item response theory. Basic definitions and criteria for DIF detection are provided, as are steps in performing the analyses. Recommendations are presented and future directions discussed.  相似文献   

2.
Quantitative research depends on using measures to collect data that are valid (ie, reflect well the phenomena of interest) and perform equivalently across contexts. Demonstrating validity and cross-context equivalence requires specifically designed studies, but many such studies have problems that have limited their usefulness. This article explains validity and cross-context equivalence of measures (and important related concepts) and clarifies how to establish them. Validation is the process of determining whether a measure or indicator is suitable for providing useful analytical measurement for a given purpose and context. Cross-context equivalence means that a measure performs comparably across contexts. Four types of equivalence are construct, item, measurement, and scalar. Establishing validity and cross-context equivalence requires representing mathematically the errors (ie, imprecision, undependability, and inaccuracy) of a measure and using appropriate statistical methods to quantify these errors. Studies aiming to provide evidence about the validity of a measure need to clarify the purpose and context for use of that measure. Choose one of the two conceptual systems for validation; obtain data to establish the extent to which the measure is well constructed, reliable, and accurate; and use analytic methods beyond simple correlations to provide a basis for making reasoned judgment about whether the measure provides useful analytic measurement for the particular purpose(s) and context. Establishing accuracy of a measure requires having available other measures known to be accurate as comparators; in the case that no other measure understood to be more accurate is available, then the study will be able to establish agreement rather than validity.  相似文献   

3.
Focus on the translation and validation of measurement instruments has left a gap in the discussion on how to construct multilingual qualitative tools, such as interviews. Traditional methods of forward and backward translation have been criticized for weak conceptual equivalence, a crucial issue when multiple language interview methods are used. Through a creative arts metaphor of weaving, the authors describe an alternative process of multicentric translation used in the development of an interview guide designed to explore the impact of transition on palliative care patients in six European countries. Four identified core constructs illuminate this multicentric process: Cohesion, Congruence, Clarity, and Courtesy. Mutual reciprocity between researcher and translator offers greater possibility for construction of nuance and meaning, particularly where cultural parameters influence the collection and meaning of sensitive data from vulnerable populations. The translator therefore becomes a collaborator in the research process, which strengthens the rigor of language-based inquiry.  相似文献   

4.

Purpose

To develop and psychometrically evaluate the brief Public Health Surveillance Well-Being Scale (PHS-WB) that captures mental, physical, and social components of well-being.

Methods

Using data from 5,399 HealthStyles survey respondents, we conducted bi-factor, item response theory, and differential item functioning analyses to examine the psychometric properties of a pool of 34 well-being items. Based on the statistical results and content considerations, we developed a brief 10-item well-being scale and assessed its construct validity through comparisons of demographic subgroups and correlations with measures of related constructs.

Results

Based on the bi-factor analyses, the items grouped into both an overall factor and individual domain-specific factors. The PHS-WB scale demonstrated good internal consistency (alpha?=?0.87) and correlated highly with scores for the entire item pool (r?=?0.94). The well-being scale scores differed as expected across demographic groups and correlated with global and domain-specific measures of similar constructs, supporting its construct validity.

Conclusion

The 10-item PHS-WB scale demonstrates good psychometric properties, and its high correlation with the item pool suggests minimal loss of information with the use of fewer items. The brief PHS-WB allows for well-being assessment on national surveys or in other situations where a longer form may not be feasible.  相似文献   

5.
Accurate measurement requires assessment of measurement equivalence/invariance (ME/I) to demonstrate that the tests/measurements perform equally well and measure the same underlying constructs across groups and over time. Using structural equation modeling, the measurement properties (stability and responsiveness) of intervention measures used in a study of metabolic syndrome (MetS) treatment in primary care offices, were assessed. The primary study (N = 293; mean age = 59 years) had achieved 19% reversal of MetS overall; yet neither diet quality nor aerobic capacity were correlated with declines in cardiovascular disease risk. Factor analytic methods were used to develop measurement models and factorial invariance were tested across three time points (baseline, 3-month, 12-month), sex (male/female), and diabetes status for the Canadian Healthy Eating Index (2005 HEI-C) and several fitness measures combined (percentile VO2 max from submaximal exercise, treadmill speed, curl-ups, push-ups). The model fit for the original HEI-C was poor and could account for the lack of associations in the primary study. A reduced HEI-C and a 4-item fitness model demonstrated excellent model fit and measurement equivalence across time, sex, and diabetes status. Increased use of factor analytic methods increases measurement precision, controls error, and improves ability to link interventions to expected clinical outcomes.  相似文献   

6.
OBJECTIVE: To examine the relative impact of four service quality dimensions on outpatient satisfaction and to test the invariance of the structural relationships between the service quality dimensions and satisfaction across three patient groups of varying numbers of prior visits to the same hospital as outpatients. DATA SOURCES/STUDY SETTING: Survey of 557 outpatients using a self-administered questionnaire over a 10-day period at a general hospital in Sungnam, South Korea. DATA COLLECTION: Patients answered questions related to two main constructs, patient satisfaction and health care service quality. The health care service quality measures (30 items) were developed based on the results of three focus group interviews and the SERVQUAL scale, while satisfaction (3 items) was measured using a previously validated scale. STUDY DESIGN: Confirmatory factor analysis was used to assess the construct validity of the service quality scale by testing convergent and divergent validity. A structural equation model specifying the four service quality dimensions as exogenous variables and patient satisfaction as an endogenous variable was estimated to assess the relative impact of each of the service quality dimensions on satisfaction. This was followed by a multigroup LISREL analysis that tested the invariance of structural coefficients across three groups with different frequencies of outpatient visits to the hospital. PRINCIPAL FINDINGS: Findings support the causal relationship between service quality and satisfaction in the context of the South Korean health care environment. The four service quality dimensions showed varying patterns of impact on patient satisfaction across the three different outpatient groups. CONCLUSION: The hospital management needs to be aware of the relative importance of each of the service quality dimensions in satisfaction formation of outpatients, which varies across different hospital utilization groups, and use this in strategic considerations.  相似文献   

7.
High-quality care for diabetes is based on proper prevention, coordination of care among a multidisciplinary team of health care professionals, enhanced patient-provider relationships, and patient self-management skills. This paper discusses gender differences across racial and ethnic groups in the quality of care for type 2 diabetes according to 10 measures defined by the National Healthcare Quality Report and the National Healthcare Disparities Report. These measures include 5 process measures and one composite measure derived from the Medical Expenditure Panel Survey and 4 outcome measures derived from the Healthcare Cost and Utilization Project. National rates for 2 process measures--measurement of HbA1c (women 89.70% versus men 90.10%) and lipid profile (women 92.9% versus men 95.3%)--are high, but only 28.9% of women and 33.9% of men with diabetes received all 5 recommended process measures (HbA1c, lipid profile, eye exam, foot exam, and influenza immunization). Screening rates for retinal and foot exams and influenza immunization should be improved for all, but the need is particularly urgent for Hispanics and non-Hispanic blacks. Women and men have similar rates of hospital admissions for uncontrolled diabetes, but rates for lower extremity amputations were higher for men, particularly non-Hispanic blacks and Hispanics. Avoidable hospitalizations for diabetes decreased as income increased across racial/ethnic groups, but other factors (e.g., quality of primary care, age, relationship with providers, patients' self-management skills) may influence such rates. Moreover, any improvements in the diabetes outcomes measures may lag many years behind any measurable improvements in quality of care. Well-designed interventions that reallocate resources for diabetes self-care should be developed to ensure that gender differences are addressed across racial/ethnic groups. Because much of this care involves the management of risk factors, self-management education should be tailored to the lifestyles and beliefs specific to gender and racial/ethnic groups.  相似文献   

8.
Standardizing the measurement tools that researchers use to assess the effectiveness of interventions would strengthen our ability to compare results across studies. In practice, however, standardization is difficult to implement, in part, because researchers prefer to use measurement tools that focus specifically on the components of their interventions. This paper demonstrates the usefulness of item response modeling linking methodology in comparing groups of participants who were administered different scales intended to measure the same underlying constructs. The Treatment Self-Regulation Questionnaire (TSRQ) as it relates to diet improvement provided the empirical application to demonstrate how two different scales that measure the same construct can be compared. The results showed that two eight-item TSRQ scales can be linked if they have at least four items in common. As expected, varying the number of linking items did not affect the reliability of the results; however, it significantly affected the relative rating with respect to the 15-item scale. In health behavior and health education research, linking methodologies can be used to compare results across studies that use slightly different versions of a scale to measure the same construct.  相似文献   

9.
Measurement invariance is a prerequisite for comparing measurement scores from different groups. In medical education, multi-source feedback (MSF) is utilized to assess core competencies, including the professionalism. However, little attention has been paid to the measurement invariance of assessment instruments; that is, whether an instrument holds the same meaning across different rater groups. To examine the measurement invariance of the National Taiwan University professionalism MSF (NTU P-MSF) in order to determine whether medical students’ self-rating can be compared to their peers’ rating. An eight-factor model was specified for confirmatory factor analysis to examine the construct validity of the NTU P-MSF. Cronbach’s alpha was computed for the items of each domain to evaluate internal consistent reliability. The same eight-factor model was used for multi-group confirmatory factor analyses. Four hierarchical models were specified to test configural (i.e., identical factor–item relationship), metric (i.e., identical factor loadings), scalar (i.e., identical intercepts), and error variance across self-rating and peer rating groups. One hundred and twenty second-year medical students from weekly discussion groups conducted as part of a medical professionalism course agreed to use the NTU P-MSF to assess themselves or their discussion group peers. NTU P-MSF assessment scores were a good fit for the eight-factor model among self group and peer group. The Cronbach’s alpha coefficients of students’ NTU P-MSF scores and peers’ scores ranged from 0.76 to 0.89 and 0.84 to 0.91, respectively indicating that the NTU P-MSF scores also have good internal consistent reliability between both groups. In addition, same factor structure and similar factor loadings and intercepts of NTU P-MSF scores between both groups indicate that NTU P-MSF scores had configural, metric, and scalar invariance. Thus, students’ self-assessments and peer assessments can be compared in terms of the constructs of NTU P-MSF scores, change in NTU P-MSF scores, and its factor scores. This study demonstrates how to investigate the measurement invariance of a professionalism MSF and contributes to the discussion on self- and peer assessment in medical education.  相似文献   

10.
Measurement Issues in Health Disparities Research   总被引:3,自引:0,他引:3       下载免费PDF全文
Background. Racial and ethnic disparities in health and health care have been documented; the elimination of such disparities is currently part of a national agenda. In order to meet this national objective, it is necessary that measures identify accurately the true prevalence of the construct of interest across diverse groups. Measurement error might lead to biased results, e.g., estimates of prevalence, magnitude of risks, and differences in mean scores. Addressing measurement issues in the assessment of health status may contribute to a better understanding of health issues in cross-cultural research.
Objective. To provide a brief overview of issues regarding measurement in diverse populations.
Findings. Approaches used to assess the magnitude and nature of bias in measures when applied to diverse groups include qualitative analyses, classic psychometric studies, as well as more modern psychometric methods. These approaches should be applied sequentially, and/or iteratively during the development of measures.
Conclusions. Investigators performing comparative studies face the challenge of addressing measurement equivalence, crucial for obtaining accurate results in cross-cultural comparisons.  相似文献   

11.

Purpose

Patient-reported outcome (PRO) measures originally developed for paper administration are increasingly being administered electronically in clinical trials and other health research studies. Three published meta-analyses of measurement equivalence among paper and electronic modes aggregated findings across hundreds of PROs, but there has not been a similar meta-analysis that addresses a single PRO, partly because there are not enough published measurement equivalence studies using the same PRO. Because the SF-36(R) Health Survey (SF-36) is a widely used PRO, the aim of this study was to conduct a meta-analysis of measurement equivalence studies of this survey.

Methods

A literature search of several medical databases used search terms for variations of “SF-36” or “SF-12” and “equivalence” in the title or abstract of English language publications. The eight scale scores and two summary measures of the SF-36 and SF-12 were transformed to norm-based scores (NBS) using developer guidelines. A threshold of within ± 2 NBS points was set as the margin of equivalence. Comprehensive meta-analysis software was used.

Results

Twenty-five studies were included in the meta-analysis. Results indicated that mean differences across domains and summary scores ranged from 0.01 to 0.39 while estimates of agreement ranged from 0.76 to 0.91, all well within the equivalence threshold. Moderator analyses showed that time between administration, survey language, and type of electronic device did not influence equivalence.

Conclusions

The results of the meta-analysis support equivalence of paper-based and electronic versions of the SF-36 and SF-12 across a variety of disease populations, countries, and electronic modes.
  相似文献   

12.
A four-step, streamlined process to adapt a large battery of measures for a study of mother–child adjustment in Arab Muslim immigrants and the lessons learned are described. The streamlined process includes adapting content, translation, pilot testing, and extensive psychometric evaluation but omits in-depth qualitative inquiry to identify the full content domain of the constructs of interest and cognitive interviews to assess how respondents interpret items. Lessons learned suggest that the streamlined process is not sufficient for certain measures, particularly when there is little published information about how the measure performs with different groups, the measure requires substantial item revision to achieve content equivalence, and the measure is both challenging to translate and has little to no redundancy. When these conditions are present, condition-specific procedures need to be added to the streamlined process.  相似文献   

13.
The aim of this study was to contribute to our knowledge of the construct validity of the Griffiths Scales of Mental Development (Griffiths Scales) through an examination of the underlying dimensions tapped by the six subscales, using Common Factor Analysis. A sample of 430 South African children, from four ethnic groups (i. e. White, Mixed Race, Asian and Black) participated. The correlation coefficients obtained for the South African groups were compared to those Griffiths obtained in her work with the British standardization sample of the Griffiths Scales. The pattern of correlation for South African and British subjects was found to be similar. This suggests that the Scales are measuring a construct which is consistent across cultures and through time. A factor analysis was performed with the data for each South African ethnic group separately and the factor solutions were compared to determine whether the Griffiths Scales measure similar or different constructs for the groups. The results indicate that the Griffiths Scales tend to measure one factor, and including only common variables, the factor appears to be similar cross-culturally.  相似文献   

14.
The objective of this study was to develop an improved weight-related eating questionnaire (WREQ) that reflects recent advancements in the assessment and understanding of theory-based eating behaviors. A sequential process of measurement development was used to construct this brief but comprehensive questionnaire. By factor analysis and structural equation modeling, a 16-item, four-factor structure was found to best fit the data. This newly developed questionnaire measures two constructs of dietary restraint (routine and compensatory restraint), susceptibility to external cues (external eating), and emotional eating. The WREQ demonstrated good preliminary construct validation against similar psychometrics, BMI, and measures of fruit and vegetable and dietary fat intake. Further validation analyses, particularly against other energy intake measures, are justified. Successful development and preliminary validation of this updated weight-related eating behavior questionnaire provides support for the WREQ as an improvement over existing eating behavior measures and warrants its use in future research.  相似文献   

15.
There is a lack of Brazilian questionnaires to assess physical activity in children. The Physical Activity Checklist Interview (PACI) was originally developed for North American children and allows assessing physical activity during the previous day. The objectives of this study were: i) to describe procedures for choosing the PACI for cross-cultural adaptation and ii) to assess conceptual, item, and semantic equivalence of the Brazilian version to be used with 7-to-10-year-old children. PACI was identified from a systematic review of 18 questionnaires. The process of choosing the instrument involved discussions with researchers. The PACI allows assessing the construct and its dimensions. Some kinds of physical activity that are uncommon in the Brazilian population had to be eliminated. The following steps were taken to evaluate semantic equivalence: translation, retranslation, connotative and referential meaning assessment, and a pretest with 24 children aged 7 to 10 years. We present the PACI in its Brazilian adapted version, called Lista de Atividades Físicas (LAF).  相似文献   

16.
The aim of this study is to describe how hermeneutic photography and one application of hermeneutic photography in particular, namely the photo-instrument, can be used as a health care intervention that fosters meaning (re-)construction of mental illness experiences. Studies into the ways how patients construct meaning in illness narratives indicate that aesthetic expressions of experiences may play an important role in meaning making and sharing. The study is part of a larger research project devoted to understanding the photostories that result from groups of psychiatric patients using the photo-instrument. Within a focused ethnography approach we employed a qualitative design of a single case study. Text analysis of photostories was combined with observational data. Data were analyzed using hermeneutic theory. Participant observations were used for triangulation and complementarity. The interaction and collaboration between health care professionals and patients in the context of a photo group emerged as core concept that underlies the photo-instrument. The interaction triggered a reframing of meaning in the patient’s illness narrative that offered new perspectives on positive identity growth. The role of visualizing meaning in images was found to lend a dynamic power to the process and triggered a dialectic between real life circumstances and imagination played out in the context of situated action. The findings suggest that a positive reframing of meaning in illness narratives is facilitated by the photo-instrument.  相似文献   

17.
The purpose of this paper is to describe the testing of a new scale to assess the perceived attributes of a federal drug prevention policy. The 17-item scale was administered to 107 Safe and Drug Free Schools (SDFS) coordinators in 12 states as a part of a larger investigation examining the diffusion of a federal drug prevention policy. In developing this scale, the authors drew from theory, previously validated measures, expert review and pre-testing with SDFS coordinators. Factor analysis revealed three underlying constructs representing relative advantage/compatibility, complexity and observability. The constructs found were internally consistent with a Cronbach's alpha ranging from a high of 0.89 for relative advantage/compatibility to a low of 0.71 for observability. Each of these constructs was correlated with a district's adoption of the policy in predictable ways. The construct of relative advantage/compatibility appears to be especially useful in assessing policy adoption. This scale was developed to assess a specific innovation; however, we believe that it can be easily adapted to understand the adoption of other health education interventions.  相似文献   

18.
Dijkers MP 《Journal of allied health》2003,32(1):38-43; discussion 43-5
Psychometrics is the name commonly used for the principles and methods of developing valid and reliable measures of intelligence, attitudes, skills, and other characteristics. One focus of psychometrics is the homogeneity of the items selected to measure the (unidimensional) latent construct of interest. Clinical scientists often use operationalizations of constructs that incorporate multiple dimensions, which may be quantified using only a single indicator. The difference between the two approaches is significant enough that Feinstein proposed a new science, clinimetrics. Homogeneity of items is of limited importance in clinimetrics, and construct indicators may be "causal" rather than "effectual." In measuring environments of individuals, the clinimetric approach seems more appropriate than the psychometric one. An article by Mackenzie et al. (J Allied Health 2002; 31:222-228) is used to show how adhering to psychometric models may suggest analytical procedures that are misleading. Some principles of the clinimetric method are set forth.  相似文献   

19.
This study investigated whether two instruments devised for people with mental illness, the Satisfaction with Daily Occupations (SDO) instrument and the Manchester Short Assessment of Quality of Life (MANSA), showed appropriate psychometric properties in terms of internal consistency, convergent/divergent validity, and discriminant validity when used with other samples. The study group comprised two female samples, one with physical disability (scleroderma) and one reference sample without known illness. It was hypothesized that the associations from SDO would be low or moderate to both general life satisfaction and self-rated health. The results confirmed that the associations were equal in size in both samples, but still the relationship to general life satisfaction in the scleroderma sample was somewhat higher than expected. Regarding the MANSA quality of life, the hypotheses were that the quality of life-index would show high correlations with general life satisfaction and moderate with self-rated health, and these hypotheses were confirmed for the reference sample, indicating that quality of life as measured by the MANSA converged with general life satisfaction but mainly diverged from self-rated health. In the scleroderma sample, the association to health was higher than expected. Both instruments appeared to reflect constructs that were stable across the two investigated groups, and both measures could distinguish the disability group from the healthy group. The SDO obtained a good value on internal consistency in the sample with scleroderma but a somewhat low value in the reference group, while the quality of life aspect of the MANSA exhibited good internal consistency in both samples. The instruments showed promising properties, indicating that they could be used for the target groups. However, both measures need further testing of psychometric properties.  相似文献   

20.
This paper outlines some of the common problems encountered by researchers conducting ethnic or cultural comparisons. The problems are considered in relation to three linked questions that are considered with respect to a comparison of Irish-American and mainland Puerto Rican drinking behavior. With regard to the first question—whom to compare in such research— attention is drawn to the importance of selecting groups on conceptual grounds rather than on the basis of convenience or availability. The distinction between model- and meaning-driven choices is then highlighted. Problems associated with group designation, inclusion criteria, and confounding are also discussed in response to this first question. With respect to the second question—what to compare in such research—the discussion focuses on model-driven measures and generalization-driven measures and the issue of acculturation. The final question—how to insure measure comparability—is addressed with respect to measure equivalence, the problem of cross-cultural meaning and significance, and the use of backtranslation methods to insure linguistic equivalence.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号