U.S. flag

An official website of the United States government

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Ustekinumab (Stelara) [Internet]. Ottawa (ON): Canadian Agency for Drugs and Technologies in Health; 2017 Apr.

Cover of Ustekinumab (Stelara)

Ustekinumab (Stelara) [Internet].

Show details

APPENDIX 5VALIDITY OF OUTCOME MEASURES

Aim

To summarize the measurement properties (e.g., reliability, validity, minimally clinically important difference [MCID]) of the following outcome measures used in the studies included in this submission:

  • Crohn’s Disease Activity Index (CDAI)
  • Inflammatory Bowel Disease Questionnaire (IBDQ)
  • Short Form (36) Health Survey (SF-36)
  • Work Limitations Questionnaire (WLQ)

Findings

Crohn’s Disease Activity Index

The National Cooperative Crohn’s Disease Study Group developed the CDAI using prospective data gathered from 187 visits of 112 patients suffering from Crohn’s disease (CD).57 It is a disease-specific index and considered the standard for assessing CD activity. The CDAI consists of eight domains that are used to evaluate overall disease severity. The overall score is based on the sum of the weighted value of each item and ranges from 0 to 600; a score of 150 is defined as the threshold between remission and active disease. Scores ranging between 150 and 219 indicate mild-to-moderate CD and scores ranging between 220 and 450 indicate moderate-to-severe CD, whereas scores above 450 indicate very severe CD.58,59 Item scores are derived using patient diaries, which are based on the seven days preceding each visit. Generally, the CDAI is considered impractical for use in clinical practice, as it has no MCID clearly defined.59,60 Originally, changes of 50 points in the CDAI were associated with physician evaluation of “slightly better” and/or “slightly worse” compared with baseline.57,59,60 However, clinical trials have commonly used changes of 50, 60, 70, or 100 points in CDAI to define clinical response.59 More recently, the FDA and the European Medicines Agency (EMA) have suggested that a change of 100 points in CDAI is considered to be a more meaningful response (i.e., enhanced clinical response).59

Development of the CDAI

Gastroenterologists considered 18 parameters to inform the CDAI, including the following CD domains: subjective patient symptoms and need for symptomatic medications; objective clinical findings on physical examination; extra-intestinal manifestations of CD; complications of CD (e.g., fistulas); radiologic and endoscopic examinations; and laboratory parameters. A global assessment score was also assessed at each visit by the gastroenterologist based on the following scheme: “very well” = 1, “fair to good” = 3, “poor” = 5, “very poor” = 7.

Multiple regression and backwards stepwise deletions were utilized to assess the correlation between the 18 parameters and the physician global assessment score. Based on the results of the correlations, eight independent weighted (weighting ranges from one to 30) variables were included in the final CDAI formula.

Table 36Final Items Included in the CDAI and Their Weights

Item (Daily Sum Per Week)Weight
Number of liquid or very soft stools2
Abdominal pain score in one week (rating: 0 to 3)5
General well-being (rating: 0 to 4)7
Sum of findings per week:
  • Arthritis/arthralgia
  • Mucocutaneous lesions (egg, erythema nodosum aphthous ulcers)
  • Iritis/uveitis
  • Anal disease (fissure, fistula, etc.)
  • External fistula (enterocutaneous, vesicle, vaginal, etc.)
  • Fever > 37.83C
20
Antidiarrheal use (e.g., diphenoxylate hydrochloride)30
Abdominal mass (none = 0, equivocal = 2, present = 5)10
47 minus hematocrit (males) or 42 minus hematocrit (females)6
100 × (1 − [body weight divided by standard weight])1

Source: Best et al.57

Reliability of the CDAI

Reliability was not originally assessed during the development of the CDAI; however, the index did provide good to very good test–retest reliability, evaluated based on two successive visits for 32 patients.57,58 The CDAI was subsequently re-evaluated and re-derived using data collected from 1,058 patients. This demonstrated little difference from the original formulation; therefore, the original version was recommended.61

Validity of the CDAI

Content validity: The items included in the CDAI were selected by gastroenterologists and are based on accepted features of CD, therefore demonstrating content validity.58

Construct validity: The CDAI appears to be able to distinguish between differing levels of CD severity.

The CDAI appears to be widely used in clinical trials, and is a measure accepted by gastroenterologists as a primary end point to assess CD activity. In contrast, the CDAI does not appear to be reflective of CD activity for pediatric patients suffering from CD, nor does the instrument address all aspects of CD such as quality of life.58

Criterion validity: Selecting a gold standard measure for comparison is difficult when considering CD because of the heterogeneous nature of its manifestations. Generally, the CDAI does not demonstrate any significant correlation between the overall score and objective measurements such as mucosal healing; however, the lack of correlation may not be indicative of a lack of criterion validity, given the multi-faceted nature of CD.58 Predictive validity is another component of criterion validity. One study demonstrated that the CDAI scores increased two months preceding exacerbations of CD and decreased one month following exacerbations of CD, therefore demonstrating criterion validity.58

The CDAI score appears to vary depending on the observer,, even if the observers are evaluating the same case histories.62

Limitations of the CDAI

The overall CDAI score is derived from some subjective items such as “general well-being” and “intensity of abdominal pain” based on patient perception.

Inflammatory Bowel Disease Questionnaire

The IBDQ, developed by Guyatt et al.,32,33 is a physician-administered questionnaire to assess health-related quality of life in patients with inflammatory bowel disease (e.g., ulcerative colitis and Crohn’s disease).63 It is a 32-item Likert-based questionnaire, divided into four dimensions (i.e., bowel symptoms [10 items], systemic symptoms [5 items], emotional function [12 items], and social function [5 items]). Patients are asked to recall symptoms and quality of life from the last two weeks, with response graded on a seven-point Likert scale (1 being the worst situation, 7 being the best) with the total IBDQ score ranging between 32 and 224 (i.e., higher scores representing better quality of life). Scores of patients in remission typically range from 170 to 190.

This questionnaire has been validated in a variety of settings, countries, and languages.63 A review of nine validation studies on the IBDQ in patients with inflammatory bowel disease reported that the IBDQ was able to differentiate clinically important differences between patients with disease remission and those with disease relapse. In a randomized placebo-controlled trial of patients with ulcerative colitis, the IBDQ was found to be able discriminate changes in the social and emotional state of patients.64 The IBDQ has high test–retest reliability in all four dimensional scores (intraclass correlation coefficient = 0.96 for CD). Six studies evaluated IBDQ for sensitivity to change and all found that changes in health-related quality of life (HRQoL) were correlated with changes in clinical activity in patients with CD.63

A study conducted by Gregor et al.34 noted that a clinically meaningful improvement in quality of life would be an increase of 16 points or more in the IBDQ total score or 0.5 points or more per question in patients with CD.

Short Form (36) Health Survey (SF-36)

SF-36 is a generic health assessment questionnaire that has been used in clinical trials to study the impact of chronic disease on HRQoL. SF-36 consists of eight domains: physical functioning, role physical, bodily pain, general health, vitality, social functioning, role emotional, and mental health. SF-36 also provides two component summaries: the physical component summary (PCS) and the mental component summary (MCS), which are created by aggregating the eight domains. The SF-36 PCS, MCS, and eight domains are each measured on a scale of 0 to 100, with an increase in score indicating improvement in health status. In general use of SF-36, a change of two to four points in each domain or two to three points in each component summary indicates a clinically meaningful improvement, as determined by the patient.36

Validation of the survey indicates satisfactory reliability and discriminant ability for all SF-36 dimensions in patients with ulcerative colitis.65 As symptoms increased, HRQoL scores statistically significantly reduced. In a population-based cohort in which patients were studied for 10 years, SF-36 scores of patients with ulcerative colitis were found to be comparable to those of a general population sample when adjusted for age, gender, and education.65 The study indicated that the individual domains may present with ceiling effects in patients with less severe ulcerative colitis. Individual domain scores were also found to have less responsiveness in patients with mild ulcerative colitis, although it is unclear whether this can be generalized to the broader PCS and MCS scores.65

A study by Coteur et al.35 explored MCID estimates within the CD patient population using data from multinational, multi-centre, double-blind, placebo-controlled parallel-group clinical trials in which clinical remission of CD was assessed using the CDAI measure as the primary outcome. Secondary outcomes included the IBDQ and SF-36. All end points were measured at weeks 0, 6, 16, and 26 using standardized procedures. A total of six estimates of MCID — two analyses utilizing anchor-based methods and four analyses utilizing distribution-based methods — were evaluated for each SF-36 scale summary score to determine the most appropriate measure to use as the anchor. For the anchor-based estimates, a linear regression was performed using the two anchors, the CDAI and IBDQ. The MCID estimates for the SF-36 were then extracted from the regression equations considering a change of 16 points for the IBDQ total score or a score change of 50 points for the CDAI score as meaningful. For distribution-based estimates, measures rely on the statistical distributions of HRQoL data and include effect size (ES) measures (ES of 0.2 and 0.5 were used and suggested as small-to-moderate ESs), the standard error of measurement, and the standard error of the difference. Overall, the MCID ranged from 1.6 to 7.0 for the SF-36 PCS and from 2.3 to 8.7 for the MCS summary, depending on the approach. Because score changes in the SF-36 showed greater correlations with score changes in the IBDQ than with the CDAI, the IBDQ was selected as the “best anchor,” with corresponding MCID values of 4.1 (IBDQ) and 3.9 (CDAI). The values derived by the IBDQ anchor-based method were similar to the values obtained by the distribution-based methods and were representative of small-to-moderate ESs. However, because of a number of methodological issues with this analysis, the general SF-36 MCIDs were used in this review to assess clinical significance for the SF-36.

Work Limitations Questionnaire (WLQ)

The WLQ is a self-reported tool used to assess and measure the on-the-job impact of chronic conditions and diseases and the treatment associated with them.39 It was developed as a generic (non–disease-specific) instrument.3739 To develop the WLQ, focus groups were convened (for content identification), cognitive interviewing was performed (to enhance the reliability and validity of the questionnaire), and alternative forms were assessed (to assess the reliability of three different forms) in employed patients (greater than 20 hours/week) between the ages of 18 and 64 who had one of the following chronic conditions/diseases: asthma, Crohn’s disease, liver disease, psychiatric disorders, and epilepsy.39 Two studies were then performed, one study to assess the scale and recall, and the other study to assess scale reliability, construct validity, and relative validity.39 The WLQ consists of the aforementioned four domains with 25 items; three domains ultimately examine the proportion of time over the previous two weeks that the patient had difficulties in the following four domains: time management or scheduling demands (5 items), output demands (5 items), physical demands (6 items), and mental-interpersonal demands (9 items).3739,66 The physical demands domain has reverse instructions and assesses the proportion of time without difficulty.3739,66 Scale responses and their corresponding scores are the same for all four domains: all of the time = 1 (100%); most of the time = 2; some of the time = 3 (about 50%); a slight bit of the time = 5; none of the time = 5; or does not apply to my job = 0.38 The scores are converted from the computed mean of the non-missing responses to answers ranging from 0 (not limited) to 100 (limited all of the time); however, the response orientation is reversed for the physical demands domain.37,38

Although no studies identified the validity and reliability of the WLQ solely in patients with CD, these aspects have been assessed and verified in numerous indications, such as among cancer survivors40 and in patients with rheumatoid arthritis, osteoarthritis, and other musculoskeletal conditions.37,66,67 The WLQ has been deemed effective in assessing the responsiveness regarding work productivity changes in patients with either rheumatoid arthritis or osteoarthritis; however, Beaton et al.66 did observe a lower-than-expected correlation in productivity-oriented constructs in this patient population. In addition, in the study by Walker et al.67 the authors observed that, while the WLQ is reliable in assessing work productivity, it was not as strong as the Health Assessment Questionnaire and SF-36 in detecting functional limitations in patients with rheumatoid arthritis (in part because these patients tend to select jobs that they can perform).

Tamminga et al.40 observed that the minimal important change in improvement in their cohort of patients with cancer was 3.2 (based on the mean change method) and 4.0 (based on the receiver operating curve method); however, this was observed only at the group level and not at the individual level.

Conclusion

The CDAI, IBDQ, and SF-36 have all been assessed within the CD population, whereas the WLQ has not. Although a minimal clinically important change in the CDAI, IBDQ, and SF-36 instruments has not been defined, some regulatory agencies rely on a reduction of 100 points in the CDAI as meaningful change, while other studies suggest an MCID of 16, 4.1, and 3.9 for the IBDQ, SF-36 PCS, and SF-36 MCS, respectively. WLQ MCIDs are lacking for patients with CD; however, there are some MCIDs (Table 37) associated with studies in cancer survivors (Tamminga et al.40) — 3.2 using the mean change method and 4.0 using the receiver operating curve.

Table 37. Summary of Outcomes Measures.

Table 37

Summary of Outcomes Measures.

Copyright © CADTH 2017.

Except where otherwise noted, this work is distributed under the terms of a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International licence (CC BY-NC-ND), a copy of which is available at http://creativecommons.org/licenses/by-nc-nd/4.0/

Bookshelf ID: NBK476193

Views

  • PubReader
  • Print View
  • Cite this Page

Other titles in this collection

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...