This work was produced by Kendrick et al. under the terms of a commissioning contract issued by the Secretary of State for Health and Social Care. This is an Open Access publication distributed under the terms of the Creative Commons Attribution CC BY 4.0 licence, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. See: https://creativecommons.org/licenses/by/4.0/. For attribution the title, original author(s), the publication source – NIHR Journals Library, and the DOI of the publication must be cited.
NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.
Kendrick T, Dowrick C, Lewis G, et al. Patient-reported outcome measures for monitoring primary care patients with depression: the PROMDEP cluster RCT and economic evaluation. Southampton (UK): National Institute for Health and Care Research; 2024 Mar. (Health Technology Assessment, No. 28.17.)
Patient-reported outcome measures for monitoring primary care patients with depression: the PROMDEP cluster RCT and economic evaluation.
Show detailsIntroduction
The economic evaluation was a key component of the study and involved the following stages:
- Comparison of service use over the 6 months prior to baseline in the intervention and control arms to check for differences at baseline, using the questionnaire data.
- Measurement of service use over the 6-month trial period from baseline using data extracted from participating patients’ computerised general practice medical records after their 26-week follow-up assessments.
- Calculation of the costs of service use over the 6-month trial period, using the medical records data.
- Comparison of service use and costs in the intervention and control arms over the 6-month trial period, again using the medical records data.
Services recorded included those provided in the primary care setting (face-to-face GP and nurse consultations, GP and nurse telephone contacts, and GP and nurse e-mail or e-consult contacts), secondary care mental and physical health services (inpatient, outpatient, day patient, accident and emergency), community health services (e.g. health visitors, district nurses, counselling or psychological therapists) and social care services (e.g. social workers, housing workers). The questionnaire and medical records data extraction form were identical in structure and recorded whether or not patients had used specific services, how many contacts they had received and, where relevant, the average duration of service contact (i.e. across all contacts the individual made with each service). The names of medications were recorded along with the dose, frequency and duration of use.
The data extracted from patients’ computerised general practice medical records were used in the primary analysis. The patient questionnaire data collected at the 26-week follow-up will be compared with the data collected from the medical records to look at the differences between them, but this was not a prime objective of the study.
The unit costs of health service use were derived from Unit Costs of Health and Social Care45 for primary and community care; the British National Formulary48 for costs of drug treatments; and the national NHS reference cost schedules49 for secondary care costs.
Outcome measures
The outcomes were expressed as incremental cost per 1-point improvement in the BDI-II clinical outcome (cost-effectiveness analysis), and incremental cost per QALY gained (cost–utility analysis) using the EQ-5D-5L to calculate patient utilities.39
Analytical methods
A generalised linear mixed model was used to estimate the mean differences in costs and QALYs, adjusting for baseline characteristics including baseline BDI-II score, baseline anxiety, sociodemographic factors and practice as a random effect. Bootstrapping methods were employed to estimate the incremental costs per BDI-II score and per QALY gained, together with their associated 95% CIs. CEACs were produced based on 1000 bootstrapping samples with replacement.
Self-reported resource use prior to baseline by arm
Table 8 shows a comparison between the intervention and control arms of self-reported NHS and social services resource use in the 6 months leading up to the baseline assessment according to the modified CSRI resource use self-report questionnaire. Intervention arm patients reported more face-to-face contacts with GPs, but fewer telephone, online and e-mail GP contacts, than control arm patients. One control arm patient reported prior contact with a psychiatrist, compared to none in the intervention arm. Fewer hospital outpatient visits, but slightly more inpatient stays, were reported in the intervention arm than in the control arm. Three control arm patients reported receiving home care visits at baseline, compared with one patient in the intervention arm.
Recorded resource use over 6 months’ follow-up by arm
Table 9 shows a comparison of the resource use between the arms for 258 (85.4%) intervention arm and 201 (88.5%) control arm patients, based on data extracted from the patients’ general practice medical records by practice staff after the end of the 26-week follow-up.
The proportion of patients in the intervention arm receiving any medications was 6% higher than in the control arm, a difference in keeping with the findings for antidepressant use reported in Chapter 3. More intervention arm patients had face-to-face contacts with GPs but fewer had telephone contacts, similar to the pattern found in the baseline self-report data in Table 8.
A slightly greater proportion of control arm patients had recorded hospital outpatient visits, inpatient stays and other hospital services. Two patients in the control arm received home care visits during the 26 weeks, with a mean number of 33 visits each in the 6 months, compared with none in the intervention arm.
Recorded contacts with mental health and social services
Overall, 90 out of 258 intervention arm patients (34.6%) and 68 out of 201 control arm patients (33.8%) had contacts with mental health and social services recorded in their medical records during the 26-week follow-up period (including contacts with community mental health nurse, counsellor, other therapist, psychologist, psychiatrist and social worker). The difference between the arms was not statistically significant (adjusted odds ratio 1.37, 95% CI 0.71 to 2.63; p = 0.342).
Costs
Costing the intervention
Modelling the likely cost of using the PHQ-9 as a PROM in routine clinical practice included making assumptions about the extra time needed for GPs/NPs to administer the initial questionnaire themselves (rather than the researcher doing it) in the non-trial situation, as well as for administering the follow-up PHQ-9. In addition to the administration time, the cost would include the time spent going over the results of the PHQ-9 with the patient and discussing the possible implications of the score for the management of their depression.
We assumed that 10 minutes’ extra GP time would be needed in both the initial and the follow-up consultations to administer the questionnaire and go over the results with the patient, effectively making each of these a double appointment, as GP appointments currently usually last 10 minutes. This means that a total of 20 minutes’ extra GP time would be needed per patient in the intervention arm.
We estimated that 5 minutes’ extra GP time would be needed to administer the questionnaire in practice. This was a conservative estimate as Spitzer et al., the developers of the PHQ-9, observed that physicians took ≤ 3 minutes in 85% of cases.50 In addition, we assumed that 5 minutes’ extra time would be needed to go over each of the results with the patient, discussing individual symptoms as well as the overall score and the implications for treatment.
Other health economic appraisals of the use of the PHQ-9 have estimated that similar amounts of time would be needed. A study51 modelling the likely cost-effectiveness of PHQ screening and collaborative care for depression in New York City assumed that 3 minutes of physician time would be needed to go over the findings of the PHQ-9 questionnaire, in addition to 6 minutes of nurse time spent administering it. Another study52 modelling the cost–utility of screening for depression in primary care again assumed nurse time of 6 minutes and physician time of only 1 minute to view the PHQ-9 results, but it did not include physician time to go over the results with the patient.
A proportion of the (maximum) cost of training on the PHQ-9 of 2 hours of GP time was also included in the cost of the intervention, discounted over 100 patients, as we assumed that the training would last for at least the assessment of that number of patients before it might need to be refreshed. This gave a total cost of approximately £33 per patient whose depression was monitored with the PHQ-9.
Costs over 6 months’ follow-up by arm
Table 10 shows a comparison of the estimated costs of resources used between the intervention and control arms over the 6 months’ follow-up.
The costs of GP care (including face to face, telephone, online or e-mail, and video calls) were slightly higher in the intervention arm. Hospital costs (including outpatient, inpatient and other hospital services) were higher in the control arm. The mean cost per patient of home care was particularly high, although only two patients in the control arm used the home care service.
The total mean per-patient cost of resources used over 6 months was £1124 (SD £1371) in the intervention arm, compared with £1292 (SD £1214) in the control arm, a relatively small and non-significant unadjusted mean saving of £168 per patient.
Utility scores
The EQ-5D-5L was used to measure quality of life at the baseline assessment and at the 12- and 26-week follow-up points. The EQ-5D-5L is the measure NICE favours in determining cost-effectiveness when developing its clinical guidelines. The EQ-5D-5L has five dimensions, mobility, self-care, usual activities, pain/discomfort and anxiety/depression, each scored on five levels.
Health states are converted into a single summary index by applying weights to each level in each dimension derived from the valuation of EQ-5D-5L health states in adult general population samples.39 Crosswalk methods were applied to derive utility scores using the algorithm for the EQ-5D-5L.39
Table 11 and Figure 7 show the utility scores at the baseline assessment and at each follow-up point. Quality of life improved in both arms between baseline and 12-week follow-up. It then improved further in the intervention arm but went down slightly in the control arm. The difference between arms was not statistically significant at 12 weeks (estimated difference in utility score –0.002; p = 0.94). However, the difference was statistically significant at the 26-week follow-up, favouring the intervention (estimated difference 0.053, 95% CI 0.013 to 0.093; p = 0.01). The analysis was adjusted for baseline EQ-5D-5L, history of depression, baseline anxiety, sociodemographics and practice as a random effect.
Changes in the five dimensions of the EuroQol-5 Dimensions, five-level
Table 12 shows the changes in the proportions of patient responses for the five dimensions of the EQ-5D-5L from baseline to the 12- and 26-week follow-up assessments. Each dimension has five levels from 1 to 5, representing no problems, slight problems, moderate problems, severe problems and extreme problems.
Patients in the two arms were similar at baseline in mobility, self-care, and pain and discomfort, and remained so throughout the 26 weeks’ follow-up. More patients in the intervention arm were at levels 4 and 5 (severe or extreme problems) for anxiety and depression at baseline. At 26-week follow-up there were slightly more patients at level 1 (the lowest level, indicating no problems) for the usual activity dimension in the intervention arm (55.2% vs. 48.1% in the control arm). The biggest difference between the arms at 26 weeks, however, was in the proportions reporting no problems in the anxiety and depression dimension (22.6% in the intervention arm vs. 13.5% in the control arm). The improvement in the anxiety and depression dimension therefore seems to have contributed most to the overall greater improvement in the mean score for quality of life on the EQ-5D-5L in the intervention arm.
Quality-adjusted life-years
Quality-adjusted life-years were calculated using the area under the curve approach. The baseline utility score was added to the score at 12 weeks and this total was divided by 2, based on the assumption of a linear change over the 12-week period. This figure was then multiplied by 0.25, as only one-quarter of a QALY could be gained over the 12-week period. The QALY gain in the 12- to 26-week period was calculated in a similar way. Gains in QALYs over the entire 26-week follow-up period were calculated by adding these two QALY gains.
Table 11 shows that the mean QALY gain between baseline and the 26-week follow-up was 0.346 (SD 0.104) for the intervention arm and 0.344 (0.098) for the control arm. The difference was not statistically significant (estimated difference 0.008; p = 0.26). The analysis again was adjusted for baseline EQ-5D-5L, baseline anxiety, history of depression, sociodemographics and practice as a random effect.
Cost-effectiveness analysis
Data on the costs of services used were linked with the BDI-II scores to assess the possible cost-effectiveness of the intervention. In Chapter 3 we showed that the improvement in depressive symptoms at the 12-week follow-up in the intervention arm on the BDI-II (the primary outcome) was very similar to that in the control arm.
However, given that the costs of service resource use in the intervention arm were lower than the costs in the control arm for the same improvement in outcome, it was necessary to compute ICERs to assist decision-makers in assessing whether adding monitoring with the PHQ-9 to usual GP/NP care for depression represents value for money.
Given that the cost data are skewed, which is frequently the case and can cause a violation of the assumptions of standard significance tests, bootstrapped estimating (multiple resampling of pairs of values for patients within treatment arms) was carried out so that estimated mean costs could still be compared while imposing no prior assumptions regarding the data distribution.
Table 13 shows the costs in the two arms together with the BDI-II scores at 12 weeks and the incremental cost per point change in the BDI-II score. The mean costs were estimated using bootstrap methods with 1000 resamples, which is why they differ slightly from the raw data in Table 10.
The ICER of £129 is the adjusted mean saving per point improvement in the BDI-II in the intervention arm compared with usual care in the control arm (the negative value for the incremental change is in favour of the intervention). The 95% CI is again wide and includes zero.
Cost–utility analysis
Similarly, data on the costs of services used were linked with the values found for QALYs gained to assess the possible cost–utility of the intervention. Table 14 shows the costs in the two arms together with the QALYs gained over 26 weeks and the incremental cost per QALY gained. The mean costs were again estimated using bootstrap methods with 1000 resamples, and the QALYs were estimated based on imputed quality-of-life data.
The ICER of –£5216 is the mean saving per QALY gained in the intervention arm compared with usual care in the control arm. (Here, the positive value for the incremental change is in favour of the intervention.) Again, the 95% CI is very wide and includes zero (see Table 14).
Cost-effectiveness plane
The above calculation is based on the mean costs and differences in QALYs gained and therefore does not take into account uncertainty around these estimates. To address such uncertainty, a cost-effectiveness plane was produced to show the probability that the intervention arm would have higher or lower costs and better or worse outcomes than usual care in the control arm. Figure 8 shows the scatterplot of the comparison of intervention and control arms from the bootstrapped analyses using 1000 resamples of pairs of values, including the 95% confidence ellipse.
Cost-effectiveness acceptability curve
To further explore the uncertainty around the cost–utility estimate for the intervention, a CEAC was produced to model the likelihood of the intervention being cost-effective at varying values of societal willingness to pay placed on a QALY gained, compared with usual care in the control arm.
Figure 9 shows the cost-effectiveness curve for the intervention based on the QALY gains found over 6 months. The probability that the intervention would be cost-effective compared with usual care in the control arm was 77% and 72%, respectively, at the lower and higher thresholds of societal willingness to pay adopted by NICE, of £20,000 and £30,000 per QALY gained, used for judging the relative cost-effectiveness of interventions.
Sensitivity analysis
We assumed above that the total time taken to administer the PHQ-9 and discuss the results with the patient in routine practice outside the trial situation would be approximately 20 minutes (an extra 10 minutes during each of the first and second consultations). We also included the training cost of up to 2 hours of GP time, discounted over 100 patients. The influence of these assumptions on cost-effectiveness was tested through a sensitivity analysis assuming that only 5 minutes of extra time would be needed at each of the initial and follow-up consultations if the patient had already completed the questionnaire prior to each consultation, and assuming no extra cost attached to the intervention for the training, which in practice could take place during routine GP vocational training.
The ICER computed using this revised costing of the intervention was a mean saving of £6243 per QALY gain over 6 months (95% CI –£122,625 to £100,543), instead of £5216 (95% CI –£109,336 to £95,761) per QALY. The CEAC for the sensitivity analysis gave probabilities of the intervention being cost-effective at the £20,000 and £30,000 per QALY thresholds of 79% and 74%, respectively. So, the probability of cost-effectiveness was not particularly sensitive to the estimated time spent by the practitioner in going over the results of the PHQ-9 with the patient in the two consultations.
- Economic evaluation - Patient-reported outcome measures for monitoring primary c...Economic evaluation - Patient-reported outcome measures for monitoring primary care patients with depression: the PROMDEP cluster RCT and economic evaluation
- The FEVER observational study - Different temperature thresholds for antipyretic...The FEVER observational study - Different temperature thresholds for antipyretic intervention in critically ill children with fever due to infection: the FEVER feasibility RCT
Your browsing activity is empty.
Activity recording is turned off.
See more...