Cost-effectiveness analysis

Raashid Luqmani; Ellen Lee; Surjeet Singh; Mike Gillett; Wolfgang A Schmidt; Mike Bradburn; Bhaskar Dasgupta; Andreas P Diamantopoulos; Wulf Forrester-Barker; William Hamilton; Shauna Masters; Brendan McDonald; Eugene McNally; Colin Pease; Jennifer Piper; John Salmon; Allan Wailoo; Konrad Wolfe; Andrew Hutchings

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Luqmani R, Lee E, Singh S, et al. The Role of Ultrasound Compared to Biopsy of Temporal Arteries in the Diagnosis and Treatment of Giant Cell Arteritis (TABUL): a diagnostic accuracy and cost-effectiveness study. Southampton (UK): NIHR Journals Library; 2016 Nov. (Health Technology Assessment, No. 20.90.)

Cover of The Role of Ultrasound Compared to Biopsy of Temporal Arteries in the Diagnosis and Treatment of Giant Cell Arteritis (TABUL): a diagnostic accuracy and cost-effectiveness study

The Role of Ultrasound Compared to Biopsy of Temporal Arteries in the Diagnosis and Treatment of Giant Cell Arteritis (TABUL): a diagnostic accuracy and cost-effectiveness study.

Show details

Contents

< Prev Next >

Chapter 7Cost-effectiveness analysis

Introduction

The economic evaluation of the two tests needs to consider any differences in the diagnostic accuracy between them, as well as the costs and impact of the tests in terms of the development of GCA-related complications, treatments and related side effects.

The starting point for the modelling is the statistical output showing the sensitivity and specificity of the two individual tests and any diagnostic strategies which incorporate them (see Chapter 5). Sensitivity is the proportion of patients with true GCA who are detected by the test or strategy; the remaining proportion is made up of ‘false negatives’, that is, patients who test negative despite having GCA. Specificity is the proportion of patients without GCA who are classified as negative by the test or strategy; the remaining proportion is made up of ‘false positives’, that is, patients who test positive but who do not have GCA. A problem with false-negative and false-positive results is that patients falling into these categories may initially be managed in a different way, with potentially adverse consequences, compared with how they would have been managed had their true disease status been known earlier. The economic analysis estimates the relative cost-effectiveness of the alternative tests and strategies by quantifying and trading off the following.

The different costs of the tests or strategies.
The different proportions of false negatives and false positives.
The cost and health-related quality-of-life impact of a false negative, that is, when a patient remains undetected with GCA for up to around 2 months, with the attendant risk of developing complications such as vision loss.
The cost and health-related quality-of-life impact of a false positive, that is, initiating or continuing treatment with high-dose steroids in a patient without GCA for many months and the impact that any unnecessary treatment has on the risk of AEs such as fractures, diabetes mellitus and weight gain.

The primary objective of the economic analysis is to estimate the cost-effectiveness of ultrasound instead of biopsy for the diagnosis of GCA. The secondary objective is to estimate the cost-effectiveness of performing a biopsy following ultrasound as an alternative to TAB alone in the diagnosis of GCA. In addition, alternative diagnostic strategies have been evaluated using estimates of sensitivity and specificity from statistical modelling (as described in Chapter 5).

Biopsy and ultrasound are also evaluated when used in conjunction with clinical judgement, that is, the clinician’s decision on the diagnosis at 2 weeks based on knowledge of the patient’s symptoms, signs and available test results such as blood tests and the biopsy. This more closely reflects current clinical practice of using biopsy results to aid the clinical diagnosis rather than to define the diagnosis.

Methods

In this section, the model structure is described, followed by details of the evidence sources used for the various parameter values in the model. These include the performance of the diagnostic testing strategies; risks of GCA-related complications and glucocorticoid-related AEs; and associated costs and health-related quality-of-life effects. The costs of the tests and medications are also covered.

The development of the economic model structure was informed by evidence from published research on GCA in order to understand the main complications of the disease and steroid-related side effects. This was supplemented with evidence from previous economic and decision-analytic studies of GCA⁷⁸^,⁷⁹ and an analysis of outcomes and cost-effectiveness of a fast-track service for GCA.⁸⁰

Model structure

The model structure takes the form of a combination of three submodels: first, a decision tree for the initial diagnostic testing; second, a risk submodel of the incidence of GCA-related complications and steroid-related AEs over 2 or 3 years; and third, a submodel of the lifetime effects of these incident complications and AEs. The model structure is shown in Figure 17.

FIGURE 17

Economic evaluation model structure.

Approach to obtaining values for parameters used in the economic model

We carried out a search for review articles in GCA and key evidence sources such as guidelines on managing GCA, prescribing steroids and steroid-related complications. We also consulted the National Institute for Health and Care Excellence (NICE) Clinical Knowledge Summary⁸¹ for GCA. It became clear from an initial assessment of these sources that there was limited evidence on rates of complications in GCA. Visual complications were most commonly reported but there was much heterogeneity of reported outcomes and results were rarely for time periods relevant to our analysis. Furthermore, given the relatively similar test performance of biopsy and ultrasound (especially when used in conjunction with clinical judgement) and the low incidence of major comorbidity, it seemed likely that complication rates would not be a major driver of cost-effectiveness.

We used an iterative approach to the cost-effectiveness modelling. Further review of this evidence was not required once it became apparent that the results were unlikely to be sensitive to model parameters relating to complications of GCA and steroids and that the cost difference between biopsy and ultrasound was the major driver of the cost-effectiveness. Instead, our modelling focused on two aspects of test performance that would be more important than had been previously realised: the need to focus on the implications of using the test results in conjunction with clinical judgement and uncertainty around the reference diagnosis for GCA.

The main sources of evidence for the model are summarised in Table 57.

TABLE 57

Main sources of evidence for the model

The specific evidence sources are provided in the detailed sections that follow.

Test accuracy was derived from an analysis of data collected in the TABUL study. For other parameters, such as the risk of complications from GCA (which were relatively infrequent in TABUL), evidence was obtained from alternative sources. The precise sources of data are described in greater detail in the following sections.

Performance of diagnostic strategies

The economic analysis considered three types of diagnostic strategy, as summarised in Table 58. One type relies on the use of test results alone for the diagnosis of GCA. Such strategies may be as simple as testing biopsy positive or biopsy negative, or they may involve combinations or components of tests. The second type of strategy involves the combination of test results with clinical judgement (the clinician’s assessment based on the patient’s characteristics and available test results) after the clinician has assessed the patient at the 2-week visit. The third type, sequential diagnostic strategies, involves applying test results in combination with characteristics of patients.

TABLE 58

Types of diagnostic strategy

The sequential diagnostic strategies include those based around the three categories of pre-test risk defined in Chapter 2 and reported in Chapter 5. The high-risk group comprised patients with tongue or jaw claudication and a high ESR or CRP level at presentation or before starting steroids. A high ESR level was defined as at least 60 mm/hour. A high CRP level was defined as at least 40 mg/l. The low-risk group comprised patients with no evidence of claudication and no evidence of a high ESR or CRP level at presentation or before starting steroids. The medium-risk group comprised the remaining patients.

Central to the cost-effectiveness of the alternative test strategies are the impacts of missing some true cases of GCA (the ‘false negatives’) and incorrectly categorising some patients without GCA as having the disease (the ‘false positives’) and, therefore, receiving unnecessary treatment. These are measured by the sensitivity and specificity of the test strategies. A strategy with high sensitivity will have few false-negative cases and a strategy with high specificity will have few false-positive cases, but, invariably, the threshold chosen will act positively on one at the expense of the other.

The performance (sensitivity and specificity) of the different test strategies was, in most cases, obtained from the data analysed from the TABUL study and reported in Chapter 5. The data used to determine a strategy indicated a positive or negative diagnosis of GCA each patient was obtained from the test results for biopsy and ultrasound, the clinical data collected at the baseline and 2-week assessments, and the clinician’s assessment of the GCA diagnosis at the 2-week assessment. The performance of the different test strategies was evaluated against the reference diagnosis, as reported in Chapter 4. The only exception was for test strategies involving a combination involving ultrasound and clinical judgement.

The sensitivity and specificity of the set of diagnostic strategies within the economic evaluation are shown in Table 59. We included strategies specified in the protocol objectives and additional ones with the best performance from those analysed within Chapter 5.

TABLE 59

Sensitivity and specificity of alternative diagnostic strategies

Performance of ultrasound plus clinical judgement strategy

For this strategy, an additional source of diagnostic data was required because the design of the TABUL study blinded clinicians to the ultrasound result. Therefore, we were unable to determine their opinion of the diagnosis based on the ultrasound together with clinical judgement. In the study, all patients had both ultrasound and biopsy tests but only the biopsy test result was given to the clinician managing the patient. Decisions about continuing treatment and the clinician’s diagnosis were therefore based on the biopsy result and a clinical assessment of the patient after 2 weeks. The ultrasound result was made available only if the clinician intended to rapidly withdraw steroids at 2 weeks based on a negative biopsy and his or her clinical assessment of the patient. The clinician could then change their treatment decision, that is, continue with steroid treatment, and alter their diagnosis after seeing the ultrasound result. TABUL data are therefore available on the treatment decisions made after 2 weeks only on the basis of the biopsy results; it is not known what treatment decisions would have been made if the ultrasound test result, but not the biopsy result, were provided to the clinician. Some assumptions are therefore required about what diagnoses and decisions about treatment would have been made. For the purposes of the economic analysis the focus is on the treatment decision to continue or withdraw treatment with high-dose steroids because it is this decision that has implications for the risk of developing GCA complications or steroid-related AEs.

An algorithm was devised that would allow an implied treatment decision to be arrived at by considering how the availability of the ultrasound rather than the biopsy would have influenced clinicians’ decision-making. To do this, it is necessary to consider this separately according to what the biopsy and ultrasound test results were; in other words, there are four possible combinations of biopsy and ultrasound test results (both positive, both negative, only biopsy positive and only ultrasound positive).

A summary of the reasoning and inferred steroid treatment decision for each of the four combinations is shown in Table 60.

TABLE 60

Inferred outcomes for the ultrasound plus judgement strategy according to biopsy result and ultrasound result

For the case in which the biopsy is negative and the ultrasound is positive, two scenarios are described. For scenario 2 (cases for which the ultrasound result was unblinded), for consistency with the TABUL study, we allowed the ultrasound result to be over-ruled by clinical judgement, which was the case for five patients.

For the final combination, a positive biopsy and a negative ultrasound, it is not possible to infer what the treatment decision would be; therefore, in the case of these 27 patients, an alternative approach based on clinical vignettes was used to elicit the treatment decisions that would have been made.

All of these 27 patients were included in the clinical vignette exercise as part of the original random sample (as reported in Chapter 6) or in an additional sample for the economic analysis. The panel members rating the vignettes reported their assessment of the diagnosis (definite, probable, possible or not GCA) and the appropriateness of continuing treatment with high-dose steroids (on a scale from 1 = extremely inappropriate to 9 = extremely appropriate) using data collected at presentation and at 2 weeks plus the result of the ultrasound. The economic analysis used the available results from the clinical vignettes, from the first 12 clinicians who completed the exercise.

To dichotomise the continuation of high-dose steroids ratings into a yes/no outcome, a score of 5 or higher was used to indicate a decision to continue treatment. Scores of 4 or lower would indicate a decision not to continue. This threshold resulted in 63% of vignettes being categorised as ‘possible GCA’ by panel members falling into the ‘continue treatment’ group. Alternative thresholds of 3, 4 or 6 would have resulted in 100%, 87% and 18% of ‘possible GCA’ vignettes being categorised as ‘continue treatment’, respectively.

A simulation was then run to model the diagnosis and treatment decisions if treatment decision had been made by a single clinician for each patient, as was the case in the TABUL study. Decisions were randomly sampled using the ratings from all 12 clinicians on the panel. The simulation was repeated for each vignette 100 times in order to give equal weight to the ratings from all clinicians. By comparing the sampled results with the reference standard, the expected (average) numbers of true positives and false negatives were obtained; there were no false positives or true negatives because all 27 were biopsy positive.

Application of the simulated results from the vignettes to the test strategy that combined ultrasound with clinical judgement produced a sensitivity of 89.1% and a specificity of 76.6% (see Table 59). These figures were slightly lower than the equivalent figures for the strategy involving biopsy and clinical judgement.

Risks of complications of giant cell arteritis

Visual complications

Visual complications represent the greatest burden of complications of GCA, with about 25% of cases resulting in sight loss if left untreated.⁸⁵ The major presenting symptoms are amaurosis fugax (a transient shade, dimming, fogging, blurring or monocular blindness), transient diplopia (double vision) or unilateral or bilateral partial or complete vision loss.

For the economic model, we needed to identify the risk of onset of visual complications after patients had presented to their GP, because an estimated 92% of visual complications arise prior to the initiation of high-dose steroid treatment and therefore would not be affected by the diagnostic strategies considered in TABUL. To do this, we created a submodel of visual complications, combining and modelling data from various sources, as shown in Figure 18.

FIGURE 18

Logic modelling of evidence to obtain incidence rates of new onset of visual complications. a, relative proportion of TPs : FNs assumed to be 90 : 10 after initial diagnostic test; so, of the 1000 GCA cases, 900 would be (more...)

Blindness in both eyes is rare in GCA⁸⁶ because steroid treatment is usually started when sight loss occurs in one eye and should reduce the risk of sight loss occurring in the other eye. It is therefore assumed that there will be no cases of bilateral sight loss and that steroids will have been started in all cases of unilateral sight loss. The stages during the diagnostic and treatment pathway during which visual loss arises are illustrated in Figure 17, based on 30% of patients experiencing visual complications,⁸⁷ 15% experiencing permanent visual loss⁸⁷ and 92% of visual complications arising before treatment as initiated.⁸⁸ Of this 92%, one-fifth of complications are assumed to be attributable to an initial false-negative diagnosis. Eight per cent are estimated to arise in true positives after steroid treatment has started. The required estimates of incidence rates of new visual loss among true positives and false negatives are shown by the solid arrows.

In order to assign costs of treatment and the quality-of-life impact of visual complications, we required an assessment of severity, based on a previously reported analysis⁸⁹ (Table 61).

TABLE 61

Analysis of the severity of visual loss by initial visual acuity in one eye

Although visual acuity is the primary criterion for determining vision loss, other types of vision loss (e.g. peripheral vision loss or contrast sensitivity loss) are recognised as disabilities even if central visual acuity is 20/20. Partial sight loss in the centre of vision is different to partial sight loss in the periphery, but we have no information on the nature of GCA-induced visual loss.

Stroke

For the incidence of GCA-related stroke, the models assume that 2.64% of cases of GCA result in a stroke, as per Amiri et al.,⁹⁰ and further assume that strokes arise after presentation to the patient’s GP. It is also assumed that stroke occurring as a result of GCA has the same severity and likelihood of fatality as stroke unrelated to GCA. Sixty per cent of strokes were assumed to be minor; case fatality in major strokes was assumed to be 50%.

Mortality from giant cell arteritis

There have been numerous studies reporting an increased risk of mortality in the years following a diagnosis of GCA. However, we decided that it was not necessary to include this in the model because there is no evidence to suggest that a delay in the diagnosis of several weeks (as a result of an initial false-negative test result) has an impact on this mortality risk. Hence, it is unlikely to have any impact on the relative cost-effectiveness of different test strategies.

Use of steroids and risk of complications

Oral corticosteroids have potent systemic effects, including numerous side effects. Evidence on complications arising from treatment with steroids is based on studies relating to oral corticosteroids; almost all patients in TABUL were treated with oral high-dose glucocorticoid therapy. The dose schedule for individuals with GCA is shown in Table 62. The second column describes the typical dose schedule for a true positive, that is, a patient with GCA with ongoing treatment. The third column describes a shorter duration of therapy for false-negative cases; this was adopted on the basis that steroid doses are likely to be tapered more quickly in the absence of ongoing features of the disease. The data from the TABUL study placed some doubt on this assumption; therefore, we performed a sensitivity analysis to include a dose schedule for false positives that was the same as that for true positives.

TABLE 62

High-dose oral glucocorticoid regimen typically used for treating GCA, with tapering over time

The list of all possible side-effects of steroids is long, but they vary in severity and burden to the patient and the NHS. Even treatment with low-dose steroids is associated with weight gain, hyperglycaemia, diabetes mellitus, increased blood pressure and hypertension, decreased bone mineral density with increased risk of fracture, cognitive dysfunction, increased risk of infection and cataracts.⁹¹ The economic analysis focused on those AEs that were reported to have a high-cost impact or a detrimental effect on quality of life and that were clearly attributable to the use of steroids (as opposed to possibly arising, at least in part, as a result of having GCA). The AEs included in the model were fractures, diabetes mellitus and hyperglycaemia, symptomatic steroid myopathy and steroid psychosis. Hypertension was not included because data from the TABUL study showed little change in the use of antihypertensive medication. As rates of AEs in TABUL were only for a 6-month period, we sought evidence from other studies for the rates to be used in the economic model.

Fractures

The model includes vertebral body compression fractures, fractures of the hip/femoral neck, wrist/forearm and proximal humerus (shoulder). The approach to modelling incidence of fractures in a GCA cohort is to start with risks in the general population, then to apply uplift (hazard ratio) for the impact of steroid treatment, and then to apply a relative risk for the effect of bone-protection therapy (Table 63).

TABLE 63

Fracture risks per annum in the general population

The model used the fracture risks per annum shown in Table 63. These are specific to the 70–74 years age group of the general population,⁹² the average age in TABUL being 71 years, and are prior to adjustment for the effect of steroids.

We also obtained the hazard ratios for the increased risks because of the use of steroids with a dose exceeding 7.5 mg daily from the same source.⁹² These are 5.2 for vertebral fracture, 2.35 for hip fracture and 1.79 for osteoporotic fracture, which we used for fractures of the wrist/forearm and humerus. Although uncertain, the evidence and clinical opinion suggest that the excess risk of fractures disappears within 1 year of stopping steroid therapy.

Prevention of fractures

We assumed that all patients treated with high-dose steroids were classed as being at high risk of fractures and so received bone protection therapy. There are various therapies available but, for simplicity, we assumed that treatment was with a combination of a bisphosphonate, vitamin D and calcium, the standard dose and costs⁸⁴ for which are shown in Table 64. We assumed that the relative risks for fracture following bone-protection therapy were 0.57 for vertebral fractures and 0.61 for fractures of the hip, forearm or humerus.⁹²

TABLE 64

Bone protection therapy

Diabetes mellitus and hyperglycaemia

In Niederkohr and Levin⁷⁸ the combined overall incidence of hyperglycaemia and diabetes mellitus was 4.8%, the majority of which was likely to be hyperglycaemia below the threshold for diabetes mellitus. Duru et al.⁹³ reported the incidence of diabetes mellitus alone to be in the range 0–3%. For the model, we used 1.5% as an estimate of the incidence of GCA-related diabetes mellitus. It was assumed that 80% of these cases might be reversible (i.e temporary hyperglycaemia). It was assumed that episodes of temporarily raised glucose would not be given a permanent label of diabetes mellitus (such a label would result in a significant burden to the individual and resource use). For the remaining 20% of patients, in whom it was assumed the incident diabetes mellitus was permanent, a proportion of these were likely to have had non-diabetic hyperglycaemia before starting steroid treatment for suspected GCA. The impact of starting steroids meant that the diagnosis of diabetes mellitus may have emerged earlier than it would otherwise have done, that is, these patients would have eventually developed diabetes mellitus at some point in the future regardless of their steroid therapy. Although it is therefore difficult to attribute a proportion of the burden of such accelerated diagnoses to the use of steroids, we judged that it would be reasonable to assume that the costs of managing diabetes mellitus would be incurred 5 years earlier than they would otherwise have been without steroid treatment, that is, the impact of steroids accelerates the occurrence of diabetes mellitus by 5 years.

Other adverse events

We assumed that the annual incidence of symptomatic steroid myopathy was 3.4% and the annual incidence of steroid psychosis was 7.6% based on a GCA study by Niederkohr and Levin.⁷⁸ For the many other common and mild AEs, for example moon face (round, puffy-shaped swollen face), there is likely to be a very small cost burden to the NHS. However, collectively there is a significant impact on quality of life; therefore, an overall adjustment to quality of life was applied (see Chapter 7, Health utilities). For the impact of diabetes mellitus on utility, based on Brown et al.,⁹⁴ we assumed a multiplier of 0.88, which leads to a decrement in quality of life because of diabetes mellitus of 0.09. This is assumed to persist indefinitely because diabetes mellitus is a progressive condition and individuals with a longer duration of diagnosed diabetes mellitus can be expected to have a greater prevalence of complications and associated loss of quality of life.

Unit costs of tests, medications and treatments

The evidence sources for the unit costs are described below. All costs are then adjusted for inflation to bring them to 2014/15 levels.

Biopsy and ultrasound

Biopsy is estimated to cost £493 based on NHS Reference Costs for 2011/12⁸³ (for lymph node biopsy/salivary gland biopsy). This is assumed to include theatre cost, surgeon time, pathologist time, sample processing, camera, microscope and other pathology equipment and administration cost. It has been pointed out that some ‘biopsy costs’ shown in NHS Reference Costs⁸³ may be understated, as they include relatively minor procedures such as the removal of warts. However, we used a specific procedure code, lymph node biopsy/salivary gland biopsy, which we expect to be robust in this case.

In the TABUL study, the typical time taken to perform ultrasound of both temporal and axillary arteries was 30 minutes, although there was considerable variation (scans took between 20 and 60 minutes, depending on the experience of the sonographer and the extent of the abnormalities to be defined). The cost of a ‘direct access’ (as opposed to outpatient) ultrasound scan taking 20 minutes or more is £57 based on the NHS Reference Costs for 2013/14.⁹⁵ This is assumed to include equipment cost, equipment maintenance and calibration, sonographer time, radiology space/room cost, radiologist interpretation cost, administration cost and a contribution for hospital overheads. Training costs for a hospital to set up a new GCA sonography service are classed as ‘implementation costs’ so, in line with NICE convention, they are excluded from the cost-effectiveness analysis. With uplifts for inflation, the costs for biopsy and ultrasound are £514 and £58, respectively.

Giant cell arteritis-related complications

The costs of vision loss shown in Table 65 are applied to the visual acuity states in the model that are worse than 6/60 m (20/200 feet), that is, those meeting the legal definition of blindness, in line with the ranibizumab and pegaptanib sodium HTA assessment.⁹⁶

TABLE 65

Costs of vision loss below best corrected visual acuity of 6/60 in the better-seeing eye

The costs of registration of blindness, provision of low-vision aids and low-vision rehabilitation are one-off rather than recurrent costs. Community care costs were estimated as the annual cost for a local authority home care worker and residential care costs were based on the annual cost of private residential care (taking into account that approximately 30% of residents pay themselves). Using the estimated annual costs in Table 65 gives a cost of £5090 for the first year of blindness and £4903 for each subsequent year.

The 5-year cost of a non-fatal stroke was estimated to be £29,400 in a NICE report.⁹⁷

Steroid-related adverse events

The unit costs of AEs were obtained from published studies⁹⁸^–¹⁰¹ and are shown in Table 66.

TABLE 66

Unit costs of steroid-related AEs

Inflation

All unit costs were inflated to 2014/15 values using the Hospital and Community Health Services index.¹⁰²

Health utilities

Utilities are valuations of health-related quality-of-life on a scale from 0 to 1, with 0 being equivalent to dead and 1 being equivalent to perfect health. A loss of quality-of-life attributable to a complication such as vision loss or an AE such as a fracture is called a utility decrement. The utility decrements used in the model are shown in Table 67. The baseline utility for someone of 71 years of age is 0.716, based on an age-related annual decrease in utility of 0.004.¹⁰⁷

TABLE 67

Utility decrement values for complications and AEs

For visual loss, we obtained the required utility decrement by combining data on substates of visual loss. The quality of life of various visual states was studied in Brown et al.,⁹⁴ showing a wide range of utilities associated with different levels of vision within the range of legal blindness (visual acuity < 20/200). We used the reported time trade-off (TTO) values rather than standard gamble (Table 68), as these are consistent with the EQ-5D quality-of-life instrument preferred by NICE. Multiplying these TTO values by the proportional occurrence of visual loss by severity in Table 61 gives a weighted value of 0.524 (on a scale of 0 to 1). As the TTO values are on a scale of 0 to 1, this was used as a multiplier to the age-specific utility in the model, giving an overall utility value for vision loss of 0.375, which represents a decrement of 0.34 compared with the baseline utility of 0.716.

TABLE 68

Utility values of alternative visual states

Utilities associated with vision loss tend to be higher after the first year, which we speculate is because of a degree of adjustment made to the condition.

Model time horizon

Cost-effectiveness analyses need to capture all significant costs and utility effects that are relevant to the intervention and condition of interest. As steroid treatment causes fractures and diabetes mellitus in a small minority of patients, and because these have lifetime cost and/or quality-of-life impacts, it is necessary for the model to take a long-term perspective. The model horizon is, therefore, 40 years, which is effectively a lifetime perspective for a cohort with a baseline age of 71 years.

Mortality

As the model has a lifetime perspective, it is necessary to include both the mortality rate for the general population and any excess mortality arising from GCA or steroid-related side effects. General population mortality rates were obtained from standard Office for National Statistics tables.¹⁰⁸ Stroke mortality is modelled explicitly. For fractures, excess mortality was applied when vertebral or hip fractures occurred, leading to an absolute estimated 1-year mortality of 4.4% and 6.0%, respectively (estimates for patients aged 71 years, rates derived from van Staa et al.¹⁰⁹).

Discount rates and perspective

Discount rates of 3.5% per annum are applied for both costs and health benefits as measured in quality-adjusted life-years (QALYs) in line with NICE guidance.¹¹⁰ Discounting is undertaken to ensure that both the overall costs and overall benefits are reported in comparable terms, in their present value. A sensitivity analysis is undertaken with alternative rates of 0% for benefits (QALYs), as long-term benefits are heavily discounted when a rate of 3.5% is applied. In line with NICE guidance, the model takes a health and social care perspective. Wider societal impacts, such as time off work and private care home costs, are excluded (except for specific sensitivity analyses).

Sensitivity analysis

The values described so far for the main analysis are referred to as the ‘base-case’ values. However, model parameters have some uncertainty around their ‘true’ value, either because of sample sizes (as evidenced by reported 95% CIs) or because there are multiple heterogeneous studies from which is it difficult to obtain an unequivocal single ‘best estimate’. It is therefore standard practice to carry out sensitivity analyses. Here the term ‘sensitivity’ refers to how much the economic outcomes change according to changes in model parameters from their ‘base-case’ values.

Sensitivity analyses were undertaken for the following strategies:

biopsy alone
ultrasound alone
biopsy in combination with clinical judgement (current routine care)
ultrasound in combination with clinical judgement.

Uncertainty around the various parameters works both ways, so, for example, if the base-case estimate of the cost of ultrasound is £57, we could test out what happens if the cost were 20% higher or 20% lower. Given that initial analyses indicated that ultrasound is likely to be more cost-effective than biopsy, values for the sensitivity analyses have been chosen, as shown in Table 69, in the direction which is likely to make the cost-effectiveness of ultrasound and biopsy closer than in the base case.

TABLE 69

Sensitivity analyses to be undertaken

Alternative reference diagnosis of giant cell arteritis

For GCA there is currently a lack of a universally accepted reference or gold standard definition for the diagnosis of GCA. As a result, the performance (sensitivity and specificity) of each test or composite screening strategy is inevitably influenced by the choice of reference standard. In TABUL, clinical judgement played a major part in the reference standard, as well as the biopsy and ultrasound results. However, there are alternative, more narrowly defined, reference standards that could be used for the purpose of sensitivity analysis, such as the ACR criteria or combinations of the tests and ACR criteria/risk factors. The concern is that if we vary the reference standard diagnosis, this will influence the relative cost-effectiveness of the potential screening strategies.

We have therefore tested the impact of three alternative reference standards, which are defined such that there are fewer ‘true’ GCA cases (Table 70). This is an exploratory analysis and the alternative reference standards are merely to explore whether or not fewer true GCA cases might alter the base-case conclusions and, having not been comprehensively evaluated, do not purport to have applicability to clinical practice.

TABLE 70

Definitions of the alternative reference standards

The outcomes of the following subset of strategies were compared against the alternative reference standard diagnoses:

biopsy alone: as per protocol
ultrasound alone: as per protocol
a composite strategy (H0M5L7) in which high-risk cases are treated as GCA and others are treated as GCA only if the ultrasound is positive
biopsy and clinical judgement.

Results

In this section, results are presented for the base case, then for the various sensitivity analyses, and the budget impact, all based around the diagnostic reference standard applicable in the TABUL study. We then investigate how varying the reference standard changes the results.

The two main measures of cost-effectiveness are the incremental cost-effectiveness ratio (ICER) and net monetary benefit (NMB). Both of these are all-encompassing measures that trade off additional costs of diagnosis, medication and treatment of complications against benefits in terms of improved life expectancy and quality of life (e.g. through reduced incidence of blindness or reduced incidence of fractures). Central to these measures is:

The QALY; for example, 2 years spent with a utility of 0.6 gives 1.2 QALYs.
The value placed on 1 QALY gained [often referred to as the willingness-to-pay (WTP) threshold], which, in the UK, is stated by NICE to be typically in the range £20,000 to £30,000 per QALY. We will use a threshold of £20,000 per QALY for our analysis because this is more usual for groups that are not disadvantaged.

The preferred measure is the ICER. This shows how cost-effective one strategy is compared with another by dividing the incremental costs by the incremental QALYs, but this can become complex to present when there are many strategies. We shall therefore report ICERs to compare a small number of strategies, but we shall use the NMB to compare the cost-effectiveness across all strategies. The NMB is the overall monetary value of a screening/treatment strategy taking account of both costs and health benefits, with the health benefits valued at £20,000 per QALY. The higher the NMB, the more cost-effective a strategy is; this allows easy comparison across multiple strategies.

To calculate the NMB of a strategy, the steps are:

Calculate the total costs incurred (including the cost of the tests, medications and treatment of complications and AEs).
Calculate the total QALYS over the model time horizon, in this case 40 years.
Multiply the total QALYs by the WTP threshold of £20,000 per QALY.
Deduct the costs calculated in (1) from the value in (3) to obtain the NMB.

Base-case results

In Table 71, results are shown for various alternative diagnostic strategies, all assuming base-case model parameters.

TABLE 71

Results for alternative screening strategies

Columns 2 and 3 show the performance of each screening strategy. Columns 4 and 5 show the proportion of patients who would undergo each test. Columns 6–10 are the economic outcomes. Column 8 is the NMB measure of cost-effectiveness. The NMB figures for each strategy appear to be of roughly the same magnitude, and, although this might suggest that they are all almost the same, this would be an incorrect interpretation. The higher the incremental net benefit in column 9 of a given strategy compared with the biopsy-only strategy, the more cost-effective that strategy is. It should be remembered that these monetary differences are per patient. The budgetary impact of selected strategies is explored later. Column 10 shows the ranking of each diagnostic strategy in terms of cost-effectiveness (based on the NMB); the lower the ranking the more cost-effective the strategy. The last two columns show two clinical outcome measures.

It may be easier to understand how the results compare visually on a cost-effectiveness plane, as shown in Figure 19. The most cost-effective strategy is indicated by bold font, that is, ‘2-week decision: ultrasound and judgement’. The green dotted line is known as a cost-effectiveness threshold, and it represents a line along which any point would have the same cost-effectiveness (any point has a cost-effectiveness ratio of £20,000 per QALY relative to this strategy). Any points below the line have a more favourable ratio of additional costs to additional benefits and would be a more cost-effective option (if there were any). Any points above the line are not cost-effective. In the case of the strategy ‘2-week decision: combined biopsy and ultrasound and judgement’ connected by a blue dashed line, the gradient is clearly much steeper than the green dotted line, indicating that the marginally higher QALY gains are not achieved in a cost-effective way. Numerically, the additional 0.0018 (7.6482 – 7.6464) QALYs cost an extra £485 (£1406 – £921), giving an ICER of £271,864, which far exceeds the acceptable threshold of £20,000 per QALY. For all other strategies, both the costs and QALYs are inferior (higher cost and fewer QALYs) compared with the optimal ‘Two-week decision: ultrasound and judgement’ strategy, which is thereby said to dominate these strategies (including ‘Biopsy and judgement’).

FIGURE 19

Cost-effectiveness plane showing the results for each strategy. Bold indicates the most cost-effective strategy.

In light of these results, we undertook some further refinement of the ultrasound and clinical judgement strategy as shown in Table 72.

TABLE 72

Further exploratory analyses of the ultrasound plus judgement strategy

The results lead to the following findings:

The most cost-effective strategies are those that include an element of clinical judgement.
Ultrasound and clinical judgement is the most cost-effective strategy, with the highest incremental NMB. This is largely because of the difference in the cost of the tests (Table 73).
For the strategy in (2) above, the estimated cost saving is £475 patient and there is a very small QALY gain of 0.0005 compared with biopsy and judgement. Rather than calculating an ICER, ultrasound is said to dominate biopsy in this case as ultrasound results in both cost savings and QALY gains.
Ultrasound alone is more cost-effective than biopsy alone.
The three sequential diagnostic strategies that incorporate pre-test probabilities (those ranked 4, 5 and 6) offer a level of cost-effectiveness between those involving clinical judgement and those (ranked 7 to 13) that include neither clinical judgement nor pre-test probabilities.

TABLE 73

Differences in costs and QALYs

A further finding from the additional analyses in Table 72 is that the ultrasound and judgement strategy may be improved slightly by undertaking a biopsy in cases in which the pre-test risk is high and the ultrasound and judgement decision would be not to treat. It should be noted that only 2% of individuals in TABUL were referred for biopsy under such a strategy, so there is some uncertainty around the benefit of a biopsy in such circumstances. It would also require the timing of the decision to perform a biopsy to be made after the outcome of the ultrasound plus judgement strategy is known. This is likely to mean that the biopsy is delayed (so may be less accurate than in our model because of the change in histology since patient presentation). Alternatively, an earlier biopsy would be possible if an ultrasound plus judgement outcome was obtained before 2 weeks. However, this would mean that there is less information available to the clinician on the patient’s symptoms and response to steroid treatment which, in turn, may lead to a less accurate outcome as a result of a more rapid ultrasound and clinical judgement strategy.

Detailed analysis of results for ultrasound plus judgement versus biopsy plus judgement

It is useful to break down the cost and QALY differences further, as shown in Table 73, to understand how they arise. As previously stated, the cost difference is largely because of the difference in cost of the tests. In terms of QALYs, compared with biopsy and judgement, ultrasound plus judgement leads to fewer false negatives and so lower loss of health due to complications of GCA (difference = 0.0023). However, approximately 75% of this QALY gain is offset by loss of health through prescribing steroids to a greater number of false-positive cases (0.0017).

Sensitivity analyses

Table 74 shows the NMB (based on a £20,000/QALY acceptability threshold) under various alternative model assumptions. The results relate to the biopsy plus clinical judgement strategy compared with the ultrasound plus clinical judgement. The base-case difference was £485 in favour of ultrasound plus clinical judgement.

TABLE 74

Results from sensitivity analyses

The results from the sensitivity analyses indicate that the improved cost-effectiveness of ultrasound and judgement (compared with biopsy and judgement) is not sensitive to alternative assumptions, with only alternative cost or test sensitivity assumptions reducing the incremental NMB result below £400 (from £485 base-case result). This is because the difference in the cost of the tests, in particular, is a very strong driver of the cost-effectiveness. Even doubling the cost and quality-of-life burden from steroid-related AEs did not change the outcome much.

Results based around an alternative reference standard

All of the results presented so far have been based on the reference diagnosis defined for the TABUL study. In this section, we show the impact of alternative reference diagnoses that involve fewer true GCA cases by removing some cases that rely solely on clinical judgement.

The results and their interpretation are best shown graphically (Figure 20). The x-axis shows four reference standards: the one used in the TABUL study and then the three alternatives described earlier, with increasing proportions of cases for which results might be considered more borderline. The y-axis shows the incremental NMB of the four selected alternative diagnostic strategies compared with biopsy alone.

FIGURE 20

Incremental NMB for four diagnostic strategies compared with biopsy alone according to reference standard adopted. H0M5L7, treat all high-risk patients, diagnosis for low-/medium-risk patients as per ultrasound result; US, ultrasound.

The results show that, for all alternative reference standards tested, ultrasound plus clinical judgement remains the most cost-effective strategy. It is only by adopting a reference standard with a significant reduction in cases of GCA (16% fewer GCA cases than in the TABUL cohort) that a diagnostic strategy based on pre-test risks and ultrasound might potentially become as cost-effective as ultrasound combined with clinical judgement.

Budget impact

In the UK population, the annual incidence of GCA in those aged over 40 years is about 1 per 4500 people (or 22 per 100,000),¹¹³ giving an annual incidence of about 7000 cases.

The cost savings arising at the point of testing through use of ultrasound instead of biopsy (both alongside clinical judgement) would be £456 (which represents the difference between £514 for a biopsy and £58 for ultrasound) per case or around £4,735,000 annually for the UK. Taking account of higher treatment costs for biopsy (because of slightly lower sensitivity), the cost savings would be £475 per case or around £4,933,615 annually for the UK.

If we use the strategy of ultrasound combined with clinical judgement but refer for biopsy cases that were judged to be ‘not GCA’ if they had a high pre-test probability of GCA, the cost saving would be £477 per case, or around £4,950,000 annually for the UK.

Discussion

Statement of principal findings

The results indicate that ultrasound alone is more cost-effective than biopsy alone, largely because of its much lower cost and, to a lesser extent, its higher sensitivity.

In practice, patients are stratified for the risk of having GCA or not, based on demographic factors such as age and sex, the clinical presentation and, in particular, the presence of more specific GCA-related symptoms such as jaw claudication and/or visual loss combined with the evidence of an acute phase response (elevated CRP level or ESR). Therefore, the biopsy test or ultrasound test are never used in isolation and should be regarded as supplementary to the rest of the clinical evaluation in such patients; this combination increases the sensitivity of the tests considerably. This is reflected in the main set (base-case) results, which show that the most cost-effective strategies are based on a test in conjunction with clinical judgement. Current clinical practice involves biopsy with clinical judgement. The results indicate that ultrasound plus clinical judgement is more cost-effective than biopsy plus clinical judgement, with a relative cost saving of £475 per patient and a larger QALY gain of 0.0005; thus, ultrasound is said to dominate biopsy in this case (both in terms of cost savings and QALY gains). This is a very small difference in QALYs, however, which is equivalent to < 1 day of full health on average across presenting patients. Ultrasound plus judgement is also estimated to result in a marginally lower incidence of vision loss (owing to its slightly higher sensitivity) than biopsy and judgement.

One-way sensitivity analyses show that these findings are highly insensitive to changes in nearly all model parameters. The only parameters having any sizeable effect, in terms of partly reducing the difference in cost-effectiveness, are the cost of ultrasound and uncertainty around the sensitivity of ultrasound and biopsy.

In conjunction with clinical judgement, performing both a biopsy and an ultrasound test in all patients is less cost-effective than ultrasound alone because the additional costs of testing are not justified by the small reduction in treatment cost and increase in QALYs.

When we explored the impact of alternative diagnostic reference standards with up to 16% fewer GCA cases, ultrasound plus clinical judgement remained the most cost-effective strategy.

Drivers of cost-effectiveness

By far the most dominant driver is the cost of TAB because it is estimated to be almost nine times the cost of an ultrasound (£514 compared with £58). It is this that makes ultrasound plus clinical judgement more cost-effective than biopsy plus clinical judgement. When comparing strategies involving clinical judgement with equivalent strategies without clinical judgement (e.g. biopsy plus judgement compared with biopsy alone), the different sensitivities to GCA are the main driver of the results.

Strengths and limitations

This is, to our knowledge, the first published economic evaluation of ultrasound compared with biopsy. The evaluation not only includes costs incurred at the point of diagnostic testing, but also the costs and QALY implications of different rates of false-positive and false-negative cases. We also carried out additional analyses to allow for the fact that there is not a single universally accepted gold standard for diagnosing GCA (see Chapter 8 for further discussion on the lack of a gold standard).

No evidence source was found for the cost of a biopsy of the temporal artery. It was therefore necessary to use the cost of a procedure similar in terms of complexity and therefore resource use, a lymph node/salivary gland biopsy. Ideally, a micro-costing study could have been undertaken to arrive at an estimate specific to TAB. However, sensitivity analysis around the difference in cost between ultrasound and biopsy showed that this only had a small impact on reducing the favourable cost-effectiveness of ultrasound.

The diagnostic outcomes for ultrasound plus clinical judgement were not a formal outcome of the TABUL study so we had to use an algorithm (see Methods). Although this approach, and specifically the use of a vignette exercise to obtain the outcome for 27 patients, introduces some uncertainty around the sensitivity and specificity of ultrasound and clinical judgement, this is very unlikely to be large enough to have a material effect on the economic findings. This can be seen in the sensitivity analysis that varied the sensitivity and specificity of biopsy.

Owing to the complexity involved, our model was not sophisticated enough to include the impact of a quicker turnaround of results with ultrasound and any benefits arising from being able to lower the steroid dose sooner for cases with a negative diagnosis so there might be some further benefit to ultrasound-based strategies not accounted for in the modelling.

Any general limitations of the TABUL study, as discussed in Chapter 8, that pertain to the observed diagnostic yields (test sensitivity and specificity) apply to the economic analysis too. However, we carried out uncertainty (sensitivity) analysis around these parameters and this did not affect the conclusions.

Implications

The results indicate that ultrasound plus clinical judgement is the most cost-effective strategy. Such use of ultrasound rather than biopsy would result in significant reductions in costs as a result of the much lower cost of the test (£514 vs. £58). Frequently, the upfront cost can be a barrier to uptake of cost-effective technologies for which the economic benefits only materialise over the long term. This is not the case here, however, with estimated savings to the UK of £4,735,000 annually based on annual incidence of 7000 cases.

Unanswered questions and further research

We were unable to identify a study that would enable us to calculate dose-specific risks of fractures for each fracture type. Studies generally reported hazard ratios by category of average steroid dose, for example > 7.5 mg per day, rather than for specific and varying doses over time. This is a specific example of the difficulty of synthesising the range of heterogeneous evidence available on risks of steroid therapy in terms of study duration, starting dose, tapering schedule and set of AEs reported. Sensitivity analysis indicates that our results are very insensitive to uncertainty around the burden of steroid-related AEs. However, in a different context to this evaluation, for example, with a less dominant difference between the costs of the tests, the difficulties in synthesising such evidence could be a far greater limitation.

Copyright © Queen’s Printer and Controller of HMSO 2016. This work was produced by Luqmani et al. under the terms of a commissioning contract issued by the Secretary of State for Health. This issue may be freely reproduced for the purposes of private research and study and extracts (or indeed, the full report) may be included in professional journals provided that suitable acknowledgement is made and the reproduction is not associated with any form of advertising. Applications for commercial reproduction should be addressed to: NIHR Journals Library, National Institute for Health Research, Evaluation, Trials and Studies Coordinating Centre, Alpha House, University of Southampton Science Park, Southampton SO16 7NS, UK.

Included under terms of UK Non-commercial Government License.

Bookshelf ID: NBK401215

Contents