U.S. flag

An official website of the United States government

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Clarke A, Pulikottil-Jacob R, Grove A, et al. Total hip replacement and surface replacement for the treatment of pain and disability resulting from end-stage arthritis of the hip (review of technology appraisal guidance 2 and 44): systematic review and economic evaluation. Southampton (UK): NIHR Journals Library; 2015 Jan. (Health Technology Assessment, No. 19.10.)

Cover of Total hip replacement and surface replacement for the treatment of pain and disability resulting from end-stage arthritis of the hip (review of technology appraisal guidance 2 and 44): systematic review and economic evaluation

Total hip replacement and surface replacement for the treatment of pain and disability resulting from end-stage arthritis of the hip (review of technology appraisal guidance 2 and 44): systematic review and economic evaluation.

Show details

Chapter 9Discussion

Decision problem and objectives

The main objective was to undertake a clinical effectiveness and cost-effectiveness analysis of different types of THR and hip RS for the treatment of pain and disability in people with end-stage arthritis of the hip. Specific aims were to compare the clinical effectiveness and cost-effectiveness of (1) different types of primary THR and hip RS for people in whom both procedures are suitable and (2) different types of primary THR for people who are not suitable for hip RS.

Methods and summary of findings

We undertook systematic reviews of the clinical effectiveness of RS and THR and of registry reporting and cost-effectiveness studies in December 2012. For the clinical effectiveness review, searches were undertaken in 12 databases including MEDLINE, Science Citation Index, The Cochrane Library and Current Controlled Trials and were limited to studies published from 2008 onwards and including sample sizes of ≥ 100 participants. Two independent reviewers screened all records, extracted data and independently assessed risk of bias. Estimates of effectiveness were pooled and the quality of the evidence was assessed using the GRADE approach.

Although we appraised and summarised a very large amount of evidence, much of it was inconclusive because of poor reporting, missing data, inconsistent results, inappropriate pooling methods, inconsistent summary findings and uncertainty in treatment effect estimates. Improvements post surgery were reported for functional/clinical measures and quality-of-life measures regardless of the type of THR or RS. Evidence on the relative benefits of RS compared with THR or of different types of THR was largely lacking. Certain types of THR appeared to confer some benefit, including larger femoral head size, use of a cemented cup, use of a cross-linked polyethylene cup liner and use of a ceramic-on-ceramic as opposed to a metal-on-polyethylene articulation, although the findings were not conclusive or reflected short-term follow-up. Systematic reviews of cost-effectiveness and registry studies worldwide provided costs for revision and follow-up, corroboratory utility data and registry data for validating the survival analysis. For both research questions we drew on our systematic reviews of clinical effectiveness and cost-effectiveness and registry data to identify inputs for the models to compare the clinical effectiveness and cost-effectiveness of RS with that of different types of THR and different types of THR with each other.

For the cost-effectiveness analyses we used the NJR to identify populations undergoing the various types of interventions. We identified the group undergoing RS but it became clear that there was a very large possible number of categories for those undergoing THR. Using a series of cross-tabulations by combinations of components, we identified the top four most commonly used categories of THR (> 25,000 in the database) and our clinical advisors recommended the inclusion of a further fifth mutually exclusive category. We identified time to revision for all categories by age and sex using NJR data and investigated a large number of methods for extrapolating beyond observed data and tested goodness of fit.

We built a Markov, multistate model to investigate both RS and THR. Health states included successful primary surgery, revision surgery, successful revision surgery and death. Cycle length was 1 year. We adopted a 10-year and a lifetime horizon from the perspective of the NHS and PSS. We applied an annual discount rate of 3.5% to both costs and outcomes and ran the model deterministically and probabilistically. We undertook a large number of sensitivity analyses. The economic model was independently reviewed and adjusted in response to this.

We found that the ages and sexes of RS and THR patients overlapped substantially such that with the data available it was impossible to identify mutually exclusive cohorts eligible for both THR and RS.

We therefore used propensity matching to compare RS with THR, drawing age–sex matched pairs from the RS data set and from the five categories of THR combined. We used NHS Supply Chain costs for ‘major hip procedures’, drawing on the same nationally available HRG4 reference costs for both RS and THR for follow-up and revision. We used age- and sex-adjusted utility values from the PROMs data set, using the same utility values for both procedures for before and after hip replacement and for revision because no separate utility values were reported for RS.

We used age- and sex-specific PROMs data and assessed estimates of cost-effectiveness for men and women aged 40, 50 and 60 years using lifetime revision rates and undertook sensitivity analyses stratified by sex and controlled for age.

We compared the five categories of THR with each other, investigating patients eligible for THR (all patients) and those less eligible for RS (aged > 65 years). For the base case we used costs supplied by the manufacturers for each of the components of THR. We used alternative costs including those supplied by local trusts when manufacturer costs were not available and alternative manufacturers’ costs in sensitivity analyses. We used age- and sex-adjusted PROMs utility values for health state utilities.

We undertook sensitivity analyses and analyses of cost drivers including investigating changes in age and sex categories, stratifying by age (< 65 years and > 65 years), investigating different methods of extrapolation of revision rates (using a log-normal model) and varying prosthesis costs (using NHS list prices) and discount rates.

The NJR included just fewer than 420,000 patients. Approximately 31,000 (7.4%) patients had undergone RS. Our identified categories of THR covered 62% of the THR population. In total, 90% of RS patients and 23% of THR category patients were aged < 65 years. Bathtub models (predicting an increasing likelihood of revision over time) gave the best fit to the observed data. PROMs data showed that utility differences were dramatic, that is, 0.35 pre intervention to 0.78 post intervention and 0.53 pre revision to 0.78 post revision.

Revision rates for all RS were always higher than those for THR (all THR, all of our identified categories of THR combined, each of our THR categories separately). The mean cost of RS was £2672 and the weighted mean cost of THR was £2571.

Costs for RS were higher than those for THR and mean QALYs gained were lower. The ICER showed that RS was dominated by THR (over a lifetime horizon in the base-case analysis, the incremental cost of RS was £11,490 and the incremental QALYs were –0.0879). Very similar results were obtained for the deterministic and probabilistic results for RS compared with THR and when THR was analysed separately in sensitivity analyses for all age and sex groups. RS remained clearly dominated by THR. CEACs showed that, for all patients, THR was almost 100% cost-effective at any WTP level.

The five categories of commonly used types of THR that we investigated are cemented–cemented with a polyethylene–metal articulation (CeMoP, category A) (125,285 patients); cementless–cementless with a polyethylene–metal articulation (CeLMoP, category B) (37,874 patients); cementless–cementless with a ceramic–ceramic articulation (CeLCoC, category C) (34,754 patients); hybrid (cementless–cemented) with a polyethylene–metal articulation (HyMoP, category D) (28,471 patients); and cemented–cemented with a polyethylene–ceramic articulation (CeCoP, category E) (12,705 patients).

There were age and sex differences between the populations receiving different types of THR and variations in revision rates between category E (1.6%) and category C (3.5%) at 9 years (for all interventions, revision rates at 9 years were well under 10%). The prosthesis cost varied between £1557.38 for category A (CeMoP) and £3868.80 for category C (CeLCoC).

In the base-case analysis, for all age and sex groups combined and using a bathtub model (indicating an increasing likelihood of need for revision with time) and a lifetime horizon, Category E dominated the other four categories. The mean cost for category E was slightly lower and the mean QALYs gained for category E were slightly higher than the corresponding values for all other THR categories for both the deterministic and the probabilistic analysis.

In the deterministic analysis, compared with category E, category A (CeMoP) cost £278 more (£14,801 vs. £14,523) and generated 0.0022 fewer QALYs (14.7887 vs. 14.7909) and the probabilistic results were very similar. Over a lifetime horizon, category E was 99.9% likely to be cost-effective whereas category A was 1% likely to be cost-effective at a WTP of £20,000 per QALY.

For patients aged > 65 years, over a 10-year time horizon, and at a WTP of £20,000 per QALY, category A was more likely to be cost-effective in all groups (category A: 99% probability of being cost-effective; categories B–E: < 1% probability of being cost-effective), although category E was more effective over a lifetime horizon for all groups (except for men aged 80 years for whom the QALYs generated by categories A and E were the same).

Sensitivity analysis for all age–sex groups combined using a log-normal model (indicating a decreasing risk of revision over time) and a lifetime horizon resulted in category A being 85% cost-effective at a WTP threshold of £20,000 per QALY. Further sensitivity analysis using an age- and sex-adjusted log-normal model demonstrated that, likewise, over a lifetime horizon and at a WTP of £20,000 per QALY, category A was 100% cost-effective at a WTP of £20,000 per QALY.

The main drivers of differences between category A and category E were found to be the costs of the components, discount rates and modelled revision rates.

Strengths and limitations

We undertook rigorous systematic reviews and we believe that we identified all relevant publications concerning the clinical effectiveness and cost-effectiveness of both THR and RS as well as all available registry results. However, given the wide scope and large amount of identified evidence, we limited our inclusion for clinical effectiveness studies to those with a sample size of ≥ 100 and those published since 2008. This decision was based on our sample size calculations for clinically important differences in the HHS and the fact that smaller studies tend to be underpowered to detect meaningful differences in continuous outcomes. We pooled data when possible and used the GRADE system for assessing overall quality.

We did not find any longer-term RCTs covering the comparison between RS and THR or between different types of THR that would allow us to model differences in revision rates for RS or THR relevant to a lifetime horizon. We therefore had to use nationally collected non-randomised clinical audit data from the NJR. The NJR has a high reported coverage with good quality assessment systems and NJR data were complete for patient age and sex at the time of receipt of the THR.

However, the non-randomised nature of the database means that selection bias may be operating within the data. Revision rates may be higher, for example those selected to receive one intervention rather than another (e.g. RS) may belong to a group who have an adverse profile in the population. We worked to reduce confounding by propensity matching RS patients with THR patients using NJR data and by undertaking extensive analyses by age and sex for the comparisons of different types of THR. However, we were of course unable to adjust for confounders of which we were unaware.

The number of unique prosthesis types used for THR patients was large, even without taking into account the variety of manufacturer brands available for the different components. It was necessary to reduce these to a smaller number for economic analysis. For the comparisons of different types of THR we therefore used cross-tabulations to generate the largest categories of THR. Selection was based on the frequency of use of different categories of prosthesis and on expert clinical opinion. The selection of the five THR categories was conducted pre hoc and before all analyses of revision rates. To our knowledge this is the first time that different THR components have been investigated in this comparative way – it allows for a more granular approach to assessing the cost-effectiveness of different types of THR than previously and has the advantage of more precisely reflecting current practice.

We were able to asses only a relatively small number of categories (five) as we needed to generate appropriate costings of subcomponents and to have enough patients in each category to model revision rates reliably. This meant that we were unable to include some of the less popular combinations of components for hip replacement (38% of THRs). However, we modelled revision rates and survival rates using all hip replacements to assess how our categories A–E compared with those for RS. We found that the overall revision rate was slightly higher than the revision rates for categories A–E. Given this finding we consider that our comparisons are likely to have focused on the more cost-effective THR options.

Age and sex distributions varied between categories. When populations were controlled for differences in age and sex or were stratified by sex and controlled for age, the lower revision rate for category E relative to the other categories remained. Also, when well-fitting models that predicted either an increasing or a decreasing hazard on extrapolation were used, the superiority of the category E revision rate was again upheld. There was insufficient information recorded consistently within the NJR for investigation of other potential confounders. For example, our clinical advisors suggested that selection of patients for RS may be made by surgeons based on activity levels (levels of physical fitness, athleticism, weight lifting, manual labour); however, the only characteristics that were reliably collected at the patient level in the NJR were age and sex. This means that we were unable to identify other characteristics or subpopulations in which RS might be more beneficial. However, age and sex may act as a proxy for physicality and it is of interest that revision rates for RS were higher in every age and sex group that we examined, including in the youngest category of men.

For revision rates the unit of analysis was the time to a patient’s first revision. For patients who received a THR for both hips simultaneously, only the replacement that failed first was included as an event; for those who received a THR for both hips on separate occasions, only the first primary intervention entered the analysis. To model revision rates we followed NICE DSU366 recommendations in first exploring exponential, Weibull, Gompertz, log-normal and log-logistic models of observed revision rates based on IPD. However, previous economic analyses of hip replacement, notably those of Briggs et al.,38 Higashi and Barendregt273 and Pennington et al.,44 modelled revision rates on the assumption of a U-shaped hazard. In these an assumed high hazard for failure associated with surgery is followed by a decreasing hazard that eventually plateaus during an initial recovery period and is then followed by a gradually increasing hazard as host bone deteriorates with patient age and the prosthesis accumulates wear and tear. The resulting hazard curve is commonly termed a bathtub hazard. We therefore also explored bathtub models to extrapolate revision rates beyond the observed data.

For most age groups this offered the best fit to the observed data but, for patients aged > 85 years, during the observation period the revision rate was low and extrapolation with an increasing hazard becomes less appropriate. We derived the bathtub hazard directly using the Stata package developed by Crowther and Lambert.360 Pennington et al.44 employed a piecewise procedure to generate the U-shaped hazard; however, after extrapolation this predicted that > 100% of patients sustained revision and at this point the rate required capping. A strength of this work is that we tested a large number of methods for extrapolating the revision rate including competing risk analysis and flexible parametric models.

For RS a wide range of femoral head sizes are used and revision rates have been reported to vary according to head size.15 Only a narrow range of head sizes are used for THR prostheses and expert clinical opinion indicated that these are unrelated to RS head sizes so that comparisons of RS and THR according to head size were not undertaken. It is of interest that we identified only one RCT investigating different THR head sizes. This demonstrated an advantage from a larger head size (36mm vs. 28mm) and had a low risk of bias, although so far follow-up has continued for only 1 year.

Utilities for both models for the base-case analysis were obtained from the national PROMs database, which is comprehensive. We were unable to link NJR and PROMs data; however, we adjusted EQ-5D scores for the successful primary health state and successful revision health state to reflect age and sex differences. In our economic model we assumed that costs and utilities were the same for both RS and THR. Our model is therefore likely to represent a fair comparison but is also likely to underestimate the prosthesis cost for RS, which has been reported to be more expensive than that for THR.40 In spite of this assumption we found THR to be cost-effective (dominant) compared with RS for all age (40, 50 and 60 years) and sex groups.

Although we undertook a rigorous systematic search for cost-effectiveness studies, little information was available in the literature to estimate costs and resource usage. We could identify only one cost–utility analysis of RS compared with THR from a RCT.40 The costs of follow-up in our model were based on this trial; however, we assumed that the costs of follow-up were the same for the first and subsequent years across the lifetime of the model. This may have overestimated the cost of follow-up although it was applied equally to both comparators in the model.

The cost of the prosthesis varied between THR categories. Category A was the least expensive but category E had lower revision rates and generated more QALYs over the lifetime horizon. We used prices for prosthesis components obtained from the NHS Supply Chain. We undertook a sensitivity analysis based on the highest (category C, £3868.80) and the lowest (category A, £1557.38) prices. So as not to disadvantage any one category, the costs of the prostheses used in revision surgery were assumed to be the same across categories. This is likely to underestimate differences in the costs of revision. We were unable to incorporate adverse events that were not severe enough to lead to revision, although we were able to weight revision costs by different reasons for revision.

Ideally, outcomes, including adverse events, costs and quality-of-life data, would be collected for each patient in a single audit database. This was not the case and we had to use separate databases for outcomes and quality of life without the possibility of linking these. However, we carried out sensitivity analyses to take account of possible cost and modelled revision rate differences. We based our economic model on previous research but a strength is that we had an independent critique and assessment of our model and altered its structure in relation to these external comments.

Copyright © Queen’s Printer and Controller of HMSO 2015. This work was produced by Clarke et al. under the terms of a commissioning contract issued by the Secretary of State for Health. This issue may be freely reproduced for the purposes of private research and study and extracts (or indeed, the full report) may be included in professional journals provided that suitable acknowledgement is made and the reproduction is not associated with any form of advertising. Applications for commercial reproduction should be addressed to: NIHR Journals Library, National Institute for Health Research, Evaluation, Trials and Studies Coordinating Centre, Alpha House, University of Southampton Science Park, Southampton SO16 7NS, UK.

Included under terms of UK Non-commercial Government License.

Bookshelf ID: NBK273954

Views

  • PubReader
  • Print View
  • Cite this Page
  • PDF version of this title (23M)

Other titles in this collection

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...