Assessment of evidence

Aileen Clarke; Ruth Pulikottil-Jacob; Amy Grove; Karoline Freeman; Hema Mistry; Alexander Tsertsvadze; Martin Connock; Rachel Court; Ngianga-Bakwin Kandala; Matthew Costa; Gaurav Suri; David Metcalfe; Michael Crowther; Sarah Morrow; Samantha Johnson; Paul Sutcliffe

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Clarke A, Pulikottil-Jacob R, Grove A, et al. Total hip replacement and surface replacement for the treatment of pain and disability resulting from end-stage arthritis of the hip (review of technology appraisal guidance 2 and 44): systematic review and economic evaluation. Southampton (UK): NIHR Journals Library; 2015 Jan. (Health Technology Assessment, No. 19.10.)

Cover of Total hip replacement and surface replacement for the treatment of pain and disability resulting from end-stage arthritis of the hip (review of technology appraisal guidance 2 and 44): systematic review and economic evaluation

Total hip replacement and surface replacement for the treatment of pain and disability resulting from end-stage arthritis of the hip (review of technology appraisal guidance 2 and 44): systematic review and economic evaluation.

Show details

Contents

< Prev Next >

Chapter 4Assessment of evidence

Methods for the review of clinical effectiveness

A protocol was developed and approved by NICE (www.nice.org.uk/nicemedia/live/13690/62831/62831.pdf). General principles were applied as recommended by the NHS Centre for Reviews and Dissemination (CRD).⁹⁸

This report contains reference to confidential information provided as part of the NICE appraisal process. This information has been removed from the report and the results, discussions and conclusions of the report do not include the confidential information. These sections are clearly marked in the report.

Identification of studies

Initial scoping searches were undertaken in MEDLINE in October 2012 to assess the volume and type of literature relating to the assessment question. The scoping searches also informed development of the final search strategies (see Appendix 1). An iterative procedure was used to develop these strategies with input from clinical advisors and previous HTA reports (e.g. Vale et al.,¹⁹ de Verteuil et al.¹¹). The strategies have been designed to capture generic terms for arthritis, THR and RS.

Search strategies

Final searches were undertaken in November and December 2012 (see Appendix 1) and were date limited from 2002 (the date of the most recent NICE guidance in this area²⁵). Searches of the clinical effectiveness literature were restricted to RCTs and systematic reviews; additional searches were undertaken to capture literature relating to costs, resource use, utilities, cost-effectiveness, cost-effectiveness models and registries to inform the survival and cost-effectiveness analysis.

The following main sources were searched to identify relevant published and unpublished studies and studies in progress:

electronic bibliographic databases
contact with experts in the field
references of included studies
screening of relevant websites.

The following databases of published studies were searched: MEDLINE, MEDLINE In-Process & Other Non-Indexed Citations, EMBASE, Science Citation Index and Conference Proceedings CitationIndex – Science, The Cochrane Library [specifically the Cochrane Database of Systematic Reviews (CDSR), Cochrane Central Register of Controlled Trials (CENTRAL), Database of Abstracts of Reviews of Effects (DARE), NHS Economic Evaluation Database (NHS EED), HTA database], Current Controlled Trials, ClinicalTrials.gov and UK Clinical Research Network (UKCRN) Portfolio Database. The search strategies were initially developed for MEDLINE and were adapted as appropriate for other databases.

The reference lists of included studies and relevant review articles were checked and the following websites of hip implant manufacturers were screened for relevant publications:

Amplitude
Biomet
B Braun/Aesculap
Comis Orthopaedics
Corin
DePuy
Exactech
Finsbury
JRI Orthopaedics
Implantcast
Implants International
Lima WG Healthcare
Mathys Orthopaedics
Medacta UK
Orthodynamics
Peter Brehm
SERF Dedienne santé
Smith & Nephew
Stanmore Implants Worldwide
Stryker
Symbios SA
Waldemar Link
Wright Medical UK
Zimmer, Inc.

Grey literature searches were undertaken using Google (Google Inc., Mountain view, CA, USA) and the online resources of the following regulatory bodies, health services, research agencies and professional societies:

British Hip Society
British Orthopaedic Association
Orthopaedic Research UK
ODEP
NJR
Arthritis Research UK
Cochrane Musculoskeletal Group
Arthritis Care
MHRA
American Association of Hip and Knee Surgeons
American Academy of Orthopedic Surgeons (AAOS)
The Hip Society
Royal College of Surgeons
Royal College of Surgeons of Edinburgh.

All bibliographic records identified through the electronic searches were collected in a managed reference database.

Inclusion criteria

Study design

RCTs.
Systematic reviews.
Meta-analyses.

Given the wide scope and large amount of identified evidence, we limited studies to those published since 2008 with a sample size of ≥ 100 participants.

Population

People with pain or disability resulting from end-stage arthritis of the hip for whom non-surgical management has failed.

Intervention

Elective primary THR.
Primary hip RS.

Comparator

Different types of primary THR compared with RS for people in whom both procedures are suitable.
Different types of primary THR compared with each other for people who are not suitable for hip RS.

Outcomes

Clinical effectiveness outcome measures were mortality, validated functional/pain and health-related quality of life total scores, revision rate, implant survival rate and femoral head penetration rate (measure of prosthesis movement). Adverse events included incidence of peri-/postprocedural complications (i.e. implant dislocation, infection, osteolysis, aseptic loosening, femoral fracture and deep-vein thrombosis).

Exclusion criteria

The exclusion criteria were as follows:

indications for hip replacement other than end-stage arthritis of the hip
revision surgery as the primary procedure of interest
abstract/conference proceedings, letters and commentaries
non-English language publications.

Study selection process

All retrieved records were collected in a specialised database. All duplicate records were identified and removed. Two reviewers pilot tested an a priori screening form based on the predefined study eligibility criteria. Afterwards, two independent reviewers applied the same inclusion/exclusion criteria and screened all identified bibliographic records for title/abstract (level I) and then for full text (level II). Disagreements over eligibility were resolved through consensus or by a third party reviewer. Reasons for exclusion of full-text papers were documented. The study flow was documented using a Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) diagram.⁹⁹

Quality assessment strategy

Two reviewers independently assessed the risk of bias of individual studies using validated tools |(see Appendix 2).¹⁰⁰^,¹⁰¹ Any disagreements between the two reviewers were resolved by a third reviewer through discussion.

Randomised controlled trials were assessed using the Cochrane Collaboration risk of bias tool,¹⁰⁰ which covers the following domains of threat to internal validity: selection bias (randomisation sequence generation, treatment allocation concealment), performance bias (blinding of participants/personnel), detection bias (blinding of outcome assessors), attrition bias (incomplete outcome data), reporting bias (selective outcome/analysis reporting) and other prespecified bias [e.g. funding source, adequacy of statistical methods used, type of analysis (intention to treat/per protocol), imbalance in the distribution of baseline prognostic factors between the compared treatment groups]. The risk of bias assessment results fall into three distinct categories of high, low and unclear risk of bias. For each RCT, the risk of bias for the performance, detection and attrition bias domains was assessed for a priori defined groups of subjective (e.g. patient-administered clinical and functional scores) and objective (e.g. mortality, revision, survival, radiography result, complications) outcomes separately. Afterwards, the within-study summary risk-of-bias rating across all of the domains was derived for subjective and objective outcomes separately. The decision for determining the within-study summary risk of bias was based on the ratings prevailing for the selection, performance and detection bias domains. At data synthesis stage, the across-study average summary risk of bias was determined and assigned to each outcome of interest.

The methodological quality of included systematic reviews was assessed with the Assessment of Multiple Systematic Reviews (AMSTAR) tool,¹⁰¹ which covers the following domains: (1) research question, (2) inclusion/exclusion criteria, (3) search strategy (at least two major electronic databases), (4) data extraction by independent reviewers, (5) assessment of risk of bias by independent reviewers, (6) consideration of risk of bias in the analysis, (7) exploration of heterogeneity and (8) publication bias. For convenience of presentation, the methodological quality of each systematic review was graded according to the number of items satisfied as follows: high (range 9–11), medium (range 5–8) and low (range 0–4).

Grading the overall quality of clinical effectiveness evidence

The overall quality of evidence for each preselected (i.e. gradable) outcome across studies was assessed using the systematic approach developed by the Grading of Recommendations, Assessment, Development and Evaluation (GRADE) Working Group (see www.gradeworkinggroup.org).

The GRADE approach¹⁰² indicates levels of confidence in the observed treatment effect estimate(s), which are categorised as high, moderate, low or very low. The grading of overall quality of evidence for each gradable outcome is based on assessments across five domains: (1) summary risk of bias across studies per gradable outcome (internal validity across studies, study limitations), (2) consistency of results (heterogeneity), (3) directness of the evidence (applicability of the results, indirect treatment comparisons), (4) precision of the results (the width of the 95% CI around the estimate) and (5) publication/reporting bias (detection of asymmetry in the funnel plot, selective outcome reporting). The definitions and explanations of the grading levels and the grading process across the five domains are presented later in this chapter (see Tables 35 and 43).

TABLE 35

Grading of Recommendations, Assessment, Development and Evaluation evidence profile for gradable outcomes reported in RCTs of THR

TABLE 43

Grading of Recommendations, Assessment, Development and Evaluation evidence profile for gradable outcomes reported in RCTs of THR vs. RS

The gradable outcomes, selected according to their meaningfulness and importance for decision-making, were the following: HHS, WOMAC score, revision, mortality, femoral head penetration rate and implant dislocation.

Data extraction strategy

The relevant data were extracted from included studies independently by one reviewer using a data extraction form informed by the CRD.¹⁰³ The extracted data were cross-checked by a second reviewer. Uncertainty and/or any disagreements with the second researcher were resolved by discussion. The extracted data were entered into summary and full extraction tables (see Appendices 3 and 4, respectively). The extracted information included the following:

Study characteristics (i.e. authors, country, design, study setting, sample size, funding source, duration of follow-up and information relevant to risk-of-bias assessment such as generation of randomisation, allocation concealment, blinding, completeness of outcome ascertainment, patient withdrawals/attrition for randomised trials; for observational studies and non-randomised trials, and information on potential confounding was additionally ascertained).
Patient baseline characteristics [i.e. inclusion/exclusion criteria, number of enrolled/analysed participants, age, race, sex, body mass index (BMI), underlying conditions, concomitant conditions, co-interventions, disability, activity levels, function, pain intensity and quality of life and disease-specific measures such as the OHS³⁰ and HHS³¹].
Experimental treatment characteristics (e.g. type – THR, RS; training/experience of the operator and postoperative rehabilitation staff; method of fixation – cemented, cementless, hybrid; bearing surface material – metal-on-metal, ceramic-on-ceramic, polyethylene-on-metal; femoral head size; name/brand and country of manufacturer; postoperative rehabilitation).
Outcome characteristics [e.g. definition; timing of measurement; scale of measurement – dichotomous, continuous; measures of association – mean difference (MD), risk ratio (RR), odds ratio (OR), hazard ratio (HR)]. Statistical test results and measures of variability were also extracted [standard deviation (SD), 95% CI, standard error (SE), p-value).

Any additional relevant information found in multiple publications of included studies was also extracted. For studies of clinical effectiveness in which summary measures and 95% CIs for the association between the treatments were not reported, MDs with 95% CIs were calculated if data allowed (t-tests for independent samples and continuous outcomes and RRs for dichotomous outcomes). No RRs and 95% CIs were estimated for individual studies that observed zero events in one or both treatment arms. The 95% CIs and SEs were used to derive SDs or vice versa. All calculated parameters were entered into the data extraction sheets.

Data management

Study, treatment, population and outcome characteristics were summarised in text, evidence and summary tables. The study results were compared qualitatively and quantitatively in text and summary tables. For each outcome of interest, the effectiveness of treatments reported in individual studies was compared as follows:

different types of primary THR compared with each other for people who are not suitable for hip RS
different types of primary THR compared with RS for people in whom both procedures are suitable.

Meta-analysis

The decision to pool individual study results was based on a degree of similarity with respect to methodological and clinical characteristics of studies under consideration (e.g. design, population, comparator treatment and outcome). Estimates of post-treatment MDs for continuous outcomes and RRs for binary outcomes (except for rare events) of individual studies were pooled using a DerSimonian and Laird random-effects model.¹⁰⁴ The choice of this model was based on the assumption that some residual clinical and methodological diversity will exist across pooled studies. Dichotomous outcomes with low event rates (5.0–10.0%) were pooled as RRs using a Mantel–Haenszel fixed-effects model. Dichotomous outcomes for studies with very low event rates (≤ 5.0%) or zero events in one of the treatment arms were pooled as ORs using a Peto fixed-effects model.¹⁰⁵

Trials were not pooled if the mean and/or SD for the continuous outcome of interest could not be ascertained.

The degree of statistical heterogeneity across pooled studies was determined through inspection of the forest plots, Cochran’s Q and the I² statistic. The presence of heterogeneity was judged according to predetermined levels of statistical significance (chi-squared p < 0.10 and/or I² > 50%). Statistical pooling was performed using The Cochrane Collaboration software package Review Manager version 5.2 (The Cochrane Collaboration, The Nordic Cochrane Centre, Copenhagen, Denmark).

Publication bias

It was planned to examine the extent of publication bias, given a sufficient number of data points, by visual inspection of funnel plots with respect to plot asymmetry as well as using linear regression tests.¹⁰⁶

Analysis to explore heterogeneity

If data allowed, exploration of study-level clinical and methodological sources of statistical heterogeneity of effect estimates across studies was planned through a priori-defined subgroup analysis (i.e. age, sex, function), sensitivity analysis (risk of bias item-specific ratings, intention-to-treat vs. per-protocol analysis) and meta-regression.

Data synthesis and interpretation

For both RCTs and systematic reviews, the comparison and synthesis of results for each outcome of interest was summarised and categorised as conclusive evidence (either there is a ‘difference’ or there is ‘no difference’) or inconclusive evidence (indeterminate results because of statistical uncertainty, statistical heterogeneity/inconsistency in treatment effects and/or incomplete information). This conclusion was based on several factors determined separately or in combination such as statistical significance of the observed difference (p-value), magnitude of the effect estimate, width of the 95% CIs, a minimal clinically important difference (MCID) for a given outcome, if known, and consistency in terms of effect direction and statistical significance. We ascertained the MCIDs for clinical/functional measures such as HHS (MCID range 7–10), OHS (MCID range 5–7), WOMAC score (MCID 8) and EQ-5D score (MCID 0.074) from previous empirical research evidence.¹⁰⁷^–¹⁰⁹

Evidence was considered conclusive in showing a ‘difference’ if a treatment effect estimate was statistically significant and the 95% CI included the MCID for any given outcome. Evidence was considered conclusive in showing ‘no difference’ if a treatment effect estimate was not statistically significant and the 95% CI around it was narrow enough to exclude the MCID for any given outcome. Alternatively, evidence was considered conclusive in showing ‘no difference’ if a treatment effect estimate was statistically significant but the 95% CI around it did not include the MCID for an outcome.

Evidence was considered inconclusive if a treatment effect estimate was not statistically significant and had 95% CIs that were sufficiently wide to include the MCID or any large effect size values. (Because for such studies the possibility of type II error cannot be ruled out, the observed non-significant results should not be interpreted as if there is no difference between the treatment effects. The lack of precision around the effect estimates may be a result of an insufficient sample size, a short follow-up period and/or low event counts, leading to inadequate study power and an increased chance of a type II error.)

The results were also considered inconclusive if there were partially missing data for continuous outcomes (e.g. reporting treatment arm-specific means without SDs; reporting only p-values for the between-treatment difference) or zero events for binary outcomes in both treatment arms. Evidence from studies showing inconsistent results, that is, significant effects but in opposing directions, was also classified as inconclusive.

Evidence from systematic reviews not reporting pooled results of RCTs (i.e. reporting only narrative syntheses), those reporting inappropriate pooling methods (e.g. indirect naive comparison of single group cohorts; pooling of studies of different design) or those reporting inconsistent summary findings was also considered inconclusive.

Industry submissions regarding effectiveness of treatments

The included clinical effectiveness evidence was compared with the evidence submitted by industry. These industry submissions will be discussed in Appendix 5.

Results of the review of clinical effectiveness

Search results

A total of 2469 records were identified through our searches of different sources. The removal of duplicates left 1522 records to be screened. Of these, 1281 records were excluded as irrelevant at title and abstract screening, leaving 241 potentially relevant records. Of these 241 full-text records screened, 146 were excluded, leaving 95 potentially relevant full-text records, of which 58 were additionally excluded based on publication date (published before 2008 unless a companion paper to an included study) and sample size (< 100 participants). The remaining 37 records were included in the review.¹⁰⁷^,¹¹⁰^–¹⁴⁵

The flow chart outlining the process of identifying relevant literature can be found in Figure 9.

FIGURE 9

Flow diagram of study identification for the clinical effectiveness review. a, A further 20 ongoing clinical trials were identified.

A list of records excluded at full-text screening with reasons for exclusion is provided in Appendix 6. The main reasons for exclusion were the comparison of different surgical/operative approaches (n = 42¹¹^,¹⁴⁶^–¹⁸⁶), study published before 2008 (unless a companion paper to an included study) (n = 33¹⁹^,³⁹^,¹⁸⁷^–²¹⁷) and study includes < 100 participants (n = 25⁸³^,²¹⁸^–²⁴¹).

A separate search in December 2012 of Clinical Trials.gov, Current Controlled Trials, the UKCRN Portfolio Database and the National Library of Medicine (NLM) Gateway Health Services Research Projects in Progress (HSRProj) database retrieved 511 potential trials or health services research projects. After screening titles and full records (if available), 20 clinical trials and one health services research project were identified, one of which¹³⁰ had already been identified from the original database search (see Appendix 7). The identified clinical trials were considered potentially relevant based on the available information. The trials were ongoing or completed since 2009 or their status was unknown.

The included 37 records represent 16 RCTs¹⁰⁷^,¹¹⁰^–¹³⁶^,¹⁴⁵ and eight systematic reviews.¹³⁷^–¹⁴⁴

Six of the 16 RCTs were represented by multiple publications:

Bjørgul et al.¹¹⁰^,¹¹¹
Engh et al.¹¹³^,¹¹⁴
Capello et al.,¹¹⁵ D’Antonio et al.¹¹⁶^,¹¹⁷ and Mesko et al.¹¹⁸
Corten et al.,¹¹⁹^,¹²² Laupacis 2002¹²⁰ and Bourne and Corten¹²¹
Costa et al.¹³⁰ and Achten et al.¹⁰⁷
Vendittoli et al.,¹³²^,¹³³^,¹³⁶ Girard et al.¹³⁴ and Rama et al.¹³⁵

These six RCTs are cited as follows: Bjørgul et al.¹¹⁰ Engh et al.,¹¹³ Capello et al.,¹¹⁵ Corten et al.,¹¹⁹ Costa et al.¹³⁰ and Vendittoli et al.¹³² Thirteen RCTs¹¹⁰^,¹¹²^,¹¹³^,¹¹⁵^,¹¹⁹^,¹²³^–¹²⁹^,¹⁴⁵ and five systematic reviews ¹³⁷^–¹⁴¹ comparing different types of primary THR and three RCTs¹³⁰^–¹³² and three systematic reviews¹⁴²^–¹⁴⁴ comparing primary THR with RS were finally included in the current review.

In the following sections we will begin by reporting the findings for the comparison of different types of THR and will then report the findings for the comparison between THR and RS.

Comparison of different types of total hip replacement

Study and participant characteristics

Randomised controlled trials

The study and participant characteristics of the 13 included RCTs¹¹⁰^,¹¹²^,¹¹³^,¹¹⁵^,¹¹⁹^,¹²³^–¹²⁹^,¹⁴⁵ are summarised in Table 9. More details can be found in Appendices 3 and 4. Briefly, four RCTs were conducted in the USA,¹¹³^,¹¹⁵^,¹²⁵^,¹²⁷ one in the UK,¹¹² one in Australia,¹²³ two in Norway¹¹⁰^,¹²⁶ two in the Republic of Korea¹²⁸^,¹²⁹ and three in Canada.¹¹¹^,¹¹⁹^,¹²⁴ A total of 3175 participants were randomised across the 13 RCTs, with the number of participants in each study ranging from 100¹²⁴^,¹²⁸^,¹⁴⁵ to 557.¹²³ The mean age of participants across the RCTs ranged from 45¹²⁹ to 72¹²³^,¹⁴⁵ years. The proportion of women across the studies ranged from 24%¹²⁹ to 73%.¹¹⁰ The length of follow-up of the studies ranged from 3 months¹¹⁹ to 20 years.¹¹⁹^,¹²⁹ The proportion of participants diagnosed with primary OA was reported for nine studies¹¹⁰^,¹¹²^,¹¹³^,¹¹⁵^,¹²³^,¹²⁴^,¹²⁷^–¹²⁹ and ranged from 14%¹²⁹ to 96%.¹²³

TABLE 9

Overall study characteristics across the 13 RCTs comparing different types of THR

Comparison of THR interventions in the included RCTs was based on differences in hip replacement implant components (e.g. acetabular cup/shell, femoral stem and femoral head) according to their composition,¹²⁷ design,¹¹⁵^,¹²⁸ bearing surface,¹¹³^,¹¹⁵^,¹²⁴^–¹²⁶^,¹⁴⁵ fixation method¹¹⁰^,¹¹²^,¹¹⁹^,¹²⁹ and component size.¹²³ Table 10 shows the distribution of RCTs across the THR comparison categories.

TABLE 10

Distribution of 13 RCTs according to basis of THR comparison

Reported outcomes across the 13 RCTs varied. Most RCTs reported HHS¹¹⁰^,¹¹²^,¹¹³^,¹¹⁵^,¹¹⁹^,¹²⁴^–¹²⁹^,¹⁴⁵ and risk of revision.¹¹²^,¹¹³^,¹¹⁵^,¹¹⁹^,¹²³^–¹²⁵^,¹²⁷^–¹²⁹ The follow-up of outcome assessments ranged from 3 months¹¹⁹ to 20 years.¹¹⁹^,¹²⁹ Outcomes reported in the included studies can be found in Appendix 8. A summary of the functional/clinical and quality of life measures/tools used is provided in Appendix 9.

Systematic reviews

The five included systematic reviews¹³⁷^–¹⁴¹ evaluated RCTs and non-RCTs of the clinical effectiveness of THR (see Appendix 3). The primary focus of these systematic reviews was the comparison of the effects of different cup fixation methods (cemented vs. cementless)¹³⁷^–¹³⁹ and materials used for implant articulations¹⁴⁰^,¹⁴¹ on postoperative clinical/functional scores (HHS, OHS)¹³⁷^,¹³⁸^,¹⁴⁰ and risk of revision rate.¹³⁸^,¹³⁹ Searches in these systematic reviews were undertaken between July 2007¹⁴¹ and June 2011.¹³⁹ Further details on specific outcomes reported in the included systematic reviews can be found in Appendix 8.

Risk of bias and methodological quality

Risk of bias in the randomised controlled trials

The risk-of-bias assessments for the 13 included RCTs comparing different types of THR are presented in risk-of-bias tables (see Appendix 2), the summary table (Table 11) and the risk-of-bias graph (Figure 10). Overall, four¹¹²^,¹¹⁹^,¹²³^,¹²⁸ of the 13 RCTs reported an adequate method for random sequence generation and eight¹¹⁰^,¹¹²^,¹¹⁹^,¹²³^–¹²⁶^,¹²⁹ reported adequate treatment allocation concealment (low risk of bias). A greater proportion of the RCTs were rated as having a low risk of performance and detection bias for objective (e.g. mortality, dislocation) than for subjective (e.g. patient-administered functional scores) outcomes (92–100% vs. 15–23%, respectively). For at least eight of the RCTs, it was unclear whether or not awareness of THR type would influence the ascertainment of clinical/functional scores by patients/study personnel (performance bias)¹¹⁰^,¹¹²^,¹¹³^,¹¹⁵^,¹²⁴^,¹²⁵^,¹²⁷^–¹²⁹^,¹⁴⁵ or outcome assessors (detection bias).¹¹²^,¹¹³^,¹¹⁵^,¹²⁴^–¹²⁶^,¹²⁸^,¹²⁹ Most RCTs failed to report the blinding status of the patients, study personnel and/or outcome assessors. Eight RCTs were judged as having a low risk of attrition bias. Five RCTs¹¹⁵^,¹²⁴^,¹²⁵^,¹²⁷^,¹²⁸ were judged as being at high risk for selective outcome and/or analysis bias. The risk of other biases (e.g. funding source, baseline imbalance in important characteristics, inappropriate analysis) for about one-third of the RCTs was judged to be high.

TABLE 11

Risk of bias summary for RCTs: review authors’ judgements about each risk of bias item – THR vs. THR

FIGURE 10

Risk of bias graph for RCTs: review authors’ judgements about each risk of bias item – THR vs. THR. ITT, intention to treat; NA, not applicable; PP, per protocol.

Methodological quality of the systematic reviews

The assessment of methodological quality of the five included systematic reviews comparing different types of THR is presented in Table 12 and the quality assessment sheets (see Appendix 2). Briefly, based on the number of methodological items that were satisfied, two systematic reviews¹³⁷^,¹⁴⁰ were judged to be of high quality (falling into the score range of 9–11) and two systematic reviews¹³⁸^,¹⁴¹ were of medium quality (falling into the score range of 5–8). The one remaining systematic review¹³⁹ was judged to be of low quality (falling into the score range of 0–4). The specific unmet methodological items related to inappropriate analysis, absence of duplicate study selection, limited literature search, failure to address issues of publication bias and no information on conflicts of interest.

TABLE 12

Methodological quality assessment summary for systematic reviews: THR vs. THR

Clinical effectiveness findings for the comparison of different types of total hip replacement

This section summarises the evidence from the 13 RCTs¹¹⁰^,¹¹²^,¹¹³^,¹¹⁵^,¹¹⁹^,¹²³^–¹²⁹^,¹⁴⁵ and five systematic reviews.¹³⁷^–¹⁴¹

The reported outcomes for this section were HHS (12 RCTs;¹¹⁰^,¹¹²^,¹¹³^,¹¹⁵^,¹¹⁹^,¹²⁴^–¹²⁹^,¹⁴⁵ three systematic reviews¹³⁷^,¹³⁸^,¹⁴⁰), WOMAC score (four RCTs¹¹⁹^,¹²⁴^,¹²⁹^,¹⁴⁵), McMaster Toronto Arthritis Patient Preference Questionnaire (MACTAR) score (one RCT¹¹⁹), Merle d’Aubigné and Postel hip score (one RCT¹¹⁹), University of California Los Angeles (UCLA) activity score (one RCT¹²⁹), OHS (one systematic review¹³⁷), SF-12 score (three RCTs;¹²⁴^,¹²⁵^,¹⁴⁵ one systematic review¹⁴⁰), risk of revision (10 RCTs;¹¹²^,¹¹³^,¹¹⁵^,¹¹⁹^,¹²³^–¹²⁵^,¹²⁷^–¹²⁹ five systematic reviews¹³⁷^–¹⁴¹), mortality (six RCTs¹¹⁰^,¹¹³^,¹¹⁹^,¹²³^,¹²⁸^,¹⁴⁵), femoral head penetration rate (three RCTs¹¹³^,¹²⁶^,¹⁴⁵), implant dislocation (seven RCTs;¹¹⁰^,¹¹²^,¹¹⁵^,¹²³^–¹²⁵^,¹²⁷ two systematic reviews¹³⁹^,¹⁴⁰), osteolysis (seven RCTs;¹¹²^,¹¹³^,¹¹⁵^,¹²⁵^,¹²⁷^,¹²⁹^,¹⁴⁵ two systematic reviews¹³⁸^,¹³⁹), aseptic loosening (five RCTs;¹¹²^,¹¹³^,¹¹⁹^,¹²⁴^,¹²⁷ one systematic review¹³⁹), femoral fracture (three RCTs¹¹³^,¹¹⁵^,¹²⁷), infection (four RCTs¹¹²^,¹²⁴^,¹²⁵^,¹²⁷) and deep-vein thrombosis (one RCT¹²⁵).

Neither the RCTs nor the systematic reviews reported any evidence for the following clinical effectiveness outcomes:

HOOS
LISOH
AAOS Hip and Knee Questionnaire
Arthritis Impact Measurement Scale (AIMS)
Nottingham Health Profile (NHP) questionnaire
EQ-5D
SF-36
time to revision
pain score [visual analogue scale (VAS)].

Summary results for the following outcomes are presented separately for RCTs and systematic reviews in the following sections. The outcomes of interest are as follows:

mortality
validated functional/pain (total scores): HHS, OHS, pain score (VAS), Merle d’Aubigné and Postel score, UCLA activity score, WOMAC, MACTAR, HOOS, LISOH, AAOS Hip and Knee Questionnaire, AIMS
health-related quality of life (total scores): EQ-5D, SF-36/SF-12, NHP
revision rate (risk of revision, mean time to revision)
femoral head penetration rate (measure of prosthesis movement)
adverse events (peri-/postprocedural complications): implant dislocation, infection, osteolysis, aseptic loosening, femoral fracture, deep-vein thrombosis, muscle weakness, nerve palsy and pulmonary embolism.

Functional/clinical measures

Twelve of the 13 included RCTs comparing different types of THR reported at least some results for the following functional scores measured at different postprocedure follow-up times: HHS (12 studies¹¹⁰^,¹¹²^,¹¹³^,¹¹⁵^,¹¹⁹^,¹²⁴^–¹²⁹^,¹⁴⁵) WOMAC score (four studies¹¹⁹^,¹²⁴^,¹²⁹^,¹⁴⁵), MACTAR score (one study¹¹⁹), Merle d’Aubigné and Postel score (one study¹¹⁹) and UCLA activity score (one study¹²⁹). None of these 12 studies reported measurements of the OHS.

Three of the five included systematic reviews comparing different types of THR reported at least some evidence on HHS¹³⁷^,¹³⁸^,¹⁴⁰ and OHS.¹³⁷ None of the three reviews reported any summary evidence for WOMAC, MACTAR, Merle d’Aubigné and Postel, and UCLA scores.

Harris Hip Score

Randomised controlled trials (n = 12)

Mean HHS at follow-up (range 6 months–10 years) did not differ between the following interventions: cup fixation (two studies;¹¹⁰^,¹¹² cemented vs. cementless), cup liner bearing surface (two studies;¹¹³^,¹⁴⁵ cross-linked polyethylene vs. non-cross-linked polyethylene), cup and femoral stem fixation (one study;¹¹⁹ cemented vs. cementless) and femoral head-on-cup liner bearing surfaces (one study;¹²⁶ cobalt–chromium/oxinium-on-polyethylene vs. cobalt–chromium/oxinium-on-cross-linked polyethylene) (Table 13). The pooled MD for HHS in our meta-analysis of two studies¹¹³^,¹⁴⁵ comparing cup liners made with cross-linked polyethylene compared with non-cross-linked polyethylene was 2.29 (95% CI –0.88 to 5.45), suggesting a non-significant benefit of cross-linked polyethylene cup liners (Figure 11).

TABLE 13

Harris Hip Score (range 0–100): RCTs

FIGURE 11

Harris Hip Score. XLPE, cross-linked polyethylene.

The evidence for the other comparisons based on cup shell design (porous coated vs. arc-deposited hydroxyapatite coated),¹¹⁵ femoral head bearing surface (oxinium vs. cobalt–chromium),¹²⁴ femoral head-on-cup liner bearing surfaces (ceramic-on-ceramic vs. metal-on-polyethylene or ceramic-on-polyethylene),¹¹⁵^,¹²⁵ femoral stem composition (cobalt–chromium vs. titanium),¹²⁷ femoral stem design (short metaphyseal fitting vs. conventional diaphyseal fitting)¹²⁸ and femoral stem fixation (cemented vs. cementless)¹²⁹ was judged to be inconclusive.

Systematic reviews (n = 3)

One systematic review¹⁴⁰ reported the pooled MD for the HHS (Table 14). Pooled estimates for the comparison between metal-on-metal and metal-on-polyethylene bearing surfaces at two different follow-up times were not consistent: at 2 years metal-on-metal bearing surfaces gave a significantly higher HHS than metal-on-polyethylene, but at > 2 years there was no significant difference between the two types of THR. The remaining two systematic reviews presented only narrative summaries.¹³⁷^,¹³⁸ In summary, for the HHS the systematic review-based evidence was judged to be inconclusive.

TABLE 14

Harris Hip Score (range 0–100): systematic reviews

Western Ontario and McMaster University Osteoarthritis Index score

RCTs (n = 4)

Results from all four RCTs reporting postprocedural mean WOMAC scores indicated statistically non-significant differences between the THR groups compared with respect to cup liner bearing surface (cross-linked polyethylene vs. non-cross-linked polyethylene),¹⁴⁵ cup and femoral stem fixation (cemented vs. cementless),¹¹⁹ femoral head bearing surface (oxinium vs. cobalt–chromium)¹²⁴ and femoral stem fixation (cemented vs. cementless)¹²⁹ (Table 15). The MD in WOMAC score of –0.12 (95% CI –7.58 to 7.34) observed for one RCT¹⁴⁵ suggested no difference between cross-linked polyethylene and non-cross-linked polyethylene cup liners. Results for WOMAC score in the remaining three RCTs¹¹⁹^,¹²⁴^,¹²⁹ were judged to be inconclusive because of incompletely reported data.

TABLE 15

Western Ontario and McMaster University Osteoarthritis Index (range 0–100): RCTs

Systematic reviews (n = 0)

No evidence was identified.

Other functional/clinical scores

Randomised controlled trials (n = 2)

In one RCT¹¹⁹ there was no difference in mean MACTAR scores (at 7 years: mean change difference 0.20, 95% CI –0.74 to 1.14) and Merle d’Aubigné and Postel scores (at 7 years: mean change difference –0.40, 95% CI –1.34 to 0.54) between patients who received a THR with cemented components and those who received a THR with cementless components (Tables 16 and 17). Results from one RCT¹²⁹ comparing femoral stem fixation (cemented vs. cementless) by the postoperative UCLA activity score were inconclusive because of incomplete data reporting (Table 18).

TABLE 16

McMaster Toronto Arthritis Patient Preference Questionnaire scores (range 0–30): RCTs

TABLE 17

Merle d’Aubigné and Postel scores (range 0–18): RCTs

TABLE 18

University of California Los Angeles activity scores (range 1–10): RCTs

Systematic reviews (n = 1)

The OHS was reported in one systematic review¹³⁷ comparing cup fixation methods (cemented vs. cementless), but the results were inconclusive (Table 19). This evidence was based on one RCT showing a statistically non-significant result.

TABLE 19

Oxford Hip Score (range 0–48): systematic review

Health-related quality of life

Only three RCTs¹²⁴^,¹²⁵^,¹⁴⁵ and one systematic review¹⁴⁰ reported any comparative evidence for measures of health-related quality of life.

Randomised controlled trials (n = 3)

In one RCT,¹⁴⁵ at follow-up times of 1 and 5 years, there was no difference in quality of life (on the mental and physical subscales of SF-12) between two groups of patients receiving cross-linked and non-cross-linked polyethylene cup liner bearings (Table 20).

TABLE 20

Short Form questionnaire-12 items (range 0–100): RCTs

In two other RCTs¹²⁴^,¹²⁵ there were no statistically significant differences in mean SF-12 mental and physical subscale scores between THR groups with different femoral head bearings (oxinium vs. cobalt–chromium)¹²⁴ and femoral head-on-cup liner articulations (ceramic-on-ceramic vs. ceramic-on-polyethylene).¹²⁵ This evidence was judged to be inconclusive (see Table 20).

Systematic reviews (n = 1)

One systematic review¹⁴⁰ reported evidence from two studies that compared SF-12 scores across different articulations (metal-on-metal vs. metal-on-polyethylene) (Table 21). The review did not provide any formal narrative or quantitative synthesis of the data. The evidence was considered to be inconclusive.

TABLE 21

Short Form questionnaire-12 items (range 0–100): systematic review

Revision

Evidence on revision was reported for 10 RCTs¹¹²^,¹¹³^,¹¹⁵^,¹¹⁹^,¹²³^–¹²⁵^,¹²⁷^–¹²⁹ and five systematic reviews.¹³⁷^–¹⁴¹

Randomised controlled trials (n = 10)

One RCT¹¹³ demonstrated a reduced risk of revision in patients who received cross-linked polyethylene compared with non-cross-linked polyethylene cup liners (RR 0.18, 95% CI 0.04 to 0.78) (Table 22). The evidence reported in the remaining nine RCTs showed statistically non-significant differences in the risk of revision between the different types of THR with wide CIs compatible with large size effects in both directions (i.e. favouring one or other of the treatment group). This evidence was deemed to be inconclusive (see Table 22).

TABLE 22

Revision rate: RCTs

Systematic reviews (n = 5)

Of the five systematic reviews reporting on revisions, two¹³⁷^,¹⁴¹ provided pooled estimates for risk of revision (Table 23). According to one review,¹⁴¹ at 9 years post surgery the recipients of zirconium femoral heads were at similar risk for revision as the recipients of non-zirconium femoral heads (three pooled RCTs; risk difference 0.02, 95% CI –0.01 to 0.06). This evidence was considered conclusive in detecting no difference in revision rates between these two types of femoral head.

TABLE 23

Revision rate: systematic reviews

In another review¹³⁷ the risk of revision at 10 years after surgery did not significantly differ between cemented and cementless cup fixation THR groups (pooled RR 0.15, 95% CI 0.02 to 1.18). This result was considered inconclusive given the uninformative 95% CIs. Evidence from the remaining three reviews¹³⁸^–¹⁴⁰ was of a narrative nature, which precluded us drawing conclusions (see Table 23).

Mortality

Evidence on mortality was reported for six RCTs.¹¹⁰^,¹¹³^,¹¹⁹^,¹²³^,¹²⁸^,¹⁴⁵ None of the five systematic reviews reported on mortality.

Randomised controlled trials (n = 6)

Evidence from the six RCTs¹¹⁰^,¹¹³^,¹¹⁹^,¹²³^,¹²⁸^,¹⁴⁵ that reported mortality was inconclusive because of non-significant RR estimates and wide 95% CIs (Table 24). For example, based on a pooled RR estimate of 1.39 (95% CI 0.78 to 2.49),¹¹³^,¹⁴⁵ 5- to 10-year post-surgery mortality rates in the group receiving cross-linked polyethylene cup liners were not significantly different from those in the group receiving non-cross-linked polyethylene cup liners (Figure 12). Similarly, the rest of the studies showed non-significant results for mortality between THR groups defined by femoral stem and/or cup fixation (cemented vs. cementless)¹¹⁰^,¹¹⁹ and femoral head size (36 mm vs. 28 mm).¹²³ One RCT¹²⁸ reported no deaths for both treatment groups receiving femoral stems of different design.

TABLE 24

Mortality rate: RCTs

FIGURE 12

Mortality. XLPE, cross-linked polyethylene.

Systematic reviews (n = 0)

No evidence was identified.

Femoral head penetration rate (measure of prosthesis movement)

Evidence on femoral head penetration rate was reported by three RCTs.¹¹³^,¹²⁶^,¹⁴⁵ None of the five systematic reviews reported this end point.

RCTs (n = 3)

Two RCTs¹¹³^,¹⁴⁵ demonstrated reduced femoral head penetration in favour of cross-linked polyethylene cup liners compared with non-cross-linked (conventional) polyethylene cup liners at 5–10 years of follow-up (Table 25). Similarly, in another RCT,¹²⁶ cross-linked polyethylene cup liners with either metal or oxinium femoral heads outperformed conventional polyethylene cup liners in reducing femoral head penetration during 2 years of follow-up.

TABLE 25

Femoral head penetration rate: RCTs

Systematic reviews (n = 0)

No evidence was identified.

Complications

Evidence on the occurrence/absence of complications was reported by nine RCTs¹¹²^,¹¹³^,¹¹⁵^,¹²³^–¹²⁵^,¹²⁷^,¹²⁹^,¹⁴⁵ and three systematic reviews.¹³⁸^–¹⁴⁰ In most studies¹¹²^,¹¹³^,¹¹⁵^,¹²³^–¹²⁵^,¹²⁹^,¹⁴⁵ the reported complications were classified as postoperative. In one RCT¹²⁷ some of the complications were classified as perioperative.

Implant dislocation

Randomised controlled trials (n = 7)

Evidence on the occurrence/absence of implant dislocation was reported by seven RCTs¹¹⁰^,¹¹²^,¹¹⁵^,¹²³^–¹²⁵^,¹²⁷ (Table 26). Our pooled estimate of two studies¹¹⁰^,¹¹² (Figure 13) indicated a reduced risk of implant dislocation at 10 years’ follow-up in recipients of cemented compared with cementless cups (pooled OR 0.34, 95% CI 0.13 to 0.89). Moreover, in one RCT¹²³ after 1 year of follow-up, the THR recipients with a larger size of femoral head experienced a lower risk of implant dislocation than those with a smaller size of femoral head (36 mm vs. 28 mm: RR 0.17, 95% CI 0.04 to 0.78). Evidence on implant dislocation for the remaining four RCTs¹¹⁵^,¹²⁴^,¹²⁵^,¹²⁷ was inconclusive because of incomplete data and non-significant results.

TABLE 26

Implant dislocation rate: RCTs

FIGURE 13

Implant dislocation.

Systematic reviews (n = 2)

Overall, no conclusions on implant dislocation could be drawn from the two systematic reviews, given the narrative evidence summary¹⁴⁰ and the mixed study designs¹³⁹ (Table 27). The pooled data from one review¹³⁹ was based on nine studies, most of which were not randomised and which indicated a lower risk of dislocation in the groups receiving cemented compared with cementless cups.

TABLE 27

Implant dislocation rate: systematic reviews

Osteolysis

Randomised controlled trials (n = 7)

Evidence on osteolysis was reported by seven RCTs¹¹²^,¹¹³^,¹¹⁵^,¹²⁵^,¹²⁷^,¹²⁹^,¹⁴⁵ (Table 28). In one RCT¹¹⁵ comparing different femoral head-on-cup liner bearing surfaces, recipients of ceramic-on-ceramic articulations had a reduced risk of osteolysis compared with recipients of metal-on-polyethylene articulations at 10 years post operation (RR 0.10, 95% CI 0.02 to 0.32).

TABLE 28

Osteolysis: RCTs

For seven RCTs, the evidence for osteolysis was inconclusive across the comparisons based on different methods of cup fixation (cemented vs. cementless),¹¹² cup liner bearing surface (cross-linked polyethylene vs. non-cross-linked polyethylene),¹¹³^,¹⁴⁵ cup shell design (porous coated vs. arc-deposited hydroxyapatite coated),¹¹⁵ femoral head-on-cup liner bearing surface (ceramic-on-ceramic vs. ceramic-on-polyethylene),¹²⁵ femoral stem composition (cobalt–chromium vs. titanium)¹²⁷ and femoral stem fixation (cemented vs. cementless).¹²⁹

Systematic reviews (n = 2)

Overall, no conclusions could be drawn on the incidence of osteolysis from two low-quality systematic reviews¹³⁸^,¹³⁹ comparing cemented and cementless methods of cup fixation, given the narrative evidence summaries, mixed study designs and inconsistent findings (Table 29).

TABLE 29

Osteolysis: systematic reviews

Other complications

Randomised controlled trials (n = 7)

Seven RCTs reported other complications such as aseptic loosening (Table 30),¹¹²^,¹¹³^,¹¹⁹^,¹²⁴^,¹²⁷ femoral fracture (Table 31),¹¹³^,¹¹⁵^,¹²⁷ infection (Table 32),¹¹²^,¹²⁴^,¹²⁵^,¹²⁷ and deep-vein thrombosis (Table 33).¹²⁵ This evidence was judged to be inconclusive because of low event or zero event counts and CIs indicating great uncertainty.

TABLE 30

Aseptic loosening: RCTs

TABLE 31

Femoral fracture: RCTs

TABLE 32

Infection: RCTs

TABLE 33

Deep-vein thrombosis: RCTs

Systematic reviews (n = 1)

Of other complications, only aseptic loosening was reported in one low-quality systematic review¹³⁹ (Table 34). Pooled data from 11 studies, most of which were not randomised, pointed towards a greater risk of aseptic loosening with cemented compared with cementless cups; however, the evidence is inconclusive given the lack of numerical data and the evidence synthesis being based on mixed study designs.

TABLE 34

Aseptic loosening: systematic review

Grading the overall quality of the evidence

The results for graded outcomes are presented in the evidence profile (Table 35). For a meaningful grading process and for consistency, only the THR comparison categories that included at least two studies (cup fixation – cemented vs. cementless and cup liner bearing surface: cross-linked polyethylene vs. non-cross-linked polyethylene) were selected. The overall quality for gradable outcomes across the THR comparison categories (cup fixation and cup liner bearing surface) was as follows: HHS – moderate grade; WOMAC score – not graded and very low grade, respectively; revision – very low grade; mortality – very low grade and low grade, respectively; femoral head penetration – not graded and moderate grade, respectively; and implant dislocation – high grade and not graded, respectively.

Summary conclusions for the comparison between different types of total hip replacement

Randomised controlled trials

The majority of the evidence comparing THRs was rated as inconclusive by us (Table 36). In three RCTs there was evidence of a reduced risk of implant dislocation with the use of a cemented cup (vs. a cementless cup)¹¹⁰^,¹¹² or a larger femoral head size (36 mm vs. 28 mm)¹²³ (high-grade evidence for the cup fixation comparison). In three other RCTs, patients who received a THR with a cross-linked polyethylene cup liner experienced a reduced (i.e. improved) femoral head penetration rate (moderate-grade evidence)¹¹³^,¹²⁶^,¹⁴⁵ and risk for revision (very low-grade evidence)¹¹³ compared with recipients of conventional polyethylene cup liners. In one RCT¹¹⁹ the use of cementless fixation of the cup and femoral stem (vs. cemented fixation) was associated with a better implant survival rate. Moreover, the recipients of ceramic-on-ceramic articulations (vs. metal-on-polyethylene) experienced a reduced risk of osteolysis.¹¹⁵ For half of the studies,¹¹⁰^,¹¹²^,¹¹³^,¹¹⁹^,¹²⁶^,¹⁴⁵ the mean post-THR clinical and functional scores (i.e. HHS, WOMAC score, SF-12 score, MACTAR score, Merle d’Aubigné and Postel score) measured at different follow-up times were similar between the different THR treatment groups (moderate-grade evidence for no difference in HHS across the comparisons for cup fixation and cup liner surface types).

TABLE 36

Summary of evidence regarding the differences between the different types of THR for each reported outcome: RCTs

Evidence from studies reporting the UCLA activity score,¹²⁹ mortality (very low-grade evidence),¹¹⁰^,¹¹³^,¹¹⁹^,¹²³^,¹²⁸^,¹⁴⁵ aseptic loosening,¹¹²^,¹¹³^,¹¹⁹^,¹²⁴^,¹²⁷ femoral fracture,¹¹³^,¹¹⁵^,¹²⁷ infection¹¹²^,¹²⁴^,¹²⁵^,¹²⁷ and deep-vein thrombosis¹²⁵ was all inconclusive. Also, the evidence reported in four studies was considered inconclusive for all outcomes (very low-grade evidence).¹²⁴^,¹²⁵^,¹²⁷^,¹²⁸ Results were considered inconclusive by us because of partial reporting (missing data for effect estimates, CIs, SEs, SDs, p-values), great uncertainty (wide CIs), zero event counts and/or inconsistency in estimates.

Systematic reviews

The majority of evidence from the five systematic reviews comparing different types of THR¹³⁷^–¹⁴¹ was considered inconclusive. This is because of unreported pooled results across RCTs (i.e. reporting only narrative syntheses), the reporting of inappropriate pooling methods (e.g. indirect naive comparison of single-group cohorts; pooling of studies of different design)¹³⁸^,¹³⁹^,¹⁴¹ or the reporting of inconsistent summary findings¹⁴⁰ (Table 37). The evidence from one review¹⁴¹ indicated no difference in the risk for revision between two different articulations of zirconium-on-polyethylene and non zirconium-on-polyethylene.

TABLE 37

Summary of evidence regarding the differences between the different types of THR for each reported outcome: systematic reviews

Other analysis

Publication bias

The extent to which publication bias could have influenced the pooled treatment effect estimates (i.e. degree of funnel plot asymmetry) could not be explored because of an insufficient number of data points in the forest/funnel plots.

Heterogeneity, subgroup effects and sensitivity analysis

The data reviewed from RCTs were too sparse and heterogeneous (in terms of different types of THR) to allow exploration of whether or not the relative effect of any given THR differed by study-level methodological characteristics (i.e. risk of bias, type of data analysis) or patient-related characteristics (i.e. age, sex or functional status). None of the included RCTs reported within-study subgroup effects of the different THRs compared.

Comparison between total hip replacement and resurfacing arthroplasty

Study and participant characteristics

Randomised controlled trials

Study and participant characteristics of the three included RCTs¹³⁰^–¹³² are summarised in Table 38. More details can be found in Appendices 3 and 4. Two RCTs¹³¹^,¹³² were conducted in Canada and one¹³⁰ was conducted in the UK. A total of 422 participants were randomised across the three RCTs, ranging from 104¹³¹ to 192¹³² participants. The mean age of participants ranged from 50¹³² to 56¹³⁰ years and the proportion of female participants across the studies ranged from 10.5%¹³¹ to 41%.¹³⁰ The total length of follow-up the studies ranged from 1 year¹³⁰ to 6 years.¹³² The proportion of participants diagnosed with primary OA was reported for two studies¹³⁰^,¹³² and was 33%¹³² and 95%.¹³⁰

TABLE 38

Overall study characteristics across the three RCTs comparing THR and RS

The three RCTs reported on clinical/functional scores (e.g. HHS, OHS, UCLA activity score, WOMAC score), health-related quality of life and risk of revision. Follow-up of outcome assessments ranged from 3 weeks¹³⁰ to 5 years.¹³² Outcomes reported in the included studies can be found in Appendix 8.

Systematic reviews

Three systematic reviews¹⁴²^–¹⁴⁴ were included that evaluated the clinical effectiveness of THR compared with RS with respect to postoperative clinical/function (HHS, WOMAC score), risk of revision, mortality and complications. Searches for these systematic reviews were undertaken between March 2008¹⁴⁴ and January 2010.¹⁴³ Evidence was synthesised from both RCTs and non-RCTs (see Appendices 3 and 4). Further details on specific outcomes reported (or not reported) in the included systematic reviews can be found in Appendix 8.

Risk of bias and methodological quality

Risk of bias in randomised controlled trials

The risk of bias assessment for the three included RCTs¹³⁰^–¹³² comparing THR with RS is presented in risk of bias tables (see Appendix 2), the summary table (Table 39) and the risk of bias graph (Figure 14). Overall, two studies¹³⁰^,¹³² reported an adequate method for random sequence generation and all three studies¹³⁰^–¹³² reported treatment allocation concealment (low risk of bias). Two of the three studies¹³⁰^,¹³² were rated as having a low risk of performance and detection bias for objective outcomes (e.g. revision, dislocation). The same two studies had a high risk of performance bias for subjective outcomes (e.g. patient-administered functional scores). Patients and study personnel were blinded in only one study.¹³¹ For two studies¹³⁰^,¹³² the influence of attrition bias on objective outcomes was judged to be of low risk. All three studies were judged as being at low risk for selective outcome and/or analysis bias. Risk of other biases (e.g. funding source, balance/imbalance in important characteristics, inappropriate analysis) for one of the three studies was judged to be high.¹³¹

TABLE 39

Risk of bias summary for RCTs: review authors’ judgements about each risk of bias item – THR vs. RS

FIGURE 14

Risk of bias graph for RCTs: review authors’ judgements about each risk of bias item – THR vs. RS. ITT, intention to treat; NA, not applicable; PP, per protocol.

Methodological quality of systematic reviews comparing total hip replacement with resurfacing arthroplasty

The assessment of methodological quality of the three included systematic reviews¹⁴²^–¹⁴⁴ is presented in Table 40 and the data extraction sheets (see Appendices 3 and 4). Given the number of methodological items that were satisfied, one of the three reviews was judged as being of high quality (falling into the score range 9–11),¹⁴³ one was judged as being of medium quality (falling into the score range 5–8)¹⁴² and one was judged as being of low quality (falling into the score range 0–4).¹⁴⁴ The specific unmet methodological items related to inappropriate analysis, failure to address issues of publication bias and no information on conflicts of interest.

TABLE 40

Methodological quality assessment summary for systematic reviews: THR vs. RS

Clinical effectiveness findings for the comparison between total hip replacement and resurfacing arthroplasty

This section summarises the findings from the three RCTs¹³⁰^–¹³² and three systematic reviews.¹⁴²^–¹⁴⁴

The reported outcomes for this section were the HHS (one RCT;¹³⁰ two systematic reviews¹⁴²^,¹⁴³), WOMAC score (two RCTs;¹³¹^,¹³² two systematic reviews¹⁴²^,¹⁴³), Merle d’Aubigné and Postel score (one RCT;¹³² one systematic review¹⁴²), UCLA activity score (two RCTs;¹³¹^,¹³² one systematic review⁴²), OHS (one RCT¹³⁰), health-related quality of life scales (SF-36 and EQ-5D; two RCTs¹³⁰^,¹³¹), risk of revision (one RCT;¹³² two systematic reviews¹⁴²^,¹⁴³), mortality (two systematic reviews¹⁴²^,¹⁴³), infection (two RCTs;¹³⁰^,¹³² one systematic review¹⁴²), aseptic loosening (one RCT;¹³² two systematic reviews¹⁴²^,¹⁴³), implant dislocation (two RCTs;¹³⁰^,¹³² one systematic review¹⁴²) and deep-vein thrombosis (two RCTs¹³⁰^,¹³²).

Neither the RCTs nor the systematic reviews reported any evidence for the following clinical effectiveness outcomes:

HOOS
LISOH
AAOS Hip and Knee Questionnaire
AIMS
MACTAR
NHP questionnaire
SF-12
time to revision
pain score (VAS)
femoral head penetration.

Summary results for the included outcomes are presented separately for RCTs and systematic reviews.

Evidence from randomised controlled trials

Functional/clinical measures

All three included RCTs comparing THR and RS reported some evidence for the following functional scores measured at 12–24 months after the procedure: HHS,¹³⁰ OHS,¹³⁰ WOMAC score,¹³¹^,¹³² UCLA activity score¹³¹^,¹³² and Merle d’Aubigné and Postel score.¹³²

In two RCTs there was no difference between the THR group and the RS group in mean postoperative OHS (12 months; MD –2.23, 95% CI –5.98 to 1.52),¹³⁰ Merle d’Aubigné and Postel score (24 months; MD 0.0, 95% CI –1.06 to 1.06)¹³² or WOMAC score (12 months; MD 2.20, 95% CI –1.57 to 5.97).¹³² One of these RCTs showed a significantly improved mean WOMAC score for the RS group compared with the THR group at 24 months of follow-up; however, this difference was not deemed to be clinically important (MD 3.30, 95% CI 0.01 to 6.58).¹³²

All three included RCTs comparing THR with RS reported some evidence for the following functional scores measured at 12–24 months after the procedure: HHS,¹³⁰ OHS,¹³⁰ WOMAC score,¹³¹^,¹³² UCLA score¹³¹^,¹³² and Merle d’Aubigné and Postel score.¹³²

Health-related quality of life

Two RCTs reporting quality of life measures showed statistically non-significant differences between the THR group and the RS group for both the SF-36 (p = 0.55 and p = 0.97 for mental and physical components, respectively)¹³¹ and the EQ-5D (MD –0.08, 95% CI –0.18 to 0.03).¹³⁰ These results were deemed to be inconclusive given the wide CI¹³⁰ and incomplete data reporting.¹³¹

Revision

The occurrence of implant revision was reported for only one RCT.¹³² There was no statistically significant difference between the THR group and the RS group for risk of revision at 6 months (RR 1.01, 95% CI 0.06 to 15.92), 24 months (RR 0.50, 95% CI 0.04 to 5.48) or 56 months (RR 0.54, 95% CI 0.10 to 2.91) post surgery. The 95% CIs around the effect estimates embraced the value 1.00 and therefore did not allow definitive conclusions to be made regarding the effectiveness of THR compared with RS.

Mortality rate

No evidence on mortality rates was identified from the RCTs.

Complications

Evidence on complications was reported for two RCTs.¹³⁰^,¹³² Meta-analysis of the data on risk of infection from the two RCTs indicated that, at 12–56 months post operation, THR recipients were at an increased risk of infection compared with RS recipients (pooled OR 7.94, 95% CI 1.78 to 35.40) (Figure 15). In addition, evidence on the differences between groups for the risk of deep-vein thrombosis (Figure 16; pooled OR 0.60, 95% CI 0.15 to 2.42),¹³⁰^,¹³² implant dislocation (Figure 17; pooled OR 3.97, 95% CI 0.79 to 19.90),¹³⁰^,¹³² wound complications (RR 4.01, 95% CI 0.92 to 18.18)¹³⁰ and aseptic loosening (RR not estimable)¹³² was judged to be inconclusive by us.

FIGURE 15

Risk of infection.

FIGURE 16

Risk of deep-vein thrombosis.

FIGURE 17

Risk of implant dislocation.

A summary of the results for the difference outcomes is presented in Table 41.

TABLE 41

Summary of the results for THR vs. RS: RCTs

Evidence from systematic reviews

Functional/clinical measures

Two of the three included systematic reviews comparing THR with RS reported evidence on HHS,¹⁴²^,¹⁴³ WOMAC score,¹⁴²^,¹⁴³ Merle d’Aubigné and Postel score¹⁴² and UCLA activity score¹⁴² (Table 42). The evidence was inconclusive because of the lack of pooled MD estimates for all four scores as well as the inconsistent results for the mean HHS and WOMAC score.

TABLE 42

Summary results for RS vs. THR: systematic reviews

Health-related quality of life

No evidence was identified.

Revision

Both systematic reviews¹⁴²^,¹⁴³ found a higher risk of revision in patients receiving RS than in those receiving THR. One review meta-analysed data from four RCTs that compared risk of revision in RS and THR recipients, reporting a pooled RR estimate of 2.60 (95% CI 1.31 to 5.15) (see Table 42).¹⁴²

Mortality

Overall, evidence on mortality reported by both systematic reviews¹⁴²^,¹⁴³ was inconclusive because of great uncertainty in the effect estimates and the variability around them. For example, the pooled RR for mortality in one review¹⁴³ for the comparison between RS and THR was 1.10 (95% CI 0.10 to 17.8) (see Table 42).

Failure rate

One systematic review¹⁴⁴ reported an indirect naive comparison analysis (i.e. analysis without a common comparator) based on data from 15 studies of RS and 19 studies of THR (see Table 42). The analysis suggested a reduced risk of failure in the RS recipients compared with the THR recipients (3.70% vs. 11.60%). Given the well-recognised problems with validity of such methodology, this evidence was judged to be inconclusive.

Complications

Evidence on complications was reported by both systematic reviews¹⁴²^,¹⁴³ (i.e. implant dislocation, infection and component loosening) (see Table 42). The evidence consistently showed an increased risk for component loosening¹⁴²^,¹⁴³ but a reduced risk for implant dislocation¹⁴² among RS recipients compared with THR recipients. One review,¹⁴² which provided the risk of infection pooled across three studies, was not informative enough to draw any conclusions (RR 2.25, 95% CI 0.61 to 8.31).

Grading the overall quality of the evidence

The results for graded outcomes are presented in the evidence profile (Table 43). The overall quality for gradable outcomes across the reviewed evidence comparing THR with RS was as follows: HHS – very low grade; WOMAC score – low grade; revision – very low grade; mortality – not graded because of absence of evidence; and implant dislocation – very low grade.

Summary conclusions for the comparison between total hip replacement and resurfacing arthroplasty

The majority of the evidence from three RCTs¹³⁰^–¹³² (Table 44) and three systematic reviews¹⁴²^–¹⁴⁴ (Table 45) comparing THR and RS was rated as inconclusive (RCTs – very low-grade evidence). Nevertheless, the evidence from two RCTs and two systematic reviews indicated a reduced risk of infection¹³⁰^,¹³² and implant dislocation¹⁴²^,¹⁴³ among RS patients compared with THR patients. However, the evidence from the same reviews also indicated that recipients of RS were at higher risk of revision and component loosening than patients who received a THR. In three RCTs¹³⁰^–¹³² the mean postoperative OHS, WOMAC score (low-grade evidence) and Merle d’Aubigné and Postel score were not different between patients who received THR and those who received RS.

TABLE 44

Summary of evidence regarding the differences between THR and RS for each reported outcome in the RCTs

TABLE 45

Summary of evidence regarding the differences between THR and RS for each reported outcome in the systematic reviews

There was inconclusive evidence on mortality (three RCTs¹³⁰^–¹³² and two systematic reviews¹⁴²^,¹⁴³), HHS (one RCT¹³⁰ and two systematic reviews¹⁴²^,¹⁴³), UCLA activity score (two RCTs¹³¹^,¹³² and one systematic review¹⁴²) and selected complications (i.e. infection, wound complication, deep-vein thrombosis; two RCTs¹³⁰^,¹³² and one systematic review¹⁴²).

Results from individual RCTs were considered inconclusive because of the partial reporting (missing data for effect estimates, CIs, SEs, SDs, p-values) and great uncertainty in the estimates (wide CIs). The findings from the systematic reviews were inconclusive because of great uncertainty in the pooled estimates (wide CIs), lack of reporting of pooled results across RCTs (i.e. only narrative synthesis reported) or inconsistent summary findings.

Other analysis

Publication bias

The extent to which publication bias could have influenced the pooled treatment effect estimates (i.e. degree of funnel plot asymmetry) could not be explored because of insufficient numbers of data points in the forest/funnel plots.

Heterogeneity, subgroup effects and sensitivity analysis

The reviewed data from RCTs were too sparse (only three RCTs) to allow an exploration of whether or not the effect of any given THR relative to RS differed by study-level methodological (i.e. risk of bias, type of data analysis) or patient-related (i.e. age, sex or functional status) characteristics. None of the included RCTs reported within-study subgroup effects of the THR relative to RS (or vice versa).

Overall summary of the clinical effectiveness findings

A large proportion of evidence appraised and summarised in this review has been judged to be inconclusive (very low to low grade) because of poor reporting, missing data, inconsistent results and/or great uncertainty in the treatment effect estimates. Nevertheless, results from most studies suggested significantly improved post-surgery scores for functional/clinical measures (HHS, OHS, WOMAC score, MACTAR score, Merle d’Aubigné and Postel score and SF-12 score), regardless of the type of THR or RS received. Some moderate- or lower-grade evidence indicated no difference for these measures between different types of THR (or between THR and RS) at different follow-up times. There was a reduced risk of implant dislocation for participants receiving a THR with a larger femoral head size (vs. a smaller head size) or with a cemented cup (vs. cementless; high-grade evidence). Moreover, the evidence suggested a reduced femoral head penetration rate (moderate grade evidence) and risk of implant revision (very low-grade evidence) for participants who received cross-linked polyethylene compared with conventional polyethylene cup liner bearings. Participants with ceramic-on-ceramic articulations (vs. metal-on-polyethylene articulations) experienced a reduced risk of osteolysis. Recipients of RS had a lower risk of infection than recipients of a THR. The evidence on mortality and other complications (e.g. loosening, femoral fracture and deep-vein thrombosis) was inconclusive (very low grade).

Limitations of the reviewed evidence and pitfalls in interpretation

The review findings warrant cautious interpretation given the limitations of the available evidence. Specifically, great uncertainty in the treatment effect estimates (i.e. wide 95% CIs) because of limited sample sizes and/or small numbers of events (especially for deaths, revisions and complications), as well as incomplete or poor reporting (e.g. missing effect measures, SDs/SEs, 95% CIs, p-values), rendered some of the reviewed evidence inconclusive. Moreover, reported evidence on complications was scarce. It is unclear whether this is because of the absence or rarity of these events or because of under-reporting. In light of poor reporting, it was not possible to explore contextual factors that might have influenced the study results. For example, lack of blinding of participants and study personnel may have led to systematic differences in caregiving or co-interventions across implant groups, which would independently influence outcome measures. Furthermore, none of the studies reported the between-group distribution of experience and skills of study personnel, including surgeons, physicians, physiotherapists and occupational therapists. Any imbalance between the study treatment groups in the above-mentioned factors would influence the participants’ prognosis apart from treatment.

The paucity of data did not allow the exploration of any variation in the treatment effect across the predefined subgroups of patients or methodological features of studies; likewise, the extent of publication bias could not be examined using funnel plots because of the small numbers of studies in the meta-analyses.

Scenario analysis around revision rates

We did not feel that it would be appropriate to use data from other clinical trials/registries to check our findings from the economic modelling because the clinical effectiveness studies that we identified concerned with revision rates were based on low counts and/or on small trials with a great deal of uncertainty. Overall, across the THR/THR and THR/RS comparisons, trials were often based on selective populations or interventions and provided data on revision rates that were inconclusive with often wide CIs.

Comparison of the results from randomised controlled trials and systematic reviews

The findings of the RCTs and systematic reviews could be compared only with regard to implant fixation methods (cemented vs. cementless) and femoral head-on-cup articulations (e.g. metal-on-metal vs. metal-on-polyethylene, ceramic-on-ceramic vs. metal-on-polyethylene, ceramic-on-ceramic vs. ceramic-on-polyethylene). In summary, the effect estimates for differences between the above-mentioned THR groups in risk of revision, mortality and complications reported in RCTs and systematic reviews were statistically non-significant and had wide uninformative CIs around them. Therefore, the evidence from both RCTs and systematic reviews was rendered as inconclusive because of the wide variability around the estimates and/or missing data. The reviewed evidence from RCTs suggested that there was no difference in postoperative HHS between cemented and cementless THR groups. The evidence for HHS reported in the included systematic reviews was ruled as inconclusive.

Our update search identified four new relevant systematic reviews.²⁴²^–²⁴⁵ Of these four systematic reviews, three compared the effectiveness of THRs using different articulations (metal-on-metal vs. metal-on-polyethylene),²⁴² implant fixation methods (cemented vs. cementless)²⁴⁵ or femoral stem coating materials (hydroxyapatite coated vs. non-hydroxyapatite coated)²⁴⁴ for risk of revision,²⁴⁵ HHS,²⁴²^,²⁴⁴^,²⁴⁵ mortality²⁴⁵ and complications.²⁴²^,²⁴⁵ The remaining systematic review compared THR with RS for risk of revision.²⁴³

Briefly, the review by Voleti et al.²⁴² presented a meta-analysis based on three RCTs and found no significant difference in HHS between the two articulations (metal-on-metal vs. metal-on-polyethylene) at 6 years post-surgery follow-up (pooled MD –1.05; p = 0.37). However, the risk of complications (dislocation, aseptic loosening, trochanteric/iliopsoas bursitis, femoral fracture and wound dehiscence) was greater in the metal-on-metal articulation group than in the metal-on-polyethylene articulation group (OR 3.37, 95% CI 1.57 to 7.26).²⁴² Similarly, another review²⁴⁵ presented a meta-analysis of seven RCTs showing a statistically non-significant difference in the mean postoperative HHS between the cemented and the cementless THR groups (pooled MD 1.12, 95% CI –1.17 to 3.41). In the same review, the meta-analytic estimates for risk of revision (six RCTs; pooled RR 1.44, 95% CI 0.88 to 2.36), mortality (five RCTs; pooled RR 1.06, 95% CI 0.73 to 1.52) and complications (four RCTs; pooled RR 1.54, 95% CI 0.21 to 11.03) between the cemented and the cementless groups of THR were also statistically non-significant. In the review by Li et al.,²⁴⁴ the postoperative pooled mean HHS was not statistically significantly different between the hydroxyapatite-coated and the non-hydroxyapatite-coated THR groups (four RCTs; pooled MD 3.04, 95% CI –4.47 to 10.54). The review by Pailhe et al.²⁴³ included a qualitative synthesis of three RCTs and eight non-RCTs, providing no definitive conclusions regarding the differences between THR and RS in terms of implant survival or risk of revision.

In summary, the findings from the newly identified systematic reviews²⁴²^–²⁴⁵ are in agreement with those of this review in showing no difference in postoperative HHS between the cemented and the cementless THR groups. Also in agreement with our findings, the pooled estimates for revision, mortality and complications were statistically non-significantly different between the groups, with sufficiently wide 95% CIs (because of low event counts and the small sample size of trials) that were compatible with a moderate-to-large effect size in either direction, rendering these findings inconclusive.²⁴⁵ Future well-designed RCTs need to corroborate or refute the finding of one systematic review²⁴² which suggests that there is an increased risk of complications in the metal-on-metal articulation group compared with the metal-on-polyethylene articulation group.

Strengths and limitations of the review

One of the strengths of this review is the fact that the reviewers used systematic and independent strategies to minimise bias in searching, identifying, selecting, extracting and appraising the relevant evidence. The search strategy was applied to multiple electronic sources. Apart from the limitations of the evidence itself, the scope of this review was limited to a predefined set of outcomes ascertained from recently published evidence (2008 or later); evidence from non-English publications was not included. Given the wide scope and large amount of evidence identified, we limited inclusion to studies with a sample size of at least 100 that were published since 2008. The rationale for the size limitation was that smaller studies tend to be underpowered to detect meaningful differences in outcomes.²⁴⁴^,²⁴⁵ The results of such studies are usually rendered inconclusive because of statistically non-significant estimates with wide CIs that include large treatment effect size values compatible with both a better and a worse outcome for any given treatment compared with the control treatment. Therefore, to minimise this problem we calculated the minimum sample size for a study that would have 90% power at a two-tailed test significance level of 0.05 to detect a MD of 10 on the HHS (we selected a SD of 15 based on external sources).¹⁰⁷^,²⁴⁶ This calculation yielded a total sample size of 100 participants.

Future research

Because the evidence for any given comparison of two types of THR was sparse (maximum of two trials), the observed findings need to be replicated in larger, long-term pragmatic trials comparing the same THRs with each other (or with RS) before more definitive conclusions or recommendation are made. Large, multicentre, long-term pragmatic trials would help to reliably evaluate relative treatment effects and their variation(s) across patients, as well as manufacturer-based subgroups, and maximise generalisability of the findings to larger populations in clinical practice settings. For a more complete picture to aid health-care policy decisions, trials are also needed to investigate the cost-effectiveness of alternative THR (or RS) techniques. Study authors are encouraged to specify MCIDs and power calculations for their primary outcome(s). This information would help in the interpretation of the study findings in both clinical and statistical terms. Better reporting of future trial results is also warranted.

Methods for the review of cost-effectiveness

Identification of studies

Initial scoping searches were undertaken in MEDLINE in October 2012 to assess the volume and type of literature relating to the assessment question. These scoping searches also informed development of the final search strategies (see Appendix 1). An iterative procedure was used to develop these strategies with input from clinical advisors and previous HTA reports (e.g. Vale et al.,¹⁹ de Verteuil et al.¹¹). The strategies have been designed to capture generic terms for arthritis, THR and RS. Searches were limited by the addition of economic and quality of life terms, which were selected with reference to previous research.²⁴⁷^,²⁴⁸

Searches were date limited from 2002 (the date of the most recent NICE guidance in this area²⁵). The searches were undertaken in November 2012 (for exact search dates see Appendix 1).

All bibliographic records identified through the electronic searches were collected in a managed reference database.

The following main sources were searched to allow for identification of relevant published and unpublished studies and studies in progress:

electronic bibliographic databases, including research in progress
references of included studies.

The following databases of published studies were searched: MEDLINE, MEDLINE In-Process & Other Non-Indexed Citations, EMBASE, Science Citation Index and Conference Proceedings Citation Index – Science, The Cochrane Library (specifically CDSR, CENTRAL, DARE, NHS EED and HTA database) and the Cost-effectiveness Analysis Registry (CEA Registry) (Articles).

The following databases of research in progress were searched: Current Controlled Trials, ClinicalTrials.gov, UKCRN Portfolio Database and NLM Gateway (HSRProj).

The reference lists of included studies were checked for additional studies.

Inclusion and exclusion criteria

The following inclusion and exclusion criteria were used to identify eligible studies reporting costs and/or effects of THR and RS useful for the economic model and decision analysis:

Inclusion criteria

Study design

RCTs.
Observational designs, cohort studies and registry-based studies.
Decision-analytic modelling studies.
Systematic reviews.
Meta-analyses.

Population

People with pain or disability resulting from end-stage arthritis of the hip for whom non-surgical management has failed.

Intervention

Elective primary THR.
Primary hip RS.

Comparator

Different types of primary THR compared with RS for people in whom both procedures are suitable.
Different types of primary THR compared with each other for people who are not suitable for hip RS.
Studies reporting costs or utilities without a comparator were also included.

Record

Full-text articles of completed or in-progress studies (protocols) published in English.

Outcomes

Cost-effectiveness outcomes were costs (cost of resources/devices, quantitative use of resources reported) and clinical effectiveness measures or utility measures (utility, EQ-5D score or QALYs), incremental cost-effectiveness ratios (ICERs), uncertainty measures, the ceiling willingness-to-pay (WTP) ratios and probabilities of cost-effectiveness from cost-effectiveness acceptability curves (CEACs).

Exclusion criteria

Non-English-language publications.
Abstract/conference proceedings, letters and commentaries.
Quality of life reported without utilities or QALYs.
Hip/knee data not reported separately.
Studies including only patients aged < 35 years.

Assessment of eligibility

All retrieved records were collected in a specialist database and duplicate records were identified and removed. An initial sift was undertaken by one reviewer to exclude clearly non-relevant records using the following exclusion criteria:

non-hip only papers
papers on animals
papers on children
papers on surgery for hip fracture only
non-English full-text papers.

This was followed by a formal sift by title and abstract by two reviewers using the inclusion/exclusion criteria. All identified relevant studies were read in full by two reviewers to identify eligible studies. Disagreement was resolved by a third reviewer. Reasons for exclusion of full-text papers were documented. The study flow was documented using a PRISMA diagram.⁹⁹

Data extraction

Data extraction was carried out in two stages by one reviewer using the data extraction sheets (see Appendix 4) and was checked by a second reviewer. Stage one considered all eligible studies and stage two considered studies assessed for usefulness for populating the economic model and decision analysis. Data extracted during stage one included the following:

study characteristics [i.e. author names, country, design, study aim, type of economic evaluation (cost-effectiveness analysis, cost–utility analysis), perspective (e.g. societal, health-care payer, patient) and study currency]
patient characteristics (i.e. number of participants, age, sex, OA)
outcomes [i.e. utilities, resources use and costs (both direct and indirect), ICERs]

Data extraction also included the overall study conclusion and a comment on the type of data included in the study that are relevant for the economic model. Studies were subsequently categorised by topic (THR or RS) and outcomes (costs or utilities) and cost studies were also ordered by year and date using the following hierarchy:

UK study published in 2008 or later
UK study UK study published before 2008
non-UK study published in 2008 or later
non-UK study UK study published before 2008.

Utility studies were ordered by study size and ‘patient-reported utility data’ (utilities derived prospectively using patient questionnaires or from databases that prospectively collected utilities) using the following hierarchy:

> 100 THR/RS patients and primary data
< 100 THR/RS patients and primary data
> 100 THR/RS patients and secondary data
< 100 THR/RS patients and secondary data.

Data extracted during the second stage considered the costs of THR (cost of the device, cost of surgical time/hospital stay), follow-up for successful THR, revision THR, follow-up for successful revision THR, RS (cost of the device, cost of surgical time/hospital stay), follow-up for successful RS, revision RS, follow-up for successful revision RS and utilities at baseline, post surgery up to 12 months and > 12 months post surgery. Information on definition of costs, source of costs, cost year and currency was also extracted.

Quality assessment

The key cost-effectiveness papers that were identified as relevant for the economic model were assessed by one reviewer and checked by a second reviewer using the Consensus on Health Economic Criteria (CHEC);²⁴⁹ cost-effectiveness studies with decision-analytic models were also assessed using the criteria of Philips et al.²⁵⁰

Results of the review of cost-effectiveness

Identification of studies

The flow chart outlining the process of identifying relevant literature can be found in Figure 18. The database search identified 1650 records, with an additional 14 records identified through screening of reference lists of included studies. Removal of duplicates left 913 studies to be screened for inclusion. The initial sift excluded 283 studies that were clearly not relevant, with a further 525 records excluded on title and abstract (κ = 0.89). The remaining 105 full-text articles were assessed for eligibility, of which 35 were excluded with reasons (see Appendix 13). This resulted in a total of 70 eligible articles,⁸^,¹¹^,¹⁹^,³⁷^,³⁸^,⁴⁰^,⁴³^,⁴⁴^,¹²⁰^,¹³⁰^,¹⁴⁸^,²⁰⁸^,²⁵¹^–³⁰⁸ in which 66 studies were reported and subsequently included in the review. Of these, 35 were observational studies with or without an economic analysis,³⁷^,²⁰⁸^,²⁵¹^,²⁵²^,²⁵⁴^,²⁵⁵^,²⁵⁸^,²⁶⁴^,²⁶⁶^–²⁶⁸^,²⁷⁰^–²⁷²^,²⁷⁴^,²⁷⁶^,²⁷⁷^,²⁷⁹^–²⁸²^,²⁸⁴^–²⁸⁷^,²⁹⁰^,²⁹⁴^,²⁹⁵^,²⁹⁷^,²⁹⁸^,³⁰⁰^–³⁰²^,³⁰⁵^,³⁰⁶^,³⁰⁸ 22 were economic analyses¹¹^,¹⁹^,³⁸^,⁴⁴^,¹⁴⁸^,²⁵³^,²⁵⁶^,²⁵⁷^,²⁵⁹^–²⁶²^,²⁶⁹^,²⁷³^,²⁷⁵^,²⁷⁸^,²⁸⁸^,²⁹¹^–²⁹³^,²⁹⁹^,³⁰⁴^,³⁰⁷ including three HTAs,¹¹^,¹⁹^,¹⁴⁸^,²⁹⁹ four were reviews⁸^,⁴³^,²⁸⁹^,²⁹⁶ (three non-systematic⁴³^,²⁸⁹^,²⁹⁶ and one systematic⁸), four were RCTs⁴⁰^,¹²⁰^,¹³⁰^,²⁶³^,²⁸³^,³⁰³ and one was a before-and-after trial.²⁶⁵ Study location covered the UK (n = 13⁸^,¹¹^,¹⁹^,³⁷^,³⁸^,⁴⁰^,⁴³^,⁴⁴^,¹³⁰^,²⁵¹^,²⁵²^,²⁵⁷^,²⁹²^,²⁹⁵^,²⁹⁹^,³⁰⁴), other European countries (n = 22²⁵⁶^,²⁵⁸^,²⁶⁰^,²⁶¹^,²⁶³^,²⁶⁵^,²⁶⁶^,²⁷¹^,²⁷⁴^–²⁷⁶^,²⁷⁸^,²⁸⁰^,²⁸¹^,²⁸³^,²⁸⁷^,²⁸⁸^,²⁹⁷^,²⁹⁸^,³⁰²^,³⁰³^,³⁰⁵^,³⁰⁶), North America (n = 21¹²⁰^,¹⁴⁸^,²⁰⁸^,²⁵³^–²⁵⁵^,²⁵⁹^,²⁶²^,²⁶⁷^,²⁶⁸^,²⁷⁷^,²⁸⁴^,²⁸⁵^,²⁸⁹^,²⁹¹^,²⁹³^,²⁹⁴^,²⁹⁶^,³⁰⁰^,³⁰¹^,³⁰⁷), Australia and New Zealand (n = 6²⁶²^,²⁶⁷^,²⁷¹^,²⁸⁰^,²⁸⁴^,²⁸⁸) and Asia (n = 4²⁷⁰^,²⁷²^,²⁷⁹^,³⁰⁸). Costs/resource use were reported by 30 studies,⁴³^,¹⁴⁸^,²⁵⁴^,²⁵⁶^,²⁶¹^,²⁶⁴^,²⁶⁷^,²⁶⁸^,²⁷⁰^,²⁷¹^,²⁷³^,²⁷⁴^,²⁷⁶^–²⁸⁰^,²⁸³^,²⁸⁵^–²⁹³^,³⁰⁰^,³⁰⁴^,³⁰⁸ utilities/QALYs by 15 studies¹²²^,²⁵¹^,²⁵²^,²⁵⁸^,²⁶⁶^,²⁷²^,²⁸⁴^,²⁹⁴^–²⁹⁸^,³⁰¹^,³⁰²^,³⁰⁵^,³⁰⁶ and both costs/resource use and utilities/QALYs by 21 studies.⁸^,¹¹^,¹⁹^,³⁷^,³⁸^,⁴⁰^,⁴⁴^,¹³⁰^,²⁰⁸^,²⁵³^,²⁵⁵^,²⁵⁷^,²⁵⁹^,²⁶⁰^,²⁶²^,²⁶³^,²⁶⁵^,²⁶⁹^,²⁷⁵^,²⁸¹^,²⁸²^,²⁹⁹^,³⁰³^,³⁰⁷ Seven of the 14 economic models reported transition probabilities.⁸^,¹¹^,¹⁹^,²⁵³^,²⁵⁹^,²⁶¹^,²⁷⁵^,²⁹⁹

FIGURE 18

Flow diagram of study identification for the cost-effectiveness review.

A separate search (December 2012) of the ClinicalTrials.gov, Current Controlled Trials, UKCRN Portfolio and HSRProj Databases retrieved 511 potential trials or health services research projects. After screening titles and full records (if available), eight clinical trials were identified as potentially relevant from the cost-effectiveness point of view (see Appendix 7). All were either ongoing or completed since 2009.

Description of included studies

Resurfacing arthroplasty

Evidence on RS was scarce with only five of the 66 included studies investigating hip RS (see Appendix 10). A 2012 UK RCT including 126 OA patients suitable for RS investigated the cost-effectiveness of RS compared with THR.⁴⁰^,¹³⁰ At the end of this 12-month trial small benefits of RS in terms of QALYs could be shown for a selected patient group, resulting in an ICER of £17,451 per QALY. This evidence was stronger for male than for female patients. In a comparison between ceramic-on-ceramic THR and RS at 3 months post surgery, evidence was not as strong, favouring THR over RS.²⁰⁸ However, longer-term follow-up in a study comparing hybrid THR with RS confirmed that, after 5 and 9 years, the revision rates for RS were lower than for hybrid THR (9.3% and 16.7% at 9 years post surgery, respectively) and patients were more active.²⁵¹^,²⁵²

A retrospective economic decision analysis of published data over a 30-year time horizon showed the cost-effectiveness of RS compared with THR for women aged < 55 years and men aged < 65 years.²⁵³ The main drivers of cost-effectiveness were the cost of the implant and length of hospital stay.⁴⁰^,²⁰⁸ However, Vale et al.¹⁹ reported in their HTA that RS would be cost-effective compared with THR only if RS revision rates could be shown to be 80–88% lower than revision rates for THR. They further concluded that RS could be cost-effective compared with ‘watchful waiting’ followed by THR or an extended period of ‘watchful waiting’ over 20 years.

Total hip replacement

The majority of studies investigated THR (n = 61) (see Appendix 10). Of these, five compared minimally invasive techniques with standard THR, reporting perioperative advantages, better short-term outcomes and reduced costs in favour of minimally invasive techniques.¹¹^,¹⁴⁸^,²⁵⁴^–²⁵⁶ However, Coyle et al.¹⁴⁸ concluded that there is little evidence of a difference between the two surgical techniques in the long term, mainly because of lack of data.

Ten of the THR studies focused on the comparison of different types of THR or specific components/brands of THR. Briggs et al.,³⁸ Davies et al.,⁴³ Fordham et al.²⁵⁷ and Hulleberg et al.²⁵⁸ assessed different brands of THR, Bozic et al.²⁵⁹ investigated alternative bearings including metal-on metal, ceramic-on-ceramic and ceramic-on-polyethylene and Laupacis et al.,¹²⁰ Marinelli et al.,²⁶⁰ Pennington et al.⁴⁴ and di Tanna et al.²⁶¹ compared cemented, cementless and hybrid THR more generally and reported inconsistent findings. The most recent economic model by Pennington et al.⁴⁴ used PROMs and showed that (1) cemented prostheses were the least costly type for THR, (2) hybrid prostheses were the most cost-effective and (3) cementless prostheses did not provide sufficient improvement in health outcomes to justify their additional costs. Similarly, Davies et al.⁴³ identified cemented prostheses as the least costly type of prosthesis in their review. However, they concluded that there is a lack of observed long-term prosthesis survival data and particularly limited up-to date evidence for the UK, which led them to call for more trials with longer-term follow-up. Cummins et al.²⁶² reported that use of antibiotic-impregnated bone cement can result in an overall decrease in costs. For more detail on the studies investigating the different types of THR see Appendix 12.

Patient management and rehabilitation was the focus of four studies,²⁶³^–²⁶⁶ which reported that perioperative management and rehabilitation programmes could improve patient outcomes and reduce costs.

The majority of the THR studies (30/61⁸^,³⁷^,²⁶⁷^–²⁸⁵^,²⁹⁸^,³⁰⁰^–³⁰²^,³⁰⁴^–³⁰⁸) assessed the costs and/or effectiveness of THR without a specific focus on a rehabilitation programme, surgical intervention, implant brand or prosthesis type. Of these, two US studies²⁶⁷^,²⁶⁸ concentrated on obese patients and reported that, even though operative costs are higher for obese patients, overall care costs and in-hospital outcomes for THR are comparable across all BMI groups. Eleven studies²⁶⁹^–²⁷⁹ evaluated the cost-effectiveness of THR in a specific country, and two multicentre studies²⁸⁰^,²⁸¹ aimed to assess the costs and outcomes of THR comparatively across a number of European member states. These two studies concluded that improvement after surgery is associated with high preoperative expectations. Stargardt et al.²⁸⁰ reported further that the total cost of treatment ranged from €1290 (Hungary) to €8739 (the Netherlands) and that the two main cost drivers were the cost of the implants and ward costs.

The overall findings of the cost-effectiveness studies were that (1) THR resulted in greater benefits than conservative treatment and (2) longer waiting times incurred greater costs and resulted in physical deterioration.²⁷¹^,²⁸²^,²⁸³ Further, agreement was reached on the long-term cost-effectiveness and sustained benefits of THR.³⁷^,¹²⁰^,²⁵⁷^,²⁷³^,²⁷⁵ However, Bozic et al.²⁸⁴ stated that, although THR improved quality of life, failed THR could lead to health states worse than chronic OA. Resource use might be increased as patients with a THR were shown to have a 10% increase in hospital stay compared with patients pre surgery.²⁸⁵

In contrast, two studies²⁸⁶^,²⁸⁷ that took a patient perspective rather than a health-care perspective concluded that out-of-pocket costs (including hospital costs, medication costs, rehabilitation costs, costs of health professional visits, costs of tests, costs of special equipment, costs of household alterations, use of private and community services and transportation costs that are not paid for by the health system), as well as use of health services, fell dramatically in the first year post surgery and that costs as well as resource use depended on pre-surgery health status.

Studies that focused on revision THR concluded that revision THR seems cost-effective but that it is resource intensive and has important implications for the allocation of health-care funding as the number of revisions is expected to increase with increasing demand for THR.²⁸⁸^–²⁹³ Vanhegan et al.²⁹² evaluated the costs associated with revision THR for different indications and reported that costs vary significantly by indication and that these variations were not reflected in the NHS tariffs. Durable implants and reduction in complications such as early dislocations have been suggested to be the solutions to reduce revision rates.²⁸⁹ However, the highest revision costs were reported for revision as a result of infection,²⁹² with infections caused by methicillin-resistant strains of bacteria (41% of periprosthetic joint infections) incurring significantly higher costs than infections with sensitive strains of bacteria.²⁹³

Four studies evaluated the usefulness of different outcome measures for measuring quality of life after THR or revision THR, which showed that there was no consistency in the tools used to assess quality of life. Feeny et al.²⁹⁴ reported that there is low agreement between certain outcome measures [SF-36, standard gamble, Health Utilities Index (HUI)-2 and HUI-3]. Dawson et al.²⁹⁵ and Jones et al.²⁹⁶ found that disease-specific measures reported larger changes than generic and utility measures. Ostendorf et al.²⁹⁷ recommended the use of the OHS and the SF-12 in the assessment of THR and the EQ-5D in situations in which utility values are needed.

Overall, studies confirmed the long-standing claims that THR and RS are cost-effective interventions for patients with OA of the hip. However, there is little evidence from long-term trials on differences between implant brands and types of prostheses. This limits the conclusions that can be drawn with regard to the most cost-effective type of prosthesis. Studies used different methodologies to estimate costs (reference costs vs. prices actually paid by health-care centres) and definitions of costs included varied extensively, and many studies did not clearly report how costs were broken down. Although this review concentrates on clinical outcomes measured by the EQ-5D, the included studies tended to use more than one outcome measure with great variation across studies. In summary, THR, more so than RS, is a widely researched topic and receives great interest in many countries; however, further research should set out to include an assessment of the cost-effectiveness of different treatments.

Core studies for the cost-effectiveness analysis

Ranking eligible cost studies by year and country (most recent UK studies on top) and utility studies by number of participants, 11 studies were identified that were potentially useful to inform the decision model. These included one HTA and a further four cost-effectiveness studies. The HTA assessed the cost-effectiveness of hip RS compared with watchful waiting and THR.¹⁹ The cost-effectiveness studies included three models that compared the cost-effectiveness of RS and THR,²⁵³ the cost-effectiveness of cemented, cementless and hybrid prostheses⁴⁴ and the cost-effectiveness of two particular prosthesis types,³⁸ respectively. One cost-effectiveness study was included that evaluated THR and RS but did not use a model.⁴⁰

The remaining six studies included partial economic evaluations that examined either costs or consequences but not both. Vanhegan et al.²⁹² reported costs for revision THR; Baker et al.²⁵² and Hulleberg et al.²⁵⁸ reported medium- to long-term utilities in small populations; Dawson et al.²⁹⁵ investigated quality of life post revision THR; and Bozic et al.²⁸⁴ measured health state utilities for chronic OA of the hip, successful primary THR, failed primary THR, successful revision THR, failed revision THR and chronically infected THR. Rolfson et al.²⁹⁸ evaluated the Swedish patient-reported outcomes data, reporting utilities for close to 35,000 THR patients.

Of the 11 studies three reported costs for THR,¹⁹^,⁴⁰^,⁴⁴ two reported costs for follow-up of successful THR¹⁹^,⁴⁰ and three reported costs of revision THR¹⁹^,⁴⁴^,²⁹² (see Appendix 12). Costs for RS were reported in three studies.¹⁹^,⁴⁰^,²⁵³ Of these, Edlin et al.⁴⁰ and Vale et al.¹⁹ also reported follow-up costs after successful RS and Bozic et al.²⁵³ reported costs for revision RS (see Appendix 11).

The studies reporting the most useful data on utilities following THR were those by Pennington et al.,⁴⁴ Rolfson et al.,²⁹⁸ Hulleberg et al.,²⁵⁸ Dawson et al.²⁹⁵ and Bozic et al.²⁸⁴ (see Appendix 14). Utilities for RS were reported in only three studies⁴⁰^,²⁵²^,²⁵³ (see Appendix 15). No data were identified on quality of life at > 12 months post RS or for post-revision RS. Follow-up costs reported by Vale et al.¹⁹ were the same for THR, RS and revision THR. Similarly, Bozic et al.²⁵³ made no distinction between revision following THR or RS in terms of costs.

Quality assessment of core studies

Of the 11 core studies, five²⁵²^,²⁵³^,²⁵⁸^,²⁹⁵^,²⁹⁸ provided useful information on EQ-5D utility scores only and one²⁹² provided useful data on costs only. These partial economic evaluations were not included in the critical appraisal.³⁰⁹

Five studies¹⁹^,³⁸^,⁴⁰^,⁴⁴^,²⁵³ were full economic evaluations and have been critically appraised using the CHEC-list.²⁴⁹ Of these five studies, four¹⁹^,³⁸^,⁴⁴^,²⁵³ included models. These studies have also been critically appraised using an adapted checklist for models developed by Philips et al.²⁵⁰

Table 46 shows that all studies met ≥ 16 of the 19 criteria in the CHEC-list.²⁴⁹

TABLE 46

Critical appraisal of the economic evaluation studies using the CHEC-list

Table 47 shows that all studies met ≥ 20 of the 32 criteria for economic models provided by Philips et al.²⁵⁰ All studies correctly reported the time horizon and the perspective of the model, and the inputs used within the models were consistent with the perspectives that were chosen. In terms of costs and outcomes used in the model, these were appropriate to the specific study data set that was used. All studies conducted subgroup analyses. None of the studies applied a half-cycle correction and no justification was given for its exclusion. In addition, Pennington et al.⁴⁴ did not provide a clear definition of all of the options under evaluation and Briggs et al.³⁸ did not specify the cycle length of the model.

TABLE 47

Critical appraisal of the economic models using an adapted checklist from Philips et al.

Core studies for the economic model

Of the 11 core studies, Edlin et al.,⁴⁰ Pennington et al.,⁴⁴ Vale et al.¹⁹ and Vanhegan et al.²⁹² provided data for the model in Chapter 9 (see Chapter 9 for the rationale of the selection procedure). This section will provide a brief description of the four core studies (Table 48).

TABLE 48

Characteristics of key cost-effectiveness studies informing the Markov model

Edlin et al.⁴⁰ reported a cost–utility analysis of RS compared with THR as part of a RCT of 126 adult patients with severe arthritis of the hip. Patients were randomised on a 1 : 1 basis between THR and RS. All RS patients received a Cormet™ (Corin Group, Cirencester, UK) metal-on-metal RS prosthesis. The THR patients received one of three types of prosthesis (ceramic-on-ceramic, metal-on-metal or metal-on-polyethylene) depending on the surgeon’s preference. The study took the NHS perspective and considered the within-trial period without any extrapolation past the 12-month trial period. The costs were reported in 2009/10 UK pounds and EQ-5D 3 Levels (EQ-5D-3L) outcomes were measured as secondary outcomes of the trial.

The study used Healthcare Resource Group v4 (HRG4) reference costs combined with NHS trust finance department list prices for implants and IPD on length of stay (LOS). Resource use data and personal costs were obtained from patient-reported data. Univariate sensitivity analyses included an assessment of the impact of using the cheapest THR type (metal-on-metal) for all THR operations. The study reported NHS and Personal Social Services (PSS) costs after 12 months by type of hip replacement (THR vs. RS), including the costs of the initial operation/care, subsequent inpatient, outpatient, primary and community care, aids and medication [THR £7217 (£1320); RS £6653 (£917)], as well as private and social costs. The main results of this analysis included a difference in QALYs of 0.033 in favour of RS after 12 months and a greater cost of RS (difference of £564) in the first 12 months following surgery. This resulted in an ICER for RS of £17,451 per QALY. These results are based on a short-term trial using a single RS prosthesis type. The study did not explore variation in costs within for each type of prosthesis used in THR. Variation in prosthesis costs by hospital, a change in current practice regarding the choice of THR implant, longer follow-up (including higher revision rates for RS than for THR) and use of different RS implants may affect the reported cost-effectiveness in this study.

Pennington et al.⁴⁴ used IPD from three data sources (national PROMs programme, the NJR and Hospital Episode Statistics) to compare the cost effectiveness of cemented, cementless and hybrid THR in adult patients with hip OA. They conducted a probabilistic Markov model over patients’ lifetime taking the NHS perspective. Implant prices were based on prices paid by English NHS centres. Costs for surgery plus hospital stay were taken from the literature and adjusted for LOS by prosthesis type and costs of revision were varied by reason for revision. Costs were reported as 2010/11 prices. The national data sources provided data on quality of life, LOS, rates of revision and rerevision and mortality for 30,203 patients.

Patients receiving different prosthesis types were matched by age, sex, number of comorbidities, ASA grade, BMI, deprivation, preoperative quality of life, surgeon experience and hospital type. The study reported data on the combined cost of the prosthesis, operating theatre and hospital stay, quality of life at 6 months post surgery and 5- and 10-year revision rates by prosthesis type, age group and sex. Overall, the study concluded that in patients aged 70 years the ICER for a hybrid prosthesis compared with a cemented prosthesis was £2100 for men and £2500 for women, with hybrid prostheses resulting in higher quality of life in all subgroups except women aged 80 years and cemented prostheses being the least costly option. The initial costs of a cementless prosthesis were highest in all subgroups. One of the limitations of the study was that it assumed that the observed quality of life at 6 months post surgery would remain unchanged for the patients’ lifetime. Furthermore, the study did not consider different revision rates by brand for the three different THR types.

Vale et al.¹⁹ undertook an assessment of the clinical effectiveness and cost-effectiveness of RS compared with watchful waiting (i.e. patient monitoring, drug-based treatment and supportive activities including physiotherapy), THR and other bone-conserving treatments. The HTA comprised a systematic review of the clinical effectiveness and cost-effectiveness of RS compared with any of the treatments above and a Markov model comparing the comparators from the NHS perspective for patients suitable for RS for up to 20 years. Cost data (in 2000/1 UK pounds) for THR and revision THR were taken from the literature (£4195 and £6027, respectively) and prostheses costs for RS were obtained from manufacturers. The model considered the lower of the two RS implant costs obtained (£1730 vs. £1890), resulting in an overall cost of £5515 for RS. LOS was estimated to be 10 or 12 days for THR and 8 or 10 days for RS. All other costs including use of the operating theatre and staff, radiography, outpatient visits and first-year follow-up costs were assumed to be the same for RS and THR. First-year follow-up included two outpatient visits with one radiography scan, totalling £118.74. Quality-of-life estimates considered pain levels and quality-of-life scores for mild, moderate and severe OA and were combined with revision and mortality rates to generate QALYs.

The main conclusion from the systematic review was that evidence from the literature on the effectiveness of RS was limited. Revision rates were reported to range between 0% and 14% over a 3-year follow-up period for RS compared with ≤ 10% over 10 years for THR. Patients with RS experienced less pain than patients managed by watchful waiting. Results from the model showed that RS was dominated by THR based on assumptions about revision rates for RS and the lower cost of THR. In subsequent sensitivity analyses the revision rates for RS had to be reduced to < 80–88% of the THR revision rates before RS was no longer dominated by THR. However, RS dominated watchful waiting within the 20-year follow-up. The study was limited because of the lack of data for the parameters of the model, particularly revision rates for different RS brands and effectiveness data for revision THR following RS. Furthermore, available data for RS originated from a small number of surgeons.

Vanhegan et al.²⁹² investigated the costs of 305 consecutive revision THRs by reason for revision in 286 patients, with a diagnosis of hip OA in 64% of revisions (n = 195). Revision THR was carried out in a single tertiary centre by one of three experienced surgeons. Costs were obtained from the finance department of the tertiary centre (in 2007/8 UK pounds) and included costs of the implant, materials and augmentation, use of the operating theatre and recovery room, the inpatient stay and laboratory tests, radiology, pharmacy, physiotherapy and occupational therapy. The study provided cost data on 13 different implants and data on resource use and costs by reason for revision (aseptic loosening, deep infection, periprosthetic fracture and dislocation).

The mean costs of revision for aseptic loosening, deep infection, periprosthetic fracture and dislocation were reported to be £11,897 (SD £4629), £21,937 (SD £10,965), £18,185 (SD £9124) and £10,893 (SD £5476), respectively. Higher complication rates as well as reoperation rates were associated with revisions for deep infection, periprosthetic fracture and dislocation. However, the numbers of revisions for these three indications were relatively small (n = 76, n = 24 and n = 11, respectively). Although the cost estimates can be assumed to be very accurate, they are limited by their lack of generalisability as they were based on one single tertiary centre. Furthermore, the study did not consider the cost of readmission for complications and other direct and indirect medical and social costs.

Summary of the cost-effectiveness evidence

We found that four¹⁹^,⁴⁰^,⁴⁴^,²⁹² of the 11 core cost-effectiveness studies were able to provide utility and cost data for the model. We assessed these using the checklists developed by Evers et al.²⁴⁹ and Philips et al.²⁵⁰ and found them to be of varying quality. All studies met ≥ 16 of the 19 criteria for economic analyses provided by Evers et al.²⁴⁹ and ≥ 20 of the 32 criteria for economic models provided by Philips et al.²⁵⁰

Methods for the review of registries

Identification of studies

Initial scoping searches were undertaken in MEDLINE in October 2012 to assess the volume and type of literature relating to national joint registries for hip replacement procedures. These scoping searches informed the development of the final search strategy (see Appendix 1). The registry search strategy was designed to capture the generic terms for ‘arthritis’, total hip replacement’ and ‘resurfacing arthroplasty’ in addition to the word ‘registry’. Searches were not date limited for the registry search and were undertaken in November 2012 (see Appendix 1). All bibliographic records identified through the electronic searches were collected in a managed reference database.

Inclusion and exclusion criteria

The following inclusion and exclusion criteria were used to identify eligible papers reporting joint replacement studies. The aim was to identify any studies that reported survival, utilities and outcomes that would potentially be useful for the economic model and survival analysis.

Inclusion criteria

Study design (registries)

Reporting of the results of joint replacement registry data collection.
All study designs.
Most recent publication in the series.

Population

People with pain or disability resulting from end-stage arthritis of the hip for whom non-surgical management has failed.

Intervention

Elective primary THR.
Primary hip RS.

Comparator

Different types of primary THR compared with hip RS for people in whom both procedures are suitable.
Different types of primary THR compared with each other for people not suitable for hip RS.

Record

Full-text articles of completed studies published in English and annual reports of national registries.

Outcomes

All reported outcomes.

Exclusion criteria

Abstract/conference proceedings, letters and commentaries.
Non-English-language publications.
< 1000 patients included in the registry study at the time of publication.
Hip/knee data not reported separately.

Assessment of eligibility

All retrieved records were collected in a referencing database and all duplicate records were identified and removed. The search returned 541 records. An initial sift was undertaken by one reviewer to exclude clearly non-relevant records using the following exclusion criteria:

non-hip only papers
papers on animals
papers on children
non-registry papers
papers on surgery for hip fracture only
non-English full-text papers.

This was followed by a formal sift of 329 papers by title and abstract by two reviewers using the inclusion/exclusion criteria. All identified relevant studies were read in full by one reviewer to identify eligible studies, with cross-checking by a second reviewer. Disagreement was resolved by a third reviewer. Reasons for exclusion of full-text papers were documented.

Data extraction

Data extraction was carried out on the final eligible papers by one reviewer in two stages. In stage one all eligible studies were considered and in stage two the studies that would provide useful input to the economic model and survival analysis were identified. Data extracted in stage one included the following:

author surname
publication year
country of registry
year that registry data were collected
type of registry data collected
size of the registry database
description of the patient population
results of key outcomes.

Data extraction of the overall aim and conclusion of each paper was also conducted to help identify inputs for the economic model and survival analysis. During stage two data extraction, registry studies were ordered by their publication year to ensure that the most recent data were extracted. Stage two extraction included the following additional exclusion criteria:

not the most recent paper in a publication series
not the most recent annual report from a national joint registry.

Results of the registry review

Identification of studies

The PRISMA flow diagram outlining the identification of registry studies is shown in Figure 19.⁹⁹ The database search for registry studies identified 538 publications, with an additional record identified through other sources. A total of 326 papers remained once duplicates were removed and these were screened for relevance. This process resulted in the exclusion of a further 230 papers, with 96 papers screened at title and abstract level. A further 47 studies were excluded with a reason provided (see Appendix 16), resulting in the inclusion of 49 studies in the review.¹⁵^,¹⁶^,⁴⁹^,²⁶¹^,²⁹⁸^,³¹⁰^–³⁵³

FIGURE 19

Flow diagram of study identification for the registry review.

Of the 49 papers included in the review, 44 were carried out in the following 10 countries: Japan (n = 1³¹⁰), Australia (n = 5³¹¹^–³¹⁵), the UK (n = 7¹⁵^,¹⁶^,³¹⁶^–³¹⁹^,³⁵³), Italy (n = 2²⁶¹^,³²⁰), Finland (n = 10³²¹^–³³⁰), Norway (n = 5³³¹^–³³⁵), the USA (n = 4⁴⁹^,³³⁶^–³³⁸), Denmark (n = 4³³⁹^–³⁴²), Sweden (n = 3²⁹⁸^,³⁴³^,³⁴⁴), and Slovakia (n = 1³⁴⁵). In addition, seven papers³⁴⁶^–³⁵² reported outcomes from multinational registries.

In stage two, 19⁴⁹^,²⁹⁸^,³¹⁰^,³¹²^,³¹⁶^,³¹⁹^,³²¹^,³²²^,³²⁴^–³²⁶^,³³¹^,³³²^,³³⁶^,³³⁸^,³⁴⁰^,³⁴²^,³⁴⁶^,³⁵² of the 49 papers were excluded (not most recent paper publication in a series or not most recent annual report from a national joint registry). Therefore, 30 papers were included in the narrative review, reflecting the most recent publication in a series from each particular registry for both THR and RS.¹⁵^,¹⁶^,²⁶¹^,³¹¹^,³¹³^–³¹⁵^,³¹⁷^,³¹⁸^,³²⁰^,³²³^,³²⁷^–³³⁰^,³³³^–³³⁵^,³³⁷^,³³⁹^,³⁴¹^,³⁴³^–³⁴⁵^,³⁴⁷^–³⁵¹^,³⁵³

Review of included studies following stage two exclusion

A narrative review of the included papers by intervention type (THR, RS) and country is given in the following sections. The 30 papers did not report similar patient populations, interventions, comparator groups or outcomes and therefore they are reported separately. For the purposes of the economic model and survival analysis, revision rate and implant survival were the key outcomes to be extracted.

Resurfacing arthroplasty

Eight registry studies provided evidence on RS.¹⁵^,³¹¹^,³¹³^,³¹⁸^,³²⁹^,³⁴⁹^,³⁵¹^,³⁵³ The majority of these studies investigated various comparisons between THR and RS. Table 49 provides a summary of the RS studies.

TABLE 49

Summary table of registry studies on RS

England and Wales

Jameson et al.³⁵³ conducted a retrospective cohort study and reported survival time to revision for RS procedures from 2003 to 2013. The study explored the risk factors independently associated with failure. Mean time to revision for each group was not reported. Data were taken from the NJR for England and Wales. The study concluded that women were at greater risk of revision than men (HR 1.30, 99% CI 1.01 to 1.76; p = 0.007), independent of age. Smaller femoral head components were also significantly more likely to require revision than medium (≤ 44 mm: HR 2.14, 99% CI 1.53 to 3.00; p < 0.001) or large heads (45–47 mm: HR 1.48, 99% CI 1.09 to 2.00; p = 0.001), as was surgery performed by low-volume surgeons (HR 1.36, 99% CI 1.09 to 1.71; p < 0.001).

McMinn et al.³¹⁸ examined mortality and revision rates among patients with OA undergoing THR, both cemented and uncemented procedures, or RS. The authors used data from the NJR database for the analysis [154,996 patients receiving cemented THR, 120,017 receiving uncemented THR and 8352 receiving RS (in particular, Birmingham hip RS)]. The baseline characteristics recorded include age (cemented mean 73.2 years, uncemented mean 66.7 years), sex (cemented: men 53,409, women 101,587; uncemented: men 50,529, women 69,488) and ASA grade. The analysis took into account the age of patients at primary surgery and their length of follow-up. Survival analysis was used to compare the cemented and uncemented procedures with adjustment for sex, age at primary surgery, ASA grade before the operation, complexity of the procedure and ‘both sides’ (surgery on both hips at the same time).

The multivariable survival analyses demonstrated a higher mortality rate for patients undergoing cemented THR than for those undergoing uncemented THR (adjusted HR 1.11, 95% CI 1.07 to 1.16). There was a lower revision rate for cemented procedures (unadjusted HR 0.53, 95% CI 0.50 to 0.57). The authors stated that these findings translate into small predicted differences in the population-averaged absolute survival probability at all time points. At 8 years post surgery the predicted probability of death in the cemented group was 0.013 higher (95% CI 0.007 to 0.019) than that in the uncemented group and the predicted probability of revision was 0.015 lower (95% CI 0.012 to 0.017). In multivariable analyses that included only men, there was a higher mortality rate in the cemented group and the uncemented group than in the RS group. RS had a similar revision rate to uncemented THR and both had a higher revision rate than cemented THR. The authors concluded that there was a small but significant increased risk of revision with uncemented THR compared with cemented THR, and a small but significant increased risk of death with cemented procedures.

A study from Smith et al.¹⁵ reported that, in women, RS resulted in worse implant survival than THR, regardless of head size. The predicted 5-year revision rates in 55-year-old women were 8.3% (95% CI 7.2% to 9.7%) for a 42-mm RS head, 6.1% (95% CI 5.3% to 7.0%) for a 46-mm RS head and 1.5% (95% CI 0.8% to 2.6%) for a 28-mm cemented metal-on-polyethylene stemmed THR. In men with smaller femoral heads, RS resulted in poor implant survival. Predicted 5-year revision rates in 55-year-old men were 4.1% (95% CI 3.3% to 4.9%) for a 46-mm RS head, 2.6% (95% CI 2.2% to 3.1%) for a 54-mm RS head and 1.9% (95% CI 1.5% to 2.4%) for a 28-mm cemented metal-on-polyethylene stemmed THR. Of the male RS patients, only 23% (5085/22,076) had a head size ≥ 54 mm. The authors concluded that RS resulted in similar implant survival to other surgical options in men with large femoral heads, and worse implant survival in other patients, particularly women.

Finland

Seppanen et al.³²⁹ analysed the risk of revision of 4401 RS procedures in the Finnish Arthroplasty Register compared with the risk of revision of 48,409 THRs performed during the same time period. The median follow-up time was 3.5 (range 0–9) years for RS and 3.9 (range 0–9) years for THRs. The study reported no statistically significant difference in risk of revision between RS and THR (risk of revision 0.93, 95% CI 0.78 to 1.10). The 4-year unadjusted Kaplan–Meier survival rate was 96% (95% CI 96% to 97%) for both the RS group and the THR group. Female patients had about double the risk of revision as male patients (risk of revision 2.0, CI 1.4 to 2.7).

Australia

Buergi et al.³¹¹ reported the use of RS based on the Australian National Joint Replacement Registry. A total of 7205 RS procedures were carried out between 1999 and 2005. The study concluded that, in the database, early revision rates were higher for RS than for THR. At 3 years, the revision rate after RS was 2.8% and that after THR was 2.0%.

Multinational

Corten et al.³¹³ compared RS survivorship reported by registries in Australia, England and Wales and Sweden with the failure of THR between 2006 and 2009. RS was associated with an overall increased failure rate compared with THR. The cumulative revision rates in the Australian registry were 3.7% for RS and 2.7% for THR. The 3-year revision rate for RS was 1.8% in England and Wales and 3.4% in Sweden.

A study using data from the Nordic Arthroplasty Registry compared the outcome of RS (n = 1638) with that of THR (n = 309,290) between 1995 and 2007.³⁴⁹ Results indicated that RS had a threefold increased revision risk compared with THR (RR 2.7, 95% CI 1.9 to 3.7). The difference was greater when RS was compared with cemented THR (RR 3.8, 95% CI 2.7 to 5.3). In men aged < 50 years the difference in revision risk was less (RS vs. THR: RR 1.9, 95% CI 1.0 to 3.9; RS vs. cemented THR: RR 2.4, 95% CI 1.1 to 5.3). However, the difference in revision risk was higher in women of the same age group (RS vs. THR: RR 4.7, 95% CI 2.6 to 8.5; RS vs. cemented THR: RR 7.4, 95% CI 3.7 to 15). In the Cox regression analysis, RS showed an increased risk of early aseptic revision compared with THR (RR 2.7, 95% CI 1.9 to 3.7; p < 0.001) and cemented THR (RR 3.8, 95% CI 2.7 to 5.3; p < 0.001).

The purpose of one recent study³⁵¹ was to evaluate the outcome of Birmingham hip RS using revision rates as reported in national joint replacement registry studies (categorised as from the UK, Australia, Asia and the USA). In total, 9806 RS procedures were analysed (reported as 44,294 observed component-years). The analysis revealed a significant difference in revisions per 100 observed component-years between studies authored by specialist clinical centres (defined by the number of patients treated, staff training 4and personal expertise) (0.27, 95% CI 0.14 to 0.40) and the register data (0.74, 95% CI 0.72 to 0.76). The average revision rate from register data was 3.41% (SD 1.79%).

Summary of resurfacing arthroplasty in registry studies

In summary, the eight studies that reported data from joint registries had mixed results. There is little evidence from long-term studies; generally, 5-year revision rates (or less) were reported. No two studies had the same comparators for analysis, which makes drawing conclusions from the eight studies difficult. The reported benefits of RS include preservation of the bone on the femoral side, greater physiological stress transfer at the proximal femur and lower risk of dislocation because of the larger femoral head compared with conventional THR.³⁵¹ However, the majority of studies included in this review found that RS had a higher revision rate than THR, particularly in female patients. Only one study found no significant difference between the procedures.³²⁹ No studies were included that reported RS implant survival as better than that for THR. One study of men only reported that RS had a similar revision rate to that of uncemented THR, but that both had a higher revision rate than that of cemented THR.³¹⁸

Total hip replacement

In total, 22 registry studies reported evidence on THR, with the majority of these studies investigating various types of THR surgery or demographic differences regarding the specific countries. Table 50 provides a summary of the THR studies.

TABLE 50

Summary table of registry studies on THR

England and Wales

Jameson et al.³¹⁷ reported survival time to revision following primary cemented THR in 34,721 THRs recorded in the NJR for England and Wales between 2003 and 2010. The authors reported the 7-year rate of revision for any reason as 1.70% (99% CI 1.28% to 2.12%). The overall risk of revision was independent of age, sex, ASA grade, BMI, surgeon volume, surgical approach, brand of cement/presence of antibiotic, femoral head material (stainless steel/alumina) and stem taper size/offset.

Smith et al.¹⁶ assessed the use of metal-on-metal bearing surfaces in the NJR between 2003 and 2011. They reported that metal-on-metal THR failed at high rates and that this was linked to head size. Analysis of the 31,171 metal-on-metal THRs showed that larger heads failed earlier (cumulative incidence of revision: 3.2%, 95% CI 2.5% to 4.1% for 28-mm heads and 5.1%, 95% CI 4.2% to 6.2% for 52-mm heads at 5 years in men aged 60 years). The 5-year revision rates in younger women were 6.1% (95% CI 5.2% to 7.2%) for 46-mm metal-on-metal THR and 1.6% (95% CI 1.3 to 2.1) for 28-mm metal-on-polyethylene THR. This finding contrasted with findings for ceramic-on-ceramic bearing surfaces, for which larger head sizes were associated with improved survival (5-year revision rate: 3.3%, 95% CI 2.6% to 4.1% for 28-mm heads and 2.0%, 95% CI 1.5% to 2.7% for 40-mm heads for men aged 60 years).

Denmark

Johnsen et al.³³⁹ examined the association between patient-related factors and the risk of initial, short-term and long-term failure after primary THR using data from the Danish Hip Arthroplasty Registry (n = 36,984). The study concluded that in Denmark between 1995 and 2002 male sex and comorbidity index score (Charlson Comorbidity Index) were strongly predictive of THR failure. The Charlson Comorbidity Index includes 19 disease categories, which correspond to International Classification of Diseases, Eighth Edition (ICD-8) and International Classification of Diseases, Tenth Edition (ICD-10) codes used in the national registries. A total of 1132 primary THRs were revised (3.1% of the 36,984 procedures) during this time period.

A more recent study from Denmark³⁴¹ evaluated short-term (0–90 days) and longer-term (up to 12.7 years) mortality of patients undergoing primary THR compared with mortality in the general population. THR patients (n = 44,558) was matched at the time of surgery with three people from the general population (n = 133,674). The findings suggest that there was a 1-month period of increased mortality immediately after surgery among THR patients (adjusted mortality rate ratio 1.4, 95% CI 1.2 to 1.7); however, overall short-term mortality (0–90 days) was significantly lower (adjusted mortality rate ratio 0.8, 95% CI 0.7 to 0.9). THR surgery was associated with increased short-term mortality in subjects aged < 60 years and among THR patients without comorbidity. Long-term mortality was lower among THR patients than in the general population control group (adjusted mortality rate ratio 0.7, 95% CI 0.7 to 0.7).

Sweden

Lazarinis et al.³⁴³ analysed patient data (n = 8043) on cementless cups with or without a hydroxyapatite coating that had been recorded in the SHAR between 1992 and 2007. The primary end point was revision because of aseptic loosening; the secondary end points were cup revision for any reason and cup revision because of infection. The results reported that the hydroxyapatite coating was a risk factor for cup revision because of aseptic loosening (adjusted RR 1.7, 95% CI 1.3 to 2). Age at primary THR of < 50 years, paediatric hip disease, a cemented stem and the cup brand were also associated with a statistically significantly increased risk of cup revision due to aseptic loosening.

A more recent study from Sweden reported data from 1999 to 2010.³⁴⁴ The authors investigated revision rates of monoblock cups used in primary THR that were registered in the SHAR. Kaplan–Meier and Cox regression analyses with adjustment for age, sex and other variables were used to calculate survival rates and adjusted HRs of the revision risk for any reason. The cumulative 5-year survival rate with any revision as the end point was 95% (95% CI 91% to 98%) for monoblock cups and 97% (95% CI 96% to 98%) for modular cups (p = 0.6). The adjusted HR for revision of monoblock cups compared with modular cups was 2 (95% CI 0.8 to 6, p = 0.1). The authors concluded that there was not any clinically relevant difference in risk of revision between monoblock and modular acetabular cups in the medium term.

Australia

Luo et al.³¹⁴ analysed the effect of the AOANJRR on the cost of joint arthroplasty through identification of implants with higher than expected failure rates between 2003 and 2007. A total of 242,454 primary joint arthroplasties were performed in Australia at a cost of AU$4.1B. The authors state that if the poor-performing THRs had been conducted using average longevity designs, the number of THR revisions could have been reduced by 47%.

One study³¹⁵ investigated the relationship between the bearing surface and the risk of revision because of dislocation using 110,239 records in the AOANJRR from 1999 to 2007. The authors reported that 2621 (2.4%) primary THRs were revised for any reason; 862 (0.78%) THRs were revised because of dislocation. Ceramic-on-ceramic bearing surfaces had a lower risk of revision for dislocation than metal-on-polyethylene and ceramic-on-polyethylene bearing surfaces at 7 years’ follow-up. The authors reported a significantly higher rate of revision for dislocation with ceramic-on-ceramic bearing surfaces than with metal-on-polyethylene bearing surfaces when smaller head sizes (≤ 28 mm) were used in younger patients (< 65 years) (HR 1.53, p = 0.041) and also with larger head sizes (> 28 mm) in older patients (≥ 65 years) (HR 1.73, p = 0.016).

Italy

Di Tanna et al.²⁶¹ report data from the Emilia-Romagna Regional Registry on Orthopaedic Prosthesis from 2000 to 2007. This registry collects information on all orthopaedic interventions performed in Emilia-Romagna, Italy. The study assessed the cost-effectiveness of cementless prostheses compared with hybrid prostheses in 41,199 THRs and concluded that there were differences in the revision rate and impact on costs between the two groups. The authors concluded that, considering two cohorts of 100 subjects, 243 revisions would be expected in the cementless group compared with 300 in the hybrid group. This was equal to a 19% difference and a number needed to treat of 18.

A second paper reporting on the Emilia-Romagna Regional Registry on Orthopaedic Prosthesis³²⁰ conducted survival analysis using the Kaplan–Meier method to analyse survival rates for THRs in Italy between 2000 and 2006 (35,042 THRs, 5878 revisions). The reported cumulative survival rate for THR at 7 years was 96.8% (95% CI 96.4% to 97.1%). Multivariate analysis demonstrated that THR survival was affected by pathology, for example the presence of RA. Women comprised 66.4% of patients and > 54.0% of patients were overweight (BMI > 25 kg/m²). Mean age at primary surgery was 66.9 years (range 16–101 years) and at revision was 70.0 years (range 22–98 years).

Finland

Eskelinen et al.³²³ evaluated the population-based survival of cementless THR in patients aged < 55 years using data from the Finnish Arthroplasty Register. All cementless stems studied showed a survival rate of > 90% at 10 years.

Makela et al.³²⁷ analysed population-based survival rates for cemented and cementless THRs in patients aged ≥ 55 years in Finland between 1980 and 2006. The 15-year survival rate for cementless THR (80%) was comparable with the rates for the cemented groups [86% in cemented group 1a (cemented, loaded-taper stem combined with a cemented, all-polyethylene cup) and 79% in cemented group 2 (a cemented, composite-beam stem with a cemented, all-polyethylene cup)] when revisions for any reason were used as the end point. The authors concluded that both cementless stems and cementless cups, analysed separately, had a significantly lower risk of revision for aseptic loosening than cemented implants.

The same authors reported revision outcomes in primary OA.³²⁸ The 15-year survival rate of group 1 cementless THR (implants with a cementless, straight, proximally circumferentially porous-coated stem and a porous-coated press-fit cup) performed in 1987–96 (62%, 95% CI 57% to 67%) and group 2 cementless THR (implants with a cementless, anatomic, proximally circumferentially porous-coated stem, with or without hydroxyapatite, and a porous-coated press-fit cup with or without hydroxyapatite) performed during the same time period (58%, 95% CI 52% to 66%) was worse than that of cemented THR (71%, 95% CI 62% to 80%), although the difference was not statistically significant. The risk of revision for aseptic loosening of group 1 cementless THR (0.49, 95% CI 0.32 to 0.74) was lower than that of cemented THR (p = 0.001).

Slovakia

One study³⁴⁵ reported findings from Slovakia from 2003 to 2010, including a total of 4970 primary THRs and 457 revisions. Cement was used for all components in 35.45% of all arthoplasties, 53.25% were cementless and 11.28% were hybrids. By 2010, the revision rate reached 9.20%, representing an annual increase of 1.1%. The revision rate in the whole observed period from 2003 to 2010 was 9.15%.

Norway

Espehaug et al.³³² studied differences by county and regional health authority over a 20-year period (1989–2008) using data from the Norwegian Arthroplasty Register. The authors observed an increase in the number of THR operations, from 109 operations per 100,000 inhabitants in the years 1991–5 to 140 in 2006–8. Variations were found across the four regions studied.

A second study from Norway³³³ reported the risks of revision after THR during a 21-year period among hip replacements reported to the Norwegian Arthroplasty Register. The risks of revision during the time periods 1993–7, 1998–2002 and 2003–7 were compared with that of the reference period 1987–92. There was an overall reduced risk of revision in the time periods 1993–7, 1998–2002 and 2003–7 compared with the risk of revision in the reference period. The improved results were due to a reduction in the incidence of aseptic loosening of the femoral and acetabular components in all time periods and in all subgroups of prostheses. The best results were obtained with the use of cemented prostheses. Analyses of revision for any cause were carried out for all prostheses together and separately for cemented, hybrid, reverse hybrid and cementless prostheses. The major cause of revision was aseptic loosening of one or both implant components.

One study used data from the Norwegian Arthroplasty Register (data from 1987 to 2008)³³⁵ to compare the difference in risk of THR revision from infection and change in risk over time. Data was from 1987 to 2008.³³³ Of the 84,492 THRs, 534 (0.6%) were revised for infection. Women had a significantly lower risk of revision for infection than men (RR 0.41, 95% CI 0.34 to 0.48). The cumulative 5-year survival rate was 99.5% in RA patients and 99.4% in OA patients (RR 0.98, 95% CI 0.65 to 1.48 for RA vs. OA patients) with revision for infection as the end point. The risk of revision for infection from 6 years postoperatively was higher in patients with RA.

USA

One study reported registry data from the USA.³³⁷ It examined patient and surgical factors associated with deep surgical site infection (SSI) following THR using data from the Kaiser Permanente Total Joint Replacement Registry between 2001 and 2009. A total of 30,491 THRs were included in the analysis, of which 17,474 (57%) were performed on women. The incidence of SSI was 0.51% (155/30,491), equating to a total of 155 deep SSIs, which occurred at a mean of 72 days (median 28, SD 93.3 days) after the procedure. Patient factors associated with SSIs included female sex, obesity and ASA grade ≥ 3.

Multinational

Sadoghi et al.³⁵⁰ compared primary THRs between different countries in terms of THR number per inhabitant, age and procedure type and compared survival curves including all THRs using data from nine registries. On average, the annual number of primary THRs per 100,000 inhabitants was found to be 133 for all ages, 26 for those aged < 55 years, 269 for those aged 55–64 years, 520 for those aged 65–74 years and 531 for those aged ≥ 75 years. The fixation method varied by country, for example in Sweden 67% of THRs are cemented whereas in Emilia-Romagna (Italy) 89% are cementless. Cementless fixation was more popular in Australia, Denmark, Emilia-Romagna, New Zealand and Portugal (50%) and cemented fixation was used more in Sweden and Norway (50%). Cemented and cementless fixations were used equally in England and Wales and Slovakia. The use of hybrid fixation was more uniform across countries and ranged from 8% in Portugal to 34.5% in New Zealand. Denmark showed the lowest survival rate within the first 15 years; however, THRs performed between 2006 and 2009 in Norway had similar low survival rates. All survival curves calculated in the study (except for Danish data) varied by < 1% within the first 9 years. Multivariate or subgroup analyses were not performed to compare the survival curves. The use of primary RS was not reported separately in the registries from Norway and Slovakia. Use of RS in the other countries varied from 1% in Portugal to between 2% and 3% in Denmark, Emilia-Romagna, New Zealand and Sweden to approximately 5% and 6% in Australia and England and Wales, respectively.

Graves et al.³⁴⁷ performed an investigation of the use of metal-on-metal THRs in the National Arthroplasty Registries of Australia, England and Wales and New Zealand. All registries reported an increased revision rate associated with larger femoral head size when metal-on-metal bearing surfaces were used.

The Nordic Registry includes the joint registries of Denmark, Sweden and Norway. One study³⁴⁸ aimed to compare demographics, choice of implant, fixation techniques and results between the countries, including a total of 280,201 THRs performed between 1995 and 2006. The study reported that 9596 THRs (3.4%) had later been revised. RS accounted for ≤ 0.5% of procedures in all countries. The 10-year survival rate was 92% (95% CI 91.6% to 92.4%) in Denmark, 94% (95% CI 93.6% to 94.1%) in Sweden and 93% (95% CI 92.3% to 93.0%) in Norway.

A second study reporting data from the Nordic Registry compared the survival of cemented THRs with metal femoral heads made from various materials (cobalt–chromium, aluminium and zirconium).³³⁴ The study reported prosthesis survival and relative revision risks adjusting for age, sex and diagnosis between 1987 and 2010. In total, 132,000 cases of THR were included in the analysis. At 12 years the survival rate was 88.1% for cobalt–chromium heads and 74.8% for zirconium heads. Aluminium femoral heads provided no advantage over cobalt–chromium heads for prosthesis survival. The authors concluded that cemented polyethylene THR with aluminium heads had a similar survival rate as the same THR with ceramic-on-ceramic heads when any revision was the end point.

Summary of the total hip replacement studies

The 22 THR studies reported the analysis of registry data from nine countries. These studies examined various aspects of the THR procedure, including revision and survival rates; different implants and combinations of implant bearing surfaces; and outcome measures such as reason for failure and patient differences associated with failure. Four of the 22 THR studies used registry data from multinational databases. Sadoghi et al.³⁵⁰ provided an extensive review of registries worldwide. They stated that fixation methods varied by country, with the cemented THR being most popular in Sweden and Norway and the cementless THR being most common in Emilia-Romagna (Italy) but also popular in Australia, Denmark, New Zealand and Portugal. Cemented and cementless fixations were used equally in England and Wales and Slovakia. In terms of survival rates, THRs carried out in Denmark showed the lowest survival rate within the first 15 years.

Core articles included in the economic model and survival analysis

The prioritisation of the eligible studies resulted in the identification of 30 papers that were deemed to be potentially useful for the economic model and survival analysis. The final number of core papers that helped to inform the survival analysis in this report was three.¹⁵^,¹⁶^,³¹⁸ This was in addition to the annual reports from the Swedish Arthoplasty Registry,⁹⁶ the NJR³⁶ and the AOANJRR,⁹⁵ which were used for comparison of survival analysis methods.

Summary of the registry evidence

Thirty papers were identified in the registry review and were included in the narrative synthesis. Eight of the studies reported registry data investigating the use of RS for the treatment of arthritis. Five of the studies combined findings in three individual countries and three studies used multinational data. The final number of THR papers included was 22. These papers reported various aspects of the THR procedure, including revision and survival rates; however, the time periods over which the analyses were carried out varied between 3 years and 15 years. Comparison of different implants and combinations of implant bearing surfaces was also conducted. Finally, additional outcome measures analysed included reason for failure (e.g. infection) and patient/demographic differences associated with failure.

Copyright © Queen’s Printer and Controller of HMSO 2015. This work was produced by Clarke et al. under the terms of a commissioning contract issued by the Secretary of State for Health. This issue may be freely reproduced for the purposes of private research and study and extracts (or indeed, the full report) may be included in professional journals provided that suitable acknowledgement is made and the reproduction is not associated with any form of advertising. Applications for commercial reproduction should be addressed to: NIHR Journals Library, National Institute for Health Research, Evaluation, Trials and Studies Coordinating Centre, Alpha House, University of Southampton Science Park, Southampton SO16 7NS, UK.

Included under terms of UK Non-commercial Government License.

Bookshelf ID: NBK273967

Contents

< Prev Next >

PubReader
Print View
Cite this Page
Clarke A, Pulikottil-Jacob R, Grove A, et al. Total hip replacement and surface replacement for the treatment of pain and disability resulting from end-stage arthritis of the hip (review of technology appraisal guidance 2 and 44): systematic review and economic evaluation. Southampton (UK): NIHR Journals Library; 2015 Jan. (Health Technology Assessment, No. 19.10.) Chapter 4, Assessment of evidence.
PDF version of this title (23M)

Assessment of evidence - Total hip replacement and surface replacement for the t...
Assessment of evidence - Total hip replacement and surface replacement for the treatment of pain and disability resulting from end-stage arthritis of the hip (review of technology appraisal guidance 2 and 44): systematic review and economic evaluation
List of abbreviations - Anaesthetic–analgesic ear drops to reduce antibiotic con...
List of abbreviations - Anaesthetic–analgesic ear drops to reduce antibiotic consumption in children with acute otitis media: the CEDAR RCT
List of abbreviations - Bisphosphonate alternative regimens for the prevention o...
List of abbreviations - Bisphosphonate alternative regimens for the prevention of osteoporotic fragility fractures: BLAST-OFF, a mixed-methods study
Conclusions - KRAS mutation testing of tumours in adults with metastatic colorec...
Conclusions - KRAS mutation testing of tumours in adults with metastatic colorectal cancer: a systematic review and cost-effectiveness analysis
Patient and public involvement - The Asymptomatic Carotid Surgery Trial-2 (ACST-...
Patient and public involvement - The Asymptomatic Carotid Surgery Trial-2 (ACST-2): an ongoing randomised controlled trial comparing carotid endarterectomy with carotid artery stenting to prevent stroke

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on