U.S. flag

An official website of the United States government

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

National Guideline Centre (UK). Cirrhosis in Over 16s: Assessment and Management. London: National Institute for Health and Care Excellence (NICE); 2016 Jul. (NICE Guideline, No. 50.)

Cover of Cirrhosis in Over 16s

Cirrhosis in Over 16s: Assessment and Management.

Show details

6Diagnostic tests

6.1. Introduction

Clinical evaluation can identify patients with cirrhosis when these individuals exhibit clinical signs of decompensated liver disease such as jaundice, ascites or hepatic encephalopathy. However, it is recognised that cirrhosis is not always clinically apparent, even to an experienced hepatologist18, because people with compensated cirrhosis may experience few or no symptoms or signs of liver disease, Consequently, people identified to be at risk of cirrhosis require a confirmatory test.

Liver biopsy is considered the ‘gold standard’ to assess the stage of liver fibrosis in people with chronic liver disease and is the definitive method for confirming a diagnosis of cirrhosis. However, liver biopsy is expensive, is not popular with patients, and is associated with a small risk of severe complications such as bleeding and death.25 It requires skilled practitioners to perform the procedure and to interpret liver histology; consequently, the application of liver biopsy is confined to secondary and tertiary care settings. Sampling error reduces the precision of liver biopsy to assess fibrosis and a biopsy specimen shorter than 25 mm in length increases the risk of inaccurate categorisation of liver fibrosis.16 Indeed, liver biopsy may fail to detect cirrhosis in up to 15% of cases.170 Given the problems associated with liver biopsy, simple non-invasive tests are often the preferred option to assess whether a person has cirrhosis76, especially tests that can be employed in primary as well as secondary care. Patients need to be fully informed of the potential pros and cons of invasive and non-invasive test options so that, with the support of their clinician, they can choose the best method for them.

Routine laboratory liver blood tests have been evaluated, as predictors of cirrhosis, but normal values of bilirubin, albumin, aspartate aminotransferase (AST), and alanine aminotransferase (ALT) do not exclude cirrhosis. Combinations of routine laboratory blood tests are used to predict cirrhosis, including AST/ALT ratio, AST to platelet ratio (APRI), and FIB-4. Proprietary test panels employing blood tests that are surrogate markers of fibrogenesis include FibroTest and Enhanced Liver Fibrosis panel (ELF). Since increased liver fibrosis is associated with a greater degree of ‘stiffness’ of the liver, recent work has focused on measuring liver elastography using transient elastography (TE) and Acoustic Radiation Force Impulse (ARFI) imaging to assess liver fibrosis. These tests are performed in an outpatient setting and the results are available immediately. Magnetic resonance elastography has also been used to asses liver fibrosis. The GDG decided to compare the clinical and cost-effectiveness of routine laboratory blood tests, the blood fibrosis tests and imaging tests, both individually and in combination, to determine their performance characteristics for the diagnosis of cirrhosis against the reference standard which is examination of liver histology.

6.2. Review question: In people with suspected (or under investigation for) cirrhosis

  1. What is the most accurate blood fibrosis test to identify whether cirrhosis is present?
  2. What is the most accurate non-invasive imaging test to identify whether cirrhosis is present?
  3. Is the most accurate blood fibrosis test more accurate compared to an individual blood test to identify whether cirrhosis is present?
  4. Is a combination of 2 non-invasive tests more accurate compared to a blood fibrosis test alone or an imaging test alone to identify whether cirrhosis is present?

For full details see the review protocol in Appendix C.

Table 16. Characteristics of review question.

Table 16

Characteristics of review question.

6.3. Clinical evidence

Fifty- three studies were included in the review10,14,22,23,29-32,39,40,52,55,57,58,72,73,75,77,78,80,81,94,96,105,108,109,116,120,128-130,132,138,164,173,186,202,205,210-212,215,243-245,247,24823,24,62,69,70,136,200,249. Study characteristics are summarised in Table 18 and evidence from these studies is summarised in the clinical evidence profiles below (Table 19, Table 20, Table 21, Table 22, Table 23, Table 24, Table 25, Table 26, Table 27, Table 28). See also the study selection flow chart in Appendix E, sensitivity/specificity forest plots, summary receiver operating characteristics (sROC) curves and ROC AUC plots in Appendix K, study evidence tables in Appendix H and exclusion list in Appendix L. Prospective and retrospective cohort studies in which the index test(s) and the reference standard test were applied to the same patients in a cross-sectional design were included in the review. The included population was those people suspected of having cirrhosis due to 1 of the specified risk factors. Two-gate study designs (sometimes referred to as case-control) are cross-sectional studies which compare the results of the index test in patients with an established diagnosis of cirrhosis with the results from healthy controls in order to assess the diagnostic accuracy of a test. This study design was excluded as it is unrepresentative of practice and is unlikely to contain the full spectrum of health and disease over which the test would be used. Studies of this design may lead to the selective inclusion of cases with more advanced disease and overestimations of sensitivity. The inclusion of healthy controls is likely to lead to overestimations of specificity.

Table 18. Summary of studies included in the review.

Table 18

Summary of studies included in the review.

The reference standard for all included studies was liver biopsy, with the level of fibrosis (and therefore the diagnosis of cirrhosis) scored using one of the fibrosis scoring systems specified in the protocol. There are known to be limitations with using liver biopsy for the diagnosis of cirrhosis. For example, the accuracy of liver biopsy can be affected by sampling errors and fibrosis heterogeneity within the liver itself. These inaccuracies are accentuated in biopsy samples of inadequate size. The UK standard criteria for an adequate biopsy length is ≥25 mm and containing at least 10 portal tracts. The GDG were aware that many studies fall below this operational definition. Studies including biopsies below this standard were not automatically excluded, but were downgraded in the quality of the evidence, as the accuracy of the reference standard will be compromised. The GDG also set a lower limit for the size of the biopsy, at which any studies including all or a proportion of biopsies below this lower limit would be excluded. This lower limit was set at 15 mm and 6 portal tracts, as the GDG felt that below this level the accuracy of the biopsy would be severely compromised and an accurate level of fibrosis would not be possible to assess. Studies including all or a proportion of biopsies below this level (or not stating the biopsy length) were excluded. The GDG discussed that setting these lower and upper limits would give the right balance between only including the higher quality evidence, without excluding a high proportion of the available studies and making conclusions on only a small proportion of the evidence. If a study reported that the biopsy was at least 15 mm ‘or’ 6 portal tracts, then this study was included, even if the other measure fell below the lower limit. The GDG thought that accurate staging of fibrosis and a diagnosis of cirrhosis would be possible as long as 1 of these parameters was met.

The fibrosis scoring using these systems is normally performed by an experienced histopathologist and has an element of subjectivity in the diagnosis. It is also a process that is subject to intra- and inter-observer variability. The ideal reference standard is a diagnosis of cirrhosis scored by a single pathologist, blinded to the patient's clinical data and blinded to the diagnosis made using the index test. Therefore, when assessing the risk of bias for each study, the evidence quality was downgraded if the assessor was not blinded to patient clinical information or results of the index test, or if different people assessed biopsies from different patients, possibly introducing inter-observer variability.

Population strata for the different underlying aetiologies of liver disease which were pre-specified in the protocol were not combined in the analyses. These pre-specified strata were separated as there are known to be factors, distinct to each aetiology, which will interfere with the results of the non-invasive tests (for example, alcohol consumption, portal hypertension, hepatic inflammation and obesity). Therefore, for the non-invasive tests there will be variation in the diagnostic accuracies and optimal cut-off thresholds for a positive result between aetiological groups. For this reason, studies reporting the diagnostic accuracy of index tests in mixed populations (without subgroup analysis by aetiology) were excluded from the review. The following population strata were separated within the analysis:

For the multiple aetiologies stratum, evidence was only identified for people with HIV/HCV co-infection. No evidence was identified for the population stratum of primary sclerosing cholangitis specified in the protocol.

Forest plots showing the sensitivity and specificity values from the individual studies for each index test at relevant cut-off thresholds are summarised in Appendix K. The corresponding pooled sensitivity and specificity values of each index test at relevant cut-off thresholds are summarised in the clinical evidence profiles in Table 19, Table 21, Table 23, Table 25 and Table 27 (1 table for each population strata, sectioned into individual blood tests, blood fibrosis tests, imaging tests, and combinations of tests). Where evidence was available from 3 or more studies for an index test at the specified cut-off threshold, a diagnostic meta-analysis was performed and the pooled sensitivity and specificity value presented in the clinical evidence profile (along with the summary sensitivity and specificity value displayed in ROC space in Appendix K). Where evidence was available from fewer than 3 studies or a single study, the median sensitivity value is presented along with the corresponding specificity value from the same study, and the range of sensitivity and specificity values (no diagrams of the sensitivity and specificity values within ROC space were presented in Appendix K as a meta-analysis was not performed).

Studies may report sensitivity and specificity values at a pre-specified published cut-off threshold, or they may determine the optimal threshold from an ROC analysis. This resulted in a range of thresholds being reported for some index tests. If all the sensitivity and specificity values from the range of cut-off thresholds are pooled together, this can result in an overestimation of the diagnostic accuracy in comparison to another index test where sensitivity and specificity values are only reported for one cut-off threshold. For the below tests, the range of thresholds was considered too wide to pool the studies together in the analysis, and the cut-off thresholds were separated into the below categories prior to analysis:

  • Transient elastography: low (9 to <13 kPa), medium (13 to <15 kPa), high (≥15 kPa)
  • APRI: low (0.5 to <1.5), high (1.5 to 2.5)

For the following tests, the range of thresholds was considered narrow enough to pool together in the analysis:

  • FibroTest: range 0.56–0.75
  • ELF: range 9.3–10.44
  • ARFI: range 1.55–2.0 m/s

In addition to reporting the sensitivity and specificity of a test at a particular cut-off threshold, some individual studies also report the AUC from an ROC analysis for each index test investigated. Where available, this mean AUC value with its 95% CI was plotted on a graph for each index test. The AUC value and its 95% CI from the median study (along with the range of AUC values from all the studies) for each index test was summarised in the clinical evidence profiles in Table 20, Table 22, Table 24, Table 26 and Table 28 (1 table for each population stratum, sectioned into individual blood tests, blood fibrosis tests, imaging tests, and combinations of tests).

Some studies reported AST and ALT results on a transformed scale whereas other studies did not. This was to account for a change in laboratory reference levels introduced in 2003. Detection of enzyme activity is dependent on temperature, requiring all ALT and AST assays to be performed at 37°C. This resulted in a change in the upper limit of normal (ULN) level for both enzymes. Studies which were performed during the changeover period in 2003 may not always report whether they took into account this change. However, ratio measures such as the AST/ALT ratio would not be affected as both measures would be expected to increase by the same proportion. All studies were included even if they did not transform the data, as this was either normalised to the ULN or a ratio measure.

Table 17. Summary of index tests: components of non-invasive tests and applicable aetiology.

Table 17

Summary of index tests: components of non-invasive tests and applicable aetiology.

6.3.1. Hepatitis C

Table 19. Clinical evidence profile (sensitivity and specificity): HCV population.

Table 19

Clinical evidence profile (sensitivity and specificity): HCV population.

Table 20. Clinical evidence profile (AUC): HCV population.

Table 20

Clinical evidence profile (AUC): HCV population.

6.3.2. NAFLD

Table 21. Clinical evidence profile (sensitivity and specificity): NAFLD population.

Table 21

Clinical evidence profile (sensitivity and specificity): NAFLD population.

Table 22. Clinical evidence profile (AUC): NAFLD population.

Table 22

Clinical evidence profile (AUC): NAFLD population.

6.3.3. ALD

Table 23. Clinical evidence profile (sensitivity and specificity): ALD population.

Table 23

Clinical evidence profile (sensitivity and specificity): ALD population.

Table 24. Clinical evidence profile (AUC): ALD population.

Table 24

Clinical evidence profile (AUC): ALD population.

6.3.4. Primary biliary cholangitis (PBC)

Table 25. Clinical evidence profile (sensitivity and specificity): PBC population.

Table 25

Clinical evidence profile (sensitivity and specificity): PBC population.

Table 26. Clinical evidence profile (AUC): PBC population.

Table 26

Clinical evidence profile (AUC): PBC population.

6.3.5. Multiple aetiologies

Table 27. Clinical evidence profile (sensitivity and specificity): HIV/HCV population.

Table 27

Clinical evidence profile (sensitivity and specificity): HIV/HCV population.

Table 28. Clinical evidence profile (AUC): HIV/HCV population.

Table 28

Clinical evidence profile (AUC): HIV/HCV population.

6.4. Economic evidence

6.4.1. Published literature

One economic evaluation was identified that compared a ‘no testing’ strategy with liver biopsy and transient elastography in chronic hepatitis C patients with no fibrosis.27

One economic evaluation was identified that compared liver biopsy and transient elastography in 3 relevant patient subgroups: hepatitis B, hepatitis C and NAFLD.213

One economic evaluation was identified that compared liver biopsy, transient elastography, ELF and FibroTest in patients with suspected liver fibrosis related to alcohol consumption.214

These are summarised in the economic evidence profile below (Table 29) and the economic evidence tables in Appendix I.

Table 29. Economic evidence profile: Comparisons of diagnostic tests for cirrhosis.

Table 29

Economic evidence profile: Comparisons of diagnostic tests for cirrhosis.

Table 30. Stevenson 2012 – Cost-effectiveness results.

Table 30

Stevenson 2012 – Cost-effectiveness results.

Figure 3. Stevenson 2012 – Cost-effectiveness of diagnostic tests.

Figure 3

Stevenson 2012 – Cost-effectiveness of diagnostic tests.

One economic evaluation relating to this review question was identified but was excluded due to limited applicability.43 This is listed in Appendix M, with reasons for exclusion given.

See also the economic article selection flow chart in Appendix F.

6.4.2. Unit costs

See Table 57 in Appendix N.

6.4.3. New cost-effectiveness analysis

Original cost-effectiveness modelling was undertaken for this question using the NGC liver disease pathway model developed for this guideline. A summary is included here. Evidence statements summarising the results of the analysis can be found below. The full analysis can be found in Appendix N.

6.4.3.1. Aim and structure

The aim of the health economic modelling for this question was to determine the most cost-effective diagnostic test to detect cirrhosis in 4 aetiology groups: NAFLD, ALD, HBV and HCV. HBV patients were further separated in 2 cohorts: positive or negative hepatitis B e antigen (HBeAg). HCV patients were further separated by disease genotype (genotypes 1–4).

For these purposes a lifetime health state transition (Markov) model was constructed, following the NICE reference case,146 which depicted the patient pathway from advanced fibrosis to liver transplantation.

The number of diagnostic strategies compared differed among the 4 examined cirrhosis aetiologies. This was related to the amount of evidence identified in the review of the diagnostic studies for each group.

Table 31. Tests included in the model by disease aetiology.

Table 31

Tests included in the model by disease aetiology.

In each population group each of the diagnostic tests above were compared to the options of:

  • Liver biopsy (reference standard)
  • No test, monitor all patients in the relevant population assuming they have cirrhosis.
  • No test, monitor no-one, assuming none have cirrhosis until later clinical presentation.

People testing negative with the test were retested using the same test every 2 years. The model used diagnostic accuracy data from studies identified in the present guideline review. When there were not enough studies (fewer than 3) around the diagnostic accuracy of a specific test for pooled sensitivity and specificity estimates, the corresponding 2×2 diagnostic table was selected from a single study that was believed to represent the best quality evidence. Test costs were obtained from published literature and GDG sources. Health states costs were constructed under GDG guidance specifically for the purposes of the model. Utilities and transition probabilities were mostly obtained from published literature and through extrapolations from other liver diseases where there was a lack of evidence (mainly in the NAFLD and ALD cohorts). The model was built probabilistically to take account of the uncertainty around input parameter point estimates.

Cost-effectiveness was defined by the value of the net monetary benefit (NMB) attributed to every test. The decision rule applied is that the comparator with the highest NMB is the most cost-effective option at the specified £20,000 per extra QALY threshold. For ALD, incremental cost-effectiveness ratios (ICERs) comparing all strategies to ’no test – no monitoring’ were also calculated due to the high uncertainty depicted in the confidence intervals.

6.4.3.2. Results

6.4.3.2.1. NAFLD cohort
Table 32. Test ranking in NAFLD cohort.

Table 32

Test ranking in NAFLD cohort.

Across the different strategies compared, transient elastography at a threshold of 15.9 kPa ranked first mainly due to having the highest diagnostic accuracy among the non-invasive tests. ARFI followed second being slightly less accurate but also having lower test unit costs. Transient elastography at 10.0–<13.0 kPa ranked third having similar specificity to the other 2 tests but lower sensitivity. All 3 non-invasive tests had similarly wide confidence intervals (ranging from first to fourth place).

In the deterministic sensitivity analysis, rankings were sensitive to increases in the transient elastography and ARFI unit costs and in the decrease of the diagnostic accuracy of transient elastography. Therefore, no safe conclusion can be made over the most cost-effective option among the top 3 comparators.

6.4.3.2.2. ALD cohort
Table 33. Test ranking and ICERs in ALD cohort.

Table 33

Test ranking and ICERs in ALD cohort.

Testing people with alcohol-related liver disease for cirrhosis was not cost-effective compared to ‘no test – no monitoring’ and ‘no test – monitor all’ at a cost-effectiveness threshold of £20,000 per QALY gained. However, it was cost-effective at a threshold of £30,000 per QALY gained: the ICERs for the 3 non-invasive liver tests were £22,438–£22,977). All three non-invasive tests had similarly wide confidence intervals (from first or second to fifth place).

In none of the deterministic sensitivity analysis scenarios did a test strategy rank higher than third. Ranking among the 3 non-invasive liver tests slightly varied across the different scenarios with transient elastography at 11.0 - <13.0 remaining third in ranking for 9 out of the 10 tested scenarios.

6.4.3.2.3. HBV cohorts
Table 34. Test ranking in HBeAg negative cohort.

Table 34

Test ranking in HBeAg negative cohort.

For the HBeAg negative group, APRI at 1.0 ranked first, most probably due to its low test unit costs and its moderate diagnostic accuracy (second best after transient elastography). Transient elastography and FibroTest ranked second and third. APRI at 2.0 ranked last among the NILT mainly due to its considerably lower sensitivity. All non-invasive liver tests had similarly wide 95% confidence intervals.

Table 35. Test ranking in HBeAg positive cohort.

Table 35

Test ranking in HBeAg positive cohort.

In the HBeAg positive group, FibroTest ranked first with TE and APRI at 1.0 ranking second and third. All non-invasive liver tests had similarly wide 95% confidence intervals. In the probabilistic analysis, the 3 tests also shared similar probabilities ranking first (20–23%).

Deterministic sensitivity analyses were conducted for the HBeAg negative group. Rankings between the deterministic and the probabilistic analyses varied particularly for the FibroTest and transient elastography tests highlighting how incorporating the uncertainty of the input parameters in the model affects the cost-effectiveness results. APRI at 1.0 ranked first or second in all scenarios. FibroTest and transient elastography each ranked between first and fourth in each scenario. The cost-effectiveness of APRI at 1.0 was sensitive to the decrease of HBV prevalence, the presence of varices at the point of cirrhosis diagnosis and changes to the cost and the accuracy of transient elastography.

6.4.3.2.4. HCV cohorts

Results are only presented for genotypes 1 and 3 as the results for genotypes 2 and 4 were consistent with these. The rankings of the top 3 tests are presented for all 4 genotypes.

Table 36. Test ranking in HCV genotype 1 cohort.

Table 36

Test ranking in HCV genotype 1 cohort.

Table 37. Test ranking in HCV genotype 3 cohort.

Table 37

Test ranking in HCV genotype 3 cohort.

Table 38. HCV diagnostic tests – top 3 ranked tests in every genotype.

Table 38

HCV diagnostic tests – top 3 ranked tests in every genotype.

For all 4 genotypes, liver biopsy ranked first with substantially higher NMB values compared to the second options. This is mainly attributable to the fact that liver biopsy was assumed to have perfect sensitivity and specificity, and that cirrhosis misdiagnosis is associated with the incorrect administration of the highly costly polymerase inhibitor drugs. This led to the economic model particularly favouring the test with the highest diagnostic accuracy irrespective of its unit cost. In genotypes 1 and 3, liver biopsy ranked first in 90% and 97% of the simulations respectively. Transient elastography at 13.0–<15.0 ranked second or third in genotypes 1–4 and the ‘TE or ARFI’ strategy ranked third in genotype 4.

Deterministic sensitivity analyses were conducted for the genotype 3 group. Liver biopsy remained first in all but 2 scenarios. These were the ‘no HCV treatment’ and the ‘diagnostic accuracy for transient elastography at 13.0 kPa at high 95% CI’ scenarios, also highlighting how crucial the drug treatment element is for the HCV diagnostic model.

For more details on all the analyses see Appendix N.

6.5. Evidence statements

6.5.1. Clinical

  • Fifty-three studies were included in the review covering 5 aetiologies of cirrhosis. Thirty-three studies looking at the hepatitis C population, 10 studies looked at the NAFLD or NASH population, 3 studies looked at the HCV/HIV co-infected population, 1 study looked at the PBC population and 2 studies looked at ALD. No evidence was identified for the population stratum of PSC specified in the protocol.
  • Of the index tests listed in the protocol, no evidence was identified for albumin, prothrombin time (INR), bilirubin, γGT, ultrasound or MR elastography.
  • Studies were identified relating to the accuracy of platelet count, AST, ALT, FibroTest, ELF, APRI, FIB-4, AST/ALT ratio, transient elastography, ARFI and combination of these tests in diagnosing cirrhosis. Data presented to the GDG were in the form of paired sensitivity and specificity values and AUC values. Data relating to transient elastography was reported at a range of thresholds: low 9-<13 kPa, medium 13 to <15 kPa, high ≥15 kPa. Similarly, data relating to APRI was divided into low (0.5 to <1.5) and high threshold ranges (1.5–2.5).

Hepatitis C

  • In the hepatitis C population, Moderate quality evidence from 5 studies indicated a high sensitivity (76 and 87), a high specificity (84 and 88) and a high AUC (range 82.7–89.9) for platelet count.
  • Moderate quality evidence from 1 study indicated a high AUC (75.2) for AST.
  • Moderate quality evidence from 1 study indicated a moderate AUC (62.6) for ALT.
  • Very Low quality evidence from 4 studies indicated a high sensitivity (80.3) and a moderate specificity (69.3) for FibroTest. Moderate quality evidence from 4 studies indicated a high AUC value (86.5) for FibroTest.
  • Low quality evidence from 3 studies indicated a high sensitivity (83.0) and specificity (82.0) for ELF.
  • Very Low quality evidence from 7 studies indicated a high sensitivity (83.8) and specificity (77.8) for APRI at low cut-offs. Very Low quality evidence from 5 studies indicated a low sensitivity (36.5) and high specificity (94.4) for APRI at high cut-offs. Low quality evidence from 8 studies indicated a high AUC 88.0 for APRI.
  • Very Low quality evidence from 1 study indicated a high sensitivity (80) and high specificity (78) for FIB-4. Low quality evidence from 4 studies indicated a high AUC (84.8) for FIB-4.
  • Low quality evidence from 2 studies indicated a low sensitivity (30 and 35) and high specificity (90 and 97) for AST/ALT ratio. Moderate quality evidence from 3 studies indicated a high AUC (73) for AST/ALT ratio.
  • Low quality evidence from 7 studies indicated a high sensitivity (81.5) and specificity (90.4) for transient elastography at a low cut-off. Very Low quality evidence from 7 studies indicated a high sensitivity (93.4) and a high specificity (92.9) for transient elastography at medium thresholds. Very Low quality evidence from 1 study indicated high sensitivity (86) and specificity (91) of transient elastography at a high threshold. Low quality evidence from 10 studies indicated a high AUC (92.6) for transient elastography.
  • Very Low quality evidence from 6 studies indicated a high sensitivity (88.1) and specificity (84.3) for ARFI. Low quality evidence from 2 studies indicated a high AUC (86.1) for ARFI.
  • Low quality evidence from 1 study indicated a high sensitivity (90) and specificity (89) for pSWE. Low quality evidence from 1 study indicated a high AUC (95) for pSWE.
  • Low quality evidence from 1 study indicated a high sensitivity (85) and specificity (94) for transient elastography plus ARFI. Very Low quality evidence from 1 study indicated a high sensitivity (96) and specificity (83) for transient elastography or ARFI.
  • Moderate quality evidence indicated a high sensitivity (87), high specificity (90) and AUC (87) of the SAFE algorithm (sequential use of APRI, FibroTest and liver biopsy).
  • Moderate quality evidence indicated a high sensitivity (89) and specificity (98) of the Castera algorithm (combination of transient elastography and FibroTest. When transient elastography and FibroTest agree no biopsy is performed whereas when they disagree, liver biopsy is needed). Low quality evidence from 1 study indicated a high AUC (93) for the Castera algorithm.

NAFLD

  • In the NAFLD population, Very Low quality evidence from 2 studies indicated a moderate sensitivity (78 and 76) and high specificity (95 and 91) for transient elastography at a low cut-off, and a high sensitivity (100 and 100) and specificity (97 and 98) for transient elastography at a high cut-off.
  • Very Low quality evidence from 2 studies indicated a high sensitivity (92 and 100) and specificity (92 and 96) for ARFI.
  • Low quality evidence indicated a moderate AUC for APRI (76.8), FIB-4 (81) and AST/ALT ratio (73.7).
  • There was a high AUC for transient elastography (94) and ARFI (98.4) from Very low and Low quality evidence, respectively.

ALD

  • In the ALD population, Very Low quality evidence from 1 study indicated a low sensitivity (40) and moderate specificity (61) for APRI at a high cut-off.
  • Very Low quality evidence from 1 study indicated a high sensitivity (100) and specificity (79) of transient elastography at a low cut-off. Very Low quality evidence from 1 study indicated a high sensitivity (80) and specificity (76) of transient elastography at a high cut-off. Very Low quality evidence from 1 study indicated a high AUC for transient elastography (92.1).

Primary biliary cholangitis (PBC)

  • In the PBC population, Low quality evidence from 1 study indicated a high sensitivity (100) and specificity (94) of transient elastography at a low cut-off.
  • Low quality evidence from 1 study indicated a high AUC (84) of APRI.
  • Moderate quality evidence from 1 study indicated a high AUC (74) of FIB-4.
  • Low quality evidence from 1 study indicated a moderate AUC (58) for AST/ALT ratio.
  • Low quality evidence from 1 study indicated a good AUC (99) for transient elastography.
  • Low quality evidence from 1 study indicated high AUC values (99) for 3 combinations of tests (TE plus APRI, TE plus FIB-4 and TE plus AST/ALT ratio).

HCV/HIV co-infection

  • In the HCV/HIV co-infected population, Very Low quality evidence from 1 study indicated a moderate sensitivity (63) and a high specificity (77) for platelet count. Low quality evidence from 2 studies indicated a high AUC (79.5) for platelet count.
  • Low quality evidence from 1 study indicated a high sensitivity (78) and a moderate specificity (57) for APRI at a low cut-off. Very Low quality evidence from 1 study indicated a low sensitivity (53) and a high specificity (89) for APRI at a high cut-off. Low quality evidence from 2 studies indicated a high AUC (77.5) for APRI.
  • Moderate quality evidence from 1 study indicated a high AUC (73) for FIB-4.
  • Very Low quality evidence from 1 study indicated a low sensitivity (38) and a high specificity (77) for AST/ALT ratio. Very Low quality evidence from 2 studies indicated a low AUC (52.5) for AST/ALT ratio.
  • Low quality evidence from 1 study indicated a high sensitivity (100) and specificity (93) for transient elastography at a low cut-off. Very Low quality evidence from 2 studies indicated high (88 and 100) sensitivity and specificity (93 and 96) for transient elastography at medium thresholds. Low quality evidence from 2 studies indicated a high AUC (80) for transient elastography.

6.5.2. Economic

  • One cost-utility analysis that compared annual liver biopsy, annual transient elastography and no testing for diagnosis of cirrhosis in chronic hepatitis C patients found that:
    • annual transient elastography was cost-effective compared to no testing (ICER: £6,557 per QALY gained)
    • annual liver biopsy was dominated by both alternatives (more expensive and less effective).
    This analysis was assessed as directly applicable with potentially serious limitations.
  • One cost analysis that compared liver biopsy and transient elastography for diagnosis of cirrhosis in 3 relevant patient subgroups found that liver biopsy had additional costs of £1,136, £2,001 and £3,841 per additional correct diagnosis when compared to transient elastography for the HBV, HCV and NAFLD subgroups respectively. This analysis was assessed as partially applicable with potentially serious limitations.
  • One cost-utility analysis that compared 6 diagnostic strategies for diagnosis of cirrhosis in adults with ALD found that liver biopsy was cost-effective at a cost-effectiveness threshold of £20,000 per QALY gained compared to the following strategies:
    • triage with transient elastography, biopsy in those who tested positive with transient elastography
    • triage with FibroTest, biopsy in those who tested positive with FibroTest
    • triage with ELF, biopsy in those who tested positive with ELF
    • transient elastography alone
    • ELF alone.
    This analysis was assessed as partially applicable with potentially serious limitations.
  • One original cost-utility analysis that compared 6 strategies to diagnose cirrhosis in people with NAFLD and advanced fibrosis with a retest frequency of 2 years found that transient elastography ranked first compared to the following diagnostic strategies, using relevant thresholds for each test, with reference to a cost-effectiveness threshold of £20,000 per QALY gained:
    • ARFI
    • transient elastography (lower threshold)
    • no test – no surveillance
    • no test – surveillance for all
    • liver biopsy.
    This analysis was assessed as directly applicable with minor limitations.
  • One original cost-utility analysis that compared 6 strategies to diagnose cirrhosis in people with ALD, with a retest frequency of 2 years, found that:
    • The ‘no test – no surveillance’ strategy ranked first compared to the following diagnostic strategies, using relevant thresholds for each test, with reference to a cost-effectiveness threshold of £20,000 per QALY gained:

      no test – surveillance for all

      transient elastography (low threshold)

      transient elastography (high threshold)

      APRI

      liver biopsy.

    • When compared to the ‘no test – no monitor’ strategy, the 3 non-invasive tests had ICERs between £22,438 and £22,977 per QALY gained.
    This analysis was assessed as directly applicable with minor limitations.
  • One original cost-utility analysis that compared 7 strategies to diagnose cirrhosis in people with hepatitis B and HBeAg negative with a retest frequency of 2 years found that APRI ranked first compared to the following diagnostic strategies, using relevant thresholds for each test, with reference to a cost-effectiveness threshold of £20,000 per QALY gained:
    • transient elastography
    • FibroTest
    • APRI (higher threshold)
    • no test – no surveillance
    • no test – surveillance for all
    • liver biopsy.
    This analysis was assessed as directly applicable with minor limitations.
  • One original cost-utility analysis that compared 7 strategies to diagnose cirrhosis in people with hepatitis B and HBeAg positive with a retest frequency of 2 years found that FibroTest ranked first compared to the following diagnostic strategies, using relevant thresholds for each test, with reference to a cost-effectiveness threshold of £20,000 per QALY gained:
    • transient elastography
    • APRI (low threshold)
    • no test – no surveillance
    • APRI (high threshold)
    • no test – surveillance for all
    • liver biopsy.
    This analysis was assessed as directly applicable with minor limitations.
  • One original cost-utility analysis that compared 20 strategies to diagnose cirrhosis in people with hepatitis C with a retest frequency of 2 years found that liver biopsy ranked first compared to the following diagnostic strategies, using relevant thresholds for each test, with reference to a cost-effectiveness threshold of £20,000 per QALY gained:
    • Castera algorithm
    • transient elastography (medium threshold)
    • transient elastography and ARFI
    • transient elastography or ARFI
    • transient elastography (high threshold)
    • SAFE algorithm
    • point shear wave elastography
    • transient elastography (low threshold)
    • ARFI
    • platelet count
    • APRI
    • ELF
    • FIB-4
    • FibroTest
    • APRI
    • AST-ALT ratio
    • no testing – surveillance for all, treat HCV using medication for people with cirrhosis
    • no testing – no surveillance, treat HCV using medication for people with fibrosis
    • no testing – no surveillance, no treatment for HCV.
    This analysis was assessed as directly applicable with minor limitations.

6.6. Recommendations and link to evidence

Recommendations
2.

Discuss with the person the accuracy, limitations and risks of the different tests for diagnosing cirrhosis.

3.

Offer transient elastography to diagnose cirrhosis for:

  • people with hepatitis C virus infection
  • men who drink over 50 units of alcohol per week and women who drink over 35 units of alcohol per week and have done so for several months
  • people diagnosed with alcohol-related liver disease.
4.

Offer either transient elastography or acoustic radiation force impulse imaging (whichever is available) to diagnose cirrhosis for people with NAFLD and advanced liver fibrosis (as diagnosed by a score of 10.51 or above using the enhanced liver fibrosis [ELF] test). Also see the assessment for advanced liver fibrosis section in NICE's NAFLD guideline.

5.

Consider liver biopsy to diagnose cirrhosis in people for whom transient elastography is not suitable.

6.

For recommendations on diagnosing cirrhosis in people with hepatitis B virus infection, see the assessment of liver disease in secondary specialist care section in NICE's hepatitis B (chronic) guideline.

7.

Do not offer tests to diagnose cirrhosis for people who are obese (BMI of 30 kg/m2 or higher) or who have type 2 diabetes, unless they have NAFLD and advanced liver fibrosis (as diagnosed by a score of 10.51 or above using the ELF test). Also see the assessment for advanced liver fibrosis section in NICE's NAFLD guideline.

8.

Ensure that healthcare professionals who perform or interpret non-invasive tests are trained to do so.

9.

Do not use routine laboratory liver blood tests to rule out cirrhosis.

10.

Refer people diagnosed with cirrhosis to a specialist in hepatology.

11.

Offer retesting for cirrhosis every 2 years for:

  • people diagnosed with alcohol-related liver disease
  • people with hepatitis C virus infection who have not shown a sustained virological response to antiviral therapy
  • people with NAFLD and advanced liver fibrosis.
12.

For recommendations on reassessing liver disease in hepatitis B virus infection, see the assessment of liver disease in secondary specialist care section in NICE's hepatitis B (chronic) guideline.

Relative values of different outcomesThe GDG was interested in the performance of various blood or imaging tests in the diagnosis of cirrhosis in people with risk factors for cirrhosis even in the absence of signs and symptoms (for example HCV, NAFLD, alcohol misuse). The GDG did not consider the performance of these tests as screening tools in the general population. Therefore, test performance was assessed from studies matching the intended population for use of the test clinically. Studies including healthy populations without suspected chronic liver disease were not included. Due to existing NICE guidance for assessment of fibrosis and cirrhosis in people with HBV, new clinical evidence was not searched for.
In order to assess test performance we first searched for any diagnostic RCTs assessing patient outcomes, from studies randomising patients to diagnoses using one test or another, followed by identical therapeutic interventions based on the results of the tests. This is seen as the gold standard study design as it assesses patient outcomes as clinically important consequences of diagnostic accuracy. No studies of this design were identified from the clinical evidence review. Therefore, the GDG reviewed evidence from diagnostic accuracy studies.
The reference standard test used to define the presence or absence of cirrhosis was liver biopsy. The GDG specified in the protocol the most commonly used biopsy scoring systems for cirrhosis, including Knodell F4, Ishak F5/6, METAVIR F4 or, for NAFLD populations, the Kleiner or Brunt F4 scoring systems.
For decision-making, the GDG focused on diagnostic accuracy measures including the sensitivity and specificity of the tests for a diagnosis of cirrhosis. It was noted that these data were used to inform the health economic model, in order to identify the most cost-effective test, or combination of tests, for the diagnosis of cirrhosis. The GDG discussed that, for a condition such as cirrhosis where early identification is essential for effective management (including treatment of the underlying cause or monitoring for life threatening complications), it is crucial to have a highly sensitive test, especially early on in the patient pathway if multiple tests are used. This is because a sensitive test will result in very few people with cirrhosis being missed (few false negative results). The GDG noted that the cut-off threshold used to define a positive test can vary and assessed the accuracy of the tests at a variety of published thresholds. A threshold set to increase the sensitivity of the test will consequently reduce the specificity. The GDG also discussed the importance of a test with high specificity, which would result in very few people without cirrhosis being incorrectly labelled with cirrhosis (false positive results). This is particularly important if the results of the test determine people who would then possibly undergo an invasive or costly intervention.
Trade-off between clinical benefits and harmsHepatitis C
The majority of available clinical evidence meeting the protocol criteria was in populations of people with chronic HCV infection. The only data available for individual blood tests were for platelets with a sensitivity ranging from 76% to 87% and a specificity ranging from 84% to 88%. The GDG noted the lack of any evidence to support ALT or AST as individual blood tests in the diagnosis of cirrhosis. This would support the fact that a diagnosis of HCV-related cirrhosis should not be discounted on the basis of these individual blood tests alone. Data were available for the accuracy of AST/ALT as a ratio measure, providing evidence of a very low sensitivity. Therefore, despite a high specificity, AST/ALT ratio would not be a very good first-line test as there would be a high number of false negative results and people with cirrhosis would be missed. However, the GDG discussed whether it would be an option to combine such a test with a highly sensitive test.
For blood fibrosis tests there were data available for FibroTest, ELF, APRI and FIB-4. The GDG noted a relatively high sensitivity and specificity for all these blood fibrosis tests.
For imaging tests, accuracy data were available for transient elastography and ARFI. Both had a relatively high sensitivity and specificity, with transient elastography at a cut-off threshold of between 13 kPa and 15 kPa performing the best (pooled sensitivity and specificity of 93.4% [95% CI 87.9, 97.0] and 92.9% [95% CI 86.5, 97.0], respectively). The GDG also noted that the tests using transient elastography and ARFI in combination (a positive result on both, or a positive result on one or the other) also gave relatively high sensitivities and specificities. One study also assessed transient elastography within an algorithm of tests, the best performing of which was the Castera algorithm consisting of a combination of transient elastography and FibroTest with a sensitivity of 89% and a specificity of 98%.

NAFLD
No relevant studies were identified looking at either individual blood tests or blood fibrosis tests in a NAFLD population. For imaging tests, accuracy data were available for transient elastography and ARFI. The GDG noted the anomalous results seen for transient elastography in the NAFLD population. Normally, increasing the threshold of a test will result in a decrease in the sensitivity and an increase in the specificity. This pattern was not always observed in the evidence. This may be due to the differing aims of the studies included in the clinical evidence review. Some studies aimed to assess the diagnostic accuracy of transient elastography in a NAFLD population, using the most appropriate probe size (M or XL probe) for each individual's BMI, as per the manufacturer's instructions. Other studies aimed to compare the accuracy of the M probe with the XL probe, assessing both probes in each individual regardless of their BMI. It was agreed that the latter type of study was not appropriate for the clinical evidence review, as the GDG were interested in the overall accuracy of transient elastography in this population, with the assumption that the test is performed according to the manufacturer's instructions using the most appropriate probe in each patient. Therefore, the following studies were removed from the analysis: Wong 2012, Myers 2012 and Friedrichrust 2010A.
For the overall accuracy of transient elastography in the NAFLD population, Wong 2010B and Gaia 2011 were available for the lower threshold range and showed a sensitivity of 76–78% and a specificity of 91–95%. Yoneda 2008 and Yoneda 2010 were available for the higher threshold range and showed a sensitivity of 100% and a specificity of 97–98%. No studies were available for the medium threshold range. The GDG noted that the higher thresholds had an unexpectedly high sensitivity and discussed that this may be due to the smaller study size in comparison to the studies included for the lower threshold range. There was also a low prevalence of cirrhosis in these studies and it was noted that even a single diagnosis in either direction would impact considerably on the performance variables. In practice, the manufacturer suggest a threshold of around 10–11 kPa for diagnosis of cirrhosis in this population. The GDG discussed that it is often difficult to interpret the transient elastography reading in this population, but that the introduction of the XL probe has helped.
Data were available from 2 studies for the accuracy of ARFI and both studies showed similarly high sensitivity and specificity of ARFI. Overall, this was higher than that of transient elastography in the NAFLD population.

ALD
No relevant studies were identified looking at individual blood tests in a population with ALD. The GDG discussed the limitations of tests using AST in this population, as people with high alcohol consumption may have a raised AST. For blood fibrosis tests, there were only data available for APRI which showed a low sensitivity of 40% and specificity of 61% in this population. Transient elastography proved to be accurate in this population at both the lower threshold range and the higher threshold range. The lower threshold range gave a very high sensitivity of 100%, but the GDG noted the wide confidence intervals for all the accuracy measures in this population, perhaps due to the very small sample sizes of the studies. The GDG agreed that transient elastography should be included in the modelling for this population, but that the wide confidence intervals in the estimate should be reflected in the sensitivity analyses. It was discussed that for both blood fibrosis tests and imaging tests, care needs to be taken when interpreting results in people who are actively drinking. Alcohol consumption per se may alter the circulating blood levels of the individual markers irrespective of the degree of hepatic fibrosis. In addition, active alcohol consumption causes swelling and protein retention in liver cells which increases liver stiffness and so imaging tests may overestimate the degree of fibrosis. The presence of steatosis and inflammation in the liver has a similar effect and consequences. Although diagnostic tests should be performed at the point of first contact, the tests should be repeated after a period of abstinence in this population subgroup.

Primary biliary cholangitis (PBC)
In the PBC population, data were only available for the accuracy of transient elastography, which showed a high sensitivity and specificity. The GDG noted that this population would not be modelled, due to the limited data available.

HIV and HCV
The only evidence identified for the accuracy of the tests in people with multiple aetiologies was in people with HIV and HCV co-infection. The only data available for individual blood tests were for platelets which had a sensitivity of 63% and a specificity of 77%. For blood fibrosis tests there were data available for APRI and AST/ALT ratio. The GDG noted the poor sensitivity of both AST/ALT ratio and APRI using the high cut-off threshold, but the improved performance of APRI using a lower cut-off threshold. Transient elastography showed a very high sensitivity and specificity at both the low and medium threshold ranges. The limited number of patients and the low prevalence of cirrhosis were noted as these will affect the accuracy data.
Trade-off between net clinical effects and costsThree relevant published economic evaluations were identified for this review.
Canavan 2013 was a cost-utility analysis that compared annual liver biopsy, annual transient elastography and a no testing strategy in a cohort of patients with chronic hepatitis C. It found that liver biopsy was dominated by both alternatives (more expensive and less effective) and that annual transient elastography was cost- effective (ICER: £6,557 per QALY gained) when compared to no testing. The GDG noted that although the model structure was considered representative of the condition it lacked the inclusion of the new polymerase inhibitor treatments as a model parameter.
Steadman 2013 was a cost analysis that compared liver biopsy and transient elastography in 3 relevant patient subgroups. It found that liver biopsy had additional costs of £1,136, £2,001 and £3,841 per additional correct diagnosis when compared to transient elastography for the HBV, HCV and NAFLD groups respectively. The GDG could not reach a conclusion on whether these additional costs are a cost-effective price per correct diagnosis as the study did not take into account any further follow-up health costs or savings related to every test result. Additional limitations of this study included the use of observational studies to determine the accuracy of transient elastography.
Stevenson 2012 was a cost-utility analysis that compared 10 strategies (of which 6 were relevant to this review) in patients with alcohol-related liver disease. The model comparators included triage with transient elastography, FibroTest or ELF with a liver biopsy as confirmation test to those positive in the first test, and transient elastography, ELF or liver biopsy in single-test strategies. The study found that only liver biopsy was cost-effective at a £20,000 per QALY gained threshold. The GDG noted the presence of multiple limitations in this analysis since most quality of life values were obtained from a HCV cohort and some QALYs were based on assumptions. In addition, results were not subjected to probabilistic sensitivity analysis.

Original cost-effectiveness analysis was conducted for this guideline to address the cost-effectiveness of diagnostic tests for cirrhosis in adults with HBV, HCV, NAFLD and ALD.

Hepatitis C
This analysis found that liver biopsy was the most cost-effective test of the 17 tests compared in all 4 genotypes at a cost-effectiveness threshold of £20,000 per QALY gained. In second and third places were transient elastography at 13.0<15.0 threshold, the Castera algorithm or a combination of transient elastography (at 12.2 kPa) and ARFI (at 1.8 m/s) (testing positive for either test), depending on the genotype. There was minimal uncertainty in the ranking of liver biopsy with it ranking first in more than 90% of the simulations. There was moderate to high uncertainty in the rankings of the remaining tests.
The GDG acknowledged that this result is mainly attributed to the fact that cirrhosis misdiagnosis is associated with the incorrect administration of the highly costly polymerase inhibitor drugs. As a result, the economic model seemed to particularly favour the test with the highest diagnostic accuracy irrespective of its unit cost. The GDG also noted that the diagnostic accuracy of liver biopsy used in the model was set to 100% sensitivity and specificity since it served as the reference standard for the test comparisons. Therefore any model bias regarding the diagnostic accuracies of the tests is in favour of liver biopsy.

Hepatitis B
This analysis, using clinical effectiveness data from the clinical review conducted for NICE CG165,141 found that in HBeAg negative patients, APRI at 1.0 was the most cost-effective test of the 5 tests compared at a cost-effectiveness threshold of £20,000 per QALY gained, with transient elastography at 11.0 and FibroTest at 0.74 very close behind in second and third place respectively. For HBeAg positive patients, FibroTest at 0.74, transient elastography at 11.0 and APRI at 1.0 ranked in the first 3 positions with very similar cost-effectiveness figures at a cost-effectiveness threshold of £20,000 per QALY gained. There was considerable uncertainty in the results with all the non-invasive tests having wide confidence intervals and the 3 strategies listed all having the first ranking place within their confidence intervals.

NAFLD
This analysis found that in patients with NAFLD, transient elastography at >15.0 was the most cost-effective test of the 4 tests compared at a cost-effectiveness threshold of £20,000 per QALY gained, with ARFI at 1.636–1.9 in second place. There was considerable uncertainty in the results with the 3 non-invasive test strategies having similarly wide confidence intervals (first to fourth place).

ALD
This analysis found that in patients with ALD, none of the diagnostic tests was cost-effective at a cost-effectiveness threshold of £20,000 per QALY gained, with ‘no test – no monitoring’ and ’no test – monitor all’ ranking first and second. Incremental cost-effectiveness ratios that compared the 3 non-invasive tests with the ‘no test – no monitor’ strategy were, however, only just beyond the £20,000 threshold (£22,438–22,977), meaning that testing was cost-effective at a threshold of £30,000 per QALY gained. There was considerable uncertainty in the cost-effectiveness rankings, with all strategies but liver biopsy (consistently ranked last) having similarly wide confidence intervals (first or second to fifth place).

Conclusions
After taking into account the overall cost-effectiveness results of the original analysis for all 4 examined populations, the GDG acknowledged that there is significant variation in the cost-effectiveness of the diagnostic tests across the different aetiologies. The economic model suggested the use of a non-invasive test for NAFLD and HBV, the use of liver biopsy for patients with HCV and no testing for ALD patients. However, for the ALD cohort, the GDG concluded that testing is an appropriate strategy, as this cohort comprises the largest group of people with cirrhosis, and the population which has the highest risk of dying from cirrhosis. The GDG noted that defining who is most at risk of cirrhosis due to alcohol use is difficult due to the lack of a universally agreed definition, but that the criterion of exceeding 50 units per week for men or 35 units for women, constituting ‘harmful drinking’, was the highest threshold that could be set and thus the population selected will be at higher risk of cirrhosis than if a lower threshold of alcohol use had been chosen. The base case ICER for transient elastography compared to no testing was £22,438 per QALY gained, which is only slightly above a cost-effectiveness threshold of £20,000 per QALY. The range of the confidence interval around this base case showed that testing could be below £20,000 per QALY within the range of uncertainty, but could not be above £30,000 per QALY. One source of uncertainty is the effect of cirrhosis diagnosis on drinking behaviour. There are no clear data in this area, but if the diagnosis of cirrhosis has the effect of substantially increasing a person's likelihood of abstaining from or reducing consumption of alcohol, in addition to the positive effects on health of treating the cirrhosis, then this would make testing people with ALD for cirrhosis substantially more cost-effective.
As for the selection of the most appropriate non-invasive cirrhosis test, the GDG noted the practicality of recommending a common test for all aetiologies, and that there is an existing recommendation for people with hepatitis B in CG165. Taking these factors into account, the GDG recognised that there was adequate evidence across all aetiologies to conclude that transient elastography (at the appropriate threshold for each aetiology) is a cost-effective option for the diagnosis of cirrhosis irrespective of the underlying cause since, after taking combined model parameter uncertainty into consideration, transient elastography could rank first in 3 out of the 4 examined cirrhosis aetiologies.
For the hepatitis B cohort the GDG noted the similarcost-effectiveness of APRI and transient elastography. Transient elastography has been recommended in CG165 for assessing the stage of liver disease in people with hepatitis B, and these results are consistent with that recommendation with regard to testing for cirrhosis. For the NAFLD cohort, the GDG felt there was too much uncertainty in the model results to exclusively recommend transient elastography since ARFI exhibited similar cost-effectiveness and is similar in its availability, cost and ease of use. Some centres currently have access to transient elastography and others to ARFI, and there is no reason why whichever technique is most easily available should not be used.
The GDG acknowledged that liver biopsy ranked as the most cost-effective option modelled for people with hepatitis C, principally due to the very high price of drugs to treat hepatitis C and thus the high cost of misdiagnosis. However, this assumes that liver biopsy has 100% sensitivity and specificity. Although regarded as the reference standard, it is acknowledged that liver biopsy does not have perfect sensitivity and specificity and does misclassify some people. Without a more objective test to compare it against this misclassification cannot be quantified, but the lower the quality of the liver biopsy (determined by the number and length of the samples taken), the greater the risk of misclassification. This does however mean that the results of the economic model are biased slightly in favour of liver biopsy. In addition, the invasive nature of liver biopsy means that it causes adverse events, including a small risk of death. It is also considered unpleasant by patients, leading to a very low acceptability among patients. The GDG considered that if liver biopsy was the only option offered to people with hepatitis C then a large majority of patients would refuse any testing and so not be diagnosed with (or without) cirrhosis. Diagnosis of cirrhosis status is also required to determine the correct drugs to use to treat hepatitis C itself, and therefore if cirrhosis status cannot be determined then hepatitis C cannot be treated. Recommending only liver biopsy would therefore be likely to cause a severe negative impact on people with hepatitis C.
The GDG also highlighted the fact that many people have multiple aetiologies (for example hepatitis C and ALD) and that recommendations should have some consistency across the different aetiologies for this reason.
However, liver biopsy should be permitted if people with hepatitis C wish to choose it, aware of the risks and benefits. Therefore the GDG recommended that liver biopsy should be considered in hepatitis C patients when transient elastography is not suitable, for example if the person opts for liver biopsy given an informed choice.
For people with NAFLD and for people suspected for ALD or drinking at harmful levels, liver biopsy is not the most cost-effective option. It is expected that most of these people will prefer and choose transient elastography. However, liver biopsy remains the most accurate and authoritative test and may be appropriate in some cases, including when transient elastography cannot be successfully used. Therefore, as with the hepatitis C population, the GDG recommended that liver biopsy should be considered in these people when transient elastography is not suitable, for example in someone who has not abstained from alcohol for at least 6 weeks prior to testing. Again, this must be an informed choice by the patient.
Further results of the original economic model showed that retesting annually was not cost-effective compared to retesting every 2 years in any of the populations. The GDG hence advised that people with hepatitis C, ALD or NAFLD with advanced fibrosis found to be negative for cirrhosis should be retested for cirrhosis using the same tests every 2 years if they still have the underlying condition. People whose hepatitis C has been cured in the meantime, people who have stopped drinking or whose NAFLD has improved, do not require retesting. The GDG noted the recommendation of CG165141 that people with hepatitis B not undergoing antiviral treatment should be retested annually with transient elastography for assessment of progression of liver disease.
Quality of evidenceThe majority of the evidence was of Very low or Low quality, with some exceptions. The main reason for downgrading the quality of the evidence was the risk of bias. The majority of the included studies were at high risk of bias because of perceived inadequacy of the liver biopsy. The GDG felt that the variation in biopsy length of the reference standard would dramatically impact on the accuracy of the index test, and any heterogeneity between studies might be attributed to this. The other reason for downgrading the quality of the evidence was the imprecision around the effect estimates as seen by wide confidence intervals.
Of the 252 full text articles ordered, the main reason that evidence was excluded was because the criteria for the biopsy length did not match the review protocol. The GDG discussed that it was a balance between excluding too many studies and only including studies where they could be confident that the evidence represented the true accuracy of the test. The GDG agreed that the biopsy standard was important and if this were relaxed further then it would reduce their confidence in the evidence. The current recommended biopsy length in the UK is 25 mm containing at least 10 portal tracts. It was agreed that including all evidence, even from studies using biopsies smaller than 15 mm or 6 portal tracts, or not stating the biopsy criteria, would have a profound effect of the accuracies of the diagnostic tests, as they would be compared to a reference standard of lower accuracy and one that does not reflect the measure of adequacy of liver biopsy used in the UK today. The GDG also reviewed the number of studies that had been excluded for this reason, and whether a different standard would dramatically increase the available evidence. They found that the evidence base for HCV would be increased, but there was not a dramatic effect for the other aetiologies where evidence is lacking. Therefore, it was agreed that it was not worth compromising the quality of the evidence.
The other main exclusion reason was studies including populations with cirrhosis of varying aetiology. The GDG confirmed these studies should be excluded due to different test performance for each aetiology, as outlined above.
Other considerationsIn determining who should be tested using the diagnostic tests investigated, the GDG had recourse to recommendation 1 (see Section 5.7). They noted that:
  • All people diagnosed with hepatitis B are recommended by the NICE Hepatitis B guideline (CG165)141 to be tested for stage of liver disease using transient elastography.
  • All people diagnosed with hepatitis C require knowledge of their cirrhosis status to determine the appropriate drugs to use to treat their hepatitis C.
  • There is no widely accepted definition of who has or should be suspected of alcohol-related liver disease short of histopathological diagnosis using liver biopsy following taking a history of alcohol use. The GDG agreed that those who should be suspected as at high risk of cirrhosis due to alcohol misuse are men drinking more than 50 units per week and women drinking more than 35 units per week over a period of at least several months. In addition, anyone already diagnosed with alcohol-related liver disease by a specialist should also be tested for cirrhosis. Retesting for people who drink excessively but have not been diagnosed with alcohol related liver disease was not recommended as no adequate evidence was identified to support it. Therefore people receiving an initial negative cirrhosis diagnosis should be monitored for alcohol related liver disease according to existing NICE guidance (CG100){National Clinical Guideline Centre, 2010 CG100/id}.
  • People with obesity or type 2 diabetes are only at risk of cirrhosis through developing NAFLD. People with obesity or type 2 diabetes without NAFLD are not at immediate risk of cirrhosis and do not need to be tested.
  • ‘Non-alcoholic fatty liver disease (NAFLD): assessment and management’142 recommends testing those diagnosed with NAFLD for advanced fibrosis. The guideline recommends using the ELF test, with a threshold of 10.51 to test for advanced fibrosis, as it was found to be the most diagnostically accurate test, and to be cost-effective compared to all other testing and non-testing strategies at a cost-effectiveness threshold of £20,000 per QALY gained. As all those who will go on to develop cirrhosis will first develop advanced fibrosis it is sufficient to test those with both NAFLD and advanced fibrosis for cirrhosis; people with NAFLD but without advanced fibrosis do not need to be tested for cirrhosis. The GDG therefore adopted the subgroup of people with NAFLD and advanced fibrosis (as determined by testing using ELF) as the population of interest in testing people with NAFLD for cirrhosis.
The GDG discussed in more detail the use of these tests and their applicability. It was agreed that the combination of a blood fibrosis test and an imaging test would be theoretically beneficial, as they measure different biological aspects of the disease, therefore they should give independent results and complement each other. Whilst imaging tests assess the current level of fibrosis in the liver but give little idea of whether the process of fibrosis is ongoing, blood fibrosis tests are more dynamic. The GDG discussed that another benefit of performing a combination of tests in people with ALD is that seeing multiple test results could encourage people to make necessary changes in their lifestyle, for example abstain from alcohol or at least reduce their alcohol consumption. A positive response may also be seen in patients with hepatitis B or C, as they may be encouraged to be concordant with antiviral treatment. For it to lead to greater accuracy than a single test alone, a combination of 2 tests would need to include a first test with high sensitivity and a second test with high specificity. It would be preferable for the first test to be a blood test (cheap and easy to conduct on a large number of people) with the second test being an imaging test, conducted on the smaller group testing positive for the first test. However, in practice no such combinations of tests were available given the diagnostic accuracy results of the individual tests, and so no combinations were considered apart from those algorithms tested within a single study included in the clinical review.
There was a general discussion that many of the papers the GDG were familiar with were not included in this review. It was noted that the current review looked at the accuracy of the tests specifically for the diagnosis of cirrhosis (for example F4 stage only when the METAVIR scoring system was used). This excluded a number of papers which grouped F3 and F4 together. In particular, the GDG noted that there is an evidence base looking at the accuracy of ELF for the diagnosis of ‘advanced fibrosis’ (F3 and F4 grouped together), but not for cirrhosis alone.
The GDG discussed the tests that should be considered in the economic modelling. It was noted that in HCV there was no need for a diagnostic test of fibrosis in primary care because all patients will be referred to secondary care for an assessment and treatment. The GDG felt that the preferred diagnostic test for a HCV population should be transient elastography or ARFI.
The GDG considered 2 additional methods of scoring the liver biopsy: the Batts-Ludwig and the Scheuer scoring systems, and questioned whether these should have been included in the protocol for the review, as another option for the reference standard. However, the GDG agreed that evidence from studies using these reference standard measures are in the minority compared to METAVIR. Also, it noted that changes in the reference standard can alter the sensitivity and specificity values of the tests. The GDG agreed that these reference standard methods should remain excluded from the review.
It was agreed that transient elastography needs to be performed according to the manufacturer's instructions and operators need to be fully trained. It was also a concern that with the introduction of non-invasive tests, skills amongst new clinicians in performing liver biopsies may be lost, before the standard of the non-invasive tests have been developed to an acceptable level and are widely available.
The GDG agreed that anyone diagnosed with cirrhosis should be referred to a hepatologist (or someone with more than 50% work time commitment to hepatology) for initial assessment. In some situations it may be appropriate for the patient's clinical and supportive care to be offered within primary care or by means of a shared-care model.
Copyright © National Institute for Health and Care Excellence 2016.
Bookshelf ID: NBK385207

Views

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...