12Methotrexate and monitoring for hepatotoxicity

Publication Details

The risk of liver fibrosis is an accepted but unknown risk associated with methotrexate. Histological evaluation of a liver biopsy specimen is currently the gold standard for diagnosing, staging and monitoring liver fibrosis due to any cause but the procedure of liver biopsy carries significant morbidity and mortality, and is disliked by patients. The need for liver biopsy is commonly cited as a reason for dissatisfaction with treatment by patients, or for discontinuing therapy when biopsy is felt to be necessary347. In addition, the technique is subject to sample errors, since the samples collected are very small and pathological change may not be evenly distributed, and interpretation varies amongst histologists depending on level of experience, size of biopsy and use of staging/scoring system. Given the limitations of liver biopsy, significant effort has been invested in identifying clinical useful, non invasive markers of liver fibrosis that allow identification and quantification of liver fibrosis24. Fibroelastography (achieved using the FibroScan®) gives a measure of liver of elasticity (and therefore fibrosis) by measuring reflected ultrasound echoes before and during compression of the liver. The degree of displacement is related to the tissue elasticity stiffness. This method has been used to evaluate and track fibrosis in chronic liver disease404, and, as indicated in recent systematic review and economic analysis by the NHS Centre for Evidence-based purchasing60, may have clinical utility for the detection and monitoring of fibrosis due to other causes. Serum biomarkers of liver fibrosis focus on indirect markers of liver function or direct markers of extracellular matrix components or the enzymes involved in their turnover. Indirect markers of liver function include aspartate aminotransferase (AST), alanine aminotransferase (ALT), c-glutamyl transpeptidase(c-GT), hyaluronic acid, apolipoprotein A1, bilirubin, a2-macroglobulin, haptoglobin, cholesterol, homeostasis model assessment of insulin resistance, platelets and prothrombin time. Direct markers of liver function include collagen IV, collagen VI, tissue inhibitor of metalloproteases-1 (TIMP-1), laminin, human cartilage glycoprotein-39 (YKL-40), tenascin, undulin, matrix metalloproteinase-2 (MMP-2) and pro-collagen III propeptide (PIIINP)253. Some of these biomarkers have been combined to improve clinical utility (for example, the European Enhanced Liver Fibrosis ELF panel which combines hyaluronic acid, TIMP-1 and PIIINP measurements).

For the last 5 – 10 years, serial measurement of PIIINP has become standard practice34 for monitoring for liver fibrosis in patients on methotrexate, with elevated levels indicating the need for treatment cessation and/or consideration of liver biopsy. Given the high level of concern amongst clinicians and patients about methotrexate-associated liver dysfunction and the plethora of new indirect markers of liver disease, the GDG agreed it important to review the evidence for the clinical utility and validity of these markers of liver fibrosis in the context of psoriasis and treatment with methotrexate in order to optimise the safe use of this drug, and minimise the need for liver biopsy.

The GDG agreed to pose the following question: in people with psoriasis (all types) who are being treated with methotrexate or who are about to being treatment with methotrexate, what is the optimum non-invasive method of monitoring hepatotoxicity (fibrosis or cirrhosis) compared with liver biopsy?

12.1. Methodological introduction

12.1.1. Review methods

A literature search was conducted for diagnostic cohorts or case control studies that assessed the accuracy of non-invasive diagnostic tools to detect liver fibrosis or cirrhosis in people with psoriasis being treated or considered for treatment with methotrexate, compared with diagnosis by the reference standard of liver biopsy.

No time limit was placed on the literature search and there were no limitations on sample size or duration of follow-up. Indirect populations were excluded.

The relevant population for these diagnostic tools will be those with psoriasis who are at risk of developing liver damage as a result of exposure or planned exposure to methotrexate. The intended role of the index test would be for use by dermatologists to identify those suspected of having clinically significant liver damage in order to refer only these people on for expert assessment and, therefore, reduce the need for the invasive procedure of liver biopsy. Consequently, it is most important that the test is able to accurately rule-out a diagnosis, so that very few people with liver damage are missed for referral, although a reasonable accuracy for ruling-in a diagnosis would also be desirable to avoid referring too many people inappropriately.

The outcomes considered were:

  • Sensitivity
  • Specificity
  • Positive predictive value (PPV)
  • Negative predictive value (NPV)
  • Likelihood ratios (LRs)

The comparisons considered were any of the following diagnostic tests compared with liver biopsy:

  • imaging techniques: liver ultrasound, liver scintigraphy, ultrasound elastography (achieved using the FibroScan)
  • serum markers: serial pro-collagen III (PIIINP), the enhanced liver fibrosis (ELF) panel (tissue inhibitor of matrix metalloproteinase 1 (TIMP 1), hyaluronic acid (HA) and pro-collagen III), and FibroTest
  • AST to platelet ratio index (APRI)
  • Standard liver function tests (e.g., alanine aminotransferase (ALT), alkaline phosphatase (AP), aspartate aminotransferase (AST), total bilirubin, albumin, total protein, lactate dehydrogenase (LDH), gamma-glutamyl transferase (GGT) and prothrombin time (PT))

It was recognised that there was great variability in the literature regarding definitions of abnormal results on both liver biopsy and non-invasive tests. For the liver biopsy findings, any definition of fibrosis or cirrhosis, regardless of the classification scale, was accepted as indicating clinically significant liver damage. However, studies that limited the definition to at least marked fibrosis were excluded as they may overestimate the sensitivity by removing the potentially more difficult to diagnose milder end of the fibrosis spectrum. Additionally, fibrosis and cirrhosis were considered together as there were few cases of cirrhosis reported and many studies did not give the number with fibrosis and cirrhosis separately, although it is accepted that cirrhosis represents a greater clinical burden. The experience of the pathologist assessing the biopsy sample and the adequacy of sampling of the histological specimen are probably more important in terms of accurate diagnosis than the classification system used, but these were rarely stated in the studies. For the non-invasive tests, the definition of abnormal liver function provided in the study was accepted for use in the analysis, because, for example, there are no universally accepted reference ranges for liver function tests and the ranges may differ according to the population being studied (anything above the upper limit of normal was accepted as an abnormal reading in this review).

It was not possible to analyse the data using diagnostic meta-analysis (because there were no cases with at least 5 studies addressing the same reference standard and index tests, population and outcomes) or the standard version of GRADE. Therefore, a modified version of GRADE has been used and a narrative summary provided. The statistics used for this diagnostic review differ from those used in intervention reviews, and a definition for each of them is provided below (Table 155). Although no meta-analysis has been performed, forest plots are provided presenting the sensitivity and specificity of the tools compared with biopsy findings as reported in the studies individually (Appendix J). There are no forest plots for one study278, as insufficient raw data were available.

Table 155. Definitions of summary statistics for diagnostic accuracy studies.

Table 155

Definitions of summary statistics for diagnostic accuracy studies.

Positive and negative predicative values are dependent on disease prevalence (pre-test probability) and so need to be interpreted together with prevalence, in the context of how test results modify the probability of disease (post-test probabilities). Consider that the lower the prevalence of disease the more certain we can be that a negative test indicates no disease, and the less certain that a positive result truly indicates the presence of disease. A note on how to interpret post-test probabilities/predictive values in the light of the disease prevalence is provided in Appendix Q. Fifteen diagnostic studies28,34,64,117,149,217,239,241,254,278,288,305,335,432,433 were found that addressed the question and were included in the review. No studies were available that from an exclusively paediatric population.

These studies differed in terms of:

  • Mean age (range 46 to 55 years)
  • Gender: % male (range 52 to 71.4%)
  • Sample size (range N=15 to N=168)
  • Prevalence of fibrosis and cirrhosis (6.9–69.5%)
  • Unit of analysis
    • 8 studies used only one index test and one reference standard per person34,117,149,254,305,335,432,433
    • 3 studies included multiple paired index and reference tests per person64,217,241
    • 1 study included only single pre-MTX tests but multiple paired tests post-MTX288
    • In 2 studies it was unclear whether the results were based upon single tests or multiple paired tests per person28,278
    • 1 study included more than one index and reference test per patient, and also more than one index test per reference standard (i.e. the biopsy was paired with more than one index test)239

A summary of the methodological quality of the included studies according to QUADAS II criteria is provided in Table 156.

Table 156. Summary of study quality.

Table 156

Summary of study quality.

12.1.2. Study details – methods and results

The study methods are graded in the evidence profile (Table 157) and a summary of the study results is provided in Table 158. In the narrative below, methodological flaws according to the QUADAS II criteria are noted as points to suggest caution when interpreting results.

12.1.2.1. Liver function tests

Methods

Five studies were found that investigated the diagnostic accuracy of liver function tests in people with psoriasis eligible to receive methotrexate. The reference standard biopsy classification varied between the studies; two studies278,288 used the Roenigk classification system, 2 studies used a system similar to Robinson grading149,305 and in one paper the classification system was unclear217.

Two of the studies limited the population to those with known217 or suspected149 fibrosis. Two of the studies149,278 had an unclear method for determining the index test threshold, which could have meant that a cut-off was chosen in a post-hoc manner to optimise the apparent sensitivity of the test. Three of the studies149,217,305 had an unclear period of time between the index test and reference standard.

Results

Sensitivity: of patients with fibrosis or cirrhosis on biopsy, the proportion expected to test positive

  • Albumin: 19–29%
  • ALT: 5–40%
  • AP: 38–57%
  • AST: 20–43%
  • Bilirubin: 0–20%
  • Galactose: 14%
  • GGT: 33%
  • Prothrombin time: 1%

Specificity: of patients without fibrosis or cirrhosis on biopsy, the proportion expected to test negative

  • Albumin: 76–100%
  • ALT: 85–92%
  • AP: 71–76%
  • AST: 86–100%
  • Bilirubin: 86–96%
  • Galactose: 94%
  • GGT: 63%
  • Prothrombin time: 99%

Positive predictive value (figure in brackets is value-added PPV; the improvement in ability to determine a positive diagnosis over and above the known prevalence): if the liver function test was positive the probability of having liver fibrosis or cirrhosis (PPV) was:

  • AP: 15–60% (5 to 16%)
  • ALT: 22–67% (22–39%)
  • Albumin: 33–100%
  • Bilirubin: 0–41% (−47 to 23%)
  • Prothrombin time: 25% (NA)
  • AST: 29–100% (19–53%)
  • GGT: 40% (−2.9%)
  • Galactose: 83% (13.8%)

Negative predictive value: if the liver function test was negative the probability of not having liver fibrosis or cirrhosis (NPV) was:

  • Albumin: 61–62% (38–39% chance of having liver fibrosis or cirrhosis despite having a negative test)
  • ALT: 52–80% (20–48% chance of having liver fibrosis or cirrhosis despite having a negative test)
  • AP: 60–92% (8–40% chance of having liver fibrosis or cirrhosis despite having a negative test)
  • AST: 62–93% (7–38% chance of having liver fibrosis or cirrhosis despite having a negative test)
  • Bilirubin: 50–91% (9–50% chance of having liver fibrosis or cirrhosis despite having a negative test)
  • Galactose: 32% (68% chance of having liver fibrosis or cirrhosis despite having a negative test)
  • GGT: 56% (44% chance of having liver fibrosis or cirrhosis despite having a negative test)
  • Prothrombin time: 66% (34% chance of having liver fibrosis or cirrhosis despite having a negative test)

Positive likelihood ratio: in a person with compared to a person without liver fibrosis or cirrhosis, the number of times more likely a positive test result is:

  • Albumin: infinity
  • AP: 1.71–2.03
  • ALT: 2.6–5.2
  • AST: 3.13-infintiy
  • Bilirubin: 1.57–4.7
  • Galactose: 2.19
  • GGT: 0.89

Negative likelihood ratio: in a person without compared to a person with liver fibrosis or cirrhosis, the number of times more likely a negative test result is:

  • Albumin: 1.4
  • AP: 1.3–1.7
  • ALT: 1.4–1.5
  • AST: 1.4–1.5
  • Bilirubin: 0.88–1.2
  • Galactose: 1.1
  • GGT: 0.93

Additional information

  • One study288 assessed subgroups before and during methotrexate treatment and showed no consistent trend among the different liver function tests for differing accuracy before and after treatment was commenced
  • One study288 assessed the statistical association between abnormal liver function tests and biopsy grade III or IV, adjusted for age and history of cholecystitis. This study found that there was a significant association between grade III or IV biopsy findings and abnormal AST, but not ALP or bilirubin, levels
  • In one study149, the one case of cirrhosis was not detected by abnormal liver function tests

12.1.2.2. Liver scintigraphy

Methods

Three studies117,241,254 were found that investigated the diagnostic accuracy of liver scintigraphy in people with psoriasis eligible to receive methotrexate. The reference standard biopsy classification varied between the studies; one study117 used the Roenigk classification system, one study241 graded fibrosis as none, very mild, mild, moderate or severe based on the method of Warin et al (abnormal was defined as at least moderate fibrosis, which maps on to the fibrosis assessed on the Roenigk scale) and the final study254 graded the biopsy according to steatosis, inflammation, fibrosis (graded mild, moderate or severe) and cirrhosis. The definition of abnormal on the liver scan also varied between the studies: one study117 counted the presence of any one from heterogeneous uptake, hepatomegaly, extra hepatic uptake and focal defects; another254 assessed the size of the liver and spleen, the pattern of uptake in these organs and the degree of extrahepatic uptake; and the third241 classified abnormal as a portal contribution of <50% of total hepatic uptake of colloid at 30s. None of the studies specified whether the assessors were blinded to the results of the first test.

Results

Sensitivity and specificity: The findings for the sensitivity and specificity of liver scans varied between the studies. The sensitivity ranged from 50.0 to 83.3% and specificity from 64.7 to 81.5%. Sensitivity and specificity were highest in the study that defined abnormal results on the scan as <50% portal contribution, which also had by far the lowest prevalence of liver fibrosis or cirrhosis and used the definition of at least moderate fibrosis.

Positive predictive value/negative predictive value: If the scan was positive the probability of having liver fibrosis or cirrhosis (PPV or proportion of patients with a positive test who are correctly diagnosed) ranged from 25 to 40% and if the scan was negative the probability of not having liver fibrosis or cirrhosis (NPV or proportion of patients with a negative test who are correctly diagnosed) ranged from 78.6 to 98.5% (1.5 to 21.4% chance of having fibrosis or cirrhosis despite having a negative test).

Given that the pre-test probabilities of having fibrosis/cirrhosis were 29.2, 6.9 and 24.5% in the three populations, this means that the liver scan improves the ability to determine a positive diagnosis (over and above the known prevalence) by 10.8 to 18.8% and a negative diagnosis by 5.3 to 7.8%.

Likelihood ratio: A positive test result ranged from 1.62 to 4.50 times more likely in a person with compared to a person without fibrosis/cirrhosis, and a negative test result ranges from 1.5 to 5.0 times more likely in a person without compared to a person with fibrosis/cirrhosis. Both the positive and negative likelihood ratios were much more favourable in the study that defined abnormal results on the scan as <50% portal contribution, which also had by far the lowest prevalence of liver fibrosis or cirrhosis and used the definition of at least moderate fibrosis241.

Additional information

One study241 noted that the one false negative result had a portal contribution of 51% so a slight alteration in the threshold would have resulted in all patients with portal fibrosis to be detected by the scan.

In one study117, the two cases of cirrhosis were correctly identified.

12.1.2.3. Liver ultrasound

Methods

Two studies64,254 were found that investigated the diagnostic accuracy of liver ultrasound in people with psoriasis eligible to receive methotrexate. The reference standard biopsy classification varied between the studies; one study254 graded the biopsy according to steatosis, inflammation, fibrosis (graded mild, moderate or severe) and cirrhosis and the other study64 graded the biopsy by subjective microscopic assessment based on the method of Warin et al of fat, inflammation, fibrosis (each graded 0, 0.5, 1, 2, or 3) and cirrhosis (not graded). The definition of abnormal on the ultrasound scan also varied between the studies: one study counted the presence of abnormalities in any one from liver size, shape, echo pattern and information about the biliary and vascular system according to a standard proforma while the other assessed fatty change and fibrosis (only those showing fibrosis were counted as positive tests).

One study254 did not specify whether the assessors were blinded to the results of the first test.

Results

Sensitivity and specificity: The findings for the sensitivity and specificity of ultrasound scans varied between the studies. The sensitivity ranged from 0 to 19% and specificity from 86 to 100% for detecting any degree of fibrosis and were 25% and 100%, respectively, for detecting portal fibrosis (in accordance with Roenigk criteria).

Positive predictive value/negative predictive value: If the ultrasound scan was positive the probability of having liver fibrosis or cirrhosis (PPV or proportion of patients with a positive test who are correctly diagnosed) ranged from 0 to 100% and if the scan was negative the probability of not having liver fibrosis or cirrhosis (NPV or proportion of patients with a negative test who are correctly diagnosed) ranged from 57 to 73% (27 to 43% chance of having fibrosis or cirrhosis despite having a negative test).

Given that the pre-test probabilities of having fibrosis/cirrhosis were 24.5, 48.2 and 37.0% in the three populations, this means that the liver scan improves the ability to determine a positive diagnosis (over and above the known prevalence) by −24.5 to 63.0% and a negative diagnosis by −2.5 to 6.0%.

Likelihood ratio: A positive test was infinitely more likely in a person with compared to a person without fibrosis/cirrhosis in two studies but equally likely in another study, and a negative test result ranged from 0.86 to 1.2 times more likely in a person without compared to a person with fibrosis/cirrhosis.

The difference in accuracy for detecting any compared with portal fibrosis was less pronounced than with scintigraphy

Additional information
  • In one study254 ultrasound failed to detect any of the three cases of fibrosis or cirrhosis.

12.1.2.4. PIIINP

Methods

Four studies34,239,335,432,433 were found that investigated the diagnostic accuracy of PIIINP assays in people with psoriasis eligible to receive methotrexate. The reference standard biopsy classification varied between the studies; one study239 used the Roenigk classification system, one study34 graded the biopsy according to steatosis, inflammation, fibrosis and cirrhosis and the other two studies did not define the classification systems used432,433. All studies conducted more than one assessment of PIIINP per person and the threshold for an abnormal PIIINP assay was >4.2 μg/l (based on the reference range in Finnish blood donors); however, the manufacturer’s information leaflet states that the reference range is 2.3–6.4 μg/l based on PIIINP values of apparently healthy adults (19–65 years), although variations in population demographics may mean that slightly different reference limits apply across populations.

Although all studies performed more than one PIIINP assay per person, for the analysis of diagnostic accuracy not all of the test results were always included:

  • One study34 serially assessed PIIINP and used only the PIIINP assay taken at the time of first biopsy
  • One study335,433 had serial PIIINP assays in 11 out of 74 participants and used the PIIINP assay taken at the time closest to biopsy
  • One study239 included multiple PIIINP assays from serial assessments and multiple biopsies per patient in the analysis (with some biopsies counted more than once as they were paired with more than one PIIINP assay), and only included biopsies with PIIINP tests within 6 months before and 6 months after biopsy
  • The final study432 serially assessed PIIINP but classed participants as positive on biopsy or PIIINP if at least one of their tests was abnormal (but it is unclear how many abnormal test results they may also have had).

Two studies432,433 had an unclear period of time between the measurement of the index test and the reference standard, which may have meant that the clinical condition of the individual had changed in the time that elapsed between the assessments.

One study432 performed serial analyses of PIIINP and multiple biopsies per patient but did not include all of the PIIINP or biopsy results in the analysis; therefore, those who tested positive (based on at least one abnormal result) could also have had several negative tests. This study was still considered eligible for inclusion as those classed as negative would not have had even a single elevated PIIINP or abnormal biopsy result among the multiple test results, which is informative as we are interested in a screening test most able to accurately determine those who do not have liver abnormalities.

Results

Note that PIIINP elevation can be due to an increase in fibrosis (and so cleaving of pro-collagen) anywhere in the body. Therefore, in those with psoriasis and arthritis it is possible that any elevation in PIIINP is due to the arthritis rather than the liver. In the available studies the proportion with PsA ranged from 22–46%, but was unclear in two studies34,335.

In one study34 the range of PIIINP values in a control group of 11 people with PsA and no MTX exposure was 2.2–4.6 ng/ml.

In the study239 with 22% PsA, 4 of 6 grade II biopsies from 4 patients with inflammatory arthritis had elevated PIIINP in all associated readings and the other two biopsies had some abnormal PIIINP readings.

In one pilot study335 one out of 11 participants with PsA gave a false positive result, and this participant had steatosis on biopsy. This was the only false positive in the study. Note that in a sub-group analysis of 10 people with PsA and 13 people with psoriasis but no arthritic component the accuracy for ruling out was actually higher in the group with PsA (sensitivity 100% vs 33% and NPV 100% vs 40%); however, the sample sizes in the subgroups were very small.

In the final study432 38.6% had PsA and one of the two false positives was a participant with PsA.

Sensitivity and specificity: The findings for the sensitivity and specificity of PIIINP varied between the studies. The sensitivity ranged from 62.5 to 100% and specificity from 63.6 to 97.9%. Note that the sensitivity and specificity were high in the study with the highest risk of bias and the lowest prevalence432, which did not include all of the PIIINP assay results in the analysis.

Positive predictive value/negative predictive value: If the PIIINP assay was positive the probability of having liver fibrosis or cirrhosis (PPV or proportion of patients with a positive test who are correctly diagnosed) ranged from 23.4 to 95.0% and if the scan was negative the probability of not having liver fibrosis or cirrhosis (NPV or proportion of patients with a negative test who are correctly diagnosed) ranged from 88.5 to 100% (0 to 11.5% chance of having fibrosis or cirrhosis despite having a negative test).

Given that the pre-test probabilities of having liver fibrosis or cirrhosis were 24.1, 5.8, 13.7 and 34.7% in the four populations, this means that the PIIINP assay improves the ability to determine a positive diagnosis (over and above the known prevalence) by 9.7 to 60.3% and a negative diagnosis by 5.6 to 23.2%. Note that the value-added PPV was markedly higher in the two Zachariae studies432,433.

Likelihood ratio: A positive test result ranged from 1.93 to 36 times more likely in a person with compared to a person without fibrosis/cirrhosis, and a negative test result ranged from 1.79-times to infinitely more likely in a person without compared to a person with fibrosis/cirrhosis.

The two Zachariae studies432,433 demonstrated markedly higher values for sensitivity and PPV than the other two studies.

Additional information
  • One study239 noted that three liver biopsies in two morbidly obese patients who also had maturity-onset diabetes were graded II on Roenigk classification but showed signs of NASH (rather than portal fibrosis, which is more often associated with MTX use).
  • In one study34 the three cases of cirrhosis were all correctly identified and the sensitivity and specificity for detecting fibrosis alone were 81% and 62%, respectively, based on one biopsy per patient.

12.1.2.5. Fibrotest and fibroscan

Methods

One study28 was found that investigated the diagnostic accuracy of Fibrotest and Fibroscan in people with psoriasis eligible to receive methotrexate. The reference standard biopsy classification was based on the Metavir system and the definition of abnormal was Metavir >F2. The definition of abnormal on the Fibrotest was defined by a cut-off of 0.31 and on Fibroscan by a cut-off of 7.1kPa based on the literature.

This study did not state whether the population was based on a consecutive sample and there could have been up to 18 months between the index test and reference standard being undertaken, which could be long enough for the liver to develop fibrosis or cirrhosis. Additionally, for Fibroscan there was some discrepancy between the details in the text and the reported diagnostic accuracy statistics.

Results

Sensitivity and specificity: The sensitivity was 83% for Fibrotest and 50% for Fibroscan, while the specificities were 61% and 88%, respectively

Positive predictive value/negative predictive value: If the Fibrotest was positive the probability of having liver fibrosis or cirrhosis (PPV or proportion of patients with a positive test who are correctly diagnosed) was 42% and if the test was negative the probability of not having liver fibrosis or cirrhosis (NPV or proportion of patients with a negative test who are correctly diagnosed) was 92% (8% chance of having fibrosis or cirrhosis despite having a negative test). The PPV for Fibroscan was 33% while the NPV was 86% (14% chance of having fibrosis or cirrhosis despite having a negative test).

Given that the pre-test probability of having fibrosis/cirrhosis was 25% for the Fibrotest population, this means that the liver scan improves the ability to determine a positive diagnosis (over and above the known prevalence) by 16.7% and a negative diagnosis by 16.7%. It was not possible to calculate the valued-added predictive values for Fibroscan as the population sample used for the calculation of PPV and NPV was unclear.

Likelihood ratio: For Fibrotest, a positive test was 2.14-times more likely in a person with compared to a person without fibrosis/cirrhosis, and a negative test was 3.7-times more likely in a person without compared to a person with fibrosis/cirrhosis. Again, it was not possible to calculate this statistic for Fibroscan as the 2×2 table could not be verified.

Additional information

In nine patients, Fibroscan and Fibrotest resulted in different Metavir scores with a discordance of two stages. In four of them, the total Fibroscan procedure failed because of the presence of obesity. In the remaining five, biopsy length was significantly shorter than the biopsy length of the remaining patients.

12.2. Non-invasive liver tests vs. liver biopsy

12.2.1. Evidence profile

Table 157. Modified GRADE profile for the diagnostic accuracy of tools to detect liver fibrosis or cirrhosis.

Table 157

Modified GRADE profile for the diagnostic accuracy of tools to detect liver fibrosis or cirrhosis.

12.2.2. Evidence summary

Table 158. Summary statistics for diagnostic accuracy of tools for fibrosis and cirrhosis.

Table 158

Summary statistics for diagnostic accuracy of tools for fibrosis and cirrhosis.

12.2.3. Evidence statements

The following statements are organised by outcome and ordered to list the tests in approximate order from the best to the worst diagnostic accuracy according to that measure.

Sensitivity: of patients with fibrosis or cirrhosis on biopsy, the proportion expected to test positive

  • PIIINP: 62.5 to 100% [4 studies; 264 participants; moderate to very low quality evidence]34,239,432,433
  • Scintigraphy (portal contribution): 83% [1 study; 63 participants; very low quality evidence]
  • Fibrotest: 83% [1 study; 24 participants; very low quality evidence]28
  • Scintigraphy (abnormalities): 50–57% [2 studies; 73 participants; very low quality evidence]117,254
  • AP: 38–57% [3 studies; 200 participants; low to very low quality evidence]278,288,305
  • Fibroscan: 50% [1 study; 24 participants; very low quality evidence]28
  • AST: 20–43% [3 studies; 235 participants; low to very low quality evidence]278,288,305
  • Gamma-glutamyl transferase: 33% [1 study; 15 participants; very low quality evidence]305
  • ALT: 5–40% [2 studies; 186 participants; very low quality evidence]149,278
  • Ultrasound (portal fibrosis): 25% [1 study; 28 participants; very low quality evidence]64
  • Albumin: 19–29% [2 studies; 183 participants; low to very low quality evidence]278,305
  • Bilirubin: 0–20% [3 studies; 200 participants; low to very low quality evidence]278,288,305
  • Ultrasound (any fibrosis): 0 to 19% [2 studies; 77 participants; low to very low quality evidence]64,254
  • Galactose: 14% [1 study; 45 participants; very low quality evidence]217
  • Prothrombin time: 1% [1 study; 168 participants; low quality evidence]278

Specificity: of patients without fibrosis or cirrhosis on biopsy, the proportion expected to test negative

  • Ultrasound (portal fibrosis): 100% [1 study; 28 participants; very low quality evidence]64
  • Prothrombin time: 99% [1 study; 168 participants; low quality evidence]278
  • Ultrasound (any fibrosis): 86 to 100% [2 studies; 77 participants; low to very low quality evidence]64,254
  • AST: 86–100% [3 studies; 235 participants; low to very low quality evidence] 278,288,305
  • Bilirubin: 86–96% [3 studies; 200 participants; low to very low quality evidence]278,288,305
  • Galactose: 94% [1 study; 45 participants; very low quality evidence]217
  • ALT: 85–92% [2 studies; 186 participants; very low quality evidence]149,278
  • Albumin: 76–100% [2 studies; 183 participants; low to very low quality evidence]278,305
  • Fibroscan: 88% [1 study; 24 participants; very low quality evidence]28
  • Scintigraphy (portal contribution): 82% [1 study; 63 participants; very low quality evidence] 241
  • PIIINP: 63.6 to 97.9% [4 studies; 264 participants; moderate to very low quality evidence]34,239,432,433
  • Alkaline phosphatase: 71–77% [3 studies; 200 participants; low to very low quality evidence]278,288,305
  • Scintigraphy (abnormalities): 65–73% [2 studies; 73 participants; very low quality evidence]117,254
  • Gamma-glutamyl transferase: 63% [1 study; 15 participants; very low quality evidence]305
  • Fibrotest: 61.1% [1 study; 24 participants; very low quality evidence]28

Positive predictive value (figure in brackets is value-added PPV; the improvement in ability to determine a positive diagnosis over and above the known prevalence): if the liver function test was positive the probability of having liver fibrosis or cirrhosis (PPV) was:

  • Galactose: 83% (13.8%) [1 study; 45 participants; very low quality evidence]217
  • Albumin: 33–100% (53%) [2 studies; 183 participants; low to very low quality evidence]278,305
  • AST: 29–100% (19–53%) [3 studies; 235 participants; low to very low quality evidence] 278,288,305
  • PIIINP: 23.4 to 95.0% (9.7 to 60.3%) [4 studies; 264 participants; moderate to very low quality evidence]34,239,432,433
  • ALT: 22–67% (22–39%) [2 studies; 186 participants; low to very low quality evidence]149,278
  • AP: 15–60% (5.4 to 16%) [3 studies; 200 participants; low to very low quality evidence]278,288,305
  • Fibrotest: 42% (16.7%) [1 study; 24 participants; very low quality evidence]28
  • GGT: 40% (−2.9%) [1 study; 15 participants; very low quality evidence]305
  • Scintigraphy (abnormalities): 37.5–40.0% (10.8 to 13.0%) [2 studies; 73 participants; very low quality evidence]117,254
  • Bilirubin: 0–41% (−47 to 23%) [3 studies; 200 participants; low to very low quality evidence]278,288,305
  • Fibroscan: 33% (NA) [1 study; 24 participants; very low quality evidence]28
  • Scintigraphy (portal contribution): 25% (18.8 %) [1 study; 63 participants; very low quality evidence]241
  • Prothrombin time: 25% (NA) [1 study; 168 participants; low quality evidence]278
  • Ultrasound: 0 to 100% (−24.5 to 63.0%) [2 studies; 77 participants; low to very low quality evidence]64,254

Negative predictive value (figure in brackets is value-added NPV; the improvement in ability to determine a negative diagnosis over and above the known prevalence): if the liver function test was negative the probability of not having liver fibrosis or cirrhosis (NPV) was:

  • PIIINP: 88.5 to 100% (5.6 to 23.2%) [4 studies; 264 participants; moderate to very low quality evidence]34,239,432,433
  • Scintigraphy (portal contribution): 98.5% (5.4%) [1 study; 63 participants; very low quality evidence]241
  • Fibrotest: 92% (16.7%) [1 study; 24 participants; very low quality evidence]28
  • Fibroscan: 86% (NA) [1 study; 24 participants; very low quality evidence]28
  • Scintigraphy (abnormalities): 78.6 to 81.8% (5.3 to 7.8%) [2 studies; 73 participants; very low quality evidence]117,254
  • AST: 62–93% (2.6 to 8.7%) [3 studies; 235 participants; low to very low quality evidence]278,288,305
  • AP: 60–92% (1.6 to 8.2%) [3 studies; 200 participants; low to very low quality evidence]278,288,305
  • Bilirubin: 50–91% (−3.3 to 0.6%) [3 studies; 200 participants; low to very low quality evidence]278,288,305
  • ALT: 52–80% (6.4–7.8%) [2 studies; 186 participants; very low quality evidence]149,278
  • Ultrasound: 57 to 73% (−2.5 to 6.0%) [2 studies; 77 participants; low to very low quality evidence]64,254
  • Prothrombin time: 66% (NA) [1 study; 168 participants; low quality evidence]278
  • Albumin: 61–62% (8.7%) [2 studies; 183 participants; low to very low quality evidence]278,305
  • Gamma-glutamyl transferase: 56% (−1.5%) [1 study; 15 participants; very low quality evidence]305
  • Galactose: 32% (1.8%) [1 study; 45 participants; very low quality evidence]217

Positive likelihood ratio: in a person with compared to a person without liver fibrosis or cirrhosis, the number of times more likely a positive test result is:

  • Albumin: infinity [2 studies; 183 participants; low to very low quality evidence]278,305
  • AST: 3.13-infintiy [3 studies; 235 participants; low to very low quality evidence]278,288,305
  • PIIINP: 1.93 to 36 [4 studies; 264 participants; moderate to very low quality evidence]34,239,432,433
  • Scintigraphy (portal contribution): 4.50 [1 study; 63 participants; very low quality evidence]241
  • Ultrasound: zero to infinite [2 studies; 77 participants; low to very low quality evidence]64,254
  • ALT: 2.6–5.2 [2 studies; 186 participants; very low quality evidence]149,278
  • Bilirubin: 1.57–4.7 [3 studies; 200 participants; low to very low quality evidence]278,288,305
  • Galactose: 2.19 [1 study; 45 participants; very low quality evidence]217
  • Fibrotest: 2.14 [1 study; 24 participants; very low quality evidence]28
  • Alkaline phosphatase: 1.71–2.03 [3 studies; 200 participants; low to very low quality evidence]278,288,305
  • Scintigraphy (abnormalities): 1.62 to 1.85 [2 studies; 73 participants; very low quality evidence]117,254
  • Gamma-glutamyl transferase: 0.89 [1 study; 15 participants; very low quality evidence]305

Negative likelihood ratio: in a person without compared to a person with liver fibrosis or cirrhosis, the number of times more likely a negative test result is:

  • Scintigraphy (portal contribution): 5.0 [1 study; 63 participants; very low quality evidence]
  • PIIINP: 1.79-times to infinitely [4 studies; 264 participants; moderate to very low quality evidence]34,239,432,433
  • Fibrotest: 3.7 [1 study; 24 participants; very low quality evidence]28
  • Alkaline phosphatase: 1.3–1.7 [3 studies; 200 participants; low to very low quality evidence]278,288,305
  • AST: 1.4–1.5 [3 studies; 235 participants; low to very low quality evidence] 278,288,305
  • ALT: 1.4–1.5 [2 studies; 186 participants; very low quality evidence]149,278
  • Scintigraphy (abnormalities): 1.4 to 1.5 [2 studies; 73 participants; very low quality evidence]117,254
  • Albumin: 1.4 [2 studies; 183 participants; low to very low quality evidence]278,305
  • Galactose: 1.1 [1 study; 45 participants; very low quality evidence]217
  • Bilirubin: 0.88–1.2 [3 studies; 200 participants; low to very low quality evidence]278,288,305
  • Gamma-glutamyl transferase: 0.93 [1 study; 15 participants; very low quality evidence]305
  • Ultrasound: 0.86 to 1.2 [2 studies; 77 participants; low to very low quality evidence]64,254

Conclusions

  • The available studies mainly have small samples, which, combined with the relatively low prevalence of fibrosis and cirrhosis, mean that the estimates of diagnostic accuracy are imprecise, leading to uncertainty (particularly around the sensitivity of the tests)
  • All of the tests generally perform better in terms of specificity compared with sensitivity, meaning that they are of greater value for confidently ruling in a diagnosis of clinically significant liver damage if the non-invasive test is positive, but there is less certainty that those who test negative actually do not have fibrosis or cirrhosis
  • Ruling in a diagnosis:
    • The specificity was consistently over 75% for the majority of the tests (ultrasound, prothrombin time, AST, bilirubin, galactose, ALT, albumin and scintigraphy when abnormality was assessed using the % portal contribution to total hepatic uptake of colloid and Fibroscan)
    • However, there was great variability in the PPV for each test, with no test showing values consistently above 50% across the different studies (except the galactose tolerance test which was only assessed in one study217)
    • The positive likelihood ratio was best for AST, albumin, ultrasound and PIIINP
  • Ruling out a diagnosis:
    • Accepting the uncertainty, the tests that may give a useful level of sensitivity are PIIINP, scintigraphy for detecting portal fibrosis and Fibrotest
    • Similarly, the NPV was only consistently over 75% for PIIINP, scintigraphy, Fibrotest and Fibroscan
    • The negative likelihood ratio was best for PIIINP, scintigraphy for detecting portal fibrosis and Fibrotest.

12.3. Economic evidence

One study54 was included that evaluated different methods of monitoring for hepatotoxicity in people with psoriasis being treated with methotrexate. The monitoring strategies evaluated Chalmers and colleagues were defined as follows:

3.

serial PIIINP testing with selective liver biopsy, and

4.

Routine liver biopsy.

This study is summarised in the economic evidence profile below (Table 159 and Table 160). See also the full study evidence tables in Appendix I.

Table 159. Serial PIIINP versus routine liver biopsy – Economic study characteristics.

Table 159

Serial PIIINP versus routine liver biopsy – Economic study characteristics.

Table 160. Serial PIIINP versus routine liver biopsy – Economic summary of findings.

Table 160

Serial PIIINP versus routine liver biopsy – Economic summary of findings.

No relevant economic evaluations comparing other non-invasive liver monitoring methods were identified. No studies were excluded.

The monitoring strategies evaluated by Chalmers and colleagues were defined as follows:

  1. Serial PIIINP testing with selective liver biopsy:
    • Where possible serum should be collected for PIIINP measurement prior to starting methotrexate. It should subsequently be measured every 2–3 months during continued treatment. Indications for considering liver biopsy:
      • Elevation of pre-treatment PIIINP above 8.0 μg L−1
      • Elevation of PIIINP above the normal range (1.7 to 4.2 μg L−1) in at least three samples over a 12 month period
      • Elevation of PIIINP above 8.0 μg L−1 in two consecutive samples
    • Indications for considering withdrawal of methotrexate:
      • Elevation of PIIINP above 10.0 μg L−1 in at least three samples over a 12 months period
    • The decision whether to perform liver biopsy, withdraw treatment or continue treatment despite raised PIIINP levels must also take into account other factors such as disease severity, patient age and the ease with which alternative therapies may be used in place of methotrexate.
  2. Routine liver biopsy:
    • In patients without risk factors for liver damage, perform first liver biopsy after cumulative dose of 1.0 to 1.5 g methotrexate
    • Provided no significant abnormalities are found, repeat liver biopsy after each additional 1.5 g methotrexate
    • When cumulate dose >4.0 g, perform biopsy after each additional 1.0 g methotrexate
    • In patient with risk factors for liver damage, perform liver biopsy within 2–4 months of starting methotrexate and after each additional 0.5 to 1.0 g thereafter.

Based on the findings of the study and if PIIINP measurement cost £22.50:

  • Monitoring with serial PIIINP and selective liver biopsy is likely to be cost-saving if liver biopsy costs more than £375
  • Monitoring with serial PIIINP and selective liver biopsy may be more costly if liver biopsy costs less than £375.

None of these cost estimates take into account the additional costs of managing potential complications of liver biopsy. With the risk of developing significant hepatic injury from liver biopsy being approximately 1–2% and the risk of mortality being around 0.01–0.1%, these costs (and impact on health-related quality of life) could be significant. If these costs were included, it is likely that cost of liver biopsy at which monitoring with serial PIIINP becomes cost-saving would be much lower. Table 161 below shows that the current cost of liver biopsy (excluding cost of potential complications) is between £553 for a day case and £816 for patients requiring an overnight stay in hospital.

Table 161. Unit costs of monitoring tests – exclusive of labour costs.

Table 161

Unit costs of monitoring tests – exclusive of labour costs.

In the event that a monitoring strategy of serial PIIINP measurement with selective liver biopsy is more costly than routine liver biopsy, the additional costs could be justified by improved health outcomes in terms of mortality and morbidity avoided. These would have to be weighed against the risk that some patients with significant liver abnormalities may be missed.

The authors investigated whether changing the threshold value upon which PIIINP was counted as predictive of liver fibrosis would increase the specificity of the test. They found that altering the threshold from 4.2 to 4.9 μg L−1 would have reduced the number of false positives (e.g. those undergoing a liver biopsy who turn out to have normal result or minor abnormalities) by more than half, but at the risk of failing to identify patients with significant liver damage (e.g. false negatives).

The study does not indicate whether any significant abnormalities were missed in the serial PIIINP strategy and what the consequences for these patients might be. The authors assert that the risk of serious harm from liver biopsy outweighs the risk of missing significant liver damage in patients monitored using serial PIIINP.

12.3.1. Unit costs

In the absence of recent UK cost-effectiveness analysis, relevant unit costs are provided below to aid consideration of cost effectiveness.

12.3.2. Evidence statements

One partially applicable cost-consequence analysis with very serious limitations found that for patients with psoriasis undergoing treatment with methotrexate, a strategy of monitoring hepatotoxicity with serial PIIINP and selective liver biopsy was likely to be cost saving compared to routine liver biopsy if the unit cost of liver biopsy was greater than £375.

12.4. Recommendations and link to evidence

Image

Table

Before and during methotrexate treatment, offer the person with any type of psoriasis an evaluation for potential risk of hepatotoxicity. Use standard liver function tests and serial serum procollagen III levels to monitor for abnormalities during treatment (more...)