Results of the review of prognostic test accuracy and clinical impact

Steven J Edwards; Samantha Barton; Mariana Bacelar; Charlotta Karner; Peter Cain; Victoria Wakefield; Gemma Marceniuk

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Edwards SJ, Barton S, Bacelar M, et al. Prognostic tools for identification of high risk in people with Crohn’s disease: systematic review and cost-effectiveness study. Southampton (UK): NIHR Journals Library; 2021 Mar. (Health Technology Assessment, No. 25.23.)

Prognostic tools for identification of high risk in people with Crohn’s disease: systematic review and cost-effectiveness study.

Show details

Contents

< Prev Next >

Chapter 3Results of the review of prognostic test accuracy and clinical impact

The sections that follow discuss the quantity and quality of evidence available, including the characteristics and risk of bias of the identified studies, retrieved through literature searches to identify data on the prognostic accuracy and clinical impact of PredictSURE-IBD and IBDX.

Quantity and quality of the available evidence

Results of the systematic literature search

Searches of electronic databases retrieved 6258 records (post deduplication) that were of possible relevance to the review (Figure 3). The initial screening of titles and abstracts led to the identification of 36 publications for review of full texts. Of the 36 articles evaluated, 16 publications, including systematic reviews, were deemed to be relevant to the review.³⁸^,⁵⁰^,⁶⁶^–⁷⁹ Four records (three full texts³⁸^,⁶⁶^,⁷⁰ and one conference abstract⁶⁸) provided details for three systematic reviews, the reference lists of which were screened for potentially relevant studies. Additionally, documents supplied by the companies marketing the prognostic tools were reviewed.

FIGURE 3

The PRISMA flow chart.

Limited evidence is available from the included full-text publications on the prognostic accuracy of PredictSURE-IBD, and no evidence is available on the prognostic accuracy of IBDX, in identifying those at high risk of following a severe course of CD, as determined by measures such as sensitivity and specificity (the prognostic outcomes of interest listed in Table 3). Most of the evidence on the tools’ utility is derived from observational studies that report estimates of the risk of experiencing a clinical outcome associated with an aggressive course of CD, for example need for treatment escalation, development of a complication or surgery. Estimates are presented of an increased risk for those categorised, based on test results, as being at higher risk compared with those determined to be at lower risk of following a severe disease course. No study retrieved reported on the clinical impact of the use of IBDX or PredictSURE-IBD in terms of influencing the treatments given in the management of active CD.

The authors of two studies⁷⁹^,⁸⁰ were contacted to verify that the kit used in their research was the IBDX tool and not a comparable kit produced by another company. One author confirmed that they had used a kit that was not captured in the scope of this review, and the study was therefore excluded from the review.⁸⁰

Summaries of the studies included in the review are presented by prognostic tool evaluated and key characteristics of studies (Table 4). See Report Supplementary Material 3 for a list of full-text publications screened but subsequently excluded (with reasons for exclusion) from the review.

TABLE 4

Characteristics of studies included in the prognostic test accuracy review

Ongoing studies

From searches of prespecified sources, together with information supplied by the companies, ongoing studies were identified that were of potential relevance to the review, all of which assess the use of PredictSURE-IBD.

The PROFILE study is a prospective, multicentre randomised study set in the UK.⁵¹ PROFILE has been designed to compare the clinical efficacy of TD and accelerated SU treatment regimens in people with newly diagnosed CD who have first been stratified into subgroups based on the risk of following a severe, relapsing course of CD (high vs. low risk) using the PredictSURE-IBD tool. Within the biomarker-stratified groups, people are randomised (1 : 1) to either TD or accelerated SU treatment. Treatment allocation is open label, but clinicians and patients are masked to subgroup classification. The authors propose that those designated as being at high risk of a severe course of CD will experience a greater benefit of receiving early TD treatment. Conversely, those likely to experience a more indolent course of disease could be managed with the accelerated SU approach and avoid the risk of adverse effects associated with biological therapies. Thus, a goal of the study is to determine whether or not using the PredictSURE-IBD tool can facilitate personalised therapy in CD and improve clinical outcomes. The primary outcome is the incidence of sustained surgery and glucocorticosteroid-free remission from the completion of induction treatment through to study completion (48 weeks). Recruitment began in December 2017, with a planned enrolment of 400 people, generating 100 people in each of the four groups.⁵¹ The estimated end date for the trial listed on the ISRCTN (International Standard Randomised Controlled Trials Number) registry is March 2022.⁸¹

PRECIOUS is a multicentre observational study based in the USA and sponsored by PredictImmune.⁸² Set in referral centres and community hospitals, PRECIOUS (Predicting Crohn’s and Colitis Outcomes in the United States) is designed to assess the efficacy of the PredictSURE-IBD tool in stratifying those newly diagnosed with active IBD, including CD, into cohorts at high or low risk of following an aggressive disease course requiring frequent treatment escalations. Patients’ blood will be collected at enrolment and will be tested with PredictSURE-IBD at a later date. Ideally, participants will be treatment naive. Those enrolled will receive treatment as per local standard of care with a SU or accelerated SU regimen, and will be followed prospectively for 12 months. The participants enrolled and the clinicians will be masked to tests results. With a planned recruitment of 200 people, the estimated end date for the study listed on ClinicalTrials.gov is June 2021.⁸²

Two additional studies evaluating PredictSURE-IBD were highlighted by PredictImmune in its response to a request for information as part of the Diagnostics Assessment Programme process:

a prospective, masked study stratifying a paediatric cohort with incident IBD (n = 80)
a head-to-head comparison of PredictSURE-IBD with IBDX for stratification of those at higher risk of following a severe course of CD using samples from cohorts previously assessed as part of a study evaluating PredictSURE-IBD.

Results for the head-to-head comparison of PredictSURE-IBD and IBDX are now available in a conference abstract.⁸³

Evidence provided by the companies

Glycominds

Glycominds provided a list of bibliographic details of the key publications outlining the evidence in support of the IBDX tool. All studies reporting results on the effectiveness of the kit in stratifying those at high risk of following a severe course of CD were retrieved, and subsequently reviewed, by the EAG.

PredictImmune

PredictImmune provided a list of bibliographic details for several publications relating to PredictSURE-IBD, including references describing the research underpinning the development of the signature gene sequence. All studies flagged by the company were retrieved, and subsequently reviewed, by the EAG.

Additionally, in response to queries from the EAG, PredictImmune supplied anonymised individual patient data (IPD) for results from the cohort that provided results for validation of PredictSURE-IBD, together with data for the head-to-head comparison of PredictSURE-IBD with IBDX. The results provided by PredictImmune for this direct comparison are presented and critiqued in Comparison of IBDX and PredictSURE-IBD.

Assessment of prognostic test accuracy

Characteristics of included studies

All studies informing the evidence base on the prognostic accuracy of the IBDX and PredictSURE-IBD biomarker stratification tests were observational in design. Key characteristics of the included studies are summarised in Table 4, with validated data extraction forms for studies available in Report Supplementary Material 5. Twelve publications, describing eight studies, retrieved from electronic searches were included in the assessment of the prognostic accuracy of the tests, with seven of the studies (11 publications) reporting results on the utility of the IBDX kit and one on the utility of PredictSURE-IBD in stratifying those at high-risk of a severe course of CD (see Table 4). Several studies included a mixed population of participants with CD and ulcerative colitis, and reported results separately for those with CD. Most studies included predominantly adults with CD, with one study (three publications) reporting data for an adolescent or a paediatric population. No additional potentially relevant study was identified from hand-searching the bibliographies of three systematic reviews.³⁸^,⁶⁶^,⁶⁸^,⁷⁰

All included studies assessed outcomes in people reported to have a diagnosis of CD. However, limited reporting was noted across studies relating to the IBDX on stage of diagnosis (newly vs. established) at the time of the test. Baseline characteristics suggest that the samples analysed were provided predominantly by people with established CD (see Report Supplementary Material 5). By contrast, most people enrolled in the study on PredictSURE-IBD had received a recent diagnosis of CD.

Prespecified inclusion criteria for the systematic review presented here required that people have active disease (see Table 3). Although most of the included studies outlined criteria to be met for a diagnosis of CD, only the study evaluating the PredictSURE-IBD tool required people to have active disease to be eligible for enrolment and reported how presence of active disease was determined.⁵⁰ In retrospect, given the biomarker targets of the two prognostic tests, the reviewers consider that the criterion of active CD is appropriate for studies assessing PredictSURE-IBD but is not essential for studies reporting on IBDX. As outlined in Chapter 1, Description of the technologies under assessment, the PredictSURE-IBD tool detects a gene sequence associated with CD8+ T-cell exhaustion that arises from an autoimmune response to active disease, and, therefore, it is appropriate to require that people have active CD when blood is taken for analysis; it has been reported that in people with inactive disease after treatment, as determined by endoscopy, the level of CD8+ T-cells increases to a level that is comparable with those observed in healthy controls.⁸⁴ By contrast, the IBDX kit detects serum levels of specific anti-glycan antibodies, with specified cut-off values for allocating positive or negative status to each biomarker. Although serum levels of each antibody can change over time, it is purported that status for positivity or negativity for that antibody remains stable throughout the course of disease.⁷⁴ Therefore, for IBDX, the reviewers decided to include those studies not specifying a measure of active disease if they met all of the other inclusion criteria and reported an assessment of the six biomarkers included in the IBDX panel.

Analyses presented for evaluation of the six biomarkers forming the IBDX kit typically reported the association of positivity for individual biomarkers, or the positive status for a larger number of biomarkers, with the increased risk of following a severe course of CD, and not the evaluation of all six biomarkers as a collective.

Considering PredictSURE-IBD, the included study described use of the tool in three cohorts, two training cohorts and one validation cohort.⁵⁰ Samples from one training cohort (n = 66) were used in biomarker discovery and samples from the second (n = 39) were used in whole blood classifier development. Estimates of prognostic accuracy are available for the validation cohort only. Based on IPD data supplied by the company, the reviewers consider the validation cohort together with the second training cohort (n = 39) to be the most appropriate data set to inform the evidence base on for economic analysis; this is discussed in greater detail in Chapter 4, Development of the health economic model.

Caveats to interpretation of the results for prognostic accuracy of both tests are discussed in Accuracy of prognostic tests.

Quality assessment of included studies

Included studies were assessed for risk of bias and applicability using the QUIPS tool.⁵⁹^,⁶⁰ A summary of the results of the assessment of risk of bias and generalisability concerns across studies is presented in Table 5 (see Report Supplementary Material 4 for the full critique of each study).

TABLE 5

The QUIPS assessment of prognostic studies

The QUIPS tool encompasses six domains for the assessment of the validity and bias of studies evaluating prognosis and factors influencing the course of a condition:⁵⁹^,⁶⁰

participation
attrition
prognostic factor measurement
confounding measurement and account
outcome measurement
analysis and reporting.

Each domain comprises prompting items (between three and seven) for consideration in the overall rating for an item of high, moderate or low risk of bias.⁵⁹^,⁶⁰

The IBDX and PredictSURE-IBD tools were designed with the goal of predicting a course of disease based on the levels of biomarkers produced in response to the presence of CD, with stratification to high or low risk of a severe course of the disease determined by the results of laboratory analysis. The extent to which biomarker levels in blood and serum samples change over time in individual people and what factors influence these fluctuations in levels is uncertain. Additionally, as production of the biomarkers assayed is triggered by changes in cellular processes, the effect of physical characteristics that could influence prognosis in CD, for example smoking status and age, on biomarker levels is unclear. Thus, for the studies informing the evidence on prognostic test accuracy reported here, the EAG considers that the importance of the ‘confounding measurement and account’ domain as a determinant of the risk of bias associated with the studies is also unclear. To reflect the ambiguity around the importance of confounding factors, and to capture uncertainty where limited reporting in the publication precluded an assessment of risk for a particular domain, the EAG adapted the QUIPS tool to include an overall assessment of unclear risk.

Around half of the included studies were deemed to have at least one domain with an unclear risk of bias (see Table 5); for conference abstracts, an unclear rating was predominantly associated with the limited reporting of details as a result of space constraints.

Most studies reporting results for the IBDX tool were determined to be at a moderate risk of bias for the population domain as the studies included those with a recent diagnosis and those with an established diagnosis of CD, and, in some studies, those with presence of severe disease at baseline. Data were not analysed separately for the individual subgroups. The population of greatest relevance to the economic evaluation is those with a new diagnosis of CD and who have moderate or severe disease activity. The study assessing the prognostic accuracy of PredictSURE-IBD enrolled those with a recent diagnosis of CD but included any level of disease activity at sample assessment, with the severity of disease activity determined by endoscopy for some people; severity of disease activity at baseline was not available for all those forming the validation cohort.

Most studies were considered to be at a low risk of bias for attrition and for measurement of prognostic factors because all samples taken were analysed with the relevant tool and results were generated as per the company’s individual protocols. Additionally, outcome assessment was deemed to be at a low risk of bias across many studies as the clinicians were masked to the results of the biomarker assessment.

Accuracy of prognostic tests

The EAG notes that limited data were available from the included studies on the prognostic accuracy of the tools in stratifying the risk of a severe course of CD in terms of standard measures of test accuracy, for example sensitivity and specificity. The EAG is unaware of a validated definition for determining whether or not an individual’s CD has followed a severe course, for example a set number of treatment escalations or the development of a complication or a need for surgery. Thus, the EAG considers the criterion required for a true-positive or false-positive result for IBDX and PredictSURE-IBD to be unclear. The EAG considers that it would be challenging to ascertain an accurate estimate of prognostic accuracy of IBDX and PredictSURE-IBD in stratifying a course of CD. Establishing the prognostic accuracy of the tools would require carrying out a prospective study that included a group that received only SU treatment after determination of their risk of course of CD, using clear prespecified criteria for following a severe course. The ongoing PROFILE RCT randomises people to accelerated SU or TD treatment after they are determined to be at high or low risk of following a severe course of CD, and so the two SU groups will provide additional data to inform estimates of prognostic accuracy.⁵¹ Additionally, no study included in the review prospectively followed people whose treatment was determined by results from IBDX and PredictSURE-IBD; the ongoing PROFILE RCT assesses whether or not early treatment with TD strategy affords clinical benefit to those categorised as being at high risk of severe course of CD and should provide data on the clinical impact of using PredictSURE-IBD.

IBDX

No identified study reported the accuracy of the IBDX kit as a whole (six biomarkers) as per the prespecified prognostic outcome of interest to this review of stratification by risk of following a severe course of CD (see Table 3). One study reported that positivity for ASCA and AMCA had the best prognostic validity for differentiating a severe course of CD from a non-severe course of CD, with an area under the curve of 0.63 and 0.65, respectively. The combination of ASCA and AMCA increased the precision of the differentiation, with an area under the curve of 0.71.⁶⁹

In its submission to the Diagnostic Assessment Programme (DAP), Glycominds reported a sensitivity for IBDX of 78%, and a specificity of 85–98% depending on the number of positive biomarkers. Data or details of references to support the reported sensitivity and specificity were not provided in the documentation. None of the studies included by the EAG provided estimates of sensitivity or specificity for the IBDX panel. Additionally, it is unclear whether the reported estimates relate to the sensitivity and specificity of the diagnosis of CD, including differentiation of CD from ulcerative colitis, or that of the stratification of risk of severe course of CD.

The typical test time for IBDX is reported by Glycominds to be around 90 minutes and all samples can be run in parallel.

The instructions on the use of the IBDX kit advise that, in cases of an equivocal test result, the individual biomarker should be tested again. Details on the frequency of an equivocal result are not available from the identified studies.

A longitudinal analysis assessed whether or not levels of the individual biomarkers fluctuate over time.⁷⁴ Between two and seven serum samples were available from each person forming the cohort for analysis. Over a median follow-up of 17.4 months (interquartile range 8.0–31.6 months), the authors noted that, despite marked changes in overall immune response and levels in individual biomarkers, the status of positivity or negativity for an individual biomarker remained mostly stable over time.

PredictSURE-IBD

One publication⁵⁰ assessing the PredictSURE-IBD tool was deemed to meet the inclusion criteria for the review. Several related papers were identified and determined not to be relevant because they described the research underpinning the identification of the signature genetic profile (15 target genes and two control genes) that stratifies those with active CD by high or low risk of a severe course of disease and did not discuss the use of PredictSURE-IBD (see Report Supplementary Material 5 for data extraction).

The included study enrolled people aged ≥ 18 years with active CD or ulcerative colitis who were not receiving concomitant glucocorticosteroids, IMs or biological therapy. Participants were recruited from a specialist IBD clinic before treatment started. Diagnosis of CD or ulcerative colitis was based on standard endoscopic, histological and radiological criteria. Active disease was confirmed by one or more objective markers (raised C-reactive protein, raised calprotectin or endoscopic evidence of active disease) in addition to active symptoms and/or signs. People were treated using a conventional SU strategy in accordance with national and international guidelines.

In the publication, the results on stratification to high or low risk of a severe course of CD are presented for a training cohort (N = 118; CD, n = 66; ulcerative colitis, n = 52) and a validation cohort (N = 123; CD, n = 66; ulcerative colitis, n = 57).⁵⁰ Additionally, the full-text publication refers to a second training cohort (n = 39) from whom samples were used in the development of a whole blood classifier. Results from the training cohort (n = 66) used in biomarker discovery were used to finalise the signature gene sequence, which was subsequently applied to analysis of the validation cohort. Two different source cells were used in the process, with mRNA extracted from unseparated peripheral blood mononuclear cells for the training cohort informing biomarker discovery and from a venous blood sample for the validation cohort, as would be the case in clinical practice. Both unseparated peripheral blood mononuclear cells and blood samples were processed for the second training cohort (n = 39), but it is unclear from the full publication whether or not the whole blood samples were analysed using the signature gene sequence identified during biomarker discovery. As part of the DAP, the company clarified that blood samples from the second training cohort were analysed using the finalised gene sequence. Thus, the EAG considers results from the validation cohort and the smaller training cohort to be the most appropriate data set to inform the evidence based on the accuracy of PredictSURE-IBD. However, data on specificity and sensitivity are available for the validation cohort only.

Of the 66 people in the validation cohort, 27 (40.9%) were categorised as being at high risk of following a severe course of CD and 39 (59.1%) were categorised as being at low risk. Of the 39 people in the training cohort, 19 (48.7%) and 20 (51.3%) were categorised as being at high risk and low risk, respectively. Baseline characteristics for the validation cohort indicate that most people had newly diagnosed CD (61/66; 92.4%). The EAG notes that level of disease activity at enrolment (mild, moderate or severe) was not reported, and details on the proportion of people with complications of CD (e.g. fistulae and perianal disease) at baseline are not available in the full publication, but were provided by PredictImmune in its response to a request for information as part of the DAR process (see Report Supplementary Material 5);⁵⁰ complications of CD at baseline could indicate an earlier requirement for surgery in the SU algorithm.

Data on the number of test failures and the number of inconclusive test results were not available.

Sensitivity and specificity

The study by Biasci et al.⁵⁰ reports a sensitivity and specificity for predicting the need for multiple escalations within the first 18 months of 72.7% and 73.2%, respectively. The full-text publication does not provide a cut off value as to how the sensitivity and specificity for multiple escalations were derived. As noted earlier, the EAG is unaware of a validated definition for determining whether or not a person has followed a severe course of CD, and, as a consequence, considers the criterion required for a true positive or false positive to be unclear for the prognostic tests assessed in this review.

As part of the DAP process, PredictImmune provided anonymised IPD for the validation cohort, including the 2 × 2 table for calculation of sensitivity and specificity for multiple escalations at 12 and 18 months (Table 6). PredictImmune applied a cut-off point of two or more treatment escalations to categorise people as having followed a more aggressive course of CD. The EAG considers the company’s approach reasonable. However, the EAG notes that people in the validation cohort and second training cohort underwent treatments at the discretion of the treating clinician and so a proportion (29/105; 27.6%) received a therapy other than glucocorticosteroid at entry, including elemental diet, anti-TNF alone or in combination with IMs, and IMs alone. The EAG recognises that the study is of a more pragmatic design but considers that induction treatment would be likely to influence the timing and frequency of treatment escalation and, consequently, sensitivity and specificity. Moreover, some people included in the calculation of sensitivity and specificity for predicting multiple escalations received surgery as a first treatment escalation (7/66; 10.6%) and continued to be monitored for subsequent treatments, including IMs and biological therapies. Given that RCTs assessing clinical effectiveness of treatment strategies in the management of CD typically report CD-related complications (e.g. need for surgery or hospitalisation or development of fistula or stenosis) as a composite clinical outcome or separately, the EAG considers it important to assess the time to and occurrence of surgery independently of other treatment escalations to reflect the outcomes in other studies, including those assessing the effectiveness of IBDX; the EAG’s clinical experts supported the proposal that it would be appropriate to assess CD-related surgery as a separate outcome. The inclusion of people who underwent surgery as a first treatment escalation and received subsequent treatment escalations could influence the accuracy of sensitivity and specificity as assessed by the number of treatment escalations. The EAG notes that the sample size for the validation cohort is small (n = 66) and, moreover, that not all people in the validation cohort were included in analyses at 12 or 18 months. Additionally, a proportion of people in the validation cohort received an anti-TNF biologic with or without an IM (11/66; 16.7%) as their first escalation.⁵⁰ The EAG appreciates that the study is pragmatic and is likely to reflect treatment approaches in clinical practice in the UK, but the EAG also considers that analysing those who receive TD or surgery as their first treatment escalation together with those who followed the SU treatment algorithm or were treated at the discretion of the treating clinician is unlikely to reflect the true estimate of the number of treatment escalations that would occur with the SU or accelerated SU strategy.

TABLE 6

Data informing the calculation of sensitivity and specificity for PredictSURE-IBD based on predicting the need for multiple treatment escalations

Predictive value

The included study reports a negative predictive value of 90.9% for PredictSURE-IBD of predicting multiple escalations within the first 18 months.⁵⁰ Based on the 2 × 2 table supplied by PredictImmune (see Table 6), the EAG calculates a positive predictive value of 42.1% for predicting multiple escalations within the first 18 months.

Results for clinical outcomes

The EAG notes that the results presented in this section are on the risk of experiencing an event among those categorised by the tools as being at high or low risk of following a severe course of CD, and are not related to the clinical outcome of treatment decisions based on the stratification of risk using IBDX and PredictSURE-IBD.

IBDX

Results are reported based on positive status for increasing number of biomarkers, as per the company’s recommendations on the interpretation of outputs from the test (see Figure 2). As noted, all included studies evaluated the full panel of biomarkers constituting the IBDX kit, but there is no single measure of accuracy or clinical outcome for the six biomarkers as a collective.

Clinical and methodological heterogeneity across the identified studies precluded meta-analysis and the results are presented in a narrative review.

Developing a complication

Two studies reported an effect estimate for the risk of experiencing a complication by the number of biomarkers testing positive (the results are available in Appendix 1, Table 26).⁷⁵^,⁷⁶ Both studies prospectively followed a cohort of people with CD.

Severe disease behaviour was defined in both studies as the occurrence of fistulae or stenosis.⁷⁵^,⁷⁶ In one study, 68% of people (249/363) had a complication before or at the time of sample procurement.⁷⁵ The second study enrolled people with or without prior complication and with or without prior CD-related surgery but focused reporting on those with no prior complications and no CD-related surgery before or within 20 days of obtaining the sample (n = 76).⁷⁶ Median follow-up was 59 months for one cohort⁷⁵ and 53.7 months for the other.⁷⁶

The median duration of CD was disparate between the two studies, with one study reporting a median of 66.8 months (interquartile range 11–141 months),⁷⁵ compared with a much shorter 10.6 months (interquartile range 1.7–52.3 months)⁷⁶ in the other. The EAG’s clinical experts advised that 10.6 months may be insufficient follow-up to monitor the development of a CD-related complication.

In the study including people with complications at baseline,⁷⁵ an odds ratio (OR) of 1.5 (95% CI 1.3 to 1.9, p < 0.001; see Appendix 1, Table 26) was reported for experiencing a complication compared with not experiencing a complication, with increased risk associated with a positive status for a larger median number of biomarkers. During follow-up, an additional 28 people developed a fistula or stenosis, or both.

Among people with no prior complication, 20 experienced a fistula or stenosis, with a higher risk of experiencing a complication noted for those with positive status on at least two or three biomarkers (see Appendix 1, Table 26), with the risk reaching statistical significance for those testing positive for at least two of the six antibodies [hazard ratio (HR) 2.5, 95% CI 1.03 to 6.1; p = 0.043].⁷⁶ The EAG notes the small sample size informing the estimate of risk.

Increasing the number of positive antibodies was reported to be significantly associated with severe disease behaviour and/or surgery (OR 3.3, 95% CI not reported; p = 0.0005) for a cohort of people with CD from the USA;⁶⁷ the results were presented in a conference abstract and limited details are available. Severe disease behaviour was defined as intestinal fistula and/or stricture.

One study of a cross-sectional design analysed serum samples from children and adolescents aged ≤ 18 years.⁷¹^–⁷³ The authors reported results for this younger cohort that were aligned with those derived from an adult cohort, with a larger number of positive serum biomarkers associated with an increased risk of experiencing severe CD and requiring CD-related surgery (estimates of effect not reported).⁷² Additionally, the authors assessed differences in the cut-off levels used to indicate the positivity of biomarkers between the paediatric cohort and adults evaluated in a related study⁷⁵ and found that lower cut-off points denoted positivity in paediatric samples. In a related conference abstract, the authors reported that in paediatric patients with CD, positivity on at least one marker out of the whole panel compared with no positive marker was independently associated with fibrostenotic or fistulising disease behaviour (p = 0.036) and ileal disease location (p = 0.014).⁷¹ Although the accuracy of the biomarker panel in diagnosing CD and differentiating it from other gastrointestinal conditions was reported to decrease with age at sample procurement, when assessing CD behaviour, the ability of the panel to stratify disease phenotypes remained constant over time.⁷²

Requirement for surgery

Two out of the three studies reporting on the risk of complications also provided information on the increased likelihood of requiring surgery among people with a higher risk of a severe course of CD.⁷⁵^,⁷⁶ A third study⁷⁸ with a cross-sectional design evaluated serum samples from 517 people with CD who had a median duration of disease of 8.9 years (range 0.02–46.30 years).

One study reported an OR of 1.5 (95% CI 1.3 to 1.8, p < 0.001; see Appendix 1, Table 27) for requiring surgery compared with no requirement for surgery, with increased risk associated with a positive status for a larger median number of biomarkers.⁷⁵ At the time of sample procurement, 224 people had undergone surgery related to IBD, with an additional 33 people requiring surgery during follow-up.

For the cohort of people who had not undergone surgery at enrolment, 14 people required surgery, with a statistically significantly higher risk for surgery (HR 3.6, 95% CI 1.2 to 11.0, p = 0.023; see Appendix 1, Table 27).⁷⁶ The EAG notes the small sample size informing the analysis, and the large CI accompanying the estimate of risk.

The third study identified a trend towards a larger proportion of people requiring surgery with increasing number of biomarkers testing positive (see Appendix 1, Table 27).⁷⁸ A statistically significant difference across the categories assessed was identified (p < 0.0001).

A conference abstract provided results for a cohort of people (n = 118) who had undergone one surgical intestinal resection related to CD.⁷⁹ Most people evaluated (92%) underwent first surgery for internal penetrating and/or stricturing disease. Serum samples for analysis with the IBDX kit were taken after surgery. After a median follow-up of 100 months, the authors reported that, when considering the full panel of six biomarkers, neither the quartile sum score nor the number of positive biomarkers combined predicted a shorter time to repeat intestinal surgery. After adjustment for ileal disease location and use of IMs or anti-TNF biologic after first surgery, analysis of individual biomarkers identified that positivity for AMCA (HR 2.6, 95% CI 1.1 to 5.9; p = 0.026) and ALCA (HR 2.3, 95% CI 1.04 to 5.3; p = 0.039) predicted a shorter time to second surgery.⁷⁹ Another study reported that, of the panel of tested antibodies, only AMCA tended to be associated with higher risk of CD-related surgery, with an OR of 2.1 (95% CI 0.8 to 5.1; p = 0.10), but the association did not reach statistical significance.⁶⁹

PredictSURE-IBD

Time to treatment escalation

The full-text publication⁵⁰ reported that those categorised as at high risk of following a severe course had a statistically significantly higher risk of first treatment escalation than those categorised as at low risk, with a HR of 2.65 (95% CI 1.32 to 5.34; p = 0.006).

The EAG notes that, based on the IPD supplied by PredictImmune, people in the validation cohort underwent treatments at the discretion of the treating clinician, and so a proportion (14/66; 21.2%) received a therapy other than glucocorticosteroid at entry.⁵⁰ Choice of and time to first treatment escalation is likely to be influenced by the response to treatment at study entry, which in turn is likely to be affected by the risk of following a severe course of CD. The EAG recognises that the study is of a more pragmatic design but considers that, as people in the validation cohort have not followed a standardised algorithm of treatment, analysis of time to first treatment escalation is subject to a level of bias, the direction of which is unclear.

The EAG analysed IPD provided by PredictImmune for incorporation into the economic model, with a focus on those with a new diagnosis of CD as per the protocol.

Comparison of IBDX and PredictSURE-IBD

For the head-to-head comparison of PredictSURE-IBD and IBDX, the cohort analysed comprised those with active CD as confirmed by one objective marker (i.e. raised C-reactive protein, raised calprotectin or endoscopic signs of active disease) in addition to active symptoms. Participants had been recruited from a single site in the UK for an observational study evaluating PredictSURE-IBD. All those enrolled were treated with the accelerated SU regimen in accordance with UK guidelines. Samples for analysis by the two biomarker tests were taken concurrently from the same bleed: PredictSURE-IBD requires whole-blood RNA and IBDX uses serum. A conference abstract outlining the results of the comparison has now been published.⁸³ Results reported in the conference abstract indicate that those categorised as being at high-risk of following a severe course of disease using PredictSURE-IBD experienced a more aggressive disease, characterised by a shorter time to treatment escalation, compared with those designated as at low risk.⁸³ The authors also commented that seropositivity for antiglycan antibodies at diagnosis did not predict the need to escalate treatment due to frequently-relapsing or chronically-active disease.⁸³

Summary of findings for prognostic test accuracy

Sensitivity, specificity and negative predictive value

The evidence base on the prognostic accuracy of the IBDX and PredictSURE-IBD tools in identifying those at high risk of following a severe course of CD is limited. No study was identified that provided an assessment of the prognostic accuracy of the full panel of six biomarkers for the IBDX, and only one observational study provided results for PredictSURE-IBD in stratifying those with a recent diagnosis of CD and disease of any level of activity at the time of sample procurement, with the severity of disease activity determined by endoscopy for some people; severity of disease activity at baseline was not available for all those forming the validation cohort.

Use of PredictSURE-IBD was associated with a sensitivity and specificity of 77.8% and 70.6%, respectively, in stratifying by need for multiple treatment escalations within 12 months. The corresponding sensitivity and specificity for multiple escalations within 18 months were 72.7% and 73.2%, respectively. A negative predictive value of 90.9% for PredictSURE-IBD of predicting multiple escalations within the first 18 months was also reported. The EAG notes that the cut-off point for multiple escalations applied in the determination of sensitivity and specificity was two treatment escalations, and comprised any type of treatment, including surgery. The EAG is unaware of a validated definition for determination of whether a person has followed a severe course of CD and considers the choice of two escalations to be an arbitrary value. Additionally, the EAG’s clinical experts fed back that it would be appropriate to consider escalation to CD-related surgery separately from progression to drug treatment, and also to use development of a complication of CD (fistula or stenosis) as another marker of sensitivity and specificity. The full-text publication presenting results for PredictSURE-IBD indicates that those in the validation cohort were treated at the discretion of the treating clinician. IPD data provided by PredictImmune indicate that, of those in the validation cohort, 21.2% (14/66) received a therapy other than glucocorticosteroid at entry. Choice of and time to first treatment escalation is likely to be influenced by the response to treatment at study entry, which in turn is likely to be affected by the risk of following a severe course of CD. The EAG recognises that the study is of a more pragmatic design but considers that, as people within the validation cohort have not followed a standardised algorithm of treatment, induction treatment would likely influence the timing and frequency of subsequent escalations, and consequently sensitivity and specificity. The risk of bias of the study as assessed by the QUIPS tool was determined to be low across most domains. Considering the caveats highlighted by the EAG, together with the small sample size (n = 66) informing calculation of prognostic accuracy for PredictSURE-IBD, the EAG considers that the results are potentially unreliable and should be interpreted with caution.

Clinical outcomes

Clinical outcomes that could be considered proxies for predicting prognosis are those that are typically associated with following a severe course of CD, including higher risk of developing a complication of CD (fistula or stenosis), of needing CD-related surgery, and a shorter time to and increased frequency of treatment escalations.

Seven studies⁶⁷^,⁶⁹^,⁷²^,⁷⁵^,⁷⁶^,⁷⁸^,⁷⁹ evaluating the IBDX kit were deemed to be of relevance to the review, all of which were observational in nature: three studies were prospective cohorts⁷⁵^,⁷⁶^,⁷⁹ and three were of a cross-sectional design.⁶⁹^,⁷²^,⁷⁸ Of those studies reporting estimates of effect, people enrolled in the studies predominantly had an established, rather than a recent, diagnosis of CD. Clinical heterogeneity across studies in terms of various characteristics (prior complication versus no complication, previous IBD-related surgery or no surgery, and unclear whether people had active disease at baseline) was noted, which led to a determination of moderate risk of bias for the population domain based on the QUIPS tool. Two prospective cohort studies reported increased risk of experiencing a complication or of requiring surgery for those testing positive for at least two of the six biomarkers included in the IBDX kit. In addition, some estimates were informed by small sample sizes. Risks of experiencing a complication by positive biomarker status were reported to be:

OR 1.5 (95% CI 1.3 to 1.9; p < 0.001; n unclear) based on positivity for a median of two biomarkers
HR 2.5 (95% CI 1.03 to 6.1; p = 0.043; n = 20 with no prior complication or surgery) based on positivity for at least two biomarkers
HR 2.6 (95% CI 0.92 to 7.2; p = 0.072; n = 20 with no prior complication or surgery) based on positivity for at least three biomarkers.

Considering surgery, three studies reported on the increased risk of surgery. One study reported a trend towards a larger proportion of people with CD requiring abdominal surgery with increasing number of positive biomarkers (n = 517; p < 0.0001 across the groups). Other estimates of higher risk of requiring surgery were:

OR 1.5 (95% CI 1.3 to 1.8; p < 0.001; n unclear) based on positivity for a median of two biomarkers
HR 3.6 (95% CI 1.2 to 11.0; p = 0.023; n = 14 with no prior complication or surgery) based on positivity for at least two biomarkers
HR 2.8 (95% CI 0.80 to 9.6; p = 0.11; n = 14 with no prior complication or surgery) based on positivity for at least three biomarkers.

Estimate of the increased risk of treatment escalation by number of positive biomarkers was not available for IBDX.

In a study evaluating IBDX in an adolescent population, results for adolescents aligned with those derived from an adult cohort, with a higher number of positive serum biomarkers associated with an increased risk of experiencing severe CD and requiring CD-related surgery. Research suggests that, although the levels of biomarkers fluctuate over time, the positive or negative status for an individual biomarker remains constant.

Estimates of increased risk of developing a complication or requirement for surgery were not available for PredictSURE-IBD. The study evaluating PredictSURE-IBD reported that those categorised as at high risk of following a severe course of CD had a statistically significantly higher risk of first treatment escalation compared with those designated as at low risk, with a HR of 2.65 (95% CI 1.32 to 5.34; p = 0.006). As noted earlier, based on the IPD supplied by PredictImmune, some of the validation cohort received a therapy other than glucocorticosteroid at entry. The EAG considers that choice of and time to first treatment escalation is likely to be influenced by the response to treatment at study entry, which in turn is likely to be affected by the risk of following a severe course of CD. As people in the validation cohort have not followed a standardised algorithm of treatment, the EAG considers analysis of time to first treatment escalation as subject to a level of bias, the direction of which is unclear. The EAG reiterates that clinical experts fed back that it would be useful to assess CD-related surgery as an independent outcome.

Given the disparity in the clinical outcomes assessed for the IBDX and PredictSURE-IBD, the EAG considers that no conclusions can be drawn on the comparative effectiveness of the two tools in stratifying people by the risk of a severe course of CD.

Copyright © 2021 Edwards et al. This work was produced by Edwards et al. under the terms of a commissioning contract issued by the Secretary of State for Health and Social Care. This is an Open Access publication distributed under the terms of the Creative Commons Attribution CC BY 4.0 licence, which permits unrestricted use, distribution, reproduction and adaption in any medium and for any purpose provided that it is properly attributed. See: https://creativecommons.org/licenses/by/4.0/. For attribution the title, original author(s), the publication source – NIHR Journals Library, and the DOI of the publication must be cited.

Bookshelf ID: NBK569065

Contents

< Prev Next >

PubReader
Print View
Cite this Page
Edwards SJ, Barton S, Bacelar M, et al. Prognostic tools for identification of high risk in people with Crohn’s disease: systematic review and cost-effectiveness study. Southampton (UK): NIHR Journals Library; 2021 Mar. (Health Technology Assessment, No. 25.23.) Chapter 3, Results of the review of prognostic test accuracy and clinical impact.
PDF version of this title (2.0M)

Results of the review of prognostic test accuracy and clinical impact - Prognost...
Results of the review of prognostic test accuracy and clinical impact - Prognostic tools for identification of high risk in people with Crohn’s disease: systematic review and cost-effectiveness study
Other information - Uterotonic drugs to prevent postpartum haemorrhage: a networ...
Other information - Uterotonic drugs to prevent postpartum haemorrhage: a network meta-analysis
Glossary - Cognitive therapy compared with CBT for social anxiety disorder in ad...
Glossary - Cognitive therapy compared with CBT for social anxiety disorder in adolescents: a feasibility study
Assessment of clinical effectiveness - Alpha-2 agonists for sedation of mechanic...
Assessment of clinical effectiveness - Alpha-2 agonists for sedation of mechanically ventilated adults in intensive care units: a systematic review
Methods - BREATHER (PENTA 16) short-cycle therapy (SCT) (5 days on/2 days off) i...
Methods - BREATHER (PENTA 16) short-cycle therapy (SCT) (5 days on/2 days off) in young people with chronic human immunodeficiency virus infection: an open, randomised, parallel-group Phase II/III trial

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

Bookshelf

Prognostic tools for identification of high risk in people with Crohn’s disease: systematic review and cost-effectiveness study.

Chapter 3Results of the review of prognostic test accuracy and clinical impact

Quantity and quality of the available evidence

Results of the systematic literature search

FIGURE 3

TABLE 4

Ongoing studies

Evidence provided by the companies

Glycominds

PredictImmune

Assessment of prognostic test accuracy

Characteristics of included studies

Quality assessment of included studies

TABLE 5

Accuracy of prognostic tests

IBDX

PredictSURE-IBD

Sensitivity and specificity

TABLE 6

Predictive value

Results for clinical outcomes

IBDX

Developing a complication

Requirement for surgery

PredictSURE-IBD

Time to treatment escalation

Comparison of IBDX and PredictSURE-IBD

Summary of findings for prognostic test accuracy

Sensitivity, specificity and negative predictive value

Clinical outcomes

Views

In this Page

Other titles in this collection

Recent Activity