U.S. flag

An official website of the United States government

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Atkin W, Wooldrage K, Shah U, et al. Is whole-colon investigation by colonoscopy, computerised tomography colonography or barium enema necessary for all patients with colorectal cancer symptoms, and for which patients would flexible sigmoidoscopy suffice? A retrospective cohort study. Southampton (UK): NIHR Journals Library; 2017 Nov. (Health Technology Assessment, No. 21.66.)

Cover of Is whole-colon investigation by colonoscopy, computerised tomography colonography or barium enema necessary for all patients with colorectal cancer symptoms, and for which patients would flexible sigmoidoscopy suffice? A retrospective cohort study

Is whole-colon investigation by colonoscopy, computerised tomography colonography or barium enema necessary for all patients with colorectal cancer symptoms, and for which patients would flexible sigmoidoscopy suffice? A retrospective cohort study.

Show details

Chapter 2Methods

The SOCCER study was proposed as a follow-on study from the SIGGAR multicentre randomised controlled trials.18,19,23 The SOCCER study is a retrospective analysis of a cohort of patients referred to secondary care who were assessed as potentially eligible for the SIGGAR trials, and includes patients regardless of whether or not they had been subsequently randomised. This approach was used to enhance the generalisability of the SOCCER study findings relating to symptoms at presentation, and subsequent cancer diagnosis, to the wider secondary care population. The clinical trial report for the SIGGAR trials, which contains information pertaining to trial design and full methodology, has been published elsewhere.19 Methodology relevant to the SOCCER study cohort and analyses will be presented in this report. The reporting of this study is in accordance with the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) guidelines.72

Research governance and ethics arrangements

The SIGGAR trials were registered in the International Standard Randomised Controlled Trial Number registry under ISRCTN95152621. Imperial College London was the nominated sponsor for the SIGGAR and SOCCER studies. The research governance procedures in place at Imperial College London ensured that all appropriate regulations and guidelines were followed.

A study steering committee was convened to provide independent oversight of the SOCCER study and expert advice on aspects of the study. This committee also included a patient representative who provided input on study plans from the patient perspective.

Ethics approval and permission to use patient data without consent

Ethics approval for the SIGGAR randomised controlled trials was obtained from the Northern and Yorkshire Multi-Centre Research Ethics Committee on 15 January 2004 and, subsequently, from individual participating centres. Research ethics approval for the SOCCER study was granted as an extension to the SIGGAR randomised controlled trials by the North East (York) National Research Ethics Service. The SOCCER study was also granted Section 251 support under the National Health Service Act 200673 for the processing of patient identifiable information without consent [references ECC 5–04(E) 2011 and 14/CAG/1043]. To comply with the conditions of Section 251 support, the Cancer Screening and Prevention Research Group at Imperial College London (responsible for all aspects of trial and data management for this study) assessed its data handling procedures against Department of Health information governance standards. The Cancer Screening and Prevention Research Group holds an Information Governance toolkit to demonstrate compliance with these standards.74

Recruitment

Selection of participating hospitals

Patients were recruited to the SIGGAR trials from hospital trusts in which a radiologist member of SIGGAR had expressed a prior interest in participating. Centres were expected to have an established and efficient fast-track referral system for patients with suspected CRC (usually an identified diagnostic clinic) to facilitate recruitment, and a named colorectal nurse specialist or researcher who would take responsibility for recruitment.

The final 21 NHS hospitals were selected via a ‘sham randomisation’ that identified centres likely to achieve a minimum monthly recruitment target (at least 18 patients).23 These 21 hospital centres included teaching and general hospitals and were distributed across England (see Appendix 1).

SOCCER eligibility criteria

Patients who were considered potentially eligible for the SIGGAR trials were considered eligible for the SOCCER study, irrespective of whether or not they were randomised, unless they met the SOCCER study exclusion criteria.

SIGGAR trial eligibility assessment

Patients were assessed for eligibility for the SIGGAR trials between March 2004 and December 2007. Consecutive potentially eligible patients were identified by colorectal nurse specialists, research nurses or radiographers at these centres from CRC and gastroenterology outpatient clinics (including fast-track CRC clinics) and procedural lists (endoscopy and radiology). Patients who met the following SIGGAR trials inclusion criteria, and did not meet the exclusion criteria were considered potentially eligible for inclusion in the SOCCER study.

SIGGAR trials inclusion criteria
  • Had been referred to hospital for symptoms or signs suggestive of CRC.
  • Were aged ≥ 55 years.
  • Were clinically judged to need a WCI.
  • Were clinically judged as fit to undergo full bowel preparation.
SIGGAR trials exclusion criteria
  • Had a known genetic predisposition to cancer, for example familial adenomatous polyposis or hereditary non-polyposis CRC.
  • Had a known diagnosis of ulcerative colitis or Crohn’s disease.
  • Had undergone a WCI in the previous 6 months.
  • Had been referred for a WCI to follow up a previously diagnosed CRC.
SOCCER study exclusion criteria

Patients were randomised during the SIGGAR trials (CT colonography vs. colonoscopy or CT colonography vs. barium enema) only if they met eligibility criteria and had given informed consent, and if a consultant had consented to their participation. Some patients who were potentially eligible were, therefore, not randomised during the SIGGAR trials. These patients were included in the SOCCER study analysis unless they fulfilled the following exclusion criteria:

  • declined consent
  • gave consent and were randomised but subsequently dissented
  • were judged unable to give informed consent
  • had no symptoms recorded at presentation
  • were untraceable for follow-up CRC diagnoses through the Health and Social Care Information Centre (HSCIC)
  • had a duplicate study record.

Data collection

Patient data used in the SOCCER study were sourced from the SIGGAR trials and additional data were obtained from hospital records. All data were held in a de-identified format in a separate SOCCER study database.

Baseline characteristics

Patient baseline characteristics were collected when patients were originally assessed for eligibility for the SIGGAR trials, and data were collected for both randomised and non-randomised patients. This information had been recorded on the bespoke SIGGAR trials pro forma and included patient age, sex, date of referral, the urgency of the referral (‘2-week wait’, ‘urgent’, ‘soon’ or ‘routine’), the referral route, the diagnostic investigations requested, the outpatient clinic type (if applicable), other relevant diagnoses and whether or not the patient had initially been investigated by FS. For randomised patients, details of the main SIGGAR trial interventions (barium enema, CT colonography and colonoscopy) and outpatient appointments were also recorded on the trial pro forma.

Symptoms and clinical signs

Clinical features at presentation were recorded for potentially eligible patients at baseline during eligibility assessment for the SIGGAR trials. The SIGGAR trials pro forma contained tick boxes to record symptoms and clinical signs under ‘details/reason for referral’. Tick boxes were included for ‘rectal bleeding’, ‘abdominal pain’, ‘anaemia’, ‘weight loss’, ‘CIBH’ and ‘positive FOBt’. A free-text field to record additional symptoms was also included on the pro forma. Entries in the free-text field were manually coded by the trial team for use in the analysis. They were categorised into ‘abdominal mass’, ‘bloating/flatulence’, ‘tiredness/weakness’, ‘anal symptoms’, ‘nausea/vomiting’, ‘back pain’, ‘upper GI symptoms’, ‘rectal mass’, ‘family history’, ‘history of polyps’, ‘presence of cancer antibodies’, ‘elevated C-reactive protein’ and ‘liver problems’. A second free-text field to record the details of the CIBH was also included on the pro forma and was manually coded and categorised to ‘looser and/or more frequent’, ‘harder and/or less frequent’, ‘variable’ or ‘unspecified’.

Data pertaining to clinical features at presentation were also sourced from hospital records. Radiology, endoscopy and pathology records were requested for patients in the SOCCER study cohort and were interrogated for information concerning symptoms/clinical signs (specifically abdominal mass, rectal bleeding, abdominal pain, weight loss, a CIBH and rectal mass). Relevant data were extracted from text fields. For further details see Data extraction.

Anaemia

Anaemia and IDA are clinical signs that have been associated with proximal colon cancer in previous clinical studies33,56 and were therefore of key importance to the SOCCER study. Iron deficiency is the most common cause of anaemia and reflects more severe stages of the disease, when the body is no longer able to replenish iron stores.64 Decreased MCV (microcytic anaemia) is often assumed to result from iron deficiency but is relatively non-specific for IDA;64 nonetheless, decreased MCV can be diagnostically useful in the investigation of GI causes of iron deficiency,65 for example when serum ferritin levels are not available. However, decreased serum ferritin levels are the most reliable sign for the diagnosis of IDA.64

Owing to the significance to our study of anaemia status, we ideally would have had data on full blood counts for all patients in order to apply a uniform definition of anaemia and consistently classify the anaemia status of each patient based on their blood test results. Although a tick box for anaemia as a reason for referral had been included on the SIGGAR trials pro forma, the classification of anaemia was not necessarily consistent between hospitals. Therefore, we separated patients into those with blood test data and those without.

For patients for whom blood test data were available, we used laboratory data to confirm anaemia and excluded the tick box from our definition of anaemia. For these patients, anaemia status at presentation was determined from blood tests taken within 6 months before the date of referral (in the SIGGAR trials) and 3 months after. For patients with a diagnosis of CRC, any blood tests dated on or after the date of diagnosis were excluded. Blood test parameters [Hb level (g/dl), MCV (fl) and serum ferritin (µg/l)] were collected from hospital haematology databases (for further details see Data Extraction). When multiple results for a parameter were available for an individual patient, the lowest recorded value (within the relevant time period) was selected.

We considered four different definitions of anaemia: ‘broad anaemia’, ‘strict anaemia’, ‘broad IDA’ and ‘strict IDA’. Broad anaemia was defined solely by Hb level: < 13 g/dl in males and < 12 g/dl in females.Strict anaemia was defined as a Hb level of < 11 g/dl in males and < 10 g/dl in females, or a Hb level of ≥ 11 g/dl but < 13 g/dl in males or ≥ 10 g/dl but < 12 g/dl in females accompanied by microcytosis (MCV < 80 fl/cell) or low ferritin (< 20 µg/l). Broad IDA was defined as a Hb level of < 13 g/dl in males and < 12 g/dl in females accompanied by microcytosis (MCV < 80 fl/cell) or low ferritin (< 20 µg/l) and strict IDA was defined as a Hb level of < 13 g/dl in males and < 12 g/dl in females accompanied by low ferritin (< 20 µg/l).

For patients without blood test data, in the absence of any available full blood counts, we used the anaemia tick box on the SIGGAR trials pro forma to define the presence or absence of anaemia. In the analysis of the overall SOCCER study cohort, anaemia was defined as a Hb level of < 13 g/dl in men or < 12 g/dl in women for patients with blood test data and by using the anaemia tick box on the pro forma for patients without blood test data.

Flexible sigmoidoscopy

Details of FS procedures performed at the time of referral had been recorded on a separate pro forma during the SIGGAR trials and included room entry and exit times; procedure start and stop times; overall assessment of the examination by the endoscopist (‘very easy’, ‘quite easy’, ‘quite difficult’ or ‘very difficult’); assessment of bowel preparation quality by the endoscopist (‘excellent’, ‘good’, ‘adequate’ or ‘poor’); the segment of the colon reached and reasons (if any) the examination could not be completed; overall findings and details of polyps, cancers or biopsies and diverticula (with a severity rating of ‘none’, ‘mild’, ‘moderate’ or ‘severe’); and adverse events occurring during the procedure. Unfortunately, during scrutiny of these records, it was discovered that in many cases the information included had been taken from the electronic endoscopy record and that many items were missing.

Data extraction

Additional pathology, endoscopy, radiology and haematology data were collected from the relevant hospital databases for the SOCCER study patient cohort. When possible, data were bulk extracted; when this was not possible, data were extracted manually, either by staff at participating hospitals or by members of the study team who had been granted permission to do so.

A few databases at participating centres had reporting systems that permitted bulk extraction of the data according to specific criteria. When possible, data were extracted with the help of hospital staff who were familiar with the systems. For most hospital databases, the application interface was not designed for bulk data extraction, so acquiring and processing the data was complex and a number of problems were encountered; for example:

  • When the maintenance and support of the hospital databases had been outsourced to the database manufacturers, often only the manufacturers could help with extracting the data or by writing software enabling the study programmer to do so.
  • Some of the data were held on legacy systems; therefore, specialist support was required to extract data from these systems.
  • Information technology staff at the hospitals sometimes had to restore archived data temporarily so that they could be extracted.
  • Most hospitals had replaced databases over the intervening years and, therefore, some data were overlapping or were duplicated (e.g. records for the same patient were found on more than one system).
  • The data outputs from these databases were in a combination of structured and unstructured formats. Structured data could be cleaned easily and converted into a standardised format for uploading. In the case of unstructured data (usually large text fields), bespoke programs had to be written to extract, clean and convert the data into a suitable format.

Manually collected data

Data were collected manually in the following scenarios.

  • The hospital did not have the facilities or specialists to bulk extract the data for us.
  • The quoted cost for bulk extracting the data obtained from the suppliers of the system was excessive, making manual data collection more cost-effective.
  • It was possible to bulk extract the data only from a data warehouse/reporting system (not the main databases in which the raw data were held) and our findings showed that the data warehouse was not always up to date. In this scenario we collected the data manually from the applications that were linked to the main databases.
  • The hospital was unable to find specialists to help with bulk extraction within our required time frame, so we manually collected the data in order to meet our data collection deadlines.
  • Some hospitals were able to extract the type of test/examination and date but not provide a report. We used this information to identify the records of interest and narrowed down the task of manual data collection to the selected records.
  • The data were held on legacy systems and the hospital did not have a maintenance contract with the suppliers, with the result that there was no option but to extract the data manually.

Study researchers visited hospitals to manually collect data in a bespoke Microsoft Access® database (2010, Microsoft Corporation, Redmond, WA, USA) or spreadsheet which included patient study numbers. Patient identifiers from the SIGGAR trials were held at hospitals and were used to search for patients on hospital databases. De-identified data were returned to the study team, and the study programmer cleaned and uploaded it to a master SOCCER Oracle database (Oracle Database 11g Enterprise Edition, Oracle Corporation, Redwood Shores, CA, USA).

Data handling and quality assurance

The SOCCER database was created to store data in a standardised, structured format using a schema structure similar to the SIGGAR database. To facilitate statistical analysis, the data were classified into quantitative and qualitative variables, ensuring that data from different hospitals were classified in the same way as in the SIGGAR database, as there was wide variation in the raw data (e.g. field names were different, some data were coded or semicoded, whereas other data were in free-text fields, and data types varied).

The study programmer cleaned and uploaded the data from different hospitals into a standard database schema, and this involved several steps:

  • identifying the fields containing information required for the study, taking into account varying field names, data types and value representations
  • extracting information from free-text fields using programming techniques such as ‘regular expressions’ and ‘fuzzy matching’ and translating them into the codes used on the master database
  • translating values in the raw data into those used on the master database, if the information was already in a coded structured format (e.g. converting units for blood tests)
  • identifying and consolidating overlapping data and removing any redundancies (e.g. the same endoscopy or pathology reports extracted from two different systems)
  • identifying and correcting errors in the data (e.g. misspellings, different date formats or truncated data fields)
  • requesting missing data (e.g. missing patients, missing time periods, missing procedure types).

A graphical user interface that linked to the SOCCER database was designed, allowing the study researchers to efficiently read, interpret, check and manually code the endoscopy, pathology and symptoms data sets. Study researchers interrogated and linked the clinical reports and categorised the data in the same way as in the SIGGAR database. Reference data (sometimes referred to as look-up tables) were used to categorise and define permissible values for data fields on the database. This method restricted the values to be recorded in a data field, thereby preventing coding errors and also ensuring uniformity of data from different hospitals. The study researchers systematically reviewed a blinded random sample of records that had been coded by other study researchers to ensure accuracy and consistency.

Health and Social Care Information Centre colorectal cancer diagnoses

Colorectal cancer diagnoses within 3 years of referral were obtained from the HSCIC. A unique study number was allocated to all patients during the SIGGAR trials and the same study number was used for the SOCCER study cohort. This unique study number was used to collect cancer registrations from the HSCIC through their data linkage service. For patients who had not been randomised in the SIGGAR trials, participating hospitals provided the HSCIC with patient identifiers (name, date of birth, NHS number, etc.) to enable data linkage, as identifiers were not held by the central trial office for the non-randomised cohort. Hospital teams worked under instruction of the central trial team to prepare the data in the electronic format specified by the HSCIC. When local assistance was not available to collate the data required by the HSCIC, central trial team staff members were issued with letters of access by the hospitals concerned and visited sites personally to complete this task. For the cohort of patients who were randomised in the SIGGAR trials, the HSCIC already held the records and so no new information needed to be supplied to them. Following data linkage by the HSCIC, the central trial office received cancer registrations from the HSCIC for the full SOCCER study cohort in a de-identified format for analysis, which were linked only by study number.

Statistical methods

Sample size

Our original sample size assumed that we would have a total cohort of 8484 patients, in whom 421 distal cancers and 68 proximal cancers would be diagnosed. The analysis plan presented estimates for the precision for the estimated sensitivity under specific regimens, with the precision being conditional on the number of cancers diagnosed. We assumed that under a regimen offering WCI to patients with IDA and/or an abdominal mass we would detect 470 of the total 489 cancers, giving a sensitivity estimate of 96.1% with a 95% confidence interval (CI) of 94.0% to 97.6%. Although the final analysed cohort of 7380 patients was smaller than the proposed sample size, the number of cancers diagnosed was greater than expected, with a total of 429 distal cancers and 127 proximal cancers, thus providing a greater level of precision than originally estimated.

Primary outcome

The primary outcome was the diagnostic yield of distal or proximal cancer within 3 years of presentation at clinic, by symptom category at presentation.18,19,23 CRC diagnoses were sourced from the HSCIC and from patient medical records. For cancers confirmed by a hospital pathology report but without corresponding verification by HSCIC, the local pathology report was taken as conclusive evidence of cancer. For the purposes of this study, CRCs included all cancers with International Classification of Diseases and Related Health Problems, Tenth Edition,75 site codes C18–C21 and with an International Classification of Diseases for Oncology, Third Edition,76 morphology code of 8000/3, 8010/3, 8070/3, 8123/3, 8140/2, 8140/3, 8144/3, 8210/3, 8261/2, 8261/3, 8263/2, 8263/3, 8480/3, 8481/3, 8490/3, 8510/3 or 8560/3. CRCs were classified as ‘distal’ if they were located in the anus, rectum, sigmoid colon or descending colon. Cancers located proximal to the descending colon were classed as ‘proximal’. Synchronous distal and proximal CRCs were included as separate cancers in the analysis.

Secondary outcomes

The secondary outcomes were the sensitivity of symptoms and symptom categories for distal and proximal cancer, the percentage of patients with cancer who had distal CRC by symptom and symptom category, the number needed to be examined to diagnose one distal or proximal cancer by symptom and symptom category at presentation, the miss rate for CRC at FS in the subgroup of patients with FS performed at baseline and the prevalence of proximal and distal CRC in the study cohort.

Analysis

Outcomes were first analysed separately in the cohort with blood test data and the cohort without blood test data. The findings in the two cohorts were then compared and outcomes analysed in the total combined cohort.

Sensitivity was calculated as the proportion of CRCs by cancer site (proximal/distal) that were identified by a particular symptom or symptom combination. Specificity was defined as the proportion of patients without CRC by cancer site who presented without a particular symptom/symptom combination.

Diagnostic yields were presented as percentages. The number needed to be examined was calculated as the inverse of the diagnostic yield. Binomial exact 95% CIs were calculated for key outcomes. The distributions of categorical variables (patient characteristics, referral details, symptoms, signs, indications and cancer outcomes) were compared between cohorts using Pearson’s chi-squared test or Fisher’s exact test, as appropriate, and all tests were two-tailed. Comparisons were made between: cohorts with and without blood test data; men and women; patients with distal cancer and patients with proximal cancer; and patients with and without FS performed at the time of referral. Data were analysed using Stata version 13.1 (StataCorp LP, College Station, TX, USA).

Copyright © Queen’s Printer and Controller of HMSO 2017. This work was produced by Atkin et al. under the terms of a commissioning contract issued by the Secretary of State for Health. This issue may be freely reproduced for the purposes of private research and study and extracts (or indeed, the full report) may be included in professional journals provided that suitable acknowledgement is made and the reproduction is not associated with any form of advertising. Applications for commercial reproduction should be addressed to: NIHR Journals Library, National Institute for Health Research, Evaluation, Trials and Studies Coordinating Centre, Alpha House, University of Southampton Science Park, Southampton SO16 7NS, UK.

Included under terms of UK Non-commercial Government License.

Bookshelf ID: NBK464601

Views

  • PubReader
  • Print View
  • Cite this Page
  • PDF version of this title (622K)

Other titles in this collection

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...