Theme 2: design and adherence to protocol

James Raftery; Amanda Young; Louise Stanton; Ruairidh Milne; Andrew Cook; David Turner; Peter Davidson

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Raftery J, Young A, Stanton L, et al. Clinical trial metadata: defining and extracting metadata on the design, conduct, results and costs of 125 randomised clinical trials funded by the National Institute for Health Research Health Technology Assessment programme. Southampton (UK): NIHR Journals Library; 2015 Feb. (Health Technology Assessment, No. 19.11.)

Cover of Clinical trial metadata: defining and extracting metadata on the design, conduct, results and costs of 125 randomised clinical trials funded by the National Institute for Health Research Health Technology Assessment programme

Clinical trial metadata: defining and extracting metadata on the design, conduct, results and costs of 125 randomised clinical trials funded by the National Institute for Health Research Health Technology Assessment programme.

Show details

Contents

< Prev Next >

Chapter 5Theme 2: design and adherence to protocol

This chapter considers questions regarding the reporting of HTA-funded trials. The relevant literature is noted before summarising the piloting of the questions. The degree to which trial reports met the CONSORT checklist is examined along with how well they reported on trial design, interventions and controls. Comparisons are made between what was planned in protocols and what was reported in the monographs.

Introduction

Well-conducted RCTs have become the ‘gold standard’ for evaluating interventions in health care. The WHO defines a clinical trial as ‘any research study that prospectively assigns human participants or groups of humans to one or more health-related interventions to evaluate the effects on health outcomes’ [reproduced, with the permission of the publisher, from Bulletin of the World Health Organization – Guidelines for Contributors. Geneva: WHO; 2006. URL: www.who.int/bulletin/volumes/84/current_guidelines/en/ (accessed 3 October 2014)].

An explanatory paper to the CONSORT 2010 statement summarises why particular design features help reduce bias and improve study power.⁴⁴ Randomisation should be rigorously done and allocation to groups should be adequately concealed from participants and researchers. Blinding (or masking) should be maintained when possible for participants and clinicians, and particularly for observers who measure outcomes. Participants lost to follow-up should be minimised, accounted for and analysed in their randomised groups. Properly designed trials reduce susceptibility to bias, whether due to selection, outcome reporting or attrition bias. The sample size should be sufficient to give adequate precision to estimate the effect of the intervention in the relevant wider population.

The design of the trial should be fully recorded in the study protocol. That protocol should be carefully followed. Failure to follow a protocol happens for many reasons, some beyond the control of the investigators. For example, the introduction or removal of outcome measures later in a trial (‘post hoc’) raises the possibility of outcome reporting bias and increases the play of chance in the trial through multiplicity of analysis.

For a reader to judge the quality (validity) of a completed trial, the design, methods, results and interpretation must be fully and fairly reported.

Good evidence in the literature shows that many trials are poorly planned, conducted or reported, or all of these.⁴⁴^,⁴⁵ This chapter describes our investigation of the design, conduct and reporting of HTA-funded trials. We provide a descriptive analysis of the design of the interventions tested. We compare the planned (in the protocol) and reported (published) methods to identify deviations from protocol and post-hoc analysis. We assess the quality of reporting of the trials against the CONSORT statement.

A tool developed to enhance the quality of reporting and reduce methodological flaws, CONSORT was established in 1996 in response to concern about the quality of reporting (www.consort-statement.org/about-consort/history). The CONSORT statement was developed by evidence and expert consensus. Widely used by authors and journals, it has been adopted by the HTA monograph series. Extensions to the main CONSORT statement include the reporting of special types of trials, such as pragmatic trials, non-inferiority or equivalence trials and cluster trials. For this study, we focus on the main CONSORT statement, which applies particularly to parallel arm trials.

In 2008, Enhancing the Quality and Transparency of Health Research (EQUATOR) was established to promote accurate reporting of health research and provide an international network to improve the quality of scientific publications. The EQUATOR website acts as a library of reporting guidelines in health research (www.equator-network.org/).

The CONSORT statement (and particularly the checklist), although useful, has limitations. It can only list the categories of information that should be reported in most trials; it cannot assess the completeness of that information. Selective reporting of whole trials⁴⁶ or outcomes¹⁰ leads to biased estimates of effectiveness through ‘publication bias’ and (closely related) ‘outcomes reporting bias’, usually resulting in overestimates of effectiveness.

Absence of publication or partial publication can only be assessed by knowing what research has or should have been reported. Concerns over the lack of availability of trial results prompted the development of clinical trials registers such as the ICTRP (the Registry Platform) based at the WHO. Ghersi et al.²² emphasised the need for greater accessibility of trial data such as protocols and final reports. Transparency is needed to overcome academic and/or commercial vested interests. We address transparency by comparing the planned research with the reported methods and results. This is usually from the study protocol which, for recent trials, is usually in the public domain through the HTA website. For the early trials in the cohort, conducted when submission of detailed protocols was not normally required, we used the full application forms for grant funding. These contain a detailed description of the planned study, but not changes that may legitimately occur before the trial starts, for instance at the request of the ethics committee.

Any assessment tool should need to include not only the completeness of its reporting, but also an assessment of the design or conduct of the study. The Cochrane Collaboration’s handbook⁴⁷ suggests that the assessment of validity of trials is best done in a framework rather than using a specific tool (available from www.cochrane-handbook.org). A full assessment of the trials using such a framework was beyond the scope of this study. Instead, we provide a descriptive analysis of the design of the trials (including the study interventions). We also compared key features of the study design in the protocol and final report to assess adherence to protocol and completeness of reporting of planned primary outcomes and analyses. More detailed analysis and statistical methods are provided in Chapter 7.

Questions addressed

Box 3 shows the questions explored in this section.

BOX 3

The research questions answered under this theme T2.1. Was the trial adequately reported? (Using the revised 2010 CONSORT checklist for core trial information, methods and results).

Methods

Seventeen questions were piloted, as shown in Box 3. Two were considered not feasible: one on pragmatic–explanatory continuum indicator summary (PRECIS),⁴⁸ the other on complex interventions. The PRECIS question has 10 headings with up to six subheadings requiring approximately 68 data fields, or around one-third of the total fields for this theme. Besides requiring considerable data extraction, matters of judgement were also likely to be required.

The issue of whether or not the intervention was complex as defined by the MRC involved four headings, each with three subheadings, thus requiring 12 questions as well as judgements on the interactions between them. Given the lack of data on basic aspects of the HTA RCTs, such as the number and types of interventions, more detailed work such as that required by PRECIS⁴⁸ or complex interventions seemed a task for further work.

The main information sources were the final protocol (if this was not available, then the funding application form) and the published monograph.

Denominators

The unit of analysis for reporting was 123 trials, that is the 125 trials identified from the 109 monographs but excluding two pilot trials.

Results

Question T2.1: was the trial adequately reported? (Using the revised 2010 Consolidated Standards of Reporting Trials checklist for core trial information, methods and results)

The CONSORT checklist has six sections/topics and 37 items (the CONSORT statement checklist lists 25 items, but there are subsections listed as a or b for 12 items, making a total of 37 CONSORT items). We extracted data on four sections: ‘title and abstract’, ‘introduction’, ‘methods’ and ‘results’. No data were extracted for 16 of the 37 items. Out of these 16, six items were under ‘discussion’ and ‘other information’, four were under ‘results’, five were under ‘methods’ and one was under ‘introduction’. Data on sample size and primary outcome are further discussed in Chapter 7. The 21 items included in this chapter are listed in Table 12.

TABLE 12

Consolidated Standards of Reporting Trials items included in this chapter

Consolidated Standards of Reporting Trials checklist items: 1 and 2

These two sections cover the title, abstract and introduction. Of the 123 trials, 100 reported that they were randomised clinical trials in their titles, 122 had a structured summary and all trials included their objectives and hypotheses in the introduction (Table 13).

TABLE 13

Consolidated Standards of Reporting Trials items 1 and 2

Consolidated Standards of Reporting Trials checklist items: methods (items 3–12)

Trial design

Of the 123 included trials, the unit of analysis in 111 was individual patient. Twelve trials were cluster randomised.

Participants

One hundred and twenty-one trials reported the eligibility criteria and provided details about the setting and location where data were collected (98.4% for both CONSORT items).

Interventions and controls

All trials provided sufficient information about the intervention groups. For drug interventions, the drug name, dose and method of administration were provided. The reporting of the control group was, however, less complete.

Most trials compared the intervention with ‘standard care’ (52.8%, 65/123). Next most common were placebo (8.1%, 10/123), ‘next best service’ (2.4%, 3/123) and ‘no treatment’ (1.6%, 2/123). Forty-three trials (35%) were classified as ‘control undefined’ as they provided insufficient detail (see Table 14).

TABLE 14

Consolidated Standards of Reporting Trials items 3–12

Outcomes

Forty trials reported more than one primary outcome. Of these, it was not possible to determine the ‘main’ primary outcome from the monograph for two trials. Reporting of the primary time point was a weakness in some trials. Eighty out of 123 trials covered this CONSORT item sufficiently (65%), the rest did not (see Table 14).

Sample size

For the 109 superiority trials, the four elements of the sample size required by the CONSORT guidelines (2010) were considered. Sixty per cent of superiority trials (66/109) reported all elements as detailed by the CONSORT guidelines (see Table 14). For each of the four elements:

94.5% (103/109) reported the statistical power.
90.8% (99/109) reported the alpha error level.
Out of the 109 trials, 45 had a binary outcome and 84.4% (38/45) sufficiently reported the estimated outcomes in each group.
Of the 109 trials, 53 had a continuous outcome and 56.6% (30/53) sufficiently reported the standard deviation (SD) of the measurements.

[The remaining trials were classified as ‘time to event’ (n = 3) or ‘effect size’ (n = 6). Two trials had missing data for the comparative analysis.]

Sequence generation

The method used to generate the random allocation sequence was adequately described in 94.3% of trials (116/123) (CONSORT statement 8a) and the type of randomisation including details about the restriction (CONSORT statement 8b) was adequately reported by 86.2% of trials (106/123) (Table 14). This item was not included in the 1996 version of CONSORT;⁵⁰ however, 18 out of 21 trials using this version still reported the item.

Consolidated Standards of Reporting Trials checklist items: results (items 13–19)

Most trials (97.6%, 120/123) included a flow diagram. Clear improvements were evident in the reporting of the losses and exclusions after randomisation for each intervention between the 1996⁵⁰ and 2001⁵¹ CONSORT statements. Only 61.9% of trials (13/21) using the 1996 CONSORT statement reported group losses and exclusions after randomisation. By comparison, of those using the 2001 revised CONSORT statement, 85.1% of trials (86/101) reported such losses and exclusions in the participant flow diagram (one trial used the 2010 CONSORT statement) (see Table 15). The reasons were not explored here.

TABLE 15

Consolidated Standards of Reporting Trials items 13–19

Baseline data

Most trials provided information about the demographic and clinical characteristics of the patient groups (91.9%, 113/123).

Harms

A notable improvement in the reporting of harms and unintended effects was found between the different CONSORT statements. Only 38.1% of trials (8/21) using CONSORT 1996 reported all important harms, compared with 64.4% of trials (65/101) reporting using the 2001 revised version (the one trial using the 2010 CONSORT statement did report all important harms or unintended effects). The results of 18 trials were unclear (Table 15).

Questions T2.2–T2.11: what were the design characteristics of the included trials?

Question T2.2: trial design framework

All 123 trials reported the design of the trial in the published monograph, with none indicating a change in design from that planned. More than four-fifths were designed as parallel arm trials (87%, 107/123), 10 were factorial (8.1%, 10/123) and six were designed as crossovers (4.9%, 6/123). Thirteen trials reported having included a preference arm to the main clinical trial (10.6%, 13/123).

Question T2.3: type of comparison

One hundred and nine trials (88.6%, 109/123) reported the type of comparison as superiority at the planning (as reported in the protocol or proposal) stage of the trial. The remaining 14 trials were either equivalence (6.5%, 8/123) or non-inferiority (4.1%, 5/123) and one trial did not report the planned type of comparison (0.8%, 1/123). The reported type of comparison was as planned for all non-inferiority trials (n = 5). There were three discrepant trials: two designed as superiority at the planning stage were actually reported as equivalence trials (ID36 and ID37) and one designed as an equivalence trial at the planned stage was actually reported as a superiority design (ID61).

Question T2.4: type of care

Eleven trials (8.9%) were reported to have been conducted in both primary and secondary care. More than half of the trials (56.1%, 69/123) were in secondary care and one-third (33.3%, 41/123) were in primary care. Two trials (1.6%, 2/123) were conducted in neither primary nor secondary care, one in a leisure centre (trial ID64) and the other in a school setting (trial ID66). These were classified as ‘other’ (see Table 16).

TABLE 16

Summary data of the trial characteristics

Question T2.5: type of setting

Almost half of the trials (44.7%, 55/123) were conducted only in a hospital setting, one-quarter (24.4%, 30/123) only in a general practitioner (GP) setting and eight (6.5%) in a community setting. Thirteen were categorised as ‘other type of setting/place’ (10.6%), which included settings such as non-NHS acupuncture clinics (trial ID39), community mental health services (trial ID42) and a health psychology department for chronic illness (trial ID60). In the remaining 17 trials, more than one type of setting was reported in the monograph (13.8%) (see Table 16).

Question T2.6: pilot and feasibility study

The NETSCC definitions of pilot and feasibility studies were used to determine whether the trial involved a pilot or feasibility study prior to conducting the main clinical trial (www.netscc.ac.uk/glossary). A pilot study is a rehearsal for the main study, whereas a feasibility study estimates important parameters for a main trial. Almost half of the trials (48%, 59/123) included a pilot study prior to conducting the main clinical trial. Six (4.9%, 6/123) conducted a feasibility study (Table 16).

Question T2.7: number of interventions

Three hundred and twenty-one interventions were reported from the 123 clinical trials (mean 2.6 interventions per trial).

Question T2.8: whether the intervention was an ‘add-on’ or ‘substitute’

Almost half (48%, 59/123) of the interventions described in the HTA monograph were reported as substitutions for another intervention and one-quarter (26%, 32/123) were reported as additional (Table 17).

TABLE 17

Whether the intervention group was an ‘add-on’ or ‘substitute’

Questions T2.9 and T2.10: type of intervention using the Health Research Classification System and Chalmers’ classification

Two classification systems were used to report the clinical trial interventions. Tables 18 and 19 illustrate the two classifications systems used (UKCRC HRCS³⁰ and Chalmers’ classification⁵²) by trial.

TABLE 18

The intervention classifications using the UKCRC HRCS (by trial)

TABLE 19

The intervention classification using Chalmers’ classification system for clinical trial intervention by trial

The UKCRC HRCS³⁰ was used to classify interventions in the included clinical trials. Twenty-two trials were classified as treatment evaluation of pharmaceuticals (17.9%), followed closely by organisation and delivery of services (‘health services’) (14.6%, 18/123) and treatment evaluation of medical devices (13.8%, 17/123).

The system of Chalmers et al.⁵² was also used to classify interventions. For those trials in which the technologies compared fell into the same class, the two commonest interventions were drugs (21.7%) and devices (15.7%).

It was not possible using this classification to classify 8 of the 123 trials as the interventions included two or more categories, such as ‘drug’ and ‘mixed and complex’; ‘drug’ and ‘service delivery’; ‘drug’ and ‘psychological therapy’; ‘drug’ and ‘education and training’; ‘surgery’ and ‘mixed and complex’; and ‘surgery’ and ‘drug’.

The type of intervention for three trials was classified as ‘other’, which referred to ‘nutritional supplement in addition to the normal hospital diet’ (trial ID54), ‘self-monitoring intervention’ (trial ID80) and ‘intravenous fluids were to be administered following primary patient assessment/to be withheld for the first hour of pre-hospital care’ (trial ID122).

Question T2.11: type of control

More than one-third (35%, 43/123) of controls could not be defined as placebo, standard care, no treatment or next best (Table 20). There appeared to be no improvement over time. It was not possible, based on the monograph, to report or extract data with certainty about the type of control used. For the 80 clinical trials where the type of control could be defined, 65 (81.3%) reported the control as standard care and 10 (12.5%) as placebo.

TABLE 20

Control type

Questions T2.12–T2.17: did the trial conform to the protocol?

To assess whether or not the design of the trial, as described in the protocol (or the application form) differed from that published in the HTA monograph, we compared the type of comparison, the number of arms and the primary outcomes.

Question T2.12: type of comparison

As shown in Table 21, 119 of 122 trials reported the type of comparison as had been planned. Three trials changed: two designed as superiority at the planning stage reported as equivalence trials (trials ID36 and ID37), and one designed as an equivalence trial reported as a superiority design (trial ID61). This trial reported a protocol change relating to the design of the trial: ‘A protocol amendment to amalgamate the two arms of the study and to compare cost-effectiveness of endoscopies in general rather than by the site of endoscopy was approved by MREC [Multicentre Research Ethics Committee] and the HTA programme.’⁵³

TABLE 21

Planned and actual type of comparison discrepancies

The planned type of comparison was not reported in one trial. The importance of this change was not clear because the status of the change was not recorded in the monograph. The change of comparison could have been agreed as part of the analysis plan by the trial Data Monitoring Committee before data were examined.

Question T2.13: the number of proposed and reported arms

The mean number of planned arms was 2.67 (n = 328) and the number reported in the monograph was 2.63 (n = 323). One hundred and seventeen trials had the same number of arms as planned (Table 22). The six discrepant trials (4.9%, 6/123) (trials ID2, ID4, ID19, ID21, ID86 and ID95) are explored in Table 23.

TABLE 22

Proposed and published number of arms

TABLE 23

Design characteristics of six trials with discrepant number of arms

The number of arms for the 123 trials was 323. The number of interventions was 321. The difference was due to one trial having a factorial design in which two interventions were tested on two different groups.

The design characteristics of the six trials with a discrepant number of arms (see Table 23) shows that four were published before 2005 (trials ID2, ID4, ID19 and ID21) and the remaining two in 2009 (trials ID82 and ID95). Of the four trials published before 2005, none submitted a project protocol.

Question T2.14: the number of proposed and reported trial centres

One hundred and three trials reported the number of proposed centres (Table 24), with a mean of 17.05 and a median of 5. One hundred and nineteen trials with available data reported the actual mean number of centres as 26.82, median 11.

TABLE 24

Summary data for the number of arms and number of centres

Thirty-nine trials (31.7%) had unchanged numbers of proposed centres. It was not possible to compare the planned and actual for 22 trials (17.9%). For the remaining trials (50.4%, 62/123), the number of planned and actual trial centres differed. Four-fifths of these (80.6%, 50/62) increased the number of centres from that planned, whereas the rest reduced it. Table 25 provides an overview of these discrepancies.

TABLE 25

Discrepancies reported between the proposed and published numbers of centres

Change in the number of centres plausibly reflects difficulties with recruitment. For those trials using fewer centres than planned, this may reflect difficulties in obtaining the support hoped for from centres. Changes in the number of centres may reduce generalisability if the centres lost or gained were unrepresentative.

Question T2.15: number of primary outcomes

Two hundred and one planned primary outcomes were reported from 122 trials (no data for one trial) (mean 1.65 and median 1 planned primary outcome per trial). Two hundred and twenty-eight primary outcomes were reported in the monographs (mean 1.85 and median 1 primary outcome per trial) (Table 26). Ninety-five trials (78%) reported the planned number of primary outcomes. Eighty trials (65%) reported one primary outcome as planned. In 27 trials (22.4%), discrepancies were reported between the proposed number of primary outcomes and that reported in the published monograph. Fourteen trials (52%, 14/27) increased the number of primary outcomes in the published monograph from the original proposed number, the other 13 (48%) reduced it. [The number of primary outcomes differs to that reported in Chapter 7. This is because of the denominator of the analyses. This chapter reports on 123 trials whereas Chapter 7 reports on the full cohort (n = 125 trials).]

TABLE 26

Proposed number of primary outcomes compared with the number actually reported in the published monograph

Three trials reported in one monograph (trials ID128, ID129 and ID134) had proposed four primary outcomes but actually reported 15. Another trial (ID79) planned seven primary outcomes but reported only two in the monograph.

Other changes in the primary outcomes are explored in Chapter 7 regarding statistical analysis.

Questions T2.16 and T2.17: reporting and specifying the time points of primary outcomes

Out of the 123 clinical trials, 68 (55.3%) did and 55 (44.7%) did not specify the planned primary time point in the protocol or application form. Of the 55 that did not specify the planned primary time point in the protocol or application form, 36 (65.5%) did not report the actual primary time point in the published monograph. Five trials (4.1%) reported discrepancies between the proposed and published reporting of the primary time point. In two, the proposed time point was 12 months, yet in the published monograph two primary time points (12 and 24 months, and 4 and 12 months, respectively) were reported (Table 27).

TABLE 27

Primary outcome time-point data

Analysis

Adherence to those sections of the CONSORT checklist that were examined was fairly high, but with some exceptions, including lack of detail on interventions, prespecified outcomes and sample size calculation. About one-third of trials failed on each of these. This was a greater problem for older trials.

A high proportion (88.6%, 109/123) were designed as superiority trials with parallel arms. Almost all the rest were equivalence or non-inferiority trials. Almost half of all trials conducted pilots but few had feasibility studies.

Around half of all interventions were substitutes for standard care and about one-third were add-ons. More than half of controls (53%, 65/123) were standard care, but one-third of controls (35%, 43/123) could not be classified.

Both the Chalmers and HRCS classification systems could be applied. Although some categories in the latter were not relevant, the Chalmers system provided less detail. The more comprehensive HRCS system³⁰ should probably be used in future.

Most trials were conducted in line with protocol and followed both the study framework and the planned type of comparison. In six trials, the number of arms changed but it could not be ascertained if those changes had been agreed with the programme. These trials were all in the early years of the programme.

The number of primary outcomes changed in 27 trials (half increased and half decreased). Changes were mainly in early trials. The outcome used to plan the sample size (the most important primary outcome) was unchanged in 106 (82%) trials.

The time point at which the primary outcome was measured was not specified in 45% (55/123) of proposals and 40% (49/123) of monograph reports.

On average, trials needed about twice as many centres as planned to complete the study, reflecting the difficulties in recruiting.

Discussion

Strengths and weaknesses of the study

Overall, sufficient data existed for the trials to be assessed against a selection of core CONSORT criteria. Comparison of planned and reported analyses showed the HTA trials reporting more faithfully to protocol than the cohort examined by Chan et al.¹⁰

The HRCS classification of interventions of the trial proved slightly more comprehensive than Chalmers’ classification and should therefore be adopted. Further work is required on the classification of controls.

Recommendations for future work

Any such further work should include 14 questions, eight to remain as they are (T2.2, T2.3, T2.4, T2.5, T2.7, T2.8, T2.12 and T2.13) and six to be amended (T2.1, T2.6, T2.9, T2.11, T2.14 and T2.15).

Unanswered questions and future research

Should similar work be continued, the HTA programme might usefully clarify:

the extent to which the programme wishes to assess the extent of compliance with CONSORT
the importance it attaches to trials classifying the control group
how it wishes to classify cluster trials
the number of primary outcomes.

Copyright © Queen’s Printer and Controller of HMSO 2015. This work was produced by Raftery et al. under the terms of a commissioning contract issued by the Secretary of State for Health. This issue may be freely reproduced for the purposes of private research and study and extracts (or indeed, the full report) may be included in professional journals provided that suitable acknowledgement is made and the reproduction is not associated with any form of advertising. Applications for commercial reproduction should be addressed to: NIHR Journals Library, National Institute for Health Research, Evaluation, Trials and Studies Coordinating Centre, Alpha House, University of Southampton Science Park, Southampton SO16 7NS, UK.

Included under terms of UK Non-commercial Government License.

Bookshelf ID: NBK274328

Contents