Search Strategy
Literature searches were conducted in PubMed, Embase, the Cochrane Central Register of Controlled Trials and the Cochrane Database of Systematic Reviews on 16 February 2017. An additional search was conducted in the Cumulative Index to Nursing and Allied Health Literature on 4 April 2017. The searches yielded 11 196 citations. Additional manual searches for existing systematic reviews were conducted on the Cochrane website and at https://guidelines.gov/.
Independent duplicate screening of citations resulted in preliminary acceptance of 454 primary articles and 41 existing systematic reviews. After full text assessment, 195 randomized controlled trials (RCTs) were considered eligible for one or more of the PICO questions; of these 129 had been included in 19 existing systematic reviews (1–19). The original plan was to rely fully on the existing systematic reviews for study descriptions, results data and assessment of study methodological quality (risk of bias). However, accessible data from the existing systematic reviews were generally too incomplete or poorly reported to allow this approach; in addition, the systematic review team found many instances of incorrect data or data that they could not find in the original study articles. Therefore, for the vast majority of primary studies from existing systematic reviews, the review team obtained data from the original publications.
Assessment of Study Quality and Methods of Review Synthesis
The methodological quality of study was assessed with the Cochrane risk of bias tool. However, when existing systematic reviews provided study-level quality ratings, the systematic review team used those, regardless of the quality assessment method used. For the evidence profiles, the team conducted two additional steps to allow determination of overall risk of bias, consistent with GRADE methodology, as follows (20):
First, the overall quality of each RCT was determined.
If a study had a high risk of bias due to inadequate randomization or allocation concealment methodology, the study was deemed to have very serious limitations.
If randomization and allocation concealment methodologies were low risk of bias (or unclear due to inadequate reporting) but the studies did not mask outcome assessors or they had high attrition rates (or a high percentage of study participants not analysed) or there was evidence of selective outcome reporting or there was an important other potential bias, the study was rated overall as having serious limitations.
However, if the study had two or more of these limitations, it was deemed to have very serious limitations.
Otherwise, studies were rated as having no serious limitations.
Studies could have different overall study quality assessments for different outcomes (e.g. if there was high attrition for only one outcome of interest).
Second, for each outcome within an evidence profile, the risks of bias of all studies were assessed together.
If more than half the studies (or the larger, dominant studies) were deemed to have very serious limitations, then the overall evidence base was also deemed to have very serious limitations.
If this was not the case, but more than half the studies (or the larger, dominant studies) were deemed to have serious (or very serious) limitations, then the overall evidence base was deemed to have serious limitations.
Otherwise the evidence base was deemed to have no serious limitations.
Study findings were assessed for consistency primarily of direction of effect, with lesser emphasis on magnitude of effect and minimal emphasis on differences in statistical significance. When meta-analysis was conducted, the statistical heterogeneity of treatment effect was assessed with the statistical significance of the heterogeneity and the I-squared statistic. However, if the direction of effect was consistent across studies, the heterogeneity of the actual effect size alone did not yield a determination of inconsistent.
Given the strict eligibility criteria, the generalizability of all eligible trials was deemed to be directly applicable to adults (or adolescents) with cancer pain. Studies of non-applicable populations were not included. Consequently, assessment of indirectness was based primarily on whether the outcomes being assessed were directly relevant to the outcome of interest. The primary reasons for downgrading based on indirectness related to studies that assessed pain outcomes that were not full (or near-full) pain relief but were only a decrease in pain scores (e.g. by 2 points out of 10). Some that included quality-of-life and functional outcome measures were also downgraded if they were deemed to be inadequate measurement tools. Ideally, these indirect outcomes or measures were not included but, where there was limited direct evidence, the systematic review team included them.
The evidence was downgraded for imprecision based mostly on small sample size (for continuous outcomes) with an arbitrary total sample size (across arms and studies) of 300 as a threshold and, separately, wide confidence intervals in relation to the measure (or scale). However, if a small study provided a precise estimate, the evidence was not downgraded.
Other considerations were noted. The main ones were used where there was only a single study evaluating a given outcome for a given question. The accuracy of a single study’s estimate of an effect size requires corroboration before it can be considered to be adequate evidence to make a clinical decision with any confidence. If a study is large (i.e. well-powered), rigorously conducted, and the outcome evaluated as a primary outcome, then the study may provide higher strengths of evidence.
Where feasible, the systematic review team conducted meta-analyses of categorical and continuous data when there were at least two trials with the same comparisons. The systematic review team was liberal in what it allowed for meta-analysis, taking account of the nature of the review questions. The review team ignored cancer types or other differences in study populations and differences in follow-up durations. The team combined sets of interventions, such as all bisphosphonates or all opioids; it also ignored differences in doses, routes, strengths and other related factors. For categorical outcomes the review team mostly ignored differences in outcome definitions (such as pain relief being complete [“no pain”] or great [e.g. <3/10 on a visual analogue scale]). For categorical outcomes, the team calculated or meta-analysed the risk ratio (RR). The direction of the RR was determined by the outcome being assessed (i.e. for “good” outcomes – e.g. pain relief – higher RR favours the intervention over control; for “bad” outcomes – e.g. skeletal-related events (SREs) – lower RR favours the intervention). Absolute differences were based on meta-analysed risk ratios and meta-analysed control rates.
For continuous measures of pain, quality of life and functional outcomes, the systematic review team first converted the reported measures to uniform scales of 0 to 100. Following standard convention, for pain control 100 = worst pain, and for quality of life and functional outcomes 100 = best status. When necessary, reported scales were reversed to ensure uniform directionality. Other continuous outcomes (e.g. time) were meta-analysed only if comparable units could be used across studies (e.g. studies reporting pain relief in hours were not meta-analysed with studies reporting pain relief in days).
Methods for the network meta-analyses for certain systematic review questions are discussed in Annex 7.