Systematic Reviews and Meta-Analyses

Suzanne West; Valerie King; Timothy S Carey; Kathleen N Lohr; Nikki McKoy; Sonya F Sutton; Linda Lux

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

West S, King V, Carey TS, et al. Systems to Rate the Strength Of Scientific Evidence. Rockville (MD): Agency for Healthcare Research and Quality (US); 2002 Apr. (Evidence Reports/Technology Assessments, No. 47.)

This publication is provided for historical reference only and the information may be out of date.

This publication is provided for historical reference only and the information may be out of date.

Systems to Rate the Strength Of Scientific Evidence.

Show details

Contents

< Prev Next >

Systematic Reviews and Meta-Analyses

Literature Searches

Searches need to be comprehensive to assure that all relevant studies are included in a systematic review. Searches that rely on computerized databases such as Medline© are not likely to find all relevant studies.¹⁴⁹ Related issues are those of publication bias and country of origin of the study.

Publication Bias

Publication bias refers to the phenomenon that "positive studies" (e.g., studies that find a particular therapy works) are more likely to be published than "negative studies" (which do not find that the therapy is effective); unpublished studies are difficult to locate.^150-152 Studies funded by the pharmaceutical industry may be published less often than studies with other sources of funding -- a type of publication bias.¹⁵¹ Thus, a systematic review or meta-analysis of only the published studies may be misleading, producing a more favorable summary estimate than would have occurred if the entire body of literature was summarized, including published and unpublished works.

Language and Country of Origin

For a variety of reasons including cost and simplicity, many searches are often restricted to English language only. Moher and colleagues found no significant differences in completeness of reporting of key study elements for Randomized Controlled Trials (RCTs) published in English versus other languages.¹⁵³ Another study by Moher et al.¹⁵⁴ found no evidence that language-restricted meta-analyses were biased in terms of estimates of efficacy, but adding non-English RCTs did yield more precise estimates of effect.

For at least some types of studies, the results of the study reflect where the study was conducted. Vickers et al. found that trials of acupuncture from China, Japan, Hong Kong, Taiwan, and USSR/Russia were positive in all but one case.¹⁵⁵ Studies of interventions other than acupuncture originating from these countries were also overwhelmingly likely to find a positive effect of the intervention. Most experts believe that this pattern is a form of publication bias as discussed above. However, how a body of literature that contains studies from these countries should be handled in a systematic review is not clear. Our criterion in Table 7 specified that if investigators restrict their searches on the basis of language or country of origin, then they should provide some justification for this decision.

Masking (Blinding) of Reviewers

Evidence is conflicting about whether masking quality assessment reviewers to the authors of the study minimizes bias in a systematic review. Jadad et al. found that quality scores were lower and more consistent when reviewers were masked,³⁴ but Moher et al. found that quality scores were higher with masked quality assessment.⁴¹ Two other methodological studies have found that quality scores did not differ significantly when reviewers were masked compared with open assessment.^95,156 A third study found no effect of reviewer masking on the summary measure of effect in meta-analysis.¹⁵⁷ Overall, we concluded that the evidence was insufficient to substantiate reviewer masking as a necessary and empirically supported quality element.

Quality Assessment

Some type of quality assessment of the individual studies that go into a systematic review is needed; however, the techniques for assessing study quality have not been well defined and there is conflicting evidence among the studies addressing this issue. Emerson and colleagues did not find that differences between treatments were related either to quality scores using the Chalmers scale or to results using an individual quality components approach.¹⁵⁸

A study of quality assessment for RCTs comparing standard versus low molecular weight heparin (LMWH) to prevent post-operative thrombosis (DVT) by Juni and colleagues provided evidence that quality assessment scales weight components of quality differently.² They applied 25 different scales to each of the 17 RCTs in the meta-analysis and found that the summary relative risk for each scale differed, depending on whether high quality or low quality scales were evaluated. Whether LMWH was superior to regular heparin depended on which quality scale was used and the actual quality score. Using meta-regression techniques, they performed a component-only analysis that focused on randomization, allocation concealment, and handling of withdrawals, showing that these quality components were not significantly associated with treatment effect. However, masking of outcome assessment is a critical quality component when comparing LMWH and regular heparin because tests to detect DVT are somewhat subjective.

Khan and colleagues reported that lower quality studies were more likely to find a positive effect of fertility treatment whereas higher quality studies did not.³⁵ An extensive methodological study by Moher et al. also found that meta-analyses using only low-quality RCTs had significantly higher effect estimates that meta-analyses using only high-quality studies.⁴¹ Moher and colleagues found that, on average, low-quality RCTs found a 52% treatment benefit whereas high-quality studies found only a 29% benefit. Moher's study, which cuts across types of interventions and fields of medicine, offers the strongest evidence on this topic.

Although no one scale is likely to provide the best quality assessment in all cases, some aspects of study design, conduct, and analysis are related to study bias, and these quality items should be assessed as part of the process of conducting a systematic review or meta-analysis. However, we acknowledge that there is more empirical evidence supporting these quality components from the RCT literature, some of which was addressed in our discussion above and will be supported in the following section on empirical evidence relating to RCTs.

Heterogeneity

One reason that apparently similar studies do not find similar results is the degree of heterogeneity among them. Heterogeneity refers to differences in estimates of effect that are related to particular characteristics of the population or intervention studied. Thompson evaluated meta-analyses for cardiac and cancer outcomes and studies of cholesterol lowering.¹⁵⁹ He found that the conclusions of meta-analyses might differ if heterogeneity (due to such factors as age of study participants or duration of treatment) is not considered. This study supports what has long been considered "good practice" for systematic reviews, that a careful assessment of the similarities and differences among studies should be undertaken before studies are combined in a systematic review or meta-analysis. Statistical pooling of study results using meta-analytic techniques may not be advisable when substantial heterogeneity is present, but heterogeneity may provide important clues to explain treatment variation among subgroups of the population.¹⁵⁷

Funding and Sponsorship

We found sufficient empirical evidence that funding and sponsorship of systematic reviews was related to the reporting of treatment effect. Barnes and Bero reported that systematic reviews of observational studies of the effects of passive tobacco smoke exposure were more likely not to find an adverse health effect if the authors had affiliations with the tobacco industry.³ A similar study by Stelfox and colleagues found that authors with financial affiliations to the pharmaceutical industry were significantly more likely to endorse the safety of calcium channel blockers.¹¹⁰ However, we do not support the view that the results of studies where authors received support from non-government sources are inherently biased. Rather, we believe that the important principle is whether the authors of a study have competing interests sufficient to bias the results of the study -- financial relationships are clearly only one such potential competing interest.

Bookshelf ID: NBK33871

Contents

< Prev Next >

PubReader
Print View
Cite this Page
West S, King V, Carey TS, et al. Systems to Rate the Strength Of Scientific Evidence. Rockville (MD): Agency for Healthcare Research and Quality (US); 2002 Apr. (Evidence Reports/Technology Assessments, No. 47.) Systematic Reviews and Meta-Analyses.