NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.
Nelson HD, Fu R, Humphrey L, et al. Comparative Effectiveness of Medications To Reduce Risk of Primary Breast Cancer in Women [Internet]. Rockville (MD): Agency for Healthcare Research and Quality (US); 2009 Sep. (AHRQ Comparative Effectiveness Reviews, No. 17.)
This publication is provided for historical reference only and the information may be out of date.
Comparative Effectiveness of Medications To Reduce Risk of Primary Breast Cancer in Women [Internet].
Show detailsAppendix C-1. Quality Rating Criteria* and Applicability Assessment with PICOTS
Quality Rating Criteria
Randomized Controlled Trials (RCTs) and Cohort Studies
Criteria:
- Initial assembly of comparable groups: RCTs—adequate randomization, including concealment and whether potential confounders were distributed equally among groups; cohort studies—consideration of potential confounders with either restriction or measurement for adjustment in the analysis; consideration of inception cohorts
- Maintenance of comparable groups (includes attrition, cross-overs, adherence, contamination)
- Important differential loss to follow-up or overall high loss to follow-up
- Measurements: equal, reliable, and valid (includes masking of outcome assessment)
- Clear definition of interventions
- Important outcomes considered
- Analysis: adjustment for potential confounders for cohort studies, or intention-to-treat analysis for RCTs; for cluster RCTs, correction for correlation coefficient
Definition of ratings based on above criteria:
Good: Meets all criteria: Comparable groups are assembled initially and maintained throughout the study (follow-up at least 80 percent); reliable and valid measurement instruments are used and applied equally to the groups; interventions are spelled out clearly; important outcomes are considered; and appropriate attention to confounders in analysis.
Fair: Studies will be graded “fair” if any or all of the following problems occur, without the important limitations noted in the “poor” category below: Generally comparable groups are assembled initially but some question remains whether some (although not major) differences occurred in follow-up; measurement instruments are acceptable (although not the best) and generally applied equally; some but not all important outcomes are considered; and some but not all potential confounders are accounted for.
Poor: Studies will be graded “poor” if any of the following major limitations exists: Groups assembled initially are not close to being comparable or maintained throughout the study; unreliable or invalid measurement instruments are used or not applied at all equally among groups (including not masking outcome assessment); and key confounders are given little or no attention.
Studies of Risk Assessment Tools
Adapted from the United States Preventive Services Task Force Quality Rating Criteria for Diagnostic Accuracy Studies
Criteria:
- Risk assessment tool appropriate for a primary care screening tool
- Tool evaluates diagnostic test performance in a population other than the one used to derive the instrument
- Study evaluates a consecutive clinical series of patients or a random subset
- Study adequately describes the population in which the risk instrument was tested
- Study adequately describes the instrument evaluated
- Study includes appropriate criteria in the instrument (must include age, family history and/or some other measure of risk)
- Study adequately describes the method used to calculate the risk index
- Study uses appropriate criterion to assess the risk factors (uses either a validated questionnaire or other corroborated method)
- Study evaluates outcomes or the reference standard in all patients enrolled (up to 20% loss considered acceptable)
- Follow up with standard diagnostic testing (mammogram/biopsy/pathology) performed consistently without regard for the results of the risk assessment
- Study evaluates outcomes blinded to results of the screening instrument
Definition of ratings based on above criteria:
Good: Evaluates relevant screening test appropriate for primary care setting; risk instrument is validated in a population other than the one used to derive the instrument; risk instrument adequately described; uses an appropriate reference standard (eg. SEER data); handles indeterminate results in a reasonable manner; broad spectrum of patients and adequate number of incident cases; use of primary data; appropriate duration of follow up and standardized diagnostic screening in follow up (mammogram).
Fair: Evaluates relevant available screening test; moderate sample size; medium spectrum of patients; risk instrument not validated in a population other than the one used to derive the instrument; handling of indeterminate results not reported or inadequate; inadequate follow up - either inadequate duration or inconsistent use of standardized diagnostic screening (mammogram); instrument not derived from primary data.
Poor: Has important limitations such as inappropriate reference standard, very small sample size, very narrow spectrum of patients; not appropriate for primary care.
Applicability Assessment with PICOTS: Limitations that Reduce Applicability
Population:
- Narrow eligibility criteria and/or high exclusion rate.
- Large differences between demographics of study population and that of patients in the community.
- Narrow or unrepresentative severity or stage of illness.
- Run in period with high-exclusion rate for non-adherence or side effects.
- Event rates much higher or lower than observed in population-based studies.
- Study size too small to represent the population of interest.
Intervention:
- Doses or schedules not reflected in current practice.
- Intensity of behavioral interventions that is not likely to be feasible for routine use.
- Co-interventions that are likely to modify effectiveness of therapy.
- Monitoring practices or visit frequency not used in typical practice.
- Highly selected intervention team or level of training/proficiency not widely available.
Comparator:
- Inadequate dose of comparison therapy.
- Use of sub-standard alternative therapy.
Outcomes:
- Surrogate rather than clinical outcomes.
- Failure to measure most important outcomes.
- Failure to distinguish minor from serious adverse effects.
Timing of Outcomes Measurement:
- Follow-up too short to detect important benefits or harms.
- Lack of long-term follow-up for interventions requiring long-term interventions.
Setting:
- Settings where standards of care differ markedly from setting of interest.
- Specialty population or level of care that differs importantly from that seen in primary care.
Appendix C-2. EPC GRADE Domains and Definitions for Assessing the Strength of Evidence
Domain | Definition and Elements | Score and Application |
---|---|---|
Risk of Bias | Risk of bias is the degree to which the included studies for a given outcome or comparison have a high likelihood of adequate protection against bias (i.e., good internal validity), assessed through two main elements:
| Use one of three levels of aggregate risk of bias:
|
Consistency | The principal definition of consistency is the degree to which reported effect sizes from included studies appear to have the same direction of effect. This can be assessed through two main elements:
| Use one of three levels of consistency:
|
Directness | The rating of directness relates to whether the evidence links the interventions directly to health outcomes. For a comparison of two treatments, directness implies that head-to-head trials measure the most important health or ultimate outcomes. Two types of directness, which can coexist, may be of concern: Evidence is indirect if:
Directness may be contingent on the outcomes of interest. EPC authors are expected to make clear the outcomes involved when assessing this domain. | Score dichotomously as one of two levels directness
|
Precision | Precision is the degree of certainty surrounding an effect estimate with respect to a given outcome (i.e., for each outcome separately) If a meta-analysis was performed, this will be the confidence interval around the summary effect size. | Score dichotomously as one of two levels of precision:
|
Printed from: Lohr K, Helfand M, Owens D, et al. Grading the strength of a body of evidence. J Clin Epidemiol in press. [PubMed: 19595577]
Appendix C-3. EPC GRADE Criteria for Assigning Strength of Evidence
Grade | Definition |
---|---|
High | High confidence that the evidence reflects the true effect. Further research is very unlikely to change our confidence in the estimate of effect. |
Moderate | Moderate confidence that the evidence reflects the true effect. Further research may change our confidence in the estimate of effect and may change the estimate. |
Low | Low confidence that the evidence reflects the true effect. Further research is likely to change the confidence in the estimate of effect and is likely to change the estimate. |
Insufficient | Evidence either is unavailable or does not permit estimation of an effect. |
Printed from: Lohr K, Helfand M, Owens D, et al. Grading the strength of a body of evidence. J Clin Epidemiol in press. [PubMed: 19595577]
Appendix C-4. Optional EPC GRADE Domains and Definitions for Assessing the Strength of Evidence
Domain | Definition and Elements | Score and Application | Explanation of Non-use in Report |
---|---|---|---|
Coherence | Coherence is the degree of plausibility of results in relation to epidemiology or, in some cases, biology and pathophysiology. | This additional domain does not need to be described or noted unless something “implausible” has emerged, in which case EPC authors should comment on it. Use one of two levels:
| No “implausible” findings emerged in this report. |
Dose-response association | This association, either across or within studies, refers to a pattern of a larger effect with greater exposure (dose, duration, adherence) | This additional domain should be rated if studies in the evidence base have noted levels of exposure. Use one of three levels:
| No multiple dose effects were tested in the trials included in this report. |
Impact of plausible residual confounders | Occasionally, in an observational study, residual confounders would work in the direction opposite that of the observed effect. A case in point is when a study is biased against finding an effect and yet it finds an effect. Thus, had these confounders not been present, the observed effect would have been even larger than the one observed. | This additional domain should be considered if a plausible impact of residual confounding exists. Use one of three levels:
| Few observational studies were included and had little impact in the GRADE table. |
Strength of association (magnitude of effect) | Strength of association refers to the likelihood that the observed effect is large enough that it cannot have occurred solely as a result of bias from potential confounding factors. | This additional domain should be considered if the effect size is particularly large. Use one of two levels:
| Effect sizes were not particularly large and came from well-designed RCTs. |
Publication bias | Publication bias indicates that studies may have been published selectively with the result that the estimated effect of an intervention based on published studies does not reflect the true effect. The finding that only a small proportion of relevant trials (or other studies) has been published or reported in a results database may indicate a higher risk of publication bias, which in turn may undermine the overall robustness of a body of evidence. | Publication bias need not be formally scored. However, it can influence ratings of consistency, precision, magnitude of effect (and, to a lesser degree, risk of bias and directness). If EPCs identify unpublished trials, and if those results differ from those of published studies, they can take these factors into account in their rating for consistency and in calculating a summary confidence interval for an effect. We encourage authors to comment on publication bias when circumstances suggest that relevant empirical findings, particularly negative or no-difference findings, have not been published or are not otherwise available. | No unpublished trials identified. Only very large, well known trials could provide the breast cancer outcomes needed for this report. |
Printed from: Lohr K, Helfand M, Owens D, et al. Grading the strength of a body of evidence. J Clin Epidemiol in press. [PubMed: 19595577]
Appendix C-5. Quality and Applicability Ratings of Included Trials
Trials author, year | Criteria for Quality | Rating/limitations | Criteria for Applicability | Quality rating for applicability | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Adequate randomization? | Blinding? | Maintenance of comparable groups? | Loss to follow-up? | Measures equal, reliable, valid? | Clear definition of interventions | Important outcomes considered? | Intention-to- treat analysis? | Population | Intervention | Comparator | Outcomes | Timing of outcomes measures | Setting | |||
Primary Prevention Trials | ||||||||||||||||
STAR Vogel, 200612 | Method not described | Yes | 68% tamoxifen, 72% raloxifene completed study | 1.5% loss tamoxifen; 1.3% raloxifene | Yes | Yes | Yes | Yes | Good | Increased risk for breast cancer; broad inclusion criteria | Appropriate | Appropriate | Appropriate | Appropriate | Multi-center, relevant to primary care | Good |
IBIS Cuzick, 200219 | Yes | Yes | 64% tamoxifen, 74% placebo completed study p<0.001; 25% completed 5 yrs | NR; assume all included in analysis | Yes | Yes | Yes | Yes | Fair; 40% estrogen use may confound | Increased risk for breast cancer; broad inclusion criteria | Appropriate | Appropriate | Appropriate | Appropriate | Multi-center, relevant to primary care | Good |
NSABP P-1 Fisher, 199824 | Yes | Yes | 76% tamoxifen, 80% placebo completed study | 1.6% loss in both groups | Yes | Yes | Yes | Yes | Good | Increased risk for breast cancer; broad inclusion criteria | Appropriate | Appropriate | Appropriate | Appropriate | Multi-center, relevant to primary care | Good |
Royal Marsden Powles, 199825 | Yes | Yes | 53% tamoxifen, 63% placebo completed study p<0.0005 | 11% loss in both groups | Yes | Yes | Yes | Yes | Fair; unequal use of estrogen in groups | Increased risk for breast cancer; broad inclusion criteria | Appropriate | Appropriate | Appropriate | Appropriate | Multi-center, relevant to primary care | Good |
Italian Veronesi, 199828 | Method not described | Yes | 69% tamoxifen 73% placebo completed study | <1% loss overall | Yes | Yes | Yes | Yes | Fair; hysterectomy, estrogen use may confound | Increased risk for breast cancer; prior hysterctomy | Appropriate | Appropriate | Appropriate | Appropriate | Multi-center, relevant to primary care | Fair; women in study have hysterectomy modifying risk |
RUTH Barret-Connor, 200646 | Yes | Yes | 80% raloxifene, 79% placebo completed study | NR; assume all included in analysis | Yes | Yes | Yes | Yes | Good | Heart disease or increased heart risk | Appropriate | Appropriate | Appropriate | Appropriate | Multi-center, relevant to primary care | Good |
MORE Cummings, 199934 | Yes | Yes | 78% raloxifene, 75% placebocompleted study | NR; assume all included in analysis | Yes | Yes | Yes | Yes | Good | Osteoporosis | Appropriate | Appropriate | Appropriate | Appropriate | Multi-center, relevant to primary care | Good |
LIFT Cummings, 200810 Ettinger, 200887 | Yes | Yes | 91% overall received 80% of doses | NR; assume all included in analysis | Yes | Yes | Yes | Yes | Good | Osteoporosis | Appropriate | Appropriate | Appropriate | Appropriate | Multi-center, relevant to primary care | Good |
Raloxifene Trials | ||||||||||||||||
Cohen, 2000*73 | Yes | Yes | Yes | 35% discontinued therapy | Yes | Yes | Yes but not all harms are reported | NR | Fair | Healthy women average risk | Appropriate | Appropriate | Appropriate | Appropriate | 2 Multi-center trials | Fair |
Delmas, 199774 | Yes | NR | Yes | NR | Yes | Yes | Yes but not all harms are reported | Yes | Fair | Healthy women | Appropriate | Appropriate | Appropriate | Appropriate | Multi-center; no US sites | Poor |
Goldstein, 200576 | Yes | Yes | Yes | 40% discontinued therapy | Yes | Yes | Yes but not all harms are reported | Yes | Fair | Healthy women with prior hysterectomy | Appropriate | Appropriate | Appropriate | Appropriate | Multi-center trial; includes US sites | Fair |
Johnston, 2000*77 | Yes | Yes | Yes | 23–42% | Yes | Yes | Yes but not all harms are reported | Yes | Fair | Healthy women | Appropriate | Appropriate | Appropriate | Appropriate | Multi-center trial; includes US sites | Fair |
Jolly, 2003*78 | Yes | No | Yes | NR | Yes | Yes | Yes but not all harms are reported | No | Poor; only includes those continuing therapy | Healthy women | Appropriate | Appropriate | Appropriate | Appropriate | Multi-center; includes US sites | Fair |
Lufkin, 1998†79 | Yes | Yes | NR | ~10% | Yes | Yes | Yes but not all harms are reported | Yes | Fair | Osteoporosis | Appropriate | Appropriate | Appropriate | Appropriate | Multi-center | Fair |
McClung, 200680 | Yes | Yes | NR | ~30% | Yes | Yes | Yes but not all harms are reported | NR | Fair | Healthy | Appropriate | Appropriate | Appropriate | Appropriate | Multi-center; includes US sites | Fair |
Meunier, 199981 | Yes | Yes | Yes | ~16% | Yes | Yes | Yes but not all harms are reported | Yes | Fair | Osteoporosis | Appropriate | Appropriate | Appropriate | Appropriate | Multi-center; France | Poor |
Morii, 200382 | Yes | Yes | Yes | ~15% | Yes | Yes | Yes but not all harms are reported | NR | Fair | Japan; osteoporosis narrow inclusion criteria | Appropriate | Appropriate | Appropriate | Appropriate | Multi-center; Japan | Poor |
Nickelson, 1999†83 | NR | Yes | Yes | 9.1% discontinued | Yes | Yes | Yes but not all harms are reported | Yes | Fair | Osteoporosis | Appropriate | Appropriate | Appropriate | Appropriate | 2 centers; US | Fair |
Palacios, 200484 | Yes | Yes | Yes | 11–13% | Yes | Yes | Yes but not all harms are reported | Yes | Fair | Healthy women | Appropriate | Appropriate | Appropriate | Appropriate | Multi-center; no US sites | Poor |
Walsh, 199885 | Yes | Yes | Yes | 16% | Yes | Yes | Yes but not all harms are reported | Yes | Fair | Health women | Appropriate | Appropriate | Appropriate | Appropriate | Multi-center; includes US sites | Fair |
Tibolone Trials | ||||||||||||||||
OPAL; Bots, 200189; Langer, 200690 | Yes | Yes for treatment group; NR for other outcomes | Yes | No; 31% tx, 30% placebo | Yes | Yes | Yes | Yes | Fair | Healthy | Appropriate | Appropriate | Appropriate | Appropriate | Multi-center; includes US sites | Fair |
Landgren, 200291 | Yes | NR | Yes | No; 11% tx, 20% placebo | Yes | Yes | Yes | NR | Fair | Healthy; vasomotor symtoms | Appropriate | Appropriate | Appropriate | Appropriate | Multi-center; no US sites | Poor |
Gallagher, 200192 | Yes | Yes for treatment group; NR for other outcomes | Yes | No; 34% tx, 29% placebo | Yes | Yes | Yes | Yes | Fair | Healthy | Appropriate | Appropriate | Appropriate | Appropriate | Multi-center; US | Fair |
Swanson, 200693 | Yes | NR | Yes | No | Yes | Yes | Yes | Yes | Fair | Healthy; vasomotor symtoms | Appropriate | Appropriate | Appropriate | Appropriate | Multi-center; US | Poor |
Hudita, 200394 | NR | NR | Yes | No | Yes | Yes | Yes | No | Poor | Healthy; symptoms | Appropriate | Appropriate | Appropriate | Appropriate | 1 Center; Romania | Poor |
Onalan, 200596 | Yes | NR | NR | No; 18% tx, 9% placebo | Yes | Yes | Yes | No | Poor | Healthy | Appropriate | Appropriate | Appropriate | Appropriate | 1 Center; Turkey | Poor |
Lundstrom, 200295 | Yes | NR | Yes | No | Yes | Yes | Only breast density | No | Fair | Healthy | Appropriate | Appropriate | Appropriate | Appropriate | 1 Center; Sweden | Poor |
Million Women Study Beral, 200398; Beral, 200597 | NA | NA | NA | No | Yes | Yes | Yes | NA | Fair | Healthy; symptoms | Appropriate | Appropriate | Appropriate | Appropriate | Multi-center | Poor |
Appendix C-6. Quality of Risk Assessment Tools
Quality Criteria | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Study | Primary care tool? | Tested in secondary population? | Population adequately described? | Instrument adeqauately described? | Appropriate criteria? | Risk calculation adequately described? | Results appropriately handled? | Reference standard? | Adequate sample size? | Adequate duration of follow up? | Quality Criteria |
Gail, 198949 | Yes | No* | Yes | Yes | Yes | Yes | Yes | No* | Yes | Yes | Good |
Costantino, 1999124 | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Good |
Rockhill, 2001122 | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Good |
Chlebowski, 2007125 | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Good |
Gail M, 2007126 | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Good |
Adams-Campbell, 2007127 | Yes | Yes | Yes | Yes | Yes | Yes | NR | Yes | Yes | Yes | Good |
DeCarli, 2006121 | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Good |
Boyle, 2004118 | Difficult† | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Fair |
Chen, 2006128 | Yes | No* | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Good |
Barlow, 2006129 | Yes | No* | Yes | Yes | Yes | Yes | Yes | Yes | Yes | No | Fair |
Tice, 2008130 | Yes | No* | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Good |
Rockhill, 2003131 | Yes | Yes | Yes | Yes | Yes | Yes | Yes | NR | Yes | Yes | Good |
Colditz, 2000119 | Yes | No* | Yes | Yes | Yes | Yes | Yes | NR | Yes | Yes | Good |
Colditz, 2004120 | Yes | Yes | Yes | Yes | Yes | Yes | Yes | NR | Yes | Yes | Good |
Tyrer, 2004123 | Yes | No* | No* | Yes | No‡ | Yes | Yes | Yes | Yes | NR | Fair |
Amir, 2003132 | Yes | Yes | No§ | Yes | Yes | Yes | Yes | Yes | No | Yes | Fair |
- *
Appropriate due to study purpose.
- †
Logistically difficult due to an extensive dietary questionnaire.
- ‡
Tyrer, 2004 did not use primary data.
- §
Amir, 2003 did not use a primary care population.
Footnotes
- *
Reference: Harris RP, Helfand M, Woolf SH, et al. Current methods of the US Preventive Services Task Force: a review of the process. Am J Prev Med. 2001:20(3S); 21–35.
- Appendix C-1. Quality Rating Criteria and Applicability Assessment with PICOTS
- Appendix C-2. EPC GRADE Domains and Definitions for Assessing the Strength of Evidence
- Appendix C-3. EPC GRADE Criteria for Assigning Strength of Evidence
- Appendix C-4. Optional EPC GRADE Domains and Definitions for Assessing the Strength of Evidence
- Appendix C-5. Quality and Applicability Ratings of Included Trials
- Appendix C-6. Quality of Risk Assessment Tools
- Quality and Strength of Evidence Criteria and Rating - Comparative Effectiveness...Quality and Strength of Evidence Criteria and Rating - Comparative Effectiveness of Medications To Reduce Risk of Primary Breast Cancer in Women
- References to Appendixes - Multidisciplinary Postacute Rehabilitation for Modera...References to Appendixes - Multidisciplinary Postacute Rehabilitation for Moderate to Severe Traumatic Brain Injury in Adults
Your browsing activity is empty.
Activity recording is turned off.
See more...