U.S. flag

An official website of the United States government

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Crandall CJ, Newberry SJ, Diamant A, et al. Treatment To Prevent Fractures in Men and Women With Low Bone Density or Osteoporosis: Update of a 2007 Report [Internet]. Rockville (MD): Agency for Healthcare Research and Quality (US); 2012 Mar. (Comparative Effectiveness Reviews, No. 53.)

  • This publication is provided for historical reference only and the information may be out of date.

This publication is provided for historical reference only and the information may be out of date.

Cover of Treatment To Prevent Fractures in Men and Women With Low Bone Density or Osteoporosis: Update of a 2007 Report

Treatment To Prevent Fractures in Men and Women With Low Bone Density or Osteoporosis: Update of a 2007 Report [Internet].

Show details

Methods

Topic Development

The topic for the original report was nominated in a public process involving input from technical experts and the AHRQ Effective Health Care Program. For this update, a new technical expert panel reviewed the key questions that guided the original report and suggested modifications as well as the addition of a new question. After approval from AHRQ, these revised questions were posted to a public Web site to permit public comment. Comments were reviewed by the research team and the technical expert panel; although no changes were made to the questions (except to clarify the parameters of long-term treatment), the comments are addressed within this report.

Search Strategy

As described in the first report14 we used a three-pronged approach to searching for relevant literature. First, we conducted three main searches. Our basic search strategy used the National Library of Medicine's Medical Subject Headings (MeSH) key word nomenclature developed for MEDLINE® and adapted for use in the other databases. Using the same basic search rules used for the original report (with the addition of several new terms for additional drugs), we searched MEDLINE® for the period from January 2005 to March 2011. We also searched Embase, the American College of Physicians (ACP) Journal Club database, the Cochrane controlled trials register, and relevant pharmacological databases. For the drugs not included in the original report, we also rescreened titles from the searches conducted for that report and mined references from articles identified in the update searches.

In searching for efficacy and effectiveness studies, we used terms for osteoporosis, osteopenia, low bone density, and the drugs listed in Key Question 1. In our search for the key adverse events (AE), we used terms for the AE and each of the drugs of interest. In our search for studies of adherence and persistence, we used terms for adherence and persistence and the drugs of interest. In all cases, both generic and trade names were used. In our search for studies on the effects of monitoring, we searched on terms related to monitoring and DXA in combination with the drugs of interest.

Searches for all KQ15 commenced from 2006. For new drugs, we reviewed the list of excluded studies from the original report to retrieve articles that had been rejected on the basis of drugs that were now included within the scope of the update, to find studies prior to 2006. The search was not limited to English-language publications and not limited by study design (e.g., reports of randomized controlled trials (RCT), observational studies, systematic reviews). The texts of the major search strategies are given in Appendix A.

To identify additional systematic reviews not captured in our primary search strategy, we also searched MEDLINE®, the Cochrane Database of Systematic Reviews, the websites of the National Institute for Clinical Excellence, and the NHA Health Technology Assessment Programme. We also manually searched the reference lists of review articles obtained as part of our search (“reference mining.”)

To augment those searches, the EPC's Scientific Resource Center (SRC), which provides a variety of scientific support services for the comparative effectiveness reviews, conducted several “grey literature” searches for us. First, they conducted a search of relevant trials in the NIH Clinical Trials database. For completed clinical trials of interest, we noted any reported publications; if no publications were mentioned, we searched MEDLINE® for published results. All such publications were checked against the results of our MEDLINE® searches. Second, they searched the Web of Science to identify abstracts presented at relevant meetings; although we would not include meeting abstracts in the report, we identified relevant abstracts and searched MEDLINE® for peer-reviewed publications of the results. Finally, the SRC searched the FDA Medwatch and Health Canada files for warnings and changes in indications.

For the third prong of our approach, we identified any relevant systematic reviews that have appeared since the original report was released and added the pooled findings of new meta-analyses to the tables of pooled results created for the original report.

Study Eligibility Criteria

Populations: Studies were limited to those recruiting adults over 18 (not children); healthy adults, those with low bone density, or those with osteoporosis (but not those with Paget's disease, cancer, or any other disease of bone metabolism); those using drugs indicated for the treatment of osteoporosis (but not if the drugs were being used to treat cancer); adults who had low bone density or were at high risk of developing low bone density as a result of chronic use of glucocorticoids (GC) or a condition associated with the chronic use of glucocorticoids (such as asthma, organ transplant, rheumatoid arthritis); adults who had low bone density or were at high risk of developing low bone density as a result of having a condition associated with low bone density (e.g., rheumatoid arthritis, cystic fibrosis, Parkinson's disease).

Interventions: Studies were included if they examined pharmacological interventions for prevention or treatment of osteoporosis approved (or expected to be soon approved for use in the United States) or if they assessed the effects of calcium, vitamin D, or physical activity.

Comparators: Studies included for assessing effectiveness were those that compared the effects of the intervention in question to that of placebo or another potency or dosing schedule for the same agent or another agent in the same or another class.

Outcomes: For effectiveness analysis, only studies that assessed vertebral, hip, and/or total fractures (and did not state that they were not powered to detect a change in risk for fracture) were included. Studies that reported fracture as an adverse event were excluded from effectiveness analysis because the way that adverse events are typically ascertained does not ensure systematic identification of these events across or even within study groups; however, fractures reported as adverse events for example atypical (low-stress subtrochanteric or femur) fractures, were included in the adverse event analysis.

Duration: Studies that had a minimum followup time of 6 months were included.

Design: Only RCTs and published systematic reviews of RCTs that met inclusion criteria were included in the assessment of effectiveness; however, for the assessment of effects in subgroups for which no RCTs were available, for the assessment of the effect of adherence on effectiveness, and for the assessment of particular serious adverse events, large (more than 1,000 participants) observational studies and systematic reviews were included.

Study Selection

Each title list was screened separately by two reviewers with clinical training and experience in systematic review to eliminate obviously irrelevant titles e.g., a study pertaining to treatment of Paget's disease or a study of dietary calcium requirements in children. Abstracts were obtained for all selected titles. Full text articles were then obtained for all selected abstracts. The reviewers then conducted a second round of screening, using a specially designed screening form (Appendix B) to ascertain which articles met the inclusion criteria and would go on to data abstraction. Selections at this stage were reconciled, and disagreements were settled by consensus (with the project leaders resolving remaining disagreements).

During the second round of screening, we imposed inclusion criteria based on the particular key question(s) addressed by the study. For effectiveness/efficacy questions (KQ1, 2, and 5), we accepted any abstracts that indicated the manuscript might include information on the treatment/prevention of osteoporotic fracture (but not bone density alone). Controlled clinical trials and large observational studies (N>1,000) that reported fracture outcomes for one or more of the drugs of interest were accepted for the efficacy analysis and went on to data extraction.

For assessing comparative effectiveness, we included only studies that compared two or more interventions within the same study, rather than attempting to compare treatment effects across studies. The differences in study design and baseline participant characteristics between studies would make interpretation of such comparisons suspect.

For KQ2, we identified studies that analyzed treatment efficacy and effectiveness by subgroups in several different ways. First, during the initial screening of full-text articles, we noted any articles that reported the results of post hoc analyses of trial efficacy data by a subgroup of interest (e.g., age, sex, menopausal status, comorbidity such as prior or concurrent treatment with glucocorticoids, presence or absence of prevalent fractures, baseline T-score, lag time between hip fracture and treatment initiation). In some cases, these articles analyzed pooled data from multiple studies. Second, while extracting primary effectiveness results from clinical trial reports and large observational studies (over 1,000 participants), we assessed whether any subgroup analyses were reported and extracted those data separately. To ensure no subgroup analyses were missed, we rescreened all articles that included any subgroup of interest to assess whether data were reported for those particular subgroups. Finally, we sought observational studies of any size that assessed effects of the agents of interest in populations not well represented in controlled trials and included reports of post hoc analyses and open-label extensions of trials. As with the head-to-head comparisons for KQ1, we did not attempt to compare treatment effects across studies because of the vast baseline differences between populations in characteristics considered to be potentially important, such as average age, body mass index, and race/ethnicity.

For KQ3 (adherence), articles of any study design that reported rates of adherence/persistence, factors influencing adherence/persistence, or the effects of adherence on effectiveness for any of the drugs of interest were included for further evaluation.

For KQ4 (adverse events), any articles were accepted if they suggested that the manuscript included information on the relationship between the adverse event and the drug. Controlled clinical trials and large case control or cohort studies (n > 1,000) that reported fracture or BMD or markers of bone turnover for one or more of the drugs of interest and that reported one or more AE, as well as studies of any design that described any of a number of rare adverse events (e.g., osteonecrosis of the jaw, atrial fibrillation, low stress subtrochanteric and femur fracture) in association with any of the drugs of interest, were initially included in adverse event analyses.

For KQ5 (Effects of Monitoring and Long-term Use), to ensure we identified all articles that examined the effect of bone density monitoring in predicting treatment effectiveness or efficacy, we searched for these articles in the following ways. During the initial screening of articles, we included any clinical trials that reported fracture results and mentioned monitoring. We also included any trials that reported both BMD and fracture and subsequently assessed whether changes in BMD were compared to fracture outcomes. Where they existed, we also included reports of followups to trials included in the original report to assess the effect of long-term use.

Data Extraction

Using forms specially created for each study design, we extracted the following data. From included trials, we extracted study name (if named trial); setting (treatment and/or residential, e.g., long-term care facilities); population characteristics (including sex, age, race/ethnicity, diagnosis [osteoporosis/low bone density], comorbidities); eligibility and exclusion criteria; interventions (dose and duration); participant numbers screened, eligible, enrolled, and lost to followup; method and schedule of outcome ascertainment; description and adequacy of randomization and blinding; description and adequacy of concealment of allocation; funding source and role of funder; monitoring of adherence/persistence and cross-over; and results for each outcome. From observational studies, we extracted study name (if named trial); setting; population characteristics (including sex, age, ethnicity, diagnosis, comorbidities); eligibility and exclusion criteria; interventions (dose and duration); recruitment method; numbers screened, eligible, enrolled, and lost to followup; method and schedule of outcome or diagnosis ascertainment; funding source and role of funder; monitoring of adherence and contamination; method of adjustment for confounders; and results for each outcome. For studies of adherence, we extracted, in addition to the above, whether measures included adherence, compliance, and/or persistence; the method of assessment of adherence; barriers to adherence; and effects of adherence on fracture risk.

Data Synthesis

We performed three main analyses: one to evaluate efficacy and effectiveness, one to evaluate adherence, and one to evaluate adverse events. Comparisons of interest for all analyses were single drug versus placebo for each of the drugs of interest, and single drug versus single drug comparisons for drugs within the same class and across classes. In addition, we evaluated comparisons between estrogen combined with progesterone and placebo or single drugs. Studies that included either calcium or vitamin D in both study arms were classified as being comparisons between the other agents in each arm, e.g., alendronate plus calcium versus risedronate plus calcium would be classified as alendronate versus risedronate.

Efficacy and Effectiveness

The outcome of interest for assessing effectiveness for this report is fractures, based on FDA requirements. We report data about the following types of fractures (as reported in the studies reviewed): vertebral, nonvertebral, hip, wrist, and humerus. For each of the drug comparisons, we first summarized fracture data from published systematic reviews in tables. Data abstracted from individual controlled clinical trials were grouped by fracture type within each drug comparison of interest. Based on the recommendation of subject matter experts, we did not combine data on different types of fracture; hence we report findings for total fractures only if a study reported data on total fractures (likewise for nonvertebral fractures). The primary outcome for our analysis of effectiveness is the number of people who reported at least one fracture. Wherever possible, data were presented separately for subgroups of interest. We provide narrative descriptions of the outcomes of each study not included in a prior (published) meta-analysis in Chapter 3. The data relevant to each outcome are presented in individual tables and subsequently in an evidence table (Appendix C).

Adherence

The terms adherence and persistence are defined based on principles outlined by the International Society for Pharmacoeconomics and Outcomes Research (ISPOR).16 Adherence (or compliance) is defined as “the extent to which a patient acts in accordance with the prescribed interval and dose of a dosing regimen.” Although not specifically stated in the ISPOR definition, we view adherence to specific dosing instructions (which for bisphosphonates can affect both effectiveness and risk of adverse events) as an important component of adherence. Persistence is defined as “the duration of time from initiation to discontinuation of therapy.”16

Studies that included information on adherence and/or persistence of medications for osteoporosis, as indicated in the initial article screening, formed the basis for this section of the review. Each of these studies was reviewed by one investigator to determine which adherence key question is discussed. Observational studies went on to the adherence long form, collecting detailed information on how adherence was defined, assessed, and measured and what barriers or predictors were included in each study. The investigators also abstracted the rates of adherence and persistence from each study.

The randomized and controlled clinical trials contributed evidence to the adherence analysis but did not go on to an adherence long form. Conclusions about adherence and persistence in all randomized trials are severely limited for three reasons: (1) trials restrict their patient populations in several ways, which often creates a group of patients who would be more adherent to a medicine than the general population; (2) patients are, by definition, in a clinical trial and therefore receive added attention and information that is not commonly received by the general population; (3) patients in a clinical trial who would otherwise be termed nonadherent to their medications may instead simply drop out of the trial, and thus adherence rates reported in trials may not account for patient drop out from the study. We summarized the rates of adherence in clinical trials and included any trials that discussed adherence and fracture risk, but the clinical trials were not searched for information about barriers/predictors of adherence using the detailed adherence long form.

Systematic reviews on the topic of adherence/persistence with osteoporosis medications that were identified in the literature search were reviewed by an investigator, and the most recent and relevant reviews were qualitatively summarized. Because each of these reviews was limited to very specific populations and study types, we did not eliminate studies from our review of adherence simply because they were mentioned in the prior systematic reviews.

We collected adherence and persistence rates from the randomized trials and observational studies and review them qualitatively, without any meta-analyses or pooling because of the substantial heterogeneity in measurements and definitions of adherence in each study and population differences across studies.

Several methods of measuring adherence are used in the medical literature. Self-reported adherence is commonly used, although self-report measures suffer from recall bias and may overestimate adherence. Electronic devices can monitor medication adherence and are quite accurate but expensive. Pill counts are another method of measuring the amount of medication taken: Patients bring in their pill bottles, and study staff will count pills that are remaining; this method is limited in that the use of pills is assumed if not counted in the bottle, and the method can overestimate adherence and cannot give any information about timing or pattern of doses taken.17 Another commonly used method to measure adherence uses administrative databases from pharmacies or health plans to capture the amount of medication obtained by patients. These methods have the advantage of being objective and providing information over a large time span, but they are limited in that they include only what is in the database: If patients fill their prescriptions by mail, or at another pharmacy, or another health plan, or receive samples, these fills will not be captured. There are several different ways to measure adherence from these databases. Commonly used is the medication possession ratio (MPR), which is a ratio of the days of medication supplied divided by the days between the first fill and the last fill of the medication. Also measured are the proportion of days covered (PDC), for which pharmacy fills are used to determine what proportion of all days within a specified time period a patient had enough medication, and the percentage of doses taken as prescribed, which is the percentage of prescribed doses taken as directed by the patient during a specified time. Persistence, on the other hand, is typically measured either as a continuous variable and reported as the number of days on a medication until discontinuation or as a dichotomous variable, reporting the proportion of study subjects still on the medication after a period of time.

For those studies that provided information on the barriers and/or predictors to medication adherence in osteoporosis, we identified those barriers and predictors using the adherence long form and determined the number of studies discussing each factor and the characteristics of the study, including population characteristics, specifics on how adherence/persistence are measured, and funding source. For the analysis of adherence/persistence and fracture, we qualitatively review each of these studies and prior systematic reviews addressing this topic.

The methodologic quality of each article was assessed based on the study characteristics above, although there were no formal criteria or scales used for quality assessment of these articles. To our knowledge, there are no accepted quality metrics for grading the quality of adherence measurement. Many of these observational studies use prescription claims data in a retrospective fashion. As discussed above, these studies varied in their methods of analysis, study population, and outcome variables (adherence/persistence). The result is tremendous heterogeneity in these studies, so no attempt was made to combine these results into a meta-analysis, and our results are thus qualitative.

Adverse Events

Two main analyses were performed for adverse events: analyses to assess the relationship between a group of adverse events that were identified a priori as particularly relevant and exploratory analyses of all adverse events that were reported for any of the drugs. For the analyses of adverse events, we examined (where possible given the available data) comparisons of drug versus placebo, and comparisons of drug versus drug, for drugs within the same class and across classes.

A list of all unique adverse events that were reported in any of the studies was compiled, and a physician grouped adverse events into clinically sensible categories and subcategories, including a category for each of the adverse events that were identified a priori as being of interest. For groups of events that occurred in three or more trials, we performed an exact logistic regression meta-analysis to estimate the pooled OR and its associated 95% confidence interval. Given that many of the events were rare, we used exact conditional inference to perform the pooling rather than applying the usual asymptotic methods that assume normality. Asymptotic methods require corrections if zero events are observed; generally, half an event is added to all cells in the outcome-by-treatment (two-by-two) table in order to allow estimation, because these methods are based on assuming continuity. Such corrections can have a major impact on the results when the outcome event is rare. Exact methods do not require such corrections. We conducted the meta-analyses using the statistical software package StatXact Procs for SAS Users.18 For events that were reported in only one trial, an OR is calculated and reported.

Any significant OR greater than one indicates the odds of the adverse event associated with the bone density drug is larger than the odds associated with an adverse event among patients in the comparison group (placebo, vitamin D, estrogen, calcium, or other bone density drug). We note that if no events were observed in the comparison group, but events were observed in the intervention group, the OR is infinity (denoted in the tables as Inf+) and the associated confidence interval is bounded from below only. In such a case, we report the lower bound of the confidence interval.

Because the occurrence of adverse events was fairly rare, and zero events were often observed in at least one of the treatment groups, odds-ratios (OR) were calculated using the Peto method.19 When analyzing outcomes with rare events, the Peto method has been shown to give the least biased estimate.20 An OR with a value less than one indicates that the odds of having a fracture is less in the intervention group than in the comparison group. Because fractures are rare events, the OR approximates the relative risk (RR) of fracture.

Some adverse events are so rare that the relative risks may not accurately portray differences between active- and placebo-treated groups. Thus, we calculated the risk differences for each of the adverse event reports, which take into account the proportions of participants reporting the events.

Quality Assessment

The methods used for quality assessment were determined by the design of included studies. The quality of RCTs was assessed using the Jadad scale, which was developed for drug trials and which we feel is well suited to the evaluation of quality in this report. The Jadad scale ranges from 0–5 based on points given for randomization, blinding, and accounting for withdrawals and dropouts (two points are awarded for randomization and two for double-blinding).21 Across a broad array of meta-analyses, an evaluation found that studies scoring 0–2 report exaggerated results compared with studies scoring 3–5.22 The latter have been called “good” quality and the former called “poor” quality. We also added an assessment of concealment of allocation.

The need to include observational studies was carefully assessed according to the guidelines presented in the Methods Reference Guide for Effectiveness and Comparative Effectiveness Reviews. Specifically, we assessed whether clinical trials provided sufficient data to reach conclusions and where they did not we included observational data. In practice, this meant we included observational data in two topic areas: adverse events and the assessment of adherence and outcomes. The quality of prospective cohort and case-control studies that reported rare adverse events of particular concern was assessed using relevant portions of the Newcastle-Ottawa Scales for cohort and for case-control studies.23 Items assessed for cohort studies included the following:

  • Are primary outcomes assessed using valid and reliable measures?
  • Are outcome measures implemented consistently across all study participants?
  • Were the important confounding and modifying variables taken into account in the design and analysis?
  • How was the nonexposed cohort selected?
  • How was exposure to drugs/exercise ascertained?
  • Was it demonstrated that the outcome of interest was not present at the start of the study?

Items assessed for case-control studies included the following:

  • Was the case definition adequate?
  • Were cases representative?
  • How were controls selected and defined?
  • On what basis were cases matched to controls?
  • How were outcomes assessed?
  • Was followup of adequate length?
  • What proportion of cases was followed up completely?

For observational studies of adherence, no standardized assessment of quality currently exists. The Newcastle-Ottawa for observational cohorts does not apply to most of the adherence studies. Thus we abstracted and report objective factors for each study that might be related to both quality and generalizability, such as how adherence (outcome) was measured and size and location of study (generalizability); however, we did not apply particular scales to those studies that focused solely on adherence.

Applicability

As was done for the original report, we assessed the applicability of each included study based on the similarity of the target populations to those for which this report is intended. This assessment was separate from other quality assessments.

Although people may use the terms “efficacy” and “effectiveness” interchangeably when describing whether an intervention works, these terms have important differences both clinically and for policy. The fundamental distinction between efficacy and effectiveness studies lies in the populations enrolled and control over the intervention(s). Efficacy studies tend to be performed on referred patients and in specialty settings, and to exclude patients with comorbidities. Effectiveness studies are larger and more generalizable to practice. The efficacy of an intervention is the extent to which the treatment works under ideal circumstances, and the effectiveness of the intervention is the extent to which the treatment works on average patients in average settings.

Comparative Effectiveness Reviews (CERs) assess internal validity and external validity (e.g., applicability or generalizability) of included studies. Efficacy studies emphasize internal validity, whereas effectiveness studies emphasize applicability.

Ideally, effectiveness studies compare a new drug with viable alternatives rather than with placebos and produce health, quality-of-life, and economic outcomes data under real-world conditions. For example, an effectiveness trial of a new asthma drug would include asthma-related emergency room visits, the frequency and costs of physician visits, patients' quality of life, patient compliance with the medications, acquisition costs of the medications, and frequency and costs of short-term and long-term adverse events.24

Based on the method of Gartlehner et al.,25 the characteristics we used to distinguish efficacy from effectiveness, and therefore to rate applicability were study setting, study population (stringency of eligibility criteria), duration and attempt to assess treatment compliance, health outcome assessment, adverse event assessment, sample size, and use of intention-to-treat analysis (see Appendix C).

In addition, it should be noted that the majority of studies included in our report are efficacy studies to the extent that they were large clinical trials. However, our analysis of adherence and persistence provides some information about effectiveness in that adherence and persistence influence effectiveness.

Rating the Body of Evidence

We assessed the overall strength of evidence for intervention effectiveness using guidance suggested by the U.S. Agency for Healthcare Research and Quality (AHRQ) for its Effective Healthcare Program.26 This method is based on one developed by the Grade Working Group,27 and classifies the grade of evidence according to the following criteria:

High = High confidence that the evidence reflects the true effect. Further research is very unlikely to change our confidence on the estimate of effect.

Moderate = Moderate confidence that the evidence reflects the true effect. Further research may change our confidence in the estimate of effect and may change the estimate.

Low = Low confidence that the evidence reflects the true effect. Further research is likely to change our confidence in the estimate of effect and is likely to change the estimate.

Insufficient = Evidence either is unavailable or does not permit a conclusion.

The evidence grade is based on four primary domains (required) and four optional domains. The required domains are risk of bias, consistency, directness, and precision; the additional domains are dose-response, plausible confounders that would decrease the observed effect, strength of association, and publication bias. A brief description of the required domains is displayed in Table 2 below. For this report, we used both this explicit scoring scheme and the global implicit judgment about “confidence” in the result. Where the two disagreed, we went with the lower classification.

Table 2. Grading the strength of a body of evidence: Required domains and their definitions.

Table 2

Grading the strength of a body of evidence: Required domains and their definitions.

Peer Review and Public Commentary

Experts on osteoporosis therapy and various stakeholder communities performed an external peer review of this CER. The AHRQ Effective Healthcare Program Scientific Resource Center (SRC) located at Oregon Health Sciences University (OHSU) oversaw the peer review process. Peer reviewers were charged with commenting on the content, structure, and format of the evidence report and encouraged to suggest any relevant studies we may have missed. We compiled all comments and addressed each one individually, revising the text as appropriate. AHRQ and the SRC also requested review from its own staff. The draft report was posted on the EHC website for public comment. We also requested review from each member of our Technical Expert Panel (TEP).

Views

  • PubReader
  • Print View
  • Cite this Page
  • PDF version of this title (4.4M)

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...