Methods

Tatyana Shamliyan; Jean Wyman; Robert L Kane

This publication is provided for historical reference only and the information may be out of date.

Methods

Input From Stakeholders

We developed research questions and an analytic framework (Figure 1) after discussions with key informants and technical experts. Research questions for the systematic review were posted for public comment, based on which we identified interventions eligible for this review. Stakeholders recommended a focus on patient-centered outcomes and interventions most relevant for ambulatory care and not evaluated in previous systematic reviews. Stakeholders also recommended reviewing nonsurgical interventions relevant to women with refractory UI. Comprehensive information about all nonsurgical treatment choices can lead to evidence-based referral practices for women with refractory UI.

This figure presents a conceptual model and the analytical framework for the key questions within the context of the Population, Interventions, Comparators, Outcomes, and Settings. The framework includes eight headers: target Population for question1 as adult and elderly women with symptoms of UI, target population for question 2 as adult and elderly women diagnosed with UI, diagnostic methods for UI, pharmacological and nonpharmacological nonsurgical treatments for women with UI, intermediate outcomes such as pad test and frequency of UI episodes, and clinical outcomes such as continence and quality of life. In general, the figure illustrates how adult and elderly women with symptoms of UI may have diagnostic procedures with different diagnostic value and probability of accurate urodynamic and medical diagnosis of stress, urge, or mixed UI. Using different diagnostic methods may be indirectly associated with treatment outcomes. Different diagnostic methods may result in false-positive and false-negative results. The figure also illustrates how pharmacological or nonpharmacological treatments versus placebo, regular care, or an active control may result in intermediate outcomes (e.g., frequency and severity of incontinence) and clinical outcomes (e.g., continence or quality of life). Adverse events may occur at any point after treatment is received. Treatment effects may be modified by age, baseline cause for UI, type of incontinence, and baseline severity of incontinence. Rounded corner rectangles provide information about intermediate outcomes and squared corner rectangles contain patient important clinical outcomes. Ovals represent possibility of false results from diagnostic methods and harms after interventions. Arrows describe diagnostic tests and treatments. Dotted line describes the association between intermediate and clinical outcomes. The figure also gives information about the research questions. Key Question 1: What constitutes an adequate diagnostic evaluation for women in the primary care setting on which to base treatment of urinary incontinence (UI)? Key Question 2: How effective is the pharmacologic treatment of UI in women? Key Question 3: How effective is the nonpharmacologic treatment of UI in women?

Figure 1

Analytic framework of diagnosis and comparative effectiveness of treatments for urinary incontinence (UI) in adult women.

Candidates to serve as key informants, technical experts, and peer reviewers were approved by the Task Order Officer from AHRQ after disclosure of conflicts of interest. The protocol was developed with input from the Technical Expert Panel.

Literature Search Strategy and Eligibility Criteria

Search Strategy

We sought studies from a wide variety of sources, including MEDLINE^® via OVID and via PubMed^®, the Cochrane Library, SCIRUS, Google Scholar, and manual searches of reference lists from systematic reviews, the proceedings of the ICS, and systematic reviews by the ICI. We also reviewed grey literature packets from the Scientific Resource Center (SRC) (Appendix Table A1). This search included regulatory documents and conducted clinical trials. The regulatory documents included medical and statistical reviews from the U.S. FDA, Health Canada - Drug Monographs, and Authorized Medicines for the European Union - Scientific Discussions. We searched the Web site www.ClinicalTrials.gov on May 20, 2010, to find closed studies of urinary incontinence or overactive bladder. In addition, the following clinical trial registries were searched for completed trials related to the key questions: Current Controlled Trials (United Kingdom), Clinical Study Results (Pharmaceutical Research and Manufacturers of America), and World Health Organization Clinical Trials (International). Scopus and Physical Education Index was searched for conference papers and abstracts related to UI. We identified ongoing studies in ClinicalTrials.gov and the National Institutes of Health Research Portfolio Online Reported Tools (report) http://report.nih.gov/index.aspx Web sites.

The search strategies for the three research questions are described in Appendix A. Exact search strategies were developed through consultation with qualified librarians and guided by the SRC. We developed an a priori search strategy based on relevant medical subject headings (MeSH) terms, text words, and weighted word frequency algorithms to identify related articles. We documented each recommended, included, and excluded study in the master library. We identified studies published in English from 1990 until December 30, 2011.

Excluded references are shown in Appendix B. Our analysis of the results from ongoing studies is presented in Appendix C. The protocol was developed with input from the Technical Expert Panel.

Eligibility

Three investigators independently determined the eligibility of the studies according to recommendations from the Cochrane Manual for Systematic Reviews.¹⁴⁷ The algorithm to define study eligibility was developed for each research question (Appendix Table D1). We followed the Comparative Effectiveness Manual to select evidence from controlled trials and observational studies.¹⁴⁸ We defined the target population, eligible independent and dependent variables, outcomes, time, and setting following the PICOS framework (Appendix Table D2). We formulated a list of eligible interventions following the discussion with key informants and technical experts, and after considering public comments (Appendix Table D3). We included nonsurgical, nonpharmacological treatments for UI. We included the drugs available in the United States for predominant stress UI (topical estrogens and antidepressants) and those approved by the FDA for overactive bladder (Appendix Table D4). We excluded systemic estrogens⁹ and selective estrogen receptor modulators¹²²^,¹²³ that failed to prevent or improve UI. We included bulking agents and ingestible neurotoxins to review all nonsurgical treatment options for women with refractory UI. We reviewed abstracts to exclude news, reviews, letters, comments, and case reports. Then we confirmed eligible target populations of adult women residing in the community.

Inclusion Criteria

Studies published in English after 1989.
Studies that examined eligible interventions of drug therapies or nonsurgical treatments for women with UI (Appendix D).
Studies that examined eligible outcomes of UI (total, mixed, stress, urgency), quality of life in women with UI, and harms of the treatments.

We included all RCTs, pooled individual patient data from RCTs, nonrandomized multicenter clinical trials, and observational studies that used strategies to reduce bias (adjustment, stratification, matching, or propensity scores).

For Key Question 1 we included studies that evaluated different diagnostic methods for UI in women that are applicable to ambulatory care settings. We applied criteria for assessing whether a body of study data was sufficient to answer the question of diagnostic methods.¹⁴⁹ We included any observational studies that reported true and false positive and negative cases, sensitivity, and specificity of diagnostic methods for different types of female UI.

For Key Questions 2 and 3 we defined efficacy and effectiveness trials following criteria from the CER manual.¹⁴⁹ We compared the results from observational studies and RCTs on positive clinical outcomes and harms.¹⁴⁹ We included randomized controlled trials (RCTs) that combined men and women if they reported outcomes in women separately or included more than 75 percent women. We examined unpublished RCTs from the medical and statistical reviews that were conducted by the FDA. We included observational studies of treatments that were not examined in RCTs.

Exclusion Criteria

Studies of children, adolescents, or men.
Studies of incontinence caused by neurological disease.
Studies of dual fecal and UI.
Studies of surgical treatments for UI or urogenital prolapsed.
Studies of drugs not available in the United States.
Studies with no clinical outcomes relevant to UI.
Case series with fewer than 100 subjects that reported short-term (less than 4 weeks) crude rates of the outcomes and/or did not use strategies to reduce bias.
Secondary data analysis, nonsystematic reviews, letters, or comments.
Studies that reported absolute values of the diagnostic tests in incontinent women.
Studies that did not report true and false positive and negative cases of diagnostic tests.

To assess harms of the treatments we followed the recommendations from the CER manual¹⁴⁹^,¹⁵⁰ and reviewed published and unpublished evidence of the adverse effects of eligible drugs and nonsurgical treatments for female urinary incontinence including:

Randomized controlled trials.
Unpublished supplemental trials data from the Web site http://www.clinicalstudyresults.org.
Observational cohort and case control studies.
Observational studies based on patient registries or large databases.
Case reports and post-marketing surveillance.

We defined harms as the totality of all possible adverse consequences of an intervention.¹⁵⁰ We analyzed harms regardless of how authors perceived the causality of treatments.

We did not contact the investigators of the primary studies.

Quality Assessment

We rated the quality of studies according to recommendations from the Methods Guide for Effectiveness and Comparative Effectiveness Review.¹⁴⁹ We classified the studies by design to distinguish randomized and nonrandomized controlled clinical trials from observational studies. We evaluated reporting and methodological quality of the studies for Key Question 1 with predefined criteria for assessing the quality of diagnostic accuracy studies.¹⁵¹^–¹⁵⁶ We evaluated the quality of therapeutic studies using predefined criteria, which included randomization, adequacy of randomization and allocation concealment, masking of the treatment status, intention to treat principles, and justification of the sample size.¹⁴⁷ We evaluated disclosure of conflict of interest by the authors of individual studies and funding sources but did not use this information to downgrade quality of individual studies. We did not downgrade methodological quality of poorly reported studies. We did synthesize evidence from poorly reported studies separately.

We defined well-designed RCTs with adequate allocation concealment, intention to treat principles in analysis, and appropriate measurements of clinically important outcomes as studies with low risk of bias.

We defined studies as having a medium risk of bias if they were susceptible to some bias but not sufficient bias to invalidate the results. Examples of studies with medium risk of bias include open label RCTs, RCTs with unclear allocation concealment, RCTs with a short term of followup, and crossover RCTs without assessment of carryover effect.

We defined studies as having a high risk of bias if they had significant flaws that imply biases of various types that may invalidate the results, including nonrandom treatment allocation, no strategies to reduce bias, and ignoring randomization in analysis.

Grading the Evidence for Each Key Question

We assessed strength of evidence following the guidelines in the CER Manual.¹⁵⁷ We judged the strength of evidence according to the domains of risk of bias, consistency, directness, and precision for each major outcome.¹⁴⁹ When appropriate, we also included dose response association, presence of confounders that would diminish an observed effect, and strength of association. We evaluated strength of the association defining a priori large effect when relative risk was >2 or <0.5) and very large effect when relative risk was >5 or <0.2.¹⁴⁷ We defined low magnitude of the effect when relative risk was significant but less than 2.

We defined evidence as strong when several well-designed RCTs with a low risk of bias demonstrated consistent treatment effects. These are findings for which future research would be very unlikely to change the estimate of effect. We assigned a moderate level of evidence when RCTs with medium risk of bias reported consistent treatment effects or large observational studies reported consistent associations. We assigned a low level of evidence to data from RCTs with serious flaws in design/analysis, and from post hoc subgroup analysis; these are findings for which further research is likely to change the estimate. We defined insufficient evidence when a single study examined treatment effects or associations. We graded the level of evidence for primary outcomes across studies as illustrated in Table 2.

Table 2

Overall ranking of evidence.

Applicability

Applicability of the population was estimated by evaluating the female population from which samples have been selected in observational studies and clinical trials.¹⁵⁸ We examined settings of the studies including ambulatory care or specialized clinics, recruitment in clinical settings or in the community, inclusion age and type of UI, and exclusion criteria for each study. The studies that recruited women from the population had better applicability.

We assumed the presence of publication bias and did not use statistical tests for bias defined as the tendency to publish positive results.¹⁵⁹^–¹⁶² We used several strategies to reduce bias, including a comprehensive literature search of published and unpublished evidence in several databases, reference lists of systematic reviews, proceedings of scientific meetings, contacts with experts for additional references, and agreement on the eligibility status by several investigators.

Data Extraction

Four researchers manually and independently performed evaluations of the studies and data extraction. The data abstraction forms are shown in Appendix E. We did multiple quality controls of all data from RCTs and in a 30 percent random sample of observational studies. Errors in data extractions were assessed by a comparison with the established ranges for each variable and the data charts with the original articles. Any discrepancies were detected and discussed. We abstracted the number of positive (true and false) and negative (true and false) after index diagnostic tests when compared to multichannel urodynamics or diary. We abstracted descriptive information about populations, interventions, controls, outcomes, settings, and time to measure outcomes in relation to the randomization or beginning of the treatment. We abstracted the number randomized into active and control treatments, doses of the drugs, events or rates, or means and standard deviations after active and control treatments. We abstracted sponsorship of the studies, sponsor participation in design and data analysis and presentation, and conflict of interest by the authors of the studies. We abstracted inclusion of minorities in the studies, inclusion of women who failed prior therapy for UI, inclusion of mixed UI, baseline daily UI, and presence of urogenital prolapse or hysterectomy in women who participated in the studies. Adjustments for age, race, comorbidities, socioeconomic status, previous treatments, and baseline severity of UI were extracted from observational studies.

Data Synthesis

For Key Question 1 results of individual studies were summarized in evidence tables to analyze sensitivity, specificity, predictive values, diagnostic odds ratios, and predictive likelihood ratios for correct diagnosis of any, stress, and urgency UI (Appendix Table D5). We focused on the predictive likelihood ratios of UI in women examined with index tests when compared to women who had urodynamic or clinical diagnosis.¹⁶³^–¹⁶⁶ Ratios of 1 indicated that the tests likely do not provide accurate UI diagnosis.¹⁶⁷ Ratios of more than 10 provided large and often conclusive increases in the likelihood of UI.¹⁶⁷ Tabulation was performed for each article regarding symptoms or results of diagnostic tests and the diagnosis of stress incontinence or detrusor overactivity, using either urodynamic testing or clinical final diagnosis separately as the criterion standard. Specifically, the diagnostic value of history of three symptoms was evaluated: symptoms of stress incontinence for stress UI and symptoms of urgency incontinence and urgency for detrusor overactivity. We pooled diagnostic test data with random effects models using Meta-Analyst software.¹⁶⁸ In cases of heterogeneity, we used bivariate pooling methods.¹⁶⁶^,¹⁶⁹^,¹⁷⁰

Urodynamic evaluation detects a presence of UI but not severity and frequency of UI. However, doctors need information about frequency and severity of UI to make treatment decisions and evaluate treatment effectiveness. To address the diagnostic methods of frequency and severity of UI we synthesized content and applicability of checklists and scales to assess symptom frequency and bothersomeness, quality of life, and women’s satisfaction with treatments. We evaluated validation, reliability, and the proposed minimal important differences in total scores when this information was available.

For Key Questions 2 and 3 we calculated relative risk, absolute risk differences, number needed to treat (NNT), and the number of events attributable to active treatment per 1,000 persons treated for binary outcomes. We used the number of randomized subjects forcing intention to treat principles independent of the ambulatory studies analyses. We calculated mean differences from the reported means and standard deviations among randomized to active and control treatments. We used correction coefficients, forced intention to treat, and recommended calculations for missing data.¹⁴⁷ We used Meta-Analyst¹⁶⁸ and STATA (Statistics/Data analysis, 10.1) software to calculate individual study estimates with a 95 percent confidence interval (CI).

Following guidelines⁶⁹^,¹⁰⁸ and recommendations from key informants and Technical Expert Panel members we focused on patient-centered outcomes including continence, improvement in UI, quality of life, adverse effects, and discontinuation due to adverse effects. We used the definitions of signs and symptoms of UI promoted by the IUGA/ICS (Appendix Table D2), including mixed, stress, and urgency UI.¹⁰ We defined continence when the authors reported cure, absence of incontinent episodes in bladder diaries, or negative pad or stress tests (Table 1). We defined improvement in UI when the authors reported reduction by more than 50 percent in frequency of UI in diaries or patient-reported significant improvement in UI. We defined failure when frequency of UI did not change or became worse in diaries or according to patient reported worsening of UI. We relied on patient outcomes rather than continuous measures of UI episodes or urine loss.¹⁰⁸ We analyzed discontinuation rates independent of investigator judgments about association with tested drugs. We analyzed adverse effects as reported by the authors.

Pooling criteria included the same operational definitions of clinical populations, incontinence outcomes, the same clinical interventions, and the time of the assessment of the outcomes.¹⁷¹ Meta-analysis was used to assess the consistency of the association between treatments and incontinence outcomes with random effects models using an inverse variance weighting method (Appendix Table D5).¹⁶⁸^,¹⁷² We chose the random effects model to incorporate in the pooled analysis differences across trials in patient populations, baseline rates of the outcomes, dosage of drugs, and other factors.¹⁷³ For pooled relative risks (RR) and absolute risk difference (ARD) we excluded trials with no events in both groups and added a correction coefficient of 0.5 in the trials with no events only in one group.¹⁷³ We used pooled ARD to calculate the number needed to treat and the number of events attributable to active treatment per 1,000 persons treated.¹⁷⁴^,¹⁷⁵ We calculated means and 95 percent CI for the number needed to treat as reciprocal to pooled ARD when ARD was significant.¹⁷⁶ We calculated means and 95 percent CI for treatment events per 1,000 treated, multiplying pooled absolute risk difference by 1,000.¹⁶⁸^,¹⁷²^,¹⁷⁴^–¹⁷⁶ We assessed missing data across studies, including loss to followup and dropout patterns, and forced intention-to-treat analysis using the number of randomized subjects for all calculations. We also used maximum likelihood method for pooling continence, clinically important improvement in UI, and treatment discontinuation due to adverse effects.¹⁶⁸ We calculated split placebo sample sizes and events in multi-arm drug trials proportionally to the randomization ratio to avoid double counting control groups. We synthesized sparse data defined as rates less than 2 percent by calculating fixed Mantel-Haenszel relative risk, and Peto odds ratio.¹⁷⁷ We analyzed adverse effects with drugs for urgency UI using double arcsine transformation for event rates. When studies had no events with active, control, or both treatments, we used correction coefficients and calculated odds ratios from random-effects generalized nonlinear mixed-effect models.¹⁶⁸^,¹⁷⁸^–¹⁸¹

We examined the association between age, race, obesity, comorbidities, UI type, baseline severity, and response to prior treatments with clinical outcomes as reported by the authors of the original studies. We synthesized the evidence by the baseline type of UI as pure or predominant stress, pure or predominant urgency, and mixed UI. We compared clinical outcomes by the type of UI within each study and across the studies. We evaluated inclusion and exclusion criteria and baseline characteristics of the subject to determine whether all or a proportion of the subjects had mixed UI. Then we conducted quantitative meta-regression and subgroup analysis to determine treatment effects by baseline type of UI. When exploring heterogeneity, we did not use subject level variables to avoid an ecological fallacy.

We examined consistency in results across the studies with Chi square tests and I square statistics.¹⁸²^,¹⁸³ We explored heterogeneity with meta-regression, subgroup, and sensitivity analysis and reported the results from random effects models only.¹⁷³ Using a standard preplanned algorithm, we explored heterogeneity by clinical diversity, comprised of the proportion of women, proportion of minority population, age of women, severity of UI, failure after prior treatments, concomitant treatments, inclusion of women with urogenital prolapse, and inclusion of women with mixed UI.¹⁷³ We explored heterogeneity by dose (when applicable), by duration of the treatments, and by control rate of the outcomes. We explored heterogeneity by quality criteria of individual studies and by whether conflict of interest was disclosed by study authors.¹⁷³ We explored heterogeneity by each quality criterion rather than the global quality score.¹⁸⁴^,¹⁸⁵ We calculated pooled relative risk, absolute risk difference with 95 percent CI, and Bayesian odds ratios with 95 percent credible intervals using STATA 10.1 and Meta-Analyst software.¹⁶⁸^,¹⁷⁴ We analyzed the probability that active treatments increased the chances of continence, improvements of UI, or adverse effects with the Bayesian approach using noninformative prior probability of the events.¹⁶⁸ The analytic framework and algorithms for the meta-analysis are shown in Appendix Table D5.

Publication Details

Copyright

Copyright Notice

Publisher

Agency for Healthcare Research and Quality (US), Rockville (MD)

NLM Citation

Shamliyan T, Wyman J, Kane RL. Nonsurgical Treatments for Urinary Incontinence in Adult Women: Diagnosis and Comparative Effectiveness [Internet]. Rockville (MD): Agency for Healthcare Research and Quality (US); 2012 Apr. (Comparative Effectiveness Reviews, No. 36.) Methods.

Grade	Definition
High	High confidence that the evidence reflects the true effect. Further research is very unlikely to change our confidence in the estimate of effect.
Moderate	Moderate confidence that the evidence reflects the true effect. Further research may change our confidence in the estimate of effect and may change the estimate.
Low	Low confidence that the evidence reflects the true effect. Further research is likely to change the confidence in the estimate of effect and is likely to change the estimate.
Insufficient	Evidence either is unavailable or does not permit a conclusion.

Methods

Input From Stakeholders

Figure 1

Literature Search Strategy and Eligibility Criteria

Search Strategy

Eligibility

Inclusion Criteria

Exclusion Criteria

Quality Assessment

Grading the Evidence for Each Key Question

Table 2

Applicability

Data Extraction

Data Synthesis

Publication Details

Copyright

Publisher

NLM Citation

Figure 1Analytic framework of diagnosis and comparative effectiveness of treatments for urinary incontinence (UI) in adult women

Table 2Overall ranking of evidence