Methods

Stanley Ip; Peter Bonis; Athina Tatsioni; Gowri Raman; Priscilla Chew; Bruce Kupelnick; Linda Fu; Deirdre DeVine; Joseph Lau

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Ip S, Bonis P, Tatsioni A, et al. Comparative Effectiveness of Management Strategies For Gastroesophageal Reflux Disease [Internet]. Rockville (MD): Agency for Healthcare Research and Quality (US); 2005 Dec. (Comparative Effectiveness Reviews, No. 1.)

See 2011 update of this review

This publication is provided for historical reference only and the information may be out of date.

This publication is provided for historical reference only and the information may be out of date.

Cover of Comparative Effectiveness of Management Strategies For Gastroesophageal Reflux Disease

Comparative Effectiveness of Management Strategies For Gastroesophageal Reflux Disease [Internet].

Show details

Contents

< Prev Next >

2Methods

Technical Expert Panel

This report on the management strategies for GERD is based on a systematic review of the literature. The Tufts-NEMC EPC held teleconferences with a Technical Expert Panel (TEP) formed for this project. The TEP served in an advisory capacity for this report, helping to refine key questions, identify important issues, and define parameters for the review of evidence.

Analytic Framework

We applied the analytic framework depicted in Figure 1 to answer the key questions in the evaluation of the treatment modalities for GERD. This framework addressed relevant subjective and objective outcomes. It also examined clinical factors that affected treatment outcomes. While evidence from high quality randomized controlled trials was preferred, when there was a paucity of data or when they were unavailable, non-randomized and uncontrolled studies were used to augment the evidence.

Figure

Figure 1. Analytic framework for evaluating the effectiveness and safety of treatments for chronic GERD.

Search Strategy

A comprehensive search of the scientific literature was conducted to identify relevant studies addressing the key questions. Results from previously conducted meta-analyses and systematic reviews on these topics were sought and used where appropriate and updated when necessary. When this evidence was not adequate, systematic reviews on the specific topics were conducted. Evidence tables of study characteristics and results were compiled, and the methodological quality of the studies was appraised.

We searched Medline (1966-February 15, 2005) for English language studies of adult humans to identify articles relevant to each key question. We conducted a supplemental search of the Cochrane Database of Systematic Reviews on March 29, 2005. We also searched reference lists of all review articles. In electronic searches, we combined terms for gastroesophageal reflux and relevant research designs (see Appendix A for complete search strategy). We invited TEP members to provide additional citations. Because the literature on endoscopic therapies was evolving rapidly, we supplemented our data on endoscopic therapy with the latest information on additional and ongoing studies provided by technical experts. Additional studies recommended by our technical experts were included if they were relevant and were published prior to June 30^th, 2005. We included one study that reported five-year comparative data of different doses of rabeprazole and placebo that was published after our literature review. We also asked peer reviewers to provide relevant unpublished data that could be made publicly available. We did not search systematically for unpublished data.

We compared lists of authors and study centers and contacted authors as needed to identify reports that included patients who we suspected had been described elsewhere. When such reports were identified, they were considered together to identify study features as completely as possible but patients were analyzed only once. Such reports are identified in the evidence tables.

Study Selection

We assessed titles and/or abstracts of citations identified from literature searches for inclusion, using the criteria described below. Full-text articles of potentially relevant abstracts were retrieved and a second review for inclusion was conducted by reapplying the inclusion criteria. Results published only in abstract form are generally not included in our reviews because adequate information is not available to assess the validity of the data.

Population and condition of interest

According to a national consensus statement, GERD has been defined as symptoms or mucosal damage produced by the abnormal reflux of gastric contents into the esophagus.¹ GERD is considered a chronic and recurrent disease. There are several potential complications related to GERD including esophageal strictures, Barrett's esophagus, and esophageal adenocarcinoma, which together are considered to represent “complicated” GERD.

There is substantial variability how GERD has been defined in different reports. To be as inclusive as possible, we considered studies that based the diagnosis of GERD on any commonly used criteria including an abnormal ambulatory pH study while off medications, endoscopy showing esophagitis^* in patients with symptoms suggestive of GERD, typical symptoms of GERD (heartburn or regurgitation), a response to a therapeutic trial of a proton pump inhibitor, and other definitions, such as ICD-9 codes. The stringency of the diagnosis was recorded for each study.

We included comparative, randomized and non-randomized, and cohort studies of adults (≥18 years) with chronic GERD using the above definitions. Some studies did not explicitly state that they had recruited only adult patients; they were accepted provided that the median age for the population was at least 40. We also included comparative and cohort studies that specifically examined the incidence of Barrett's esophagus or esophageal adenocarcinoma in patients with complicated GERD.

We excluded studies that focused exclusively on patients with extra-esophageal manifestations of GERD (eg, reflux laryngitis, asthma), those with post-surgical GERD, pregnancy induced GERD, duodenal or peptic ulcer, gastritis, primary esophageal motility disorder, scleroderma, diabetic gastroparesis, radiation esophagitis, Zollinger-Ellison syndrome, Zenker's diverticulum, previous antireflux surgery, infectious, pill, or chemical burn esophagitis.

Intervention of interest

For studies on medical treatment, we included meta-analyses of RCTs, in which a PPI was used for treatment of acute symptoms or for maintenance therapy. Acute treatment is considered the short-term therapy - usually up to 8 weeks or, in some trials, 12 weeks - until symptom resolution or esophagitis healing. Maintenance treatment is considered the long-term treatment - at least 6 months - for preventing symptoms or esophagitis relapse. We included studies using any type of PPI given at any dose. We excluded reports that combined a PPI with antibiotic treatment for H. pylori.

For studies with surgical procedures, we accepted only studies examining total (Nissen and Nissen-Rossetti) or partial (Toupet) fundoplication either as an open or as a laparoscopic procedure. These techniques represent the most commonly used surgical approaches for treatment of GERD. We excluded studies on surgical treatment of achalasia, esophageal strictures or rings, esophageal adenocarcinoma, hiatal hernia repair (unless the indication was for reflux), and colon interposition. We also excluded procedures that are no longer in use, such as the Angelchik prosthesis.

We included all endoscopic procedures, such as endoscopic suturing, radiofrequency energy delivery to the gastroesophageal junction, or implantation of inert polymers; but we limited these articles to products approved in the United States (eg, Stretta™, EndoCinch™ Suturing System, NDO Plicator™, and Enteryx™) (see Appendix F). One of the procedures, Enteryx™, was voluntarily recalled from the market due to safety concerns during final preparation of this report (Boston Scientific recalls Enteryx Products, <http://www.bostonscientific.com/common_templates/procedureOverview.jsp?task=tskProcedureOverview.jsp&sectionId=4
&relId=7,323,324&procedureId=7004&uniqueId=MPPO1216>, accessed on 9/28/2005). However, we elected to include the data pertaining to Enteryx™ since it was the method used in one of the only two sham-controlled trials and because of the relatively large number of reports, which allowed for a better understanding of how various endpoints in the endoscopic studies correlated with one another.

Comparators of interest

For studies comparing one medical treatment with another, we included only those comparing a PPI versus another PPI or a H2RA irrespective of type or dose. Trials including other medical treatments (eg, prokinetic agents, antacids, sucralfate), combinations of other medical treatment with a PPI or an H2RA, or placebo as the only comparative group to a PPI group were excluded. These options are not considered to represent a typical medical approach for patients with GERD in the United States.

For studies comparing a surgical or endoscopic procedure with a medical treatment, we set no restrictions as to the medication used in the control arm. We also accepted sham procedure as potential control group.

For studies comparing one surgical procedure with another, the control arm was considered to be eligible if it included a total (Nissen) or partial (Toupet) fundoplication, either as an open or as a laparoscopic procedure.

No restrictions were set for control groups in studies that compared different endoscopic procedures.

Outcomes of interest

To evaluate the comparative efficacy of different therapies (question 1), we analyzed subjective and objective outcomes that are generally considered to represent clinically important endpoints in the management of GERD.

Subjective outcomes included:

change in symptoms based on the clinical methods and scales that were described in each study;
quality of life (QOL) when it was based on a validated quality of life-instrument such as the Medical Outcomes Study Short-Form-36 or the GERD-Health Related Quality of Life Instrument (see Appendix G); in addition, we recorded any outcome related to a systematic assessment of patient satisfaction.

Objective outcomes included:

esophageal pH exposure either as a change from baseline exposure or as the proportion of patients achieved “normal” acid exposure whenever it was provided; since there is variability in the techniques for performing and interpreting esophageal pH studies, we accepted each study's definition of “normal” (for details see Streets 2003¹⁵);
lower esophageal sphincter (LES) competence as described in each study;
esophagitis healing rate based on the proportion of patients without esophagitis after treatment as assessed visually by endoscopy; to evaluate the medical maintenance treatment we used esophagitis relapse rate as the proportion of patients who developed esophagitis again after healing as assessed visually by endoscopy;
continued need for antisecretory medications, as the proportion of patients who continued to require medications after treatment; we sought reporting of the proportion of patients who no longer required any antisecretory medications but also recorded the proportion who were freed from requiring PPIs or in whom the daily requirement for PPIs was reduced;
development of Barrett's esophagus or esophageal carcinoma.

We focused on the results with the longest follow-up when an endpoint was measured more than once and the trial reported results from different time points. We excluded cost-effectiveness or cost-benefit outcomes. We also excluded outcomes on extra-esophageal GERD symptoms.

For question 2, we focused on the following baseline patient characteristics that may influence treatment efficacy of GERD: age, sex, smoking status, presence of obesity or not, severity of GERD symptoms (as described in each study), type and response to previous medication, presence and severity of esophagitis, presence and size of hiatal hernia, presence of esophageal motility abnormality or not (as assessed in each study), and presence of abnormal esophageal acidification (abnormal pH study) or not among patients off medication.

To evaluate adverse events and complications (question 3), we extracted from each study the rate for each adverse event of medical treatments and the rate for every reported complication of surgical and endoscopic procedures. In addition, we looked at the length of in-hospital stay and assessed the rate for re-operation after a surgical procedure and, specifically for laparoscopic operations, the conversion rate to an open procedure. We attempted to differentiate complications for surgical and endoscopic procedures that happened intra-operatively, or resolved within 30 days from the procedure and long-term complications presenting, or persisting after the first 30 days, whenever possible.

Study designs of interest

To address question 1, we used information from recent meta-analyses of RCTs comparing efficacy between medical therapies for acute and maintenance treatment of GERD. Among the recent meta-analyses of good quality, we chose the most comprehensive in terms of included comparisons and number of primary studies. For comparing efficacy between a medical and a surgical treatment, we retrieved all the comparative studies - randomized and non-randomized - between medical and surgical treatments. For comparing efficacy between different surgical techniques, we retrieved all RCTs that recruited at least 50 participants and had a mean or median follow-up duration of at least 5 years; we also included non-randomized comparative studies that had at least 100 participants and a mean or median follow-up of at least 5 years. To supplement data on long-term efficacy of surgery, we also included surgical cohort studies - prospective and retrospective - that recruited at least 100 participants and had a mean or median follow-up of at least 5 years. To assess the efficacy of endoscopic procedures, we collected all endoscopic papers, including comparative and cohort studies.

To address question 2, we included data on specific patient characteristics of interest from the studies collected to address question 1. In addition, we retrieved comparative studies and cohorts that specifically investigated the relationship between certain patient characteristics with the efficacy of a treatment modality for GERD. To assess whether hospital setting influences the efficacy of surgical therapy for GERD, we included all studies that directly compared the surgical efficacy in an academic versus a community setting.

To address question 3, we examined all the studies already included in addressing questions 1 and 2. We also collected all studies, including case reports, cohorts, comparative studies, and reviews in which the specific focus was on adverse events and complications after medical, surgical, or endoscopic interventions for GERD. For surgical procedures, we also retrieved papers that were designed to compare the complication rate at different institutions with different volumes of patients. In addition, we used the Food and Drug Administration's MAUDE (Manufacturer and User Facility Device Experience) database (accessed May 31, 2005) to identify adverse events, complications, and interactions.¹⁶

Data Extraction

Items extracted included first author, year, country, setting, funding source, study design, inclusion,and exclusion criteria. For RCTs, we recorded the method of randomization, allocation concealment, blinding, and whether results were reported on an intention-to-treat basis. Specific population characteristics included demographics such as age and sex, presence of obesity (as assessed by BMI), and smoking status. For studies that reported short-term and long-term data in separate publications, we used the short-term publication to extract baseline data if the baseline data were not reported in the long-term publication.

To help interpret the results, we also extracted the following factors that are related to the diagnosis of GERD and disease severity (if they were reported at study entry): presenting symptoms and quality of life for patients on medication (as described in the paper); whether patients underwent endoscopy; whether patients with a hiatal hernia, esophagitis, esophageal stricture, or Barrett's esophagus were included. For hiatal hernia, we also extracted the size of hiatal hernia that the study used to exclude patients from participation. We also recorded whether pH or esophageal motility tests were performed as well as their results (as described in the study). For pH studies, we clarified, if possible, whether patients were receiving or abstaining from PPIs during the study. Finally, we recorded whether patients had tried any medical treatment, or lifestyle modifications previously, the type of medication, and their response to these therapies. For all population-related factors that were extracted, we investigated whether their baseline values differed significantly among the comparison groups.

We extracted information on treatment modality and the comparator. Primary and secondary outcomes were also extracted. For each outcome of interest, we reported the number of patients enrolled and analyzed, and the results (including baseline value, final value, within-treatment change, or between-treatment difference, with their variability estimate) as provided by the study. Duration of in-hospital stay after a surgical or an endoscopic procedure was also recorded. We collected the duration of follow-up, as well as the number and the reasons for the dropouts during the follow-up period.

Quality Assessment

We assessed the methodological quality of studies based on predefined criteria. For the assessment of meta-analyses, the criteria for methodological quality were based on the QUOROM Guidelines for Meta-analyses and Systematic Reviews of RCTs.¹⁷ For the assessment of RCTs, the criteria were based on the CONSORT statement for reporting RCTs.^18, ¹⁹ We mainly considered the methods used for randomization, allocation concealment, and blinding as well as the use of intention-to-treat analysis, the report of dropout rate and the extent to which valid primary outcomes were described. For non-randomized trials, we used the report of eligibility criteria, and the similarity of the comparative groups in terms of baseline characteristics and prognostic factors. We also considered the report of intention-to-treat analysis, and the crossovers, as well as important differential loss to follow-up between the comparative groups or overall high loss to follow-up. The validity and the adequacy of the description of outcomes and results were also assessed. For the assessment of prospective and retrospective cohorts, as well as case-control studies, we used the Newcastle-Ottawa Quality Assessment scales for cohort and case-control studies. Items assessed included selection of cases or cohorts and controls, comparability, and exposure or outcome.

We applied a three-category quality grading system (A, B, C) to studies within each of the study designs. This grading scheme applies to meta-analyses, RCTs, cohorts, and case-control studies. An assigned grade to a study of one design is not equivalent to the same grade in a study of a different design. This grading system does not attempt to assess the comparative validity of studies across different design strata. For example, a “B” rated RCT is not judged to have the same methodological quality as a “B” rated case-control study. Thus, both study design and quality grade should be noted when interpreting the methodological of a study.

A (good)
Category A studies have the least bias and results are considered valid. A study that adheres mostly to the commonly held concepts of high quality including the following: a rigorously conducted meta-analysis; a formal randomized study; clear description of the population, setting, interventions, and comparison groups; appropriate measurement of outcomes; appropriate statistical and analytic methods and reporting; no reporting errors; less than 20% dropout; clear reporting of dropouts; and no obvious bias.

B (fair/moderate)
Category B studies are susceptible to some bias, but not sufficient to invalidate the results. They do not meet all the criteria in category A because they have some deficiencies, but none likely to cause major bias. The study may be missing information, making it difficult to assess limitations and potential problems.

C (poor)
Category C studies have significant bias that may invalidate the results. These studies have serious errors in design, analysis, or reporting; have large amounts of missing information, or discrepancies in reporting.

Data Synthesis

Review of meta-analyses

We used the results reported in meta-analyses on comparative efficacy of medical treatment. We considered the outcomes on acute and maintenance medical treatment as combined by the meta-analyses. Meta-analyses reported dichotomous outcomes, which included, for acute treatment: esophagitis healing and complete heartburn resolution, and for maintenance treatment: esophagitis relapse and symptom relapse. To combine these outcomes, meta-analyses applied the random effects model to estimate risk difference or relative risk with 95% confidence interval. Compared with the fixed effects model, the random effects model is more conservative in that it results in broader confidence intervals when between-study heterogeneity is present. We used the estimates as reported by the meta-analyses. We also used any attempt reported by the meta-analyses to explore heterogeneity using sub-group analyses or meta-regression.

Evidence and summary tables

The evidence tables offer a detailed description of the studies that addressed each of the key questions. The tables (see Appendix C) provide detailed information about the study design, the sample size, the intervention and comparison group treatments, the patient characteristics, the follow-up, the major outcomes, and the quality. In addition, for systematic reviews and meta-analyses, we reported the databases searched and for which time period, the number and the type of primary studies included, and the type of comparison addressed (medical versus medical; medical versus surgery; or endoscopic versus sham procedure).

Summary tables succinctly report summary measures of the main outcomes evaluated. They include information regarding study design, intervention and comparison group, therapeutic modality, study duration or follow-up, whether patients with severe esophagitis were also recruited, sample size (subjects enrolled and analyzed in each arm), results of major outcomes, and methodological quality. These tables were developed by condensing information from the evidence tables. They are designed to facilitate comparisons and synthesis across studies. A methodological quality was assigned to each study as described previously.

We reported medication usage data as described by the study authors without attempting to standardize the definitions. Some authors reported medication usage as the proportion of patients off PPIs while others reported the proportion of patients on PPIs or the number of days that patients regularly used antisecretory medications.

We also included an overall synthesis table in the results section to succinctly report the findings. The table included information on the data sources, populations studied, limitations of the included studies, a summary on major outcomes (symptoms, quality of life, esophagitis healing, esophageal acid exposure, medication use), treatment-related factors with or without an association on outcomes, the type and frequency of major adverse events, and complications for the three treatment modalities.

Adverse events reporting

We reported the main adverse events of medical treatments in a summary table. We grouped studies according to the type of comparison (PPI versus H2RA or placebo; PPI maintenance dose versus healing dose), and the adverse event reported. For adverse events in each comparison, we reported the total number of patients included in the studies, the number of studies, and the total percent adverse event rate for each of the comparative arms, whenever the data are available.

We summarized complications of surgical and endoscopic procedures in evidence and summary tables. We considered studies with Nissen and Nissen-Rossetti fundoplication within the same category. In evidence tables, we grouped studies reporting complications according to the type of procedure and the complication reported. In these tables, for each study we report the data on the absolute number and the percentage of subjects with the complication. In summary tables, we reported the number of studies and the event rate for each complication and for each procedure. The mean event rate was calculated for two or more studies. Separate evidence and summary tables were created for studies that reported complications occurred within 30 days from the procedure, for studies with complications after 30 days from the procedure and for studies that were unclear for the time period between the procedure and a complication. We did not include case reports in the evidence or the summary tables.

Overall comparative synthesis table

To aid discussion, we summarized the comparative data across treatment modalities (medical, surgical, and endoscopic) in one table in the section on conclusions/discussion/future research. Separate cells were constructed for each key question. Important comparative findings for each key question were summarized whenever the data were available.

Grading a body of evidence for each key question

We assigned an overall grade describing the body of evidence for each key question that was based on the number and quality of individual studies, duration of follow-up and the consistency across studies. To assess the evidence for the first key question on comparative efficacy, we relied on direct and indirect comparative data between treatment modalities. We provided separate grades that assessed the body of evidence on medical versus surgical treatments and surgical versus endoscopic treatments. No studies compared medical with endoscopic treatments, and we did not assign a grade to this comparison. For the second key question on factors influencing outcomes, we relied mainly on observational studies. For the third question on adverse events, we relied on direct and indirect comparative studies, cohort studies, and various databases that reported adverse events. The grades corresponded to the following definitions:

Robust - There is a high level of assurance with validity of the results for the key question based on at least two high quality studies with long-term follow-up of a relevant population. There is no important scientific disagreement across studies in the results for the key question.

Acceptable - There is a good to moderate level of assurance with validity of the results for the key question based on fewer than two high quality studies or in high quality studies that lack long-term outcomes of relevant populations. There is little disagreement across studies in the results for the key question.

Weak - There is a low level of assurance with validity of results for the key question based on either moderate to poor quality studies or on studies of a population that may have little direct relevance to the key question. There could be disagreement across studies in the results for the key question.

The grades provide a shorthand description of the strength of evidence supporting the major questions we addressed. However, they may oversimplify the many complex issues involved in appraising a body of evidence. The individual studies involved in formulating the composite grade differed in their design, reporting, and quality. As a result, the strengths and weaknesses of the individual reports addressing each key question should also be considered, as described in detail in the text and tables.

Peer Review

A draft version of this report was reviewed by a panel of expert reviewers, including representatives from professional organizations, pharmaceutical companies, and manufacturers of endoscopic devices used in the management of GERD. Revisions of the draft were made, where appropriate, based on their comments. (See Appendix D ^**) The draft and final reports were also reviewed by staff from the Scientific Resource Center at Oregon Health and Science University. However, the findings and conclusions are those of the authors, who are responsible for the contents of the report.

Footnotes

: Several grading systems have been proposed to evaluate the severity of GERD; the most common of which are the Savary-Miller Classification and the Los Angeles Grade. Patients were considered to have mild to moderate esophagitis if they were categorized as Savary-Miller class I-II or Los Angeles grade A-B, while they were considered to have severe esophagitis if it was categorized as Savary-Miller class III-IV or Los Angeles grade C-D (see Appendix E).
: Appendix D (Peer Reviewers) is available electronically at www.ahrq.gov/clinic/epcindex.htm.

Bookshelf ID: NBK42943

Contents