NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.
Aronson N, Flamm CR, Mark D, et al. Endoscopic Retrograde Cholangiopancreatography. Rockville (MD): Agency for Healthcare Research and Quality (US); 2002 Jun. (Evidence Reports/Technology Assessments, No. 50.)
This publication is provided for historical reference only and the information may be out of date.
Part I: Common Bile Duct Stones
This chapter reviews evidence on the following questions:
In patients with known or suspected common bile duct stones,
a. What is the diagnostic performance of ERCP in detecting common bile duct stones in comparison to alternatives (e.g., EUS, MRCP, or CTC)? (Part I, Section 1: Diagnostic Performance of ERCP in Detecting Common Bile Duct Stones - Comparison to Alternatives)
b. What are the outcomes of treatment using ERCP strategies compared to using surgical or medical management? (Part I, Section 2: Outcomes of Treatment Using ERCP for Common Bile Duct Stones - Comparison of Strategies Using ERCP, Surgery, or Medical Management)
c. What is the diagnostic value of individual risk factors or predictive models for assessing the likelihood of having a common bile duct stone? (Part I, Section 3: Diagnostic Value of Individual Risk Factors or Predictive Models for Assessing the Likelihood of Having a Common Bile Duct Stone)
Part I, Section 1: Diagnostic Performance of ERCP In Detecting Common Bile Duct Stones -- Comparison With Alternatives
Introduction
The literature review identified three techniques that could be used as alternatives for diagnostic ERCP in the diagnosis of common bile duct stones: magnetic resonance cholangiography (MRCP), endoscopic ultrasound (EUS), and computed tomography cholangiography (CTC, with and without oral or intravenous biliary contrast). This section of the review only assesses diagnostic performance, and does not consider costs, availability, or adverse effects.
All included studies enrolled patients who underwent both the diagnostic test under consideration and ERCP. However, the choice of reference standard varied between studies and needs to be taken into account when interpreting the test characteristics calculated in each study, particularly if the goal is to determine which test is superior. Although ERCP had traditionally been considered the most accurate test for diagnosis of common bile duct stones, the test can produce both false-negative and false-positive results. The studies reviewed here generally used one of three different types of reference standards.
Ideally, ERCP and the alternative diagnostic test are both compared to a perfect reference standard such as actual examination of the common bile duct, producing unbiased estimates of test characteristics for both tests. Such a reference standard would not be ethical in most circumstances. Short of that, there may be selective confirmation of positive ERCP or other tests, producing slightly biased estimates of test characteristics that are upwardly biased. However, the relative performance of ERCP to the alternative diagnostic test can be examined.
If ERCP is used as the reference standard, then the comparator test can only be worse. In such a case, the analysis can not determine which test is superior, but only the degree of concordance between the two tests.
Finally, a few studies (Neitlich, Topazian, Smith et al., 1997; Jimenez Cuenca, del Olmo Martinez, Perez Homs et al., 2001; Sugiyama, Atomi, and Hachiya, 1998) used ERCP images and sphincterotomy findings as the reference standard. This does not really allow an evaluation of the comparison between ERCP and the diagnostic test of interest, because the unreported diagnostic errors of ERCP images are "corrected" by the sphincterotomy findings. The performance of diagnostic ERCP cannot be evaluated in such studies unless the interpretation of the diagnostic ERCP is reported separately.
Given that the expected difference in diagnostic performance between ERCP and the diagnostic alternatives reported here are relatively small and the number of cases with the outcome of interest is generally small, these studies may have very limited power to detect statistically significant differences in test performance. None of the studies actually calculated any statistical significance values. Thus, it is not possible to determine with confidence whether the diagnostic performance of the alternative is similar or poorer than ERCP or to accurately quantitate any difference.
Evidence Base
The search and selection process yielded 10 studies on MRCP (total n=834), 9 studies on EUS (total n=601), and 6 studies with 7 sets of findings on CTC (total n=266). In addition to these studies reporting diagnostic performance specific to common duct stones, 2 studies on MRCP which reported only on overall detection of obstructive abnormalities (total n=121) are also presented here. Study quality assessment is outlined in Table 1.
Review of Evidence: MRCP Performance
Ten studies studying a total of 834 patients were selected which examined the performance of MRCP compared to ERCP for the diagnosis of common bile duct stones (Table 2). Nine of the studies used ERCP as the reference standard, and thus measure the concordance of the two techniques rather than the relative performance. Only one study (Sugiyama, Atomi, and Hachiya, 1998) confirmed positive tests and allowed a comparison between the two tests. All the studies were rated as good quality with the exception of Guibaud, Bret, Reinhold, et al. (1995) and Sugiyama, Atomi, and Hachiya (1998).
Seven of the 9 studies which use ERCP as a reference standard show high concordance between the two tests with both sensitivity and specificity being greater than 90 percent. Two studies showed lesser degrees of concordance (Guibaud, Bret, Reinhold, et al., [1995], sensitivity 81 percent specificity 98 percent, and Stiris, Tennoe, Aadland et al. [2000], sensitivity 88 percent and specificity 94 percent).
Sugiyama, Atomi, and Hachiya (1998) did the only study that confirms positive ERCP tests and allows a comparison between the two tests. In that study of 97 patients, ERCP had 100 percent sensitivity, and MRCP had 91 percent sensitivity. Specificity for both tests was 100 percent. This was the only study that analyzed sensitivity by subgroups of stone diameter. Sensitivity was 100 percent for stone diameters from 11-27 mm, 89 percent for stone diameter from 6-10 mm, and 71 percent for stone diameter between 3-5 mm.
Two studies reporting on a total number of patients of 121 had a mixed category of outcomes that included common duct stones (Table 3). In the study by Adamek, Albert, Weitz et al. (1998), the abnormalities included benign and malignant strictures, cholangiocarcinoma and choledochal cyst in addition to common duct stones. MRCP had a sensitivity and specificity for detecting any abnormality of 89 percent and 92 percent, whereas ERCP had a sensitivity of 91 percent and 92 percent.
In the study by Holzknecht, Gauger, Sackmann et al. (1998), the abnormalities detected included common bile duct dilatation and stenosis, in addition to common duct stones. Only the concordance with ERCP was evaluated. According to an image interpretation performed on-site, the sensitivity was 91 percent and the specificity was 80 percent. An off-site interpretation showed similar results.
In conclusion, most of the evidence on MRCP allows only conclusions as to whether MRCP and ERCP are concordant, rather than which test is superior. Most studies show fairly good concordance, with sensitivities and specificities both higher than 90 percent. Evidence limited to one study may indicate that ERCP is slightly better than MRCP.
Review of Evidence: Endoscopic Ultrasound Performance
There are 9 studies (total n=601) reporting on the capability of endoscopic ultrasound to diagnose common duct stones compared to ERCP (Table 4). In all the studies except 1 (Sugiyama and Atomi, 1998), positive tests of either method were confirmed with sphincterotomy, allowing for inferences regarding comparative performance. The study by Prat, Amouyal, Amouyal et al. (1996) stands out in this regard by subjecting all patients to sphincterotomy and endoscopic exploration, and thus is the only study in this whole section examining common bile duct stones with a truly independent reference standard. Chak, Hawes, Cooper et al. (1999) and Canto, Chak, Stellato et al. (1998) were also rated as "good" quality studies.
Given the small differences in performance noted in most of the studies, none of the studies is likely to detect statistically significant differences in test performance. In three of the studies, the sensitivity of EUS was higher than ERCP (Prat, Amouyal, Amouyal et al., 1996, Norton and Alderson 1997; Burtin, Palazzo, Canard et al., 1997). In three studies, the sensitivity of ERCP was higher than EUS (Canto, Chak, Stellato et al., 1998; Dancygier and Nattermann 1994, Sugiyama and Atomi, 1997) and in the two other studies the sensitivities were within 1 percent (Polkowski, Palucki, Regula et al., 1999; Chak, Hawes, Cooper et al., 1999). The specificities were very close in all studies except Chak, Hawes, Cooper et al. (EUS 100 percent, ERCP 87 percent).
Although most of the studies are small, within the limits of the evidence available, it appears that EUS is similar to ERCP in the detection of common bile duct stones.
Review of Evidence: CTC Performance
Seven sets of findings report the diagnostic characteristics of CTC compared to ERCP for the diagnosis of common bile duct stones (Table 5). The studies varied considerably in the reference standard used. Three studies used ERCP as a reference standard, 2 studies used an independent reference standard, and 2 studies used ERCP and sphincterotomy findings as a reference standard. Three variations of CTC were used -- no biliary contrast (3 studies, total n=142), intravenous biliary contrast (2 studies, total n=95) and oral contrast (2 studies, total n=80). This results in a body of literature in which, at most, 2 studies share the same CT technique and reference standard. The studies by Jimenez Cuenca, del Olmo Martinez, Perez Homs et al. (2001), Neitlich, Topazian, Smith et al. (1997), and Soto, Alvarez, Munera et al. (2000) were rated as "good" quality.
Three sets of findings from 2 studies, all from the same principal author (Soto, Velez, Guzman et al., 1999 and Soto, Alvarez, Munera et al., 2000), used ERCP images as the reference standard. Soto, Alvarez, Munera et al. (2000, n=51), which used no biliary contrast, showed poor concordance with ERCP (sensitivity 65 percent and 84 percent specificity). The other two sets of findings (Soto, Velez, Guzman et al., 1999, n=29 and Soto, Alvarez, Munera et al., 2000, n=51), found higher concordance with ERCP when using oral biliary contrast (sensitivities and specificities both greater than 90 percent).
Two studies (Ishikawa, Tagami, Toyota et al., 2000, n=45 and Polkowski, Palucki, Regula et al., 1999, n=50) examined CTC with IV biliary contrast, and both studies used methods where ERCP findings were confirmed. In both studies ERCP was more sensitive and specific than CTC (Ishikawa, Tagami, Toyota et al., 2000, ERCP 100 percent sensitivity, 100 percent specificity, CTC 71 percent sensitivity, 95 percent specificity; Polkowski, Palucki, Regula et al., 1999, ERCP 91 percent sensitivity, 100 percent specificity, CTC 85 percent sensitivity, 88 percent specificity).
Finally, the two studies that use ERCP sphincterotomy results as the reference standard (Jimenez Cuenca, del Olmo Martinez, Perez Homs et al., 2001, n=40 and Neitlich, Topazian, Smith et al., 1997, n=51) showed sensitivities of 80 percent and 88 percent, respectively, and specificities of 100 percent and 97 percent. A direct comparison to ERCP cannot be done with these data, but these sensitivities are lower than generally has been shown for ERCP.
In conclusion, most studies show a fair concordance with ERCP diagnosis of common bile duct stones, but in studies which allow a determination of which test is superior ERCP seems to have better sensitivity and specificity. However, no estimate of the magnitude of this superiority can be made from this evidence.
Conclusion
The evidence about the relative performance of EUS compared to ERCP is the strongest, because most of the studies used reference standards which allowed inferences regarding comparative performance. With some studies showing EUS is better, and other studies showing ERCP is better, and no remarkable outlying results, the weight of the evidence suggest that EUS is similar to ERCP in detecting common bile duct stones.
MRCP has a concordance with ERCP that results in sensitivities and specificities greater than 90 percent in most studies when using ERCP as a reference standard. Along with evidence limited to one study regarding comparative performance of MRCP and ERCP, MRCP may be slightly worse than ERCP in detecting common bile duct stones.
CTC also has reasonable concordance with ERCP, but the range of sensitivities and specificities is lower, with sensitivities dipping down to the 80 percent level in some studies. Again with evidence limited to only 2 small studies on the relative performance of CTC to ERCP, it appears that CTC is not as good as ERCP in detecting common bile duct stones.
Although some tests may not perform quite as well as ERCP, the role of these tests in the management of patients with suspected common bile duct stones cannot be determined strictly by an examination of their test characteristics. The costs and risks of the tests, and the costs and risks of actions based on their results, along with the pretest probability of stone needs to be taken into account to determine the optimal strategy that most efficiently treats patients with suspected common duct stones.
Part I, Section 2: Outcomes of Treatment Using ERCP for Common Bile Duct Stones - Comparison of Strategies Using ERCP, Surgery, or Medical Management
Introduction
ERCP can both provide diagnosis and treatment of common bile duct stones in one session in a less-invasive manner than an open surgical procedure. Commonly performed in conjunction with cholecystectomy, it could be performed before or after or, rarely, during surgery. However, there are risks from the procedure and it may not be successful at removing the common bile duct stones. Common bile duct exploration was the traditional surgical treatment to remove stones. This used to be performed with an open surgical incision. Then laparoscopic cholecystectomy became a common operation, and in order to avoid an open incision, ERCP was used in the diagnosis and removal of common duct stones. Recently, laparoscopic methods of exploring the common bile duct and removing stones have evolved, making for even more varied potential treatment options.
In order to appropriately evaluate ERCP treatment strategies, studies must properly account for the patients throughout the diagnostic and treatment process, including additional procedures needed for failed initial procedures. Alternatively, studies can assess outcomes through identical stages of the diagnostic or treatment process. Complication rates in and of themselves may not be fair measures of outcomes between treatment strategies if the baseline morbidity of procedures (e.g., open common bile duct exploration versus ERCP common duct stone extraction) are very different. Ideally, a measure of morbidity that could fairly assess both the quantity of procedures and total morbidity endured during each procedure would be a fair comparison between treatment strategies.
Evidence Base
For the purposes of this evidence review, the literature remaining after selection criteria were applied was very thin and spread out over many different research questions. Generally, there was only one or at most, two, studies on a specific comparison of interest. Study quality assessment is outlined in Table 6.
Review of Evidence: ERCP with Laparoscopic Cholecystectomy to Remove Common Bile Duct Stones
Three randomized controlled trials enrolling a total of 289 patients compared alternative strategies for removal of common bile duct stones in patients undergoing laparoscopic cholecystectomy (Tables 7-9). Although all 3 trials were judged to be of good quality, the evidence is limited because there is only a single study addressing each comparison of interest. Each trial reported on a different comparison, with respect to both the procedures compared and the patient population selected.
Overall, both arms in each of these 3 studies reported similar rates of stone clearance and morbidity, although morbidity was not well defined in two of these trials (Chang, Lo, Stabile et al., 2000; Rhodes, Sussman, Cohen et al., 1998). Thus, the main outcome of interest is relative resource utilization for each pair of alternative strategies for stone removal.
Mandatory Preoperative ERCP versus Selective Postoperative ERCP
Chang, Lo, Stabile et al. (2000) randomized 59 patients undergoing cholecystectomy during recovery from acute gallstone pancreatitis. Selective postoperative ERCP was based on findings from intraoperative cholangiogram. Resource utilization was lower in the selective postoperative ERCP group as measured by mean total hospital stay (9.0 vs. 11.7 days, p=0.04), and total costs ($8,586 vs. $10,210, p=0.049)
Preoperative ERCP versus intraoperative cholangiogram and laparoscopic common bile duct exploration (LCBDE)
Cuschieri, Lezoche, Morino et al. (1999) randomized 300 patients undergoing laparoscopic cholecystectomy who had suspected common bile duct stones. In one treatment arm, preoperative ERCP was performed, and sphincterotomy and stone removal was attempted if stones were detected. In the other treatment arm, LCBDE was performed if stones were detected on intraoperative cholangiogram. Mean hospital stay was reduced in the LCBDE treatment group (6 versus 9 days, p<0.05).
LCBDE versus Postoperative ERCP
Rhodes, Sussman, Cohen et al. (1998) randomized 80 patients with common bile duct stones found on intraoperative cholangiography during laparoscopic cholecystectomy. The hospital stay was reduced in the LCBDE group (median days, 1 vs. 3.5, p<0.01)
Summary
There is insufficient evidence determine whether there is an optimal strategy for common bile duct stone removal in patients undergoing cholecystectomy. The available evidence suggests that resource utilization is lower when:
- selective postoperative ERCP is performed, as compared to routine ERCP prior to cholecystectomy; and
- when laparoscopic common bile duct exploration is performed during laparoscopic cholecystectomy, as compared to adjunctive pre- or postoperative ERCP.
However, since success and complications of ERCP and laparoscopic cholecystectomy with LCBDE may be operator dependent, findings may not be generalizable across clinical settings. The availability of expertise in LCBDE may be limited at present.
Review of Evidence: ERCP Sphincterotomy alone versus Definitive Surgery for suspected common duct stones
Patients at High Surgical Risk
One randomized, controlled trial (Targarona, Ayuso, Bordas et al., 1996) and an observational study derived from the Targarona trial (Trias, Targarona, Ros et al., 1997) addressed whether removal of common duct stones with endoscopic sphincterotomy alone has lower morbidity and mortality than approaches which also remove the gall bladder during initial treatment (Table 10 and Table 11). The population of interest is patients at high surgical risk if subjected to cholecystectomy. For patients at high surgical risk, there may be advantages to a nonsurgical approach for removing common duct stones during acute symptomatic episodes. However, there may be differences in long term outcome if the gall bladder is not removed. Study quality was judged to be "Good" for the Targarona, Ayuso, Bordas et al. (1996) trial, and "Fair" for the Trias, Targarona, Ros et al. (1997) study.
The Targarona and Trias studies included high-risk surgical candidates based on age, cardiac risk, and pulmonary disease. The technique used in the Targarona, Ayuso, Bordas et al. (1996) study may not be representative of current surgical practice as the investigators performed open cholecystectomy for the definitive surgery arm; only the observational study by Trias, Targarona, Ros et al. (1997) used laparoscopic cholecystectomy.
Targarona, Ayuso, Bordas et al. (1996; n=98) found that both groups had similar short-term treatment failure, mortality, and morbidity, but initial postoperative length of stay favored endoscopic sphincterotomy alone (5 versus 11 days, p<0.001). However, over the longer term, the cholecystectomy patients had fewer biliary complications (6 percent versus 21 percent, p=0.04) and fewer readmissions (4 percent versus 23 percent, p<0.01). Eventually, 15 percent of patients in the sphincterotomy group underwent cholecystectomy.
Trias and colleagues performed laparoscopic cholecystectomy with preoperative ERCP as needed in 60 high-risk patients, and compared outcomes the to endoscopic sphincterotomy arm of the Targarona, Ayuso, Bordas et al. (1996) trial. Short-term and long-term results were similar to the Targarona trial, but initial hospital length of stay no longer favored the endoscopic sphincterotomy group when compared to laparoscopic, rather than open, cholecystectomy.
Patients Not at High Surgical Risk
One randomized controlled trial by Hammarstrom, Holmin, Stridbeck et al. (1995) enrolled 80 patients with intact gallbladders diagnosed with common bile duct stones on ERCP (Table 12). Patients either received sphincterotomy alone or open cholecystectomy and common bile duct exploration. Patients were followed for 5 years.
The study does not coherently define and compare outcomes between treatment groups for the most part; rather, various post-procedure events are unsystematically enumerated, making it difficult to tabulate any overall sense of outcomes. Total hospital stay (short term and follow up stays) was compared between the groups and was not statistically significantly different (median stay, 13 days sphincterotomy, 16 days surgery, p=ns). Of patients who received sphincterotomy, 13 were subsequently treated with cholecystectomy, 4 urgently because of acute cholecystitis. The authors also noted that the death rate from non-biliary related causes was higher in the endoscopic sphincterotomy group (30 percent vs. 10 percent, p=0.02). The authors conclude that the two alternatives are equally effective in the long term, but that due to the difference in heart disease mortality surgery might be the better option.
Summary
The very limited available evidence shows that definitive treatment prevents long term recurrence of biliary symptoms, hospitalization, and need for further treatment. In high-risk patients as defined in these studies, definitive treatment can be performed with acceptable short term morbidity and equivalent mortality as sphincterotomy alone. Not all patients develop recurrent problems, so the choice of definitive treatment versus sphincterotomy alone involves the weighing of short term morbidity of treatment, be it sphincterotomy alone, open or laparoscopic surgery, against the probability of recurrent biliary symptoms.
Review of Evidence: ERCP versus surgery for patients with acute cholangitis
Two studies compared of ERCP treatment to open surgery for patients with acute cholangitis due to common bile duct stones (Table 13 and Table 14). Lai, Mok, Tan et al. (1992) randomized 82 patients diagnosed with common bile duct stones by ERCP to endoscopic nasobiliary drainage or open common bile duct exploration. This study is from Hong Kong, where oriental cholangiohepatitis is a common cause of common duct stones, and may not generalize to populations with a different spectrum of disease. Leese, Neoptolemos, Baker et al. (1986) conducted a retrospective review of 43 patients treated with endoscopic sphincterotomy to 28 contemporaneous patients undergoing surgical decompression for relief of cholangitis.
The Leese, Neoptolemos, Baker et al. (1986) study was judged to be of poor quality due to imbalance of patient characteristics between groups.
Acute severe cholangitis is a condition of very high mortality, thus the important outcome is to reduce the acute mortality rate. Both studies show that short-term mortality from acute cholangitis is lower in the ERCP-treated group compared to open surgery. Lai, Mok, Tan et al. (1992) reported lower hospital mortality (10 percent versus 32 percent, p<0.05) in the group treated with endoscopic nasobiliary drainage. Despite prognostic factors favoring the open surgery group, Leese, Neoptolemos, Baker et al. (1986) found that mortality at 30 days was lower in the endoscopic sphincterotomy group (5 percent versus 21 percent, p<0.02).
Review of Evidence: Endoscopic lithotripsy vs. extracorporeal shock wave lithotripsy (ESWL) in stones not removable with standard endoscopic techniques
Two studies compared endoscopic lithotripsy techniques to extracorporeal shock wave lithotripsy (ESWL) in removing common bile duct stones that cannot be removed with standard endoscopic techniques (which includes mechanical lithotripsy) (Neuhaus, Zillinger, Born et al., 1998 and Adamek, Maier, Jakobs et al., 1996; Table 15 and Table 16). In these studies, successful removal of stones is the important outcome.
Neuhaus, Zillinger, Born et al. (1998) randomized 60 patients to ESWL or intracorporeal laser lithotripsy. Adamek, Maier, Jakobs et al. (1996) performed an observational comparison between ESWL (n=79) and intracorporeal electrohydraulic lithotripsy (n=46).
Neuhaus, Zillinger, Born et al. (1998), found that intracorporeal laser lithotripsy was more successful than ESWL in clearing the bile duct of stones (97 percent versus 73 percent, p<0.05). Adamek, Maier, Jakobs et al. (1996) found no significant difference between ESWL and electrohydrolic lithotripsy.
Review of Evidence: Endoscopic balloon dilation versus endoscopic sphincterotomy
Two randomized controlled trials (Bergman, Rauws, Fockens et al., 1997 and Ochi, Mukawa, Kiyosawa et al., 1999) compared endoscopic balloon dilation to endoscopic sphincterotomy for removal of common bile duct stones in a total of 312 patients (Table 17). Study quality was judged as "Good" for both trials.
Concern about possible long term effects of sphincterotomy on biliary function, plus concern about hemorrhage induced by sphincterotomy have led to consideration of dilation of the biliary sphincter as an alternative method to remove common bile duct stones. Dilation would potentially preserve the function of the biliary sphincter. However, concern has been raised that pancreatitis may occur more often as a complication after balloon dilation.
However, neither study assesses long term outcomes, so the only outcomes that can be assessed are success in removing common bile duct stones and early complications. Both studies found that although balloon dilation ultimately produces equivalent stone removal rates (Bergman, Rauws, Fockens et al., 1997, balloon 89 percent success, sphincterotomy 91 percent success; Ochi, Mukawa, Kiyosawa et al., 1999, balloon 93 percent success, sphincterotomy 98 percent). Some patients in the balloon treatment arm must either cross over or be subject to additional procedures such as mechanical lithotripsy to compensate for the lower initial success rate. Early complications and follow-up complications were not statistically significantly different in the Bergman, Rauws, Fockens et al. (1997) study. In the Ochi, Mukawa, Kiyosawa et al. (1999) study, early complications were not statistically different. Late complications were reported (balloon 4 percent, sphincterotomy 15 percent), but statistical significance tests were not reported.
DiSario, Freeman, Bjorkman et al., (1998) also completed a randomized controlled trial comparing balloon dilation to sphincterotomy, but this trial had only been reported in abstract form in 1998. The results of this study are summarized here because it is commonly cited in reviews and the findings on post-procedure pancreatitis are striking. In this randomized controlled trial of 240 patients, stone clearance was achieved in 99 percent of patients. However, morbidity occurred in 15 percent of balloon dilation patients and 4 percent of sphincterotomy patients (p=0.014) Most of the morbidity in the dilation group was due to moderate or severe pancreatitis which occurred in 4 patients and resulted in 2 deaths.
Review of Evidence: Needle-knife fistulotomy versus needle-knife precut papillotomy for the treatment of common bile duct stones in patients with difficult cannulations
Mavrogiannis, Liatsos, Romanos et al. (1999) performed a randomized, controlled trial (n=153) comparing two precutting techniques for cannulating the common bile duct when difficulty is encountered when trying to cannulate the common bile duct. (Table 18). Needle-knife fistulotomy (NKF) has been proposed as a safer method of precutting than traditional needle-knife precut papillotomy (NKPP), with the potential disadvantage of a smaller opening into the bile duct which may prevent successful stone removal.
Overall success in cannulating the common bile duct (after second attempts) was equivalent between the two techniques (NKF 91 percent, NKPP 89 percent, p=n.s.) Stone removal without use of lithotripsy was greater for NKPP than for NKF (98 percent versus 83 percent), but final stone removal rates were 100 percent for both groups. Overall complications were not statistically significantly different (NKF 11 percent, NKPP 15 percent, p=n.s.), but NKPP had a greater pancreatitis rate (7.6 percent versus 0 percent, p<0.05) and a higher rate of hyperamylasemia (17.7 percent versus 2.7 percent, p<0.01). Both methods appear to be similar in the management of patients with common bile duct stones.
Review of Evidence: Endoscopic biliary endoprosthesis versus endoscopic sphincterotomy and stone extraction for common bile duct stones in high risk patients
One randomized study (Chopra, Peters, O'Toole, et al., 1996) compared biliary endoprosthesis placement to conventional endoscopic sphincterotomy and stone extraction for patients with common duct stones who were at high risk because of old age or serious debilitating disease. It was theorized that placement of the endoprosthesis might successfully prevent biliary complications with lower short term morbidity than endoscopic sphincterotomy.
Early complications arising within 72 hours after the procedure were 3/43 in the endoprosthesis group and 7/43 in the endoscopic sphincterotomy group (p=0.18). Among the 82 patients followed long term for a median of 16 to 20 months, 9 patients in the endoprosthesis group had 11 episodes of cholangitis, and 6 patients in the endoscopic sphincterotomy group developed cholangitis. Overall, a higher proportion of the sphincterotomy group (86 percent) remained free of biliary complications at 20 months than the endoprosthesis group (64%, p=0.03). Thus although endoprosthesis placement is as effective and safe as sphincterotomy over the short term, complications and cholangitis are higher over the long term.
Conclusion
Overall, a very thin literature spread out over many different comparisons of interest prevents strong conclusions about any specific treatment comparison. Keeping in mind this thin literature base, the available evidence suggests that:
- Laparoscopic common bile duct exploration may be better than ERCP strategies to manage cholecystectomy patients with the least resource use.
- Definitive surgery prevents long term complications at acceptable short-term morbidity when compared to sphincterotomy alone in high-risk surgical patients.
- Endoscopic treatment of acute cholangitis reduces short-term mortality when compared to emergency surgery.
- Limited evidence suggests that intracorporeal and extracorporeal lithotripsy methods show similar outcomes in removing large common bile duct stones.
- Limited evidence suggests similar stone removal rates and short-term complications when comparing balloon dilation and sphincterotomy.
- Limited evidence suggests similar stone removal rates and complications when comparing needle-knife fistulotomy to needle-knife precut papillotomy.
- Limited evidence suggests that endoscopic sphincterotomy and duct stone clearance is more effective than biliary endoprosthetic placement for prevention of long term complications in patients considered to be high surgical risks.
Part I, Section 3: Diagnostic Value of Individual Risk Factors or Predictive Models for Assessing the Likelihood of Having a Common Bile Duct Stone
Introduction
In trying to determine optimum diagnostic and treatment strategies, many investigators have analyzed individual risk factors and combinations of risk factors that may predict the presence or absence of common bile duct stones. With information about the probability of a common bile duct stone, it may be possible to design a diagnostic and treatment strategy that minimizes patient morbidity and/or minimizes medical resource utilization.
The data reviewed here cannot be directly translated into optimum diagnostic and treatment strategies because there are many possible strategies, given the variety of methods possible to diagnose common bile duct stones (ERCP, MRCP, endoscopic ultrasound, intraoperative cholangiogram) and treat them (preoperative ERCP, laparoscopic common bile duct exploration, postoperative ERCP, expectant management).
However, a few simple principles surface. From the perspective of the individual patient, the probability of a common duct stone is the key factor in determining which approach may be best. If the preoperative probability of a common bile duct stone is high enough, ERCP tends to become efficient and effective because both diagnosis and therapy can be carried out in a single procedure in one setting. If the preoperative probability of a common duct stone is low enough, then it may be possible to avoid any diagnostic procedure to diagnose common duct stones and rely on expectant postoperative management with ERCP to manage any stones that were missed. In the middle range of probability, use of diagnostic tests such as EUS, MRCP, or intraoperative cholangiogram may be efficient methods to treat patients.
All the risk factors or decision rules evaluated in this section have potentially variable cutoff thresholds, so that sensitivity or specificity can be manipulated with the expected trade-offs to produce a particular positive or negative predictive value. However, at a particular cutoff point that produces the desired predictive value, a superior risk factor or decision rule will have higher sensitivities and specificities than other decision rules, and thus better performance in discriminating between those patients who do and do not have stones.
For example, suppose that a probability of stone of 60 percent or greater makes preoperative ERCP the optimum strategy for that particular patient. For example, risk factor A at a particular cutoff produces a positive predictive value of 60 percent, and risk factor B at a particular cutoff point also produces a positive predictive value of 60 percent in the same population. However, risk factor A only identifies 40 percent of the patients with stones at that cutoff (40 percent sensitive), and risk factor B identifies 80 percent of the patients with stones at that cutoff (80 percent sensitivity). Thus, using risk factor B, 80 percent of the patients with stones can be managed by a strategy which requires a 60 percent probability of stone to be optimal.
In sum, then, given that the particular cutoff threshold can be varied to meet desired criteria, then the exact sensitivity and specificity calculated in any single study is not important. The critical factor differentiating any of these risk factors or decision rules is the capability to have both the highest sensitivity and specificity, or in the parlance of diagnostic decision-making, the best receiver-operator characteristic (ROC). Then the cutoff point can be defined that produces the sensitivities and specificities that result in the desired positive predictive value. The studies reviewed here did not in general calculate ROC curves. A risk factor or decision rule with both high sensitivity and specificity would have the best ROC.
Evidence Base
A total of 13 studies with a total of 7,409 patients contributed to the findings reported here. Most studies reported on several of the individual risk factors, some reported on individual risk factors and a multivariate risk prediction model.
Review of Evidence: Univariate Risk Factors for Common Bile Duct Stones
The single risk factors commonly examined in studies included clinical jaundice or elevated bilirubin, liver function tests, and ultrasound findings of a dilated common bile duct. Studies varied in the definitions and cutoff thresholds for the various tests
Five studies (total n=2,661) reported on clinical jaundice as a risk factor (Table 19). Positive predictive values ranged from 29 percent to 86 percent, sensitivity from 24 percent to 56 percent, and specificity from 87 percent to 99 percent. Clinical jaundice does not have an exact threshold cutoff value, nor is the reliability of measurement certain. In general, though, sensitivities are low, specificities are higher, and in the situation of a low prevalence condition such as common bile duct stones, the high specificity drives the predictive values to be high.
Six studies (total n=2369) reported on bilirubin levels. At varying cutoff levels, positive predictive values ranged from 42 percent to 95 percent, sensitivity from 31 percent to 56 percent, and specificity from 48 percent to 99 percent. In general, sensitivities were low, specificities higher, and the resulting positive predictive values are reasonably high.
Eight studies (total n=3,551) reported on various liver function tests (Table 20). Some studies examined more than 1 cutoff level. There was a broad range of predictive values, sensitivities and specificities for all the different liver function tests examined. In general, the trade off between sensitivity and specificity can be noted in all the studies. The studies with cutoff values that produce high specificity tend to have low sensitivity, but this type of cutoff produces the highest positive predictive values.
Ten studies (total n=4,321) reported on the finding of a dilated common bile duct seen on ultrasound (Table 21). The threshold for a dilated duct varied from 5 to 10 mm, and was undefined in a few studies. Predictive values ranged from 28 percent to 91 percent, sensitivities from 28 percent to 94 percent, and specificities from 72 percent to 98 percent. Studies with high sensitivity tend to have low specificity, and vice versa.
In sum, although all the previously mentioned single risk factors for common duct stones have significant associations with the presence of stones, none of them have outstanding ROC characteristics. The presence of any of these factors certainly increases the probability of the presence of a common bile duct stone, possibly high enough to change clinical decision-making. However, changing the cutoff value to increase the positive predictive value (by increasing the specificity) usually results in poor sensitivity.
Review of Evidence: Multivariable Predictors for Common Bile Duct Stones
Four studies (total n=1,461) examined the use of multiple risk factors for prediction of the presence of common bile duct stones (Table 22). Many studies that simply used the criterion of "any one risk factor" as a prediction rule were not included in this evidence review, as such a criterion has been used for many years to select patients for ERCP and has a known poor specificity and low positive predictive value.
The four studies varied in the analytic technique used to develop the prediction rule. Hawasli, Lloyd, Pozios et al. (1993) did not use any quantitative technique but defined combinations of risk factors to classify patients at high risk of stones. Menezes, Marson, Debeaux et al. (2000) developed a logistic model based on age, sex, jaundice, presence of cholangitis, liver function tests, and ultrasound examination of the common bile duct. Trondsen, Edwin, Reiertsen et al. (1995) used a discriminant analysis technique based on age, bilirubin, alanine aminotransferase, and gamma glutamyltransferase. In Trondsen, Edwin, Reiertsen et al. (1998), a new rule was not developed, but the previously developed discriminant analysis rule was prospectively validated in a new population of patients.
Thus, except for Trondsen, Edwin, Reiertsen et al. (1998), the findings of the three other studies should be viewed as optimistic estimates of stone prediction, since the performance of the rules was only evaluated on the set of patients used to develop the rule.
All the studies produced decision rules in which both the sensitivity and specificity were greater than 80 percent. However, these findings should be viewed cautiously, since there has been no independent validation. The prospective validation study by Trondsen, Edwin, Reiertsen et al. (1998) is a particularly strong finding, since the rule was derived from an independent population -- the sensitivity was 94 percent and the specificity was 88 percent in an independent set of patients. The discriminant function cutoff could be varied to increase sensitivity at the expense of specificity or vice-versa, but since both are high the actual discriminative capability of the rule compared to individual risk factors was far superior.
In conclusion, multivariable modeling of risk factors for prediction of common duct stones shows promise as a method of triage for determining appropriate treatments, given that they appear to have superior discriminatory power. These prediction models have yet to be integrated into clinical decision models to determine optimal cutoffs.
Review of Evidence: Absence of Any Risk Factor as A Predictor of Common Bile Duct Stone Absence
Seven studies (total n=599) examined the prediction of absence of common duct stones (Table 23). Usually, the absence of any of the known risk factors (all the individual factors reviewed previously) was used as the indicator. Trondsen, Edwin, Reiertsen et al. (1995) and Trondsen, Edwin, Reiertsen et al. (1998) reviewed previously, are also included here because the discriminant function used to predict stones can also be used to predict the absence of stone.
If the prevalence of stone is low enough in some patients, then some clinicians might avoid use of any diagnostic test to diagnose common duct stones. Such a case would be very compelling if the probability of stone is in the same range or lower as it is in the case of a negative ERCP examination. Although ERCP is selectively performed on patients with higher risk of common duct stones, if physicians are willing to believe a negative ERCP, they should be willing to believe a prediction rule if the probabilities of stones are equally low.
The seven studies reported a probability of common duct stones in those predicted not to have stones between a range of 0.25 percent to 7 percent. In all studies, a reasonable sensitivity for stone-free patients was shown, from 60 percent to 98 percent, and reasonable specificity, 60 percent to 96 percent. Thus, the decision rules all can identify more than half of the patients that do not have stones.
The strongest finding is Trondsen, Edwin, Reiertsen et al. (1998), in which the same discriminant function which identifies stones can rule out stones with both high sensitivity (88 percent) and specificity (94 percent). This study is also a validation study of an independently developed discriminant function, which further increases its validity.
These probabilities of stones compare quite favorably to the probabilities of stones in patients having a negative ERCP. If the probability is calculated, using the equation "1-NPV" and some of the reported NPVs of the ERCP studies in the section of this report comparing ERCP to EUS, a range of stone probabilities is calculated from 0 percent to 17 percent.
In conclusion, the absence of any risk factors for stones (or a discriminant function indicating absence of stone) is a very strong predictor of the absence of stones, producing probabilities of stones that are in the same range as a negative ERCP exam in a patient with risk factors for stones.
Conclusions
The probability of a common duct stone is the key factor to determining diagnostic and treatment strategies. When preoperative probability of a common bile duct stone is high enough, ERCP may be preferred because diagnosis and therapy can be carried out in a single procedure. If the preoperative probability of a common duct stone is low enough, then expectant management may be preferred in order to avoid unnecessary procedures. In the middle range of probability, use of diagnostic tests such as EUS, MRCP, or intraoperative cholangiogram may be used to further discriminate patients with high or low probability of common bile duct stones.
Thirteen studies with a total patient population of 7,409 patients that reported multiple findings of sensitivities and specificities of a single or combination of risk factors to predict the presence of common bile duct stones were reviewed.
The single risk factors most commonly assessed were clinical jaundice or elevated bilirubin, liver function tests, and ultrasound findings of a dilated common bile duct. All have significant associations with the presence of common duct stones, but none have both high sensitivity and specificity.
Four studies tested prediction rules based on combinations of risk factors for the presence of stones. All the studies produced decision rules in which both the sensitivity and specificity were greater than 80 percent. These findings must be viewed cautiously, since only one study was a validation of an independently developed prediction rule. Presently, multivariable modeling of risk factors for prediction of common duct stones is a promising approach.
The absence of any risk factors for stones (or a discriminant function indicating absence of stone) is a very strong predictor of the absence of stones, producing probabilities of stones that are in the same range as a negative ERCP exam in a patient with risk factors for stones (0 percent to 17 percent).
Results and Conclusions, Part II: Pancreaticobiliary Malignancy
This chapter reviews evidence on the following questions:
In patients with known or suspected pancreaticobiliary malignancy,
a. What is the diagnostic performance of ERCP tissue sampling techniques, in establishing a tissue biopsy diagnosis of pancreaticobiliary malignancy in comparison to each other or alternative nonsurgical tissue sampling techniques (e.g., endoscopic ultrasound-guided fine-needle aspiration (FNA) or percutaneous FNA)? (Section 1: Diagnostic Performance of Nonsurgical Tissue Sampling Techniques in Pancreaticobiliary Malignancy - Comparison of Strategies Using ERCP, EUS, or Percutaneous Approach)
b. What is the diagnostic performance of ERCP, in diagnosing the presence of malignant pancreaticobiliary obstruction in comparison to other imaging alternatives (e.g., EUS or MRCP)? (Section 2: Diagnostic Performance of ERCP in Pancreaticobiliary Malignant Obstruction - Comparison To Alternatives)
c. What are the outcomes of treatment using ERCP strategies to treat malignant pancreaticobiliary obstruction compared to using surgical or interventional radiology treatment? (Section 3: Outcomes of Treatment Using ERCP for Palliation of Pancreaticobiliary Malignancy - Comparison of Strategies Using ERCP, Surgery, or Interventional Radiology; A. Comparison of ERCP stent versus Surgical Bypass; B. Comparison of Metal vs. Plastic stents During ERCP; C. Additional Comparisons of ERCP Strategies)
(Section 4: Outcomes of Treatment Using Preoperative ERCP Drainage for Relief of Malignant Obstructive Jaundice)
Part II, Section 1: Diagnostic Performance of Nonsurgical Tissue Sampling Techniques in Pancreaticobiliary Malignancy -- Comparison of Strategies Using ERCP, EUS, or Percutaneous Approach
Introduction
When a malignant cause is suspected for biliary obstruction, preoperative tissue confirmation of malignancy may be helpful in guiding management decisions. Nonsurgical tissue sampling methods include endoscopic and percutaneous approaches. Cytologic assessment can be performed on endoscopically acquired specimens such as aspirated biliary or pancreatic fluid, wire brushing specimens, or fine-needle aspiration (FNA) specimens. FNA specimens can be obtained during ERCP, EUS, or through a percutaneous approach using imaging guidance. Endoscopic tissue biopsy can be performed during ERCP with a forceps device.
The goal of tissue sampling techniques is to provide sufficient cellular material to make an accurate pathologic diagnosis. Theoretically, increasing the numbers of samples and/or the types of samples might yield more cellular tissue for assessment and might improve diagnostic accuracy, but the extent to which combinations of different sampling techniques increase the diagnostic accuracy is still being investigated (Lee and Leung 1998).
It is outside the scope of this systematic review to determine whether biliary versus pancreatic location of sampling is related to differences in diagnostic performance of sampling techniques. A recent review summarized the diagnostic sensitivity of brush cytology for detection of pancreatic cancer (Lee and Leung 1998). In a total sample of 362 patients who had pancreatic cancer, brush cytology samples diagnosed 55% of cases with a range among studies of 0-85%. When the subset of 190 brush cytology samples taken from the pancreatic duct was analyzed separately, 66% of pancreatic cancers were detected. The few studies using blinded readings reported a lower range of sensitivity (0-40%).
Cytology findings may be interpreted as definite malignancy or may be reported according to the degree of atypia. The sensitivity and specificity of cytology will be dependent on where the criterion is set for calling the test positive. Using a strict criterion where only definite malignancy is counted as positive will achieve the highest specificity, but the associated sensitivity will usually be the lowest. Likewise, considering any degree of atypia as a positive test will increase the test's sensitivity, but the specificity will generally be reduced.
This systematic review selected studies comparing the diagnostic performance of at least 2 of the available nonsurgical tissue sampling techniques in patients with pancreaticobiliary malignancy. Comparative studies including at least one ERCP tissue sampling technique compared to an alternative technique were the primary focus defined prospectively in the systematic review protocol. None of the studies identified with this set of selection criteria included any comparison of ERCP tissue techniques and EUS sampling techniques. Upon discussion of this result with the Technical Advisory Group, a supplementary request was made to review single arm studies reporting the diagnostic performance of endoscopic ultrasound (EUS) fine-needle aspiration (FNA). Studies included in this secondary analysis were not selected using a formalized systematic review, but were identified by manually searching for recent reports on EUS-FNA and carefully reviewing prior articles referenced in these studies to identify additional studies.
Evidence Base
Twelve studies comparing at least two tissue sampling techniques were identified in this systematic review. Quality ratings are displayed in Table 24. Five of these studies were rated as "Good" quality, signifying the use of blinded interpretation of test results. Only three studies include over 100 patients, and six studies include less than 50 subjects.
There is considerable variation in reported estimates of sensitivity for each tissue sampling technique, and comparison of results for the same technique across studies may be limited due to differences in populations with regard to distribution of tumor types as well as differences in tissue sampling technique and interpretation methods. To minimize this problem, this analysis will focus primarily on within-study comparisons of the relative sensitivity of alternative sampling techniques. However, this problem is not completely avoided because the selected comparative studies frequently reported diagnostic performance for individual sampling techniques being compared on a different number of patients and thus slight differences in the population characteristics may be present.
Given that the expected difference in diagnostic performance between tissue sampling techniques and the diagnostic alternatives reported here are frequently relatively small and the number of cases with the outcome of interest is generally small, these studies may have limited power to detect statistically significant differences in test performance. Only 4 of 12 studies (Jaiwala, Fogel, Sherman et al., 2000; Sugiyama, Atomi, Wada et al., 1996; Ponchon, Gagnon, Berger et al., 1995; Kurzawinski, Deery, Dooley et al., 1993) actually reported any statistical comparisons, and all of these only reported chi square comparisons of sensitivity.
The specificity estimates for cytology techniques reported in these studies were generally close to 100%, though Jaiwala, Fogel, Sherman et al. (2000; n=133) found that specificity fell to 90% when any atypia was considered equivalent to malignancy.
The nonsurgical tissue sampling techniques being evaluated in these studies are measured against a reference standard incorporating the best available information from surgical findings, surgical or nonsurgical pathology, autopsy, imaging follow-up, and clinical follow-up.
Review of Evidence: Diagnostic Performance
Bile Aspiration Cytology Compared to Brush Cytology
Five studies (total n=approximately 178), including 3 with "Good" quality, (Kurzawinski, Deery, Dooley et al., 1993; de Peralta-Venturina, Wong, Purslow et al., 1996; Foutch et al. 1991; Mansfield et al. 1997; Sugiyama, Atomi, Wada et al., 1996) provided comparisons between bile cytology and brush cytology for biliary strictures (Table 25 and Table 26). In each comparison, brush cytology provided higher sensitivity than bile aspirate cytology, although only one study reported a statistical assessment. The absolute increase in sensitivity ranged from 16 to 50%. Reported range of bile cytology sensitivity was 6-50% and that for brush cytology was 33-100%.
Two studies reported comparative data for tissue sampling using an ERC approach versus a percutaneous transhepatic cholangiographic (PTC) approach. de Peralta-Venturina, Wong, Purslow et al. (1996) noted lower sensitivity with PTC compared with ERC, 43 versus 100%. Kurzawinski, Deery, Dooley et al. (1993) observed similar sensitivity for brush cytology techniques using either approach and possibly lower sensitivity for bile aspirates with PTC.
In sum, the available studies are relatively small and most are limited by lack of statistical analysis but do provide suggestive evidence that brush cytology is more sensitive than bile aspiration cytology.
Brush Cytology Compared to FNA Cytology
Three studies (total n=approximately 193), all rated "Fair" (Jaiwala, Fogel, Sherman et al., 2000; Howell, Beveridge, Bosco et al., 1992; Ferrari, Lichtenstein, Slivka et al., 1994) compare brush cytology with FNA cytology (Table 27 and Table 28). The first two studies use ERCP to obtain both the FNA specimen and the brush cytology specimens while Ferrari, Lichtenstein, Slivka et al. (1994) compares ERCP brush cytology with percutaneous CT-guided FNA. The largest study, (Jaiwala, Fogel, Sherman et al., 2000, n=133) reports similar sensitivity for FNA and for brush cytology and the combination of both techniques increased overall sensitivity by about 9%. This difference was not statistically significant in 2 of 3 comparisons and was found significant (p<0.05) only when high-grade atypia was considered equivalent to malignancy.
The study by Howell, Beveridge, Bosco et al. (1992, n=31) notes a higher sensitivity for FNA than for brush cytology (62% vs. 8%) but the combination of both techniques only yielded a slight increase to 65% sensitivity. Ferrari, Lichtenstein, Slivka et al. (1994, n=29 with FNA and 70 for brush cytology) found percutaneous CT-guided FNA to be more sensitive than brush cytology (91% versus 56%) but the large difference in sample sizes makes direct comparison limited. Furthermore, the small size and lack of statistical analysis of these two studies limits the interpretation of these findings.
Among these studies, the findings of Jaiwala, Fogel, Sherman et al. (2000) provide the more reliable information and suggest that brush cytology and ERCP-FNA may be similar in sensitivity. When used together, the available evidence does not demonstrate a statistically significant increase in sensitivity.
Forceps Biopsy Sampling Compared to Brush Cytology
Six studies (total n=approximately 437), including the 3 largest studies and 3 "Good" quality studies, compared forceps biopsy sampling to brush cytology (Tables 25-28). Gmelin and Weiss (1981) exclusively studied papillary tumors and found an increase in sensitivity of about 30% using forceps biopsy over brush cytology (86% versus 55%), but statistical analysis was not reported. Sugiyama, Atomi, Wada et al. (1996) specifically excluded papillary tumors and also found a large increase in sensitivity with forceps biopsy, 81% versus 48%, p<0.05. The remaining studies (Jaiwala, Fogel, Sherman et al., 2000; Ponchon, Gagnon, Berger et al., 1995; Schoefl, Haefner, Wrba et al., 1997; Pugliese, Antonelli, Vincenti et al., 1997) included a mixture of pancreaticobiliary malignancies. These studies reported generally similar sensitivity with forceps biopsy compared with brush cytology, though one study (Jaiwala, Fogel, Sherman et al., 2000) noted statistically significant increases for forceps biopsy over brush cytology when atypia was not interpreted as malignancy).
In addition, each of these studies reports that the combination of forceps biopsy and brush cytology increases the sensitivity in detecting malignancy between 5-20%. Jaiwala, Fogel, Sherman et al. (2000) and Ponchon, Gagnon, Berger et al. (1995) both reported the increase in sensitivity for the combination of forceps biopsy plus brush cytology compared to forceps biopsy alone to be statistically significant (p<0.05).
In sum, the available evidence suggests that forceps biopsy provides similar, or higher, sensitivity compared to brush cytology, and both tests used in combination may slightly increase sensitivity over that achieved with either technique alone.
Combination of Three Sampling Techniques
Jaiwala, Fogel, Sherman et al. (2000; n=133) also reports on the combination of brush cytology, FNA cytology, and forceps biopsy (Table 28). This study reports increases in overall sensitivity for detecting pancreaticobiliary malignancy as more sampling techniques are added together. The size of incremental the gains in sensitivity and statistically significance associated with adding the third sampling technique vary depending on the criteria used to interpret positive results on cytology. The largest gains are observed when forceps biopsy is being added as the third procedure (approximately 18-23% higher sensitivity, p<0.05), but smaller gains are still noted when one of the cytology techniques is added as the third procedure (approximately 4-13%).
Comparison of ERCP-FNA with EUS-FNA
In the absence of comparative studies directly comparing EUS-FNA and ERCP-FNA, an indirect comparison of single arm studies was attempted. Ten articles were identified, including one large multicenter report (Wiersema, Vilmann, Giovannini et al., 1997), three reports from Indiana University (Gress, Gottlieb, Sherman et al., 2001; Gress, Hawes, Savides et al., 1997; Wiersema, Kochman, Cramer et al., 1994), one report from Massachusetts General Hospital (Brandwein, Farrell, Centano et al., 2001), two reports from University of South Carolina (Williams, Sahai, Aabakken et al., 1999; Bhutani, Hawes, Baron et al., 1997), two reports from University of California (Chang, Nguyen, Erickson et al., 1997; Chang, Katz, Durbin et al., 1994), and one report from University of Pennsylvania (Bentz, Kochman, Faigel et al., 1998) (Table 29). Overlap of patient populations and data from separate reports from the same institution is difficult to assess due to limitations in reported detail. An attempt was made to minimize duplicate reporting of subjects. Earlier reports of studies from the same institution that were later published with more subjects have omitted from Table 29. However, some duplication of results likely remains between the multicenter report and separate reports from contributing institutions. The two reports by Gress et al. (Gress, Gottlieb, Sherman et al., 2001 and Gress, Hawes, Savides et al., 1997) address differently selected, but probably overlapping patient groups; however, both are included as they address slightly different questions.
All of these studies reported results separately for diagnosis of pancreatic mass. Additional results on lymph node evaluation and intestinal lesions were not relevant to this review. Despite uncertainties over the exact number of subjects included among the reports detailed in Table 29, the available studies include at least 400 subjects with pancreatic mass and report a range of sensitivity in detecting pancreatic malignancy of 60-94% with a specificity of 100%. Brandwein, Farrell, Centano et al. (2001; n=93) reported results separately for cystic versus solid pancreatic masses and found slightly lower sensitivity for cystic lesions, 50% versus 60%.
The sensitivity estimates for ERCP-FNA derived from the two studies identified in the systematic review (Jaiwala, Fogel, Sherman et al., 2000, n=133; Howell, Beveridge, Bosco et al. (1992, n=31) were obtained in subjects with a mixture of pancreaticobiliary malignancy and included subjects with pancreatic cancer, ampullary tumors, cholangiocarcinoma, and metastases. While the reported range of sensitivity of 25-62% for ERCP-FNA appears to be lower than that reported for EUS-FNA, direct comparisons do not seem appropriate due to differences in the case mix of tumors between studies. Further limitations secondary to relatively small numbers of subjects in ERCP-FNA studies and potential differences in cytology techniques and interpretations between studies preclude direct comparison of these estimated ranges of sensitivity.
Summary
There is a modest body of evidence directly comparing the diagnostic performance of nonsurgical tissue sampling techniques for the evaluation of suspected pancreaticobiliary malignancy. The available studies are limited by small size and do not consistently compare techniques in the same group of patients. Most studies do not report statistical tests, so it is not possible to determine with confidence whether reported differences in sensitivity are significantly different. While available evidence is suggestive, larger studies are needed to draw conclusions on relative performance of tissue sampling techniques.
The available evidence suggests that sensitivity for detecting malignancy is similar or higher for brush cytology versus bile aspiration cytology, similar for FNA cytology versus brush cytology, and similar or higher for forceps biopsy versus brush cytology. Using combinations of two or more sampling techniques may increase the overall sensitivity. No comparative studies evaluated whether incremental improvement could also be achieved by repeated sampling using the same technique.
In the absence of comparative studies of EUS-FNA and ERCP-FNA, indirect comparison of single arm-studies was attempted. Results from 10 studies including at least 400 subjects with pancreatic mass suggest a range of sensitivity in detecting pancreatic malignancy of 60-94% with a specificity of 100%. Two studies of ERCP-FNA including 164 subjects with various pancreatobiliary tumors reported of sensitivities ranging from 25% to 62%. While sensitivity in reported in these studies appears to be lower than that for EUS-FNA, such a comparison is not valid due to differences in study populations, cytology techniques, and study settings.
Part II, Section 2: Diagnostic Performance of ERCP In Pancreaticobiliary Malignant Obstruction -- Comparison To Alternatives
Introduction
The evaluation of suspected malignant obstructive jaundice includes imaging evaluation to determine if there is an anatomic narrowing or stricture of the biliary or pancreatic ducts. If a stricture is identified, the appearance and location of the stricture are characterized to determine the likelihood of malignancy and to guide subsequent treatment decisions.
Images of the pancreaticobiliary system can be obtained using a variety of techniques. Direct cholangiopancreatography performed via an ERCP approach is the subject of this systematic review, and the primary diagnostic alternatives to ERCP are magnetic resonance cholangiopancreatography (MRCP), endoscopic ultrasonography (EUS), computed tomography cholangiography (CTC), and percutaneous transhepatic cholangiography (PTC). Both ERCP and PTC are minimally invasive procedures involving injection of contrast directly into the biliary tree. EUS involves endoscopy, but does not directly invade the biliary system. MRCP and CTC are both noninvasive procedures, though oral or intravenous biliary contrast agents may be used to enhance CTC while MRCP does not require the administration of a contrast agent to visualize the biliary tree.
This systematic review selected studies that directly compared the diagnostic performance of ERCP with at least one of the primary alternative diagnostic tests. Given that the expected difference in diagnostic performance between tissue sampling techniques and the diagnostic alternatives reported here are relatively small and the number of cases with the outcome of interest is generally small, these studies may have very limited power to detect statistically significant differences in test performance.
Evidence Base
ERCP vs. MRCP
Eight studies (total n=538) were identified that compared ERCP with MRCP and that used current MRCP technique. Five studies utilized an independent reference standard consisting of best available information derived from surgery, biopsy, imaging, and clinical follow-up to establish the final diagnosis, thus providing comparative data for ERCP and MRCP. The remaining three studies considered ERCP to be the reference standard against which MRCP was measured, yielding concordance of findings of MRCP with ERCP. Four studies were rated "Good" quality, signifying use of blinded interpretation of tests (Table 30). Four of these studies included over 100 subjects and the smallest study contained 46 subjects.
ERCP vs. EUS
Seven studies (total n=466) were identified that compared ERCP with EUS. Six of these employed an independent reference standard consisting of best available information derived from surgery, biopsy, imaging, and clinical follow-up to establish the final diagnosis, and therefore reported data for both EUS and ERCP. Only one study was rated "Good" (Glasbrenner, Schwarz, Pauls et al., 2000, n=90-91) (Table 30). Three studies addressed populations with obstructive jaundice, two studies addressed populations with suspected pancreatic cancer, and two studies addressed patients with either known or suspected intraductal papillary mucinous tumors of the pancreas.
Review of Evidence: Diagnostic Performance
Presence of Malignant Stricture/Lesion
ERCP vs. MRCP
Five studies including a total of 379 patients reported on diagnostic performance of MRCP in identifying and characterizing a malignant stricture (Table 31). In the two studies where ERCP was the reference standard (Guibaud, Bret, Reinhold et al., 1995; n=126; Lomas, Bearcroft, and Gimson 1999, n=69; both rated "Fair"), MRCP showed 86% and 92% sensitivity and 98 and 100% specificity. These data suggest good concordance between MRCP and ERCP results.
The three studies comparing MRCP and ERCP with an independent reference standard report slight differences in estimates of sensitivity and specificity, but none of these differences is statistically significant. The one study rated "Good" quality (Adamek, Albert, Weitz et al., 1998, n=60), reported slightly lower sensitivity (81% vs. 93%) and higher specificity (100% vs. 94%) for MRCP compared with ERCP, but both tests were considered equivalent. The largest study (Arslan, Geitung, Viktil et al., 2000, n=78) found similar sensitivity (86% vs. 89%) and reports lower specificity (82% vs. 94%) for MRCP, but 95% confidence intervals overlap significantly. Finally, Lee et al. (1998; n=46) reports higher sensitivity (81% vs. 71%) and similar specificity (92% vs. 92%) for MRCP, but overall accuracy was not statistically different.
ERCP vs. EUS
Three studies, all rated "Fair" quality and including a total of 129 patients with obstructive jaundice, reported on the diagnostic performance of EUS in identifying the presence of a malignant lesion/stricture (Table 32). One study (Burtin. Palazzo, Canard et al., 1997, n=34) reported similar diagnostic performance for ERCP and EUS, with both tests achieving 89% sensitivity and similar specificity (96% for EUS and 92% for ERCP). Dancygier and Nattermann (1994, n=41) reported complete concordance between EUS and ERCP. One study (Snady, Cooperman, Siegel et al., 1992, n=54-60) compared EUS with the combination of ERCP plus CT and reports both higher sensitivity and specificity for EUS, 85% vs. 75% sensitivity, and 80% vs. 65% specificity, respectively, but these differences were not statistically significant.
In summary, individual studies were relatively small and did not identify significant differences in diagnostic performance between ERCP and either MRCP or EUS. These data permit preliminary conclusions that MRCP and EUS provide similar diagnostic assessment as ERCP for detection of malignant pancreaticobiliary obstruction.
Diagnosis of Pancreatic Cancer
MRCP vs. ERCP
Diagnostic performance for demonstrating pancreatic cancer in 37 of 124 was reported by Adamek, Albert, Breer et al. (2000; Table 31). This study compares MRCP and ERCP and reported slightly higher sensitivity (84% vs. 70%) and similar specificity (97% vs. 94%) for MRCP and ERCP, respectively, but these differences did not reach statistical significance (McNemar p=0.059). This study was rated "Good" for quality.
EUS vs. ERCP
Diagnostic performance for pancreatic cancer was reported in two studies specifically addressing populations with suspected pancreatic disease (Table 32). Rosch, Schusdziarra, Born et al. (2000) retrospectively evaluated 184 patients who had ERCP, EUS, and CT and compared the diagnostic performance of clinical assessment with the various imaging tests. This study finds similar performance for clinical assessment, ERCP, or EUS in distinguishing pancreatic cancer from chronic pancreatitis and in distinguishing pancreatic cancer from inflammatory tumor. Interpretation of Rosch, Schusdziarra, Born et al. (2000) is somewhat limited by the retrospective selection of patients on the basis of having all three imaging tests, which might bias the study toward cases where findings were inconclusive. Glasbrenner, Schwarz, Pauls et al. (2000; n=95) noted ERCP and EUS to have similar sensitivity (81% vs. 78%, respectively) and specificity (88% vs. 93%, respectively), and the combination of the two tests yielded 92% sensitivity and 86% specificity, but these differences were not statistically significant.
Summary
In summary, there is little evidence directly comparing ERCP with either MRCP or EUS in diagnosing pancreatic cancer. The available evidence does not demonstrate statistically significant differences between ERCP and either MRCP or EUS.
Presence of Stricture
ERCP vs. MRCP
Three studies reported diagnostic performance in demonstrating the presence of stricture (either benign or malignant) (Table 31). One of the two studies rated as "Good" independently verified results and found 100% sensitivity and 100% specificity for both MRCP and ERCP (Varghese, Farrell, Courtney et al., 1999, n=98-100). The other (Holzknecht, Gauger, Sackmann et al., 1998, n=61) used ERCP as reference standard and reported 89% sensitivity and 85% specificity for MRCP relative to ERCP, though this study utilized only projection ("snapshot") MRCP techniques without additional multislice techniques which may limit its comparability. One additional study (Lomas, Bearcroft, and Gimson 1999, n=69) rated as "Fair" quality because of uncertainties with regard to complete blinding of interpretation, noted 100% concordance for MRCP with ERCP.
ERCP vs. EUS
No studies reported this specific analysis.
Summary
In summary, the evidence specifically evaluating MRCP in relation to ERCP for detecting strictures is sparse and suggests similar results for MRCP and ERCP in identifying the presence of a stricture. However, these studies do not report full statistical analysis. The relative performance of EUS and ERCP in this setting has not been reported.
Level of Stricture
ERCP vs. MRCP
One study comparing ERCP and MRCP (Varghese, Farrell, Courtney et al., 1999, n=98-100, "Good") specifically reported 100% sensitivity and specificity for both MRCP and ERCP in defining the level of the stricture (Table 31). Lomas, Bearcroft, and Gimson (1999, n=69, "Fair") also reported complete concordance for MRCP with ERCP in defining the level of malignant strictures.
ERCP vs. EUS
Only one study comparing ERCP and EUS (Dancygier and Nattermann 1994, n=41, "Fair") specifically reported sensitivity and specificity in defining the level of the stricture (Table 32). This study reports 100% sensitivity and specificity for both ERCP and EUS.
Summary
In summary, there is little evidence specifically reporting the diagnostic accuracy of MRCP or EUS relative to ERCP in defining the level of stricture, but the available studies suggest that all three tests provide highly accurate localization of pancreaticobiliary stricture.
Evaluation of Suspected Intraductal Papillary Mucinous Tumors (IPMT) of the Pancreas
ERCP vs. MRCP
No studies reported this specific analysis
ERCP vs. EUS
Two studies evaluated EUS in comparison with endoscopic retrograde pancreatography (ERP) in patients with either known or suspected IPMT of the pancreas (Table 32). Kaneko, Nakao, Inoue et al. (2001; n=27, "Fair") found that EUS and ERP were similarly sensitive (59% vs. 50%, respectively) in detecting mural nodules while both tests were 100% specific for this finding. Cellier, Cuillerier, Palazzo et al. (1998; n=47, "Fair") compared ERCP and EUS in defining the presence of invasive tumor and reported EUS to be more sensitive (78% vs. 55%) and less specific (75% vs. 90%), but no statistical analysis was reported.
These two small studies, reporting estimates of diagnostic performance relating to different diagnostic endpoints, suggest that EUS may provide a similar information to ERCP in patients with known or suspected intraductal papillary mucinous tumors of the pancreas, but confirmation of these findings would be helpful.
Conclusions
The body of evidence directly comparing ERCP with either MRCP or EUS is modest in size and of varying methodological quality. The evidence comparing ERCP with MRCP is slightly stronger than that comparing ERCP with EUS both in terms of number of subjects and study quality. The available studies do not demonstrate statistically significant differences in diagnostic performance for ERCP versus MRCP or for ERCP versus EUS for characterizing malignant strictures. In sum, the available studies suggest that either MRCP or EUS provides similar diagnostic performance as ERCP in detecting pancreaticobiliary malignant obstruction.
Part II, Section 3: Outcomes of Treatment Using ERCP and Endoscopic Sphincterotomy and Endoscopic Stent for Palliation of Pancreaticobiliary Malignancy -- Comparison of Strategies Using ERCP, Surgery, or Interventional Radiology
Introductin
Biliary obstruction is a frequent presenting feature of pancreaticobiliary malignancy. Unfortunately, patients with pancreaticobiliary malignancy are usually incurable at the time of diagnosis (Conio, Demarquay, De Luca et al., 2001; England and Martin 1996). Whether surgical resection for attempted cure is feasible or not, management of biliary obstruction is desirable to palliate the morbidity of jaundice. Endoscopic stent drainage has been proposed as an alternative to biliary-enteric bypass surgery to palliate malignant biliary obstruction. In addition, alternative approaches to biliary stenting have been compared with particular interest to determining optimal stent material, design, and placement strategies.
Part II, Section 3A. Comparison of ERCP Stent Versus Surgical Bypass
Body of Evidence
Five studies compared results of surgical bypass with endoscopic stent drainage for palliation of malignant obstructive jaundice. Quality assessments are described in Table 33. Results of these studies are detailed in the "Evidence Tables" section and summarized in Tables 34-37. Three randomized, controlled trials were identified comparing surgical biliary bypass with endoscopic biliary stent placement. Two of these (Smith, Dowsett, Russell et al., 1994, n=204; Andersen, Sorensen, Kruse et al., 1989, n=50) were rated as "Good" quality, and Shepherd, Royal, Ross et al. (1988, n=52) was rated as "Fair"). Two retrospective comparisons (Raikar, Melin, Ress et al., 1996, n=66; Leung, Emergy, Cotton et al., 1983, n=98) were both rated as "Poor."
Review of Evidence: Treatment Outcomes
All studies reported that there was no significant difference in overall patient survival between the ERCP and the surgery groups (Table 35). Two randomized controlled trials reported both treatments to have high rates for relief of jaundice but no statistically significant difference. A third study reported on quality of life, as measured by mean percentage of survival time with normal activity or limited activity with no aid; there were no significant differences.
Review of Evidence: Adverse Outcomes
There were no significant differences in perioperative mortality (Table 36). The randomized controlled trial by Smith, Dowsett, Russell et al. (1994) was designed to show a 5-20% decrease in 30-day mortality at 95% power with 115 patients entered into each arm. Accrual was stopped at 204 patients when interim analysis indicated that additional accrual would not change the outcome. While this trial did not show a statistically significant difference in perioperative (30-day) mortality, intent-to-treat analysis showed significantly greater procedure-related mortality in the surgery arm (14% vs. 3%, p=0.006). Smith, Dowsett, Russell et al., (1994) also found that major complications were significantly greater in the surgery group than in the ERCP group (29% vs. 11%, p=0.02). Andersen, Sorensen, Kruse et al. (1989) reported severe infections in 36% of ERCP patients compared to 20% of surgical patients, but the difference was not statistically significant. Shepherd, Royal, Ross et al. (1988) found twice the rate of complications in the surgical group, but again this was not statistically significant.
Review of Evidence: Resource Utilization
The two randomized controlled trials rated as good quality found no significant difference in total days of hospitalization, including the largest of trials in this group of studies (Smith, Dowsett, Russell et al., 1994, n=203) (Table 37). Three studies report on initial hospitalization; including 1 randomized controlled trial (Shepherd, Royal, Ross et al., 1988, n=52). All show fewer days of initial hospitalization with ERCP, and 2 report that the difference is statistically significant. Readmissions were more common with ERCP, but tests of statistical significance were not reported. The randomized controlled trial by Shepherd, Royal, Ross et al. (1988) reports significantly fewer initial and total hospitalization days with ERCP, despite a readmission rate twice that of surgery. However, this randomized controlled trial was judged of lesser quality ("fair"), largely due to lack of clarity in the method of analysis.
Stent replacement was reported in the Smith, Dowsett, Russell et al., (1994) study as necessary in 37% of patients, all but 1 case due to recurrence of obstructive jaundice. Raikar, Melin, Ress et al. (1996) reported an average of 1.7 stent replacements per patient.
Summary
The most robust evidence is provided in the randomized controlled trial by Smith, Dowsett, Russell et al. (1994). There were no significant differences in overall survival, relief of jaundice, technical success, total hospitalization days or perioperative mortality. Major complications were more frequent in the surgery group (11% vs. 29%, p=0.02), presumably reflecting the more invasive nature of surgical versus endoscopic treatment. Stent replacement was required in 37% of ERCP patients.
Part II, Section 3B. Comparison of Metal vs. Plastic Stents During ERCP
Evidence Base
Three studies were identified comparing endoscopically placed metal or plastic stents for palliation of biliary obstruction due to malignancy. Quality ratings are described in Table 38. Results are detailed in the "Evidence Tables" chapter and summarized in Tables 39-42. Two randomized, controlled trials (total n=206) were identified. Davids, Groen, Rauws et al. (1992, n=105, "Fair" quality) compared metal versus plastic stents. Prat, Chapat, Ducot et al. (1998, n=101, "Fair" quality) randomized patients into 3 arms (either metal stents, plastic stents with exchange as needed for stent dysfunction, or plastic stents with routine exchange every 3 months). In addition, Schmassmann, Von Gunten, Knuchel et al. (1996, n=165, "Poor" quality) retrospectively compared results with metal versus plastic stents.
Review of Evidence: Treatment Outcomes
Metal stents showed statistically significantly longer patency rates compared with plastic stents in all three studies (Table 40). Two of the studies reported that median duration of patency with metal stents was twice as long as plastic stents (9.1-10 months versus 4-4.2 months, p<0.006), but one of the randomized trials showed a smaller benefit for metal stents (4.8 months versus 3.2 months, p<0.05).
The two randomized studies reported no significant difference in overall survival for patients treated with metal or plastic stents, with median survival ranging from 4.5-5.8 months. In contrast, the retrospective study found slightly longer median survival in the metal stent group (6.5 months versus 4 months, p<0.05), but related this observation to increased mortality in 18% of subjects (predominantly plastic stent group) who did not receive treatment for stent dysfunction.
All studies reported both treatments to have high rates for relief of jaundice with no statistically significant differences reported.
Review of Evidence: Adverse Outcomes
Two studies (Prat, Chapat, Ducot et al., 1998; Schmassmann, Von Gunten, Knuchel et al., 1996) reported no significant difference in perioperative mortality (Table 41). The randomized, controlled trial by Davids, Groen, Rauws et al. (1992) noted a higher perioperative mortality rate in the metal stent group (14% vs. 4%, p=0.047), but the causes of death in 6 of 7 cases were completely unrelated to biliary pathology. No significant differences were noted in complications in the two randomized studies and the retrospective study did not specifically report complications other than perioperative mortality.
Review of Evidence: Resource Utilization Outcomes
All studies examined the relative utilization of ERCP procedures and found patients receiving metal stents to require the fewest ERCP procedures (Table 42). Patients receiving metal stents required 1.2-1.3 ERCP procedures on average and those receiving plastic stents and undergoing stent exchange only when needed required 1.58-1.8 ERCP procedures. The study by Prat, Chapat, Ducot et al. (1998) examined the strategy of routine plastic stent exchange every 3 months which necessitated an average of 2.5 ERCP procedures per patient. The differences in ERCP utilization between metal and plastic stents were reported to be statistically significant in two studies and a statistical comparison was not reported in the third study.
Prat, Chapat, Ducot et al. (1998) also examined utilization of total hospital days and found the metal stent group averaged 5.5 days while the plastic stent groups required 7.4 to 10.6 days on average, depending on whether "as needed" or routine stent exchange was used, respectively. The difference between metal stents and routinely exchanged plastic stents was statistically significant (5.5 ± 1.4 versus 10.6 ± 1.7, p=0.01) while the differences between metal stents and plastic stents exchanged as needed were not statistically significant.
Prat, Chapat, Ducot et al. (1998) also reported lower average total costs for the metal stent group than costs associated with either of the plastic stent strategies, but statistical analysis was not reported for these results.
Summary
Three studies including a total of 371 subjects provide consistent evidence that metal stents remain patent longer than plastic stents. Both types of stents offer initial relief of jaundice and the available evidence does not conclusively show any difference in perioperative adverse events. Overall patient survival is not significantly different when stent occlusions are treated with stent exchange as needed. Total resource utilization including need for repeat ERCP, total hospital days, and costs was reported to be lower with metal stents compared with plastic stents.
Part II, Section 3C. Additional Comparisons of ERCP Strategies
Evidence Base
The ERCP literature systematically reviewed for this report also included nine studies comparing various alternative ERCP treatment techniques. The comparisons reported in these studies were sufficiently dissimilar from the studies reviewed in preceding sections on palliative treatments of pancreaticobiliary malignancy that they are briefly summarized separately in this section. The quality assessments of these studies are detailed in Table 43 and the results of these studies are in Tables 44-46.
Review of Evidence: Stent Material and Design
Four studies, including two randomized controlled trials (one quality rated as "Good" and one as "Fair") and two nonrandomized studies (both rated "Poor" quality) compared different features of endoscopically placed stents for palliation of pancreaticobiliary malignancy (Tables 44-46.).
van Berkel, Boland, Redekop et al. (1998, n=84, "Fair") randomized patients to receive stents made of Teflon™ versus stents made of polyethylene and found no significant differences in efficacy or complications (Table 44). Median stent patency duration was 83 days for Teflon™ stents and 80 days for polyethylene stents (p=0.93).
Pedersen (1993, n=89, "Poor") and Speer, Cotton, MacRae et al. (1988, n=79, "Poor") both compared outcomes using different caliber stents, but neither of these studies uses a randomized, controlled design (Table 45). Speer, Cotton, MacRae et al. (1988) found significantly longer median stent patency for 10Fr stents compared with 8Fr stents (32 weeks vs. 12 weeks, p<0.001). Complications reported included a lower rate of cholangitis with 10 Fr stents (5% vs. 34%, p<0.05), and similar rates of local perforation and stent migration. However, the 8Fr stents had pigtail-shaped ends compared with straight-shaped 10Fr catheters, a potential confounding factor in interpreting this study. Pedersen (1993) did not reveal a statistically significant difference in stent patency comparing 10Fr and 7 Fr, and did not show significant differences in total complication rates. However, this study also suffered from baseline differences in age, with younger patients receiving 7 Fr stents, increasing concerns over interpretation of findings.
Sung, Chung, Tsui et al. (1994, n=70, "Good") randomized patients to receive 10Fr stents with or without sideholes (Table 46). No statistically significant differences were noted in stent patency and reported complications appeared similar, although statistical analysis was not reported.
None of these studies provides a sufficient basis for a conclusion regarding the relative efficacy the stent features being compared.
Review of Evidence: Comparisons of Stent Placement
Five studies including three RCT (two quality rated as "Good" and one as "Fair") and two retrospective studies (one "Fair" and one "Poor" quality) looked at issues of stent placement (Tables 47-49).
Speer, Cotton, Russell et al. (1987, n=75, "Good") randomized patients to undergo percutaneous transhepatic placement of 12 Fr stents or endoscopic placement of 10 Fr stents (Table 47). This trial was terminated early when a prespecified statistical criterion was reached, specifically increased perioperative mortality was observed in subjects randomized to percutaneous stent insertion, 33% vs. 15%, p=0.016. Early complications also favored endoscopic over percutaneous placement (19% vs. 67%, p=n.r.). Patient survival and stent patency results did not demonstrate statistically significant differences.
Pedersen, Lassen, De Muckadell et al. (1998, n=34, "Fair") randomized patients to have 10Fr stents placed with the inferior tip above the sphincter of Oddi or across the sphincter of Oddi (Table 48). Stents placed across the sphincter of Oddi were less likely to become dislocated (12% vs. 53%, p=0.026). Otherwise, no statistically significant differences were observed between the two groups with regard to patient survival, stent patency, procedure-related mortality, or complications.
Three studies compared results of unilateral versus bilateral stent placement in patients with biliary obstruction secondary to hilar malignancy (Table 49). DePalma, Galloro, Iovino et al. (2001, n=157, "Good") provides the best evidence derived from a randomized controlled trial. This study finds no statistically significant differences in overall patient survival, perioperative mortality, procedure-related mortality, or late complications between those randomized to receive a unilateral versus bilateral stent. Moreover, the significant results reported favored unilateral stent placement over bilateral stents. Those randomized to receive bilateral stents had significantly lower rates of successful drainage (73% versus 81%, p=0.049), significantly more early complications (26.9% versus 18.9%, p=0.026), and significantly higher rates of cholangitis (16.6% versus 8.8%, p=0.013).
The two earlier retrospective studies, Chang, Kortan, and Haber (1998, n=141, "Fair") and Deviere, Baize, de Toeuf et al. (1988, n=70, "Poor") both examined patients who all had hilar malignancy and compared outcomes for those receiving unilateral or bilateral stents. Chang, Kortan, and Haber (1998) further considered subgroups who had different combinations of having received unilateral versus bilateral diagnostic biliary opacification and unilateral versus bilateral stent drainage. Deviere, Baize, de Toeuf et al. (1988) restricted analysis only to deceased patients. The results of these studies are complex with primary findings reported to be longer median patient survival in patients receiving bilateral drainage procedures, and higher perioperative mortality and increased rate of acute cholangitis among the subgroup which had unilateral stent placement in Deviere, Baize, de Toeuf et al. (1988) and the subgroup with unilateral drainage but bilateral diagnostic opacification performed in Chang, Kortan, and Haber (1998). However, the reported analyses do not fully account for various possible confounding influences and in light of findings of the randomized controlled trial, these retrospective findings are likely related to unmeasured differences in the groups being compared.
Summary
Several additional comparative studies addressing variations in stent design and stent placement were identified in this systematic review. Since each research comparison has only one or no randomized controlled trial available, the results of these studies support only preliminary conclusions regarding the relative efficacy of these alternative approaches to stent palliation of pancreaticobiliary malignancy.
Part II, Section 4: Outcomes of Treatment Using Preoperative ERCP Drainage for Relief of Malignant Obstructive Jaundice
Introduction
Biliary obstruction results in a variety of biochemical and physiological disturbances such as elevated bilirubin and other liver function tests, as well as impaired hepatic and renal function with associated coagulation problems. In patients who are scheduled for potentially curative surgery, it has been postulated that using a course of preoperative biliary drainage to alleviate biliary obstruction may result in reduced surgical morbidity and mortality.
Evidence Base
Six studies addressed preoperative stenting compared to no stenting prior to surgery for malignant obstruction. Quality assessments are described in Table 50. Results are displayed in detail in the "Evidence Tables" chapter and summarized in Tables 51 and 52. The four nonrandomized series (Sewnath, Birjmohun, Rauws et al., 2001, n=290; Karsten, Allema, Reinders et al., 1996, n=241; ten Hoopen-Neumann, Gerhards, van Gulik et al., 1998, n=52; Heslin, Brooks, Hochwald et al., 1998, n=74) were judged to be of poor quality, largely due to lack of between-group comparability of patients or performance of intervention; and the randomized controlled trial by Lygidakis, van der Heyde, Lubbers et al. (1987, n=38) suffered from inappropriate use of statistical tests. Accompanying letters to the editor suggest that the conclusions as stated in the Lygidakis, van der Heyde, Lubbers et al. (1987) paper are not substantiated by the reported data. The randomized controlled trial by Lai, Mok, Fan et al. (1994, n=87) was judged to be of "Fair" quality, but is limited by insufficient sample size, which is the reason the trial was terminated by the investigators after initial analysis. Outcomes reported in these studies are largely limited to laboratory values and perioperative mortality and morbidity and postoperative hospital stay.
Review of Evidence: Treatment Outcomes
One randomized trial (Lygidakis, van der Heyde, Lubbers et al., 1987) and two nonrandomized comparisons reported on hospital days (Table 52). Lygidakis, van der Heyde, Lubbers et al. (1987) reported that preoperative ERCP group had higher initial hospital days (7 vs. 3.7) and lower total hospital days (23 vs. 26.7) than the no stent group, respectively. Tests of statistical significance were not reported. Heslin, Brooks, Hochwald et al. (1998, n=74) found patients receiving preoperative stents had slightly longer postoperative hospital stay (median of 11 versus 10 days, p=0.04) but Sewnath, Birjmohun, Rauws et al. (2001, n=290) reported slightly shorter postoperative stays in the stented groups that did not reach statistical significance (median of 13-15 days versus 16 days, p=0.09).
Lai, Mok, Fan et al. (1994) reported on technical success of preoperative stenting, which was 87%.
Comparison of changes in laboratory values before and after placement of a preoperative stent consistently showed a reduction in serum bilirubin and liver function tests. One study showed a significant increase in white blood cell count in the preoperative stent group after stenting. These changes were significantly different from the pattern of laboratory values seen in the "no stent" group that went immediately to surgery. No significant changes were noted in hemoglobin, hematocrit, creatinine, blood urea nitrogen, albumin or coagulation profiles.
Review of Evidence: Adverse Outcomes
The available data shows no apparent differences in perioperative mortality (Table 52). Lygidakis, van der Heyde, Lubbers et al. (1987) reported no deaths in the stent group and 2 (11%) in the "no stent" group; and Lai, Mok, Fan et al. (1994) reported 14% mortality for both groups. However, the sample sizes (n=34 and n=87, respectively) in these randomized controlled trials are likely too small to make a meaningful comparison. A larger but nonrandomized comparative study (Sewnath, Birjmohun, Rauws et al., 2001, n=290) and a smaller retrospective comparison (Heslin, Brooks, Hochwald et al., 1998, n=74) also reported no statistically significant differences in mortality.
Only Lai, Mok, Fan et al. (1994) reported on total complications, including complications from preoperative endoscopic stenting plus those from surgery. Total complications were greater in the preoperative stent group (56% vs. 41%), but results were not statistically significant. Of patients in the preoperative stent group who had complications, 30% had complications from both preoperative endoscopic stenting and from surgery. Sewnath, Birjmohun, Rauws et al. (2001) reported no significant difference in postoperative complications (50% for stented versus 55% without stent, p=0.69) but also reported that 6% of those receiving preoperative stenting experienced a stent-related complication. Lygidakis, van der Heyde, Lubbers et al. (1987), Karsten, Allema, Reinders et al. (1996), and Heslin, Brooks, Hochwald et al. (1998) reported only postoperative complications. The nonrandomized comparison by Heslin, Brooks, Hochwald et al. (1998) reported higher complications in the stent group (59% versus 34%, p=0.04), and the study by Karsten, Allema, Reinders et al. (1996) reported the same rate of infective complications (39%) in no drainage group as in the preoperative ERCP papillotomy plus stent group.
The retrospective series by ten Hoopen-Neumann, Gerhards, van Gulik et al. (1998) reports that implantation metastases (i.e., metastases presumed to be attributable to an invasive procedure) occurred in 20% of patients with preoperative stent and none in patient without stent, but the difference was not statistically significant. Moreover, this study did not control for whether patients received postoperative radiation therapy.
Summary
The evidence available is limited by poor methodological quality and fails to demonstrate that preoperative stenting improves health outcomes. Five of the six studies were judged to be of poor quality and the sixth, a randomized controlled trial judged to be of fair quality, is limited by insufficient sample size. Few studies report overall complications including both those related to the preoperative stent and the surgery, and these suggest that when complications of preoperative endoscopic stenting are considered along with the perioperative complications of surgery, pre-operative stenting is associated with more complications. The other studies did not report on total complications, and thus fail to account for the morbidity associated with undergoing two procedures rather than one. Preoperative stenting does appear to significantly improve elevated bilirubin and liver function tests, but the available evidence does not suggest that surgical outcomes are improved as a result.
Results and Conclusions, Part III: Pancreatitis
This chapter reviews evidence on the following questions:
In patients with pancreatitis,
a. What is the diagnostic performance of ERCP in detecting underlying causes or complications of pancreatitis that are amenable to treatment in comparison to alternatives (e.g., EUS or MRCP)? (Section 1: Diagnostic Performance of ERCP in Detecting Underlying Causes or Complications of Pancreatitis Amenable to Treatment - Comparison to Alternatives)
b. What are the outcomes of treatment using ERCP strategies compared to using surgical or medical therapy? (Section 2: Outcomes of Treatment Using ERCP for Pancreatitis - Comparison of Strategies Using ERCP, Surgery, or Medical Management)
Part III, Section 1: Diagnostic Performance of ERCP in Detecting Underlying Causes or Complications of Pancreatitis Amenable to Treatment -- Comparison to Alternatives
Introduction
In this section, evidence was sought to find studies that compared the diagnostic performance of ERCP and another diagnostic modality to diagnose treatable causes or complications of pancreatitis. Studies that demonstrate the utility of a single diagnostic modality in detecting treatable conditions did not meet selection criteria; only studies comparing ERCP with an alternative method were included. Studies whose aim was to diagnose or characterize chronic pancreatitis itself by two diagnostic modalities also did not meet selection criteria. Common duct stones can cause pancreatitis, but these studies were included in the review of studies evaluating diagnosis of common duct stones (see "ERCP Evidence Report Results and Conclusions, Part I: Common Bile Duct Stones").
Evidence Base
Only 3 studies were found that met selection criteria. Study quality is outlined in Table 53.
Review of Evidence
Duvnjak, Rotkvic, Vucelic et al. (1991, n=43, "Fair to Poor"; Table 54) compared ERCP to percutaneous cystopancreatography with measurement of pseudocyst amylase concentration to detect whether the pseudocyst communicates with the pancreatic duct. Knowledge of such a communication would help determine appropriate treatment for the pseudocyst. Although cystopancreatography alone has poor sensitivity compared to ERCP, measurement of the amylase concentration showed that amylase concentration greater than 64 WU had a sensitivity of 100 percent and a specificity of 90 percent compared to ERCP. It is not stated whether the 64 WU cutoff was prospectively defined. These results require further prospective validation.
Bret, Reinhold, Taourel et al. (1996, n=108, "Good"; Table 55) compared ERCP to MRCP for the diagnosis of pancreas divisum. Out of 108 undergoing both ERCP and MRCP, pancreas divisum was demonstrated by both techniques in 6 patients with complete concordance. The clinical significance of this finding is uncertain, as it is not reported or known whether the demonstration of the pancreas divisum alone determined the etiology or treatment of the clinical problem.
Takehara, Ichijo, Tooyama et al. (1994, n=39, "Fair"; Table 56) compared ERCP to MRCP to examine morphology of the pancreatic ducts in 39 patients with chronic pancreatitis. Ductal narrowing is potentially treatable with surgery or endoscopy, although evidence supporting effectiveness is lacking. In the area of the pancreas with the highest prevalence of stenosis, MRCP had only fair sensitivity, 57 percent, and fair specificity, 73 percent. The prevalence of lesions in other parts of the pancreas is too low to make any conclusions comparing MRCP to ERCP.
Conclusion
In sum, there is an inadequate literature base to compare ERCP and other diagnostic modalities for the identification of treatable complications of pancreatitis.
Part III, Section 2: Outcomes of Treatment Using ERCP for Pancreatitis -- Comparison of Strategies Using ERCP, Surgery, or Medical Management
Introduction
This chapter reviews the evidence on ERCP for the treatment of pancreatitis. Pancreatitis encompasses a number of distinct entities with differing etiologies, clinical expression, and treatment options. Each will be addressed separately to the extent allowed by the available literature. Also, there are a number of different endoscopic techniques employed for varying clinical situations. For the purposes of this chapter, "ERCP" will refer to the spectrum of interventional endoscopic techniques that are employed in the treatment of pancreatitis.
Evidence Base
Pancreatitis was classified as "acute," "acute recurring," and "chronic," and evidence was sought to address a total of 9 separate indications within these classifications (Table 57). However, evidence meeting study selection criteria for this systematic review was available for only 4 of 9 indications of interest. These are: acute biliary pancreatitis; pancreas divisum; idiopathic recurrent pancreatitis, and pancreatic pseudocyst. Table 58 shows the quality and type of available evidence on pancreatitis together with the number of studies that met our inclusion criteria for each indication. A more detailed account of the reason(s) for each of the excluded studies can be found in Table 59.
For acute pancreatitis, comparative studies are included that evaluate ERCP in the treatment of acute biliary pancreatitis. For acute recurrent pancreatitis (ARP) and chronic pancreatitis, there is a notable lack of comparative and/or prospective studies. To address the paucity of evidence on the indications, study selection criteria were relaxed to include retrospective, single arm studies that met a minimum threshold for reporting outcome measurements. Chronic pain, one of the most important outcome measures in chronic pancreatitis, is a subjective outcome that is prone to bias, especially when assessed in the absence of a comparison group. Therefore, retrospective single arm studies of acute relapsing and chronic pancreatitis were restricted to those that reported quantifiable pre and post measurements of pain and/or other similar outcomes such as analgesic use or hospitalization rates.
Review of Evidence: Acute Pancreatitis
Three randomized controlled trials compared early ERCP to delayed or selective ERCP. One associational study of a Veterans Administration database compared ERCP to surgery (Aiyer, Burdick, Sonnenberg et al., 1999).
Early ERCP Vs. Delayed or Selective ERCP for Acute Biliary Pancreatitis
There are three randomized controlled trials included in this review that compare early ERCP vs. delayed or selective ERCP for acute biliary pancreatitis. Two of these three trials were rated as "Good" (Fan, Lai, Mok et al., 1993; Folsch, Nitsche, Ludtke et al., 1997) by the quality assessment, the third was rated as "Fair" (Neoptolemos, Carr-Locke, London et al., 1988). Among the three randomized controlled trials, there are differences in the patient eligibility criteria, severity of pancreatitis and application of ERCP intervention that are important to interpretation of the results (Table 60, Table 61). With respect to patient population: Neoptolemos, Carr-Locke, London et al. (1988, n=121) is restricted to patients with acute biliary pancreatitis; Fan, Lai, Mok et al. (1993, n=195) includes patients with non-biliary pancreatitis; and Folsch, Nitsche, Ludtke et al. (1997, n=238) excluded patients with signs of obstructive jaundice, and the remaining population largely represented patients with mild pancreatitis. Thus, the likelihood that pancreatitis was associated with ongoing biliary obstruction was highest in the Neoptolemos, Carr-Locke, London et al. (1988) study; lower in the Fan, Lai, Mok et al. (1993) study because patients with nonbiliary causes of pancreatitis were included; and lowest in the Folsch, Nitsche, Ludtke et al. (1997) study, which excluded patients with obvious obstruction.
In all three studies, patients were classified with mild or severe pancreatitis based on commonly used scales. These scales use readily available clinical information to predict prognosis in acute pancreatitis, but are not specifically meant to select patients for ERCP or to identify patients with biliary obstruction. Given the sophistication of contemporary imaging techniques, such classification systems may be of less clinical significance in predicting which patients are likely to benefit from ERCP treatment.
In these studies, ERCP was performed in 20-28 percent of patients in the delayed or selective groups. This represents a substantial minority of patients in the control group that actually underwent ERCP; but is a much lower percentage compared to the early ERCP groups, where almost all patients had the procedure.
Treatment Outcomes
No study reported statistically significant differences in mortality between groups (Table 62). Neoptolemos, Carr-Locke, London et al. (1988) and Fan, Lai, Mok et al. (1993) found numerically greater mortality in the delayed or selective ERCP group, but only for patients with severe pancreatitis. Consistent with these data, in a study population with milder disease, Folsch, Nitsche, Ludtke et al. (1997) found numerically greater mortality in the early ERCP group. This trial was terminated prematurely as the question of interest was whether early ERCP might lead to reduced mortality in the study population.
The lack of benefit for early ERCP in Folsch, Nitsche, Ludtke et al. (1997) is seen in conjunction with the exclusion of patients with ongoing biliary obstruction. This implies that the potential mortality benefit of ERCP is limited to patients with obstruction. Additionally, the overall magnitude of benefit among theses studies appears to be related to the likelihood of biliary obstruction in the population. Neoptolemos, Carr-Locke, London et al. (1988), which reports the greatest benefit, also has the highest likelihood of obstruction in their population, while the study with the least benefit, Folsch, Nitsche, Ludtke et al. (1997), has a population with the lowest likelihood of obstruction. The population in the Fan, Lai, Mok et al. (1993) study had a higher likelihood of obstruction compared to Folsch, Nitsche, Ludtke et al. (1997). Neoptolemos, Carr-Locke, London et al. (1988), reported a degree of benefit intermediate between those studies.
For total complications, Neoptolemos, Carr-Locke, London et al. (1988) reported a statistically significant reduction for the early ERCP group. Fan, Lai, Mok et al. (1993) and Folsch, Nitsche, Ludtke et al. (1997) reported no significant difference in total complication rates. However, Fan, Lai, Mok et al. (1993) observed half as many total complications with early ERCP (22 of 41 patients vs. 44 of 40) among the subgroup of patients with severe pancreatitis, but did not report statistical significance. In a subgroup analysis of patients with severe pancreatitis and documented common bile duct stone, Fan, Lai, Mok et al. (1993) reported a significantly lower rate of total complications for early ERCP group (3/19 vs. 10/16, p=0.005). In a study population presenting mainly with mild pancreatitis, Folsch, Nitsche, Ludtke et al. (1997) reported a significantly greater respiratory failure (15/126 vs. 5/112, p=0.03) with early ERCP.
In summary, the interpretation of this group of studies is that early ERCP reduces complications in patient populations with acute pancreatitis and biliary obstruction. In studies that report benefit for patients with severe pancreatitis, but not mild pancreatitis, this finding likely represents the correlation of biliary obstruction with more severe disease. In patients with low likelihood of biliary obstruction, a clinical approach that includes delayed or selective ERCP may result in lower complications, and permits many patients to avoid the procedure.
Previous meta-analysis
Sharma and Howden (1999), pooled four randomized controlled trials of early vs. delayed or selective ERCP for acute biliary pancreatitis, three of which are the studies discussed here. The fourth randomized controlled trial, Nowak, Nowakowska-Dulawa, Marek et al. (1995), has been published only in abstract form. This meta-analysis is flawed because it combines studies that have different patient populations and interventions. Also, these studies report subgroup analyses suggesting that aggregate outcomes may be misleading when applied to subsets of patients that are stratified on the severity of pancreatitis or the likelihood of biliary obstruction.
The authors computed summary estimates for total mortality and complications, and reported the relative risk reduction associated with the early ERCP strategy. For overall mortality, the combined relative risk reduction associated with early ERCP was 42.9 percent. For total complications, there was a 34.6 percent relative risk reduction associated with early ERCP. These summary results are driven largely by the results of Neoptolemos, Carr-Locke, London et al. (1988) and Nowak, Nowakowska-Dulawa, Marek et al. (1995), neither of which allowed selective early ERCP in the control group for clinical indications. The authors did not perform sensitivity analyses or stratified analysis of the data.
The authors concluded that all patients with acute biliary pancreatitis should undergo early ERCP. Given the differences in the methodology of these studies and the lack of rigor in the meta-analysis, this conclusion is not supported by a critical analysis of the data.
ERCP vs. Surgery for Acute Pancreatitis
There was a single study that met the inclusion criteria for this comparison (Table 63, Table 64). This study (Aiyer, Burdick, Sonnenberg et al., 1999) was a retrospective comparison of outcomes for patients with biliary pancreatitis that were treated initially either by ERCP or surgery, using the United States Veterans Administration computerized database. Investigators identified all hospitalizations in the VA database that had simultaneous diagnoses of pancreatitis and cholelithiasis. Outcomes for 650 patients treated initially with ERCP were compared with 1,425 patients treated initially with surgery.
This study was assigned a quality rating of "Poor" by quality assessment. The major methodologic limitation of this study is that the two groups being compared are likely to differ substantially on a variety of clinical factors. Limited information contained in the database on severity of illness indicated that the patients in ERCP group were older and had higher baseline Charlsson score as compared to patients initially treated with surgery. Also, a higher percentage of patients in the ERCP group had cholangitis, choledocholithiasis, and pancreatic cysts.
Outcomes for the two groups were generally similar or favorable towards ERCP, despite the fact that the ERCP group appeared to be more severely ill. Mortality was 4 percent for the surgery group and 2 percent for the ERCP group (p=0.08), while the rate of total complications was identical for the two groups at 2 percent.
Conclusions
Early ERCP Vs. Delayed or Selective ERCP for Acute Biliary Pancreatitis
Evidence from three randomized controlled trials suggests that early ERCP reduces complications in patient populations with acute pancreatitis and signs and symptoms suggesting biliary obstruction. In patients with low likelihood of biliary obstruction, delayed or selective ERCP permits many patients to avoid the procedure, and may result in lower complications.
ERCP vs. Surgery for Acute Pancreatitis
A single retrospective study suggests that outcomes from ERCP are at least as good as those from surgery. This study reported comparable outcomes for the two groups despite evidence for a higher severity of illness in ERCP group. However, this is a retrospective database study and confidence in the conclusions is limited by a number of methodologic factors, especially the potential for imbalances among the groups that are compared. Also, given the limited clinical information available, this study cannot ascertain the best strategy to employ given particular patient characteristics and/or clinical presentation.
Review of Evidence: Acute Recurrent Pancreatitis
Four studies, two randomized controlled trials and two single-arm retrospective series, met the inclusion criteria for this category. The main outcomes reported in these studies were pain, episodes of recurrent pancreatitis and/or hospitalization (Table 65).
Acute, Recurrent Pancreatitis Associated with Pancreas Divisum
Three studies, one randomized controlled trial (Lans, Geenen, Johanson et al., 1992) and two retrospective single-arm studies (Lehman, Sherman, Nisi et al., 1993; Kozarek, Ball, Patterson et al., 1995), reporting on a total of 110 patients, evaluated ERCP treatment for acute, recurrent pancreatitis associated with pancreas divisum. Lans, Geenen, Johanson et al. (1992) was a randomized controlled trial in 19 patients with pancreas divisum and recurrent acute pancreatitis. All patients received diagnostic ERCP, and patients who were amenable to stenting were randomized to stent or no stent. Patients were followed for a mean of approximately 30 months for the outcomes of recurrent pancreatitis, emergency room visits/hospitalizations, and clinical improvement. The quality of this study was rated "Fair." Confidence in the results of this study is limited by its small size, lack of blinding, and lack of comparison with alternatives Quality ratings were not applied to the two retrospective single studies, which are prone to confounding by the placebo effect, natural history of the disease, and a potentially large number of clinical factors.
The small randomized controlled trial by Lans, Geenen, Johanson et al. (1992, n=19) and the two retrospective single-arm studies (n=91) reported that ERCP treatment with stent or sphincterotomy decreased recurrent episodes of pancreatitis, and reduced pain as measured on visual analog scales. None of these studies met the threshold study selection criteria initially set for this systematic review. Although the body of evidence is sparse and largely uncontrolled, the observation that hospitalizations and emergency room visits were significantly reduced is consistent for both the single randomized controlled trial and the less rigorous single arm studies.
Idiopathic Acute, Recurrent Pancreatitis
A single, small, randomized controlled trial (Jacob, Geenen, Catalano et al., 2001, n=34) in patients with idiopathic acute, recurrent pancreatitis reported that ERCP plus stenting reduces episodes of recurrent acute pancreatitis as compared to diagnostic ERCP alone. However, the percent of patients with persistent pain was no less in the ERCP plus stent group as compared to the diagnostic ERCP group. Thus, this trial provides evidence that ERCP treatment reduces subsequent episodes of pancreatitis in idiopathic recurrent acute pancreatitis, similar to the results seen in patients with pancreas divisum. However, this single small, unblinded trial is insufficient to determine whether ERCP treatment reduces pain in patients who present with idiopathic acute recurrent pancreatitis.
Review of Evidence: Chronic Pancreatitis
The three studies (n=187) included in this review evaluate ERCP drainage of pancreatic pseudocysts (Table 66). There are a number of different endoscopic approaches for drainage of pseudocysts. The available studies generally report aggregate outcomes and are not adequately robust to compare outcomes among different approaches to drainage. Thus, this review will not attempt to differentiate among variations of endoscopic drainage. Only one of these studies is prospective (Barthet, Sahel, Bodiou-Bertei et al., 1995), and none provides robust information on prospective, long-term outcomes from these procedures.
One of the three studies met the threshold study selection criteria initially set for this systematic review (Froeschle, Meyer-Pannwitt, Brueckner et al., 1993). Results of this retrospective comparative study initial suggest that ERCP drainage results in a similar rate of pain relief as compared with surgery, with equivalent or lower mortality. Two additional single arm series that met the relaxed selection criteria suggest that regression of pseudocysts occurs in a majority of cases following ERCP drainage, in the range of 70-86 percent (Libera, Siqueira, Morais et al., 2000; Barthet, Sahel, Bodiou-Bertei et al., 1995). Pain relief after ERCP drainage was reported in the comparative study and in one case series, with approximately half of patients reporting complete pain relief following the procedure. The uncontrolled trial by Libera, Siqueira, Morais et al. (2000) also reported a significant improvement in pain scores following ERCP drainage. Using a 0-3 pain scale, the mean pain score was reduced from 2.48 pre-treatment to 0.28 post-treatment (p<0.001).
Conclusions
For treatment of acute pancreatitis, 3 randomized controlled trials (total n=554) compared early ERCP to delayed or selective ERCP. The available evidence suggests that early ERCP reduces complications in patient populations with acute pancreatitis and signs and symptoms suggesting biliary obstruction. In patients with low likelihood of biliary obstruction, delayed or selective ERCP permits many patients to avoid the procedure, and may result in lower complications. In addition, one retrospective associational study of a Veterans Administration database of patient with acute pancreatitis (n=2,075) suggests that outcomes of ERCP treatment are similar to those of surgery.
For ERCP treatment in patients with acute recurrent or chronic pancreatitis, study selection criteria were relaxed as described above in order to address this question. Although the available evidence is sparse and largely uncontrolled, it suggests that ERCP treatment reduces emergency room visits and hospitalization in patients with pancreas divisum and acute recurrent pancreatitis. Evidence on ERCP drainage of pseudocysts is also sparse and poorly controlled, but suggests that pain relief with ERCP is similar to results of surgery.
Results and Conclusions, Part IV: Abdominal Pain Of Possible Pancreaticobiliary Origin
This chapter reviews evidence on the following questions:
In patients with abdominal pain of possible pancreaticobiliary origin,
a. What is the diagnostic performance of ERCP with sphincter of Oddi manometry in identifying a pancreaticobiliary origin of pain in comparison to alternatives (e.g., biliary scintigraphy, EUS, or MRCP)? (Section 1: Diagnostic Performance of ERCP Manometry in Evaluation of Abdominal Pain of Possible Pancreaticobiliary Origin -- Comparison To Alternatives)
b. What are the outcomes of treatment using ERCP strategies compared to using surgical or medical therapy? (Section 2: Outcomes of Treatment Using ERCP for Abdominal Pain of Possible Pancreaticobiliary Origin)
Part IV, Section 1: Diagnostic Performance of ERCP Manometry In Evaluation of Abdominal Pain of Possible Pancreaticobiliary Origin -- Comparison With Alternatives
Evidence Base
Three studies comparing biliary scintigraphy with ERCP with or without manometry for the diagnosis of sphincter of Oddi dysfunction met the inclusion criteria for this chapter. There were a total of 136 patients enrolled in these studies, 54 of whom had sphincter of Oddi dysfunction. Quality assessment of these studies is available in Table 67. The study characteristics and diagnostic performance of biliary scintigraphy in these studies are summarized in Table 68.
Review of Evidence
There are notable differences in the study objectives, populations, diagnostic criteria for biliary scintigraphy, and reference standards that limit the ability to synthesize results from these studies. The earliest study (Kloiber, AuCoin, Hershfield et al., 1988) evaluated the ability of biliary scintigraphy to diagnose obstruction of the biliary tree postcholecystectomy. In this study, not all patients with obstruction had sphincter of Oddi dysfunction. Sostre, Kalloo, Spiegler et al. (1992) compared a number of different biliary scintigraphy diagnostic criteria for sphincter of Oddi dysfunction in a consecutive sample of postcholecystectomy patients, with the intent of determining the optimal criterion for diagnosing sphincter of Oddi dysfunction. The most recent study, Peng, Lai, Tsay et al. (1994), attempted to define the performance characteristics of biliary scintigraphy in a group of patients with suspected sphincter of Oddi dysfunction and a control group of asymptomatic postcholecystectomy patients. Other differences in the study populations, diagnostic criteria, and reference standards for biliary scintigraphy are summarized in Table 68.
The reported performance characteristics varied among these studies. The sensitivity of biliary scintigraphy for diagnosing sphincter of Oddi dysfunction ranged from 50-100 percent. The specificity ranged from 64-100 percent. The positive predictive value ranged from 73-100 percent and the negative predictive value ranged from 62-100 percent. Confidence intervals were not reported around the point estimates for these values in any of the studies. While it is likely that differences in study methodology and populations are related to the variability in reported outcomes, it cannot be determined which variables are associated with variability in outcomes.
Conclusions
The evidence is not sufficient to permit conclusions on the diagnostic performance of biliary scintigraphy for sphincter of Oddi dysfunction. The body of evidence consists of three studies that included only 54 patients with sphincter of Oddi dysfunction; results of these studies cannot be synthesized due to differences in populations and methodology. There was substantial variability in the reported performance characteristics of biliary scintigraphy.
Part IV, Section 2: Outcomes Of Treatment Using ERCP For Abdominal Pain of Possible Pancreaticobiliary Origin
Introduction
Patients with abdominal pain showing a typical biliary or pancreatic pattern who have undergone diagnostic evaluation excluding a pancreaticobiliary anatomic or structural cause for the pain may have what is termed "sphincter of Oddi dysfunction." This diagnostic category of functional abdominal pain encompasses both sphincter of Oddi stenosis and sphincter of Oddi dyskinesia. In sphincter of Oddi stenosis, there is persistent narrowing in the region of the sphincter of Oddi with abnormal pancreaticobiliary manometry findings of elevated basal pressure and abnormality of phasic contraction patterns. In sphincter of Oddi dyskinesia, there is intermittent functional obstruction in the sphincter of Oddi, and, like sphincter of Oddi stenosis, basal sphincter of Oddi pressures may be elevated at manometry, but in sphincter of Oddi dyskinesia abnormal manometry pressures may be temporarily reversible following administration of a smooth muscle relaxant (Tzovaras and Rowlands, 1998).
Classification systems for biliary type pain have been proposed with one frequently cited system derived by Hogan and Geenen (1998). In this system, patients are classified into Types I, II, and III, depending on the number of features present. Type I biliary patients have all features present including: typical biliary type pain, elevated alanine transaminase (ALT) and aspartate transaminase (AST) on two separate occasions, dilated common bile duct on ultrasound or ERCP, and delayed biliary drainage. Type II biliary patients have biliary type pain and only one or two of the additional features required for Type I. Finally, Type III patients have biliary type pain but none of the accompanying features. The prevalence of sphincter of Oddi dysfunction is generally highest for Type I biliary patients and decreases among Type II and Type III biliary patients. Additional modifications of this classification system have been made reflecting the limited role of delayed biliary drainage as a criterion (personal communication, Elta G.).
Pancreatic type sphincter of Oddi dysfunction has been classified into three types by Sherman, Troiano, Hawes, et al., 1991). In this system, Type I patients demonstrate recurrent pancreatitis and/or typical pancreatic-type pain, elevated amylase and/or lipase, dilated pancreatic duct, and prolonged drainage of pancreatic duct. Type II pancreatic type patients have typical pancreatic-type pain and one or two of the additional features listed for Type I patients. Type III pancreatic type patients have typical pancreatic type pain but none of the accompanying features.
Evidence Base
This systematic review selected studies reporting results of endoscopic treatment with sphincterotomy in patients with abdominal pain of suspected pancreaticobiliary origin (e.g., suspected sphincter of Oddi dysfunction). Studies comparing outcomes of ERCP sphincterotomy with alternative treatment strategies were included.
There were 7 studies that met the selection criteria for this question. Quality ratings are described in Table 69 and results of these studies are detailed in Tables 70 and 71. Two of these studies were prospective randomized, controlled trials (Geenen, Hogan, Dodds et al., 1989; Toouli, Robert-Thomson, Kellow et al., 2000) and met the study selection criteria as originally defined. Because of the paucity of evidence found using the original selection criteria, criteria were relaxed to include single arm studies that reported quantifiable pre- and post-outcome measures, or that compared outcomes among relevant clinical subgroups. Four studies were identified that met these modified selection criteria. One was a prospective single-arm study that evaluated consecutive patients treated with endoscopic sphincterotomy and used quantifiable pre- and post-outcome measures. Three additional articles were retrospective single-arm studies in which outcomes were compared among different clinical subgroups of patients. These studies evaluated the relative success of treatment in relation to specific clinical factors.
Finally, an eighth study, a randomized controlled trial (Jamidar, Sherman, and Hawes, 1992) was only available in abstract form and has not been submitted for publication (personal communication, Sherman S, August 2001). This abstract was not included in the review of evidence.
Review of Evidence: Randomized Controlled Trials
There were 2 double-blind randomized, controlled trials reporting on a total of 126 patients, comparing endoscopic sphincterotomy with a sham procedure (Table 70). Both of the published randomized, controlled trials were rated as "Good" by quality assessment. Strengths of these randomized, controlled trials include double blinding, the use of a sham procedure in the control group, and independent blinded assessment of outcomes. For both studies, the primary outcome was improvement in abdominal pain. Geenen, Hogan, Dodds et al. (1989) compared outcomes between groups at 1 year and Toouli, Robert-Thomson, Kellow et al. (2000) compared outcomes at 2 years. Geenen, Hogan, Dodds, et al. (1989) also reports the number of patients in each group who have persistent objective abnormalities (increased liver enzymes, dilatation of common bile duct, delayed contrast drainage) following treatment.
In the Geenen, Hogan, Dodds, et al. (1989) study, there was a significantly greater improvement in pain scores for the overall endoscopic sphincterotomy group as compared to control (65 percent vs. 30 percent with good/fair improvement, p<0.01). In Toouli, Robert-Thomson, Kellow et al. (2000), more patients in the endoscopic sphincterotomy group had improvement in pain scores than in the sham endoscopic sphincterotomy group (62 percent vs. 43 percent), however, statistical significance was not reported for the overall group comparison.
Both studies evaluated subgroups of patients with and without an elevated sphincter of Oddi pressure, defined as greater than 40mmHg. In patients with an elevated pressure, both studies report a statistically significant benefit for the endoscopic sphincterotomy group. Geenen, Hogan, Dodds, et al. (1989) reported that 91 percent (10/11) patients in the endoscopic sphincterotomy group had good or fair improvement in pain scores, compared with 25 percent (3/12) in the sham group. Similarly, Toouli, Robert-Thomson, Kellow et al. (2000) reported that 85 percent of patients in the endoscopic sphincterotomy group with elevated pressure had improvement in pain, as compared with 38 percent in the sham group (p<0.04). In patients without an elevated sphincter of Oddi pressure, both studies reported that the improvement in pain scores was not statistically significant for the endoscopic sphincterotomy group as compared to the sham group.
Geenen, Hogan, Dodds et al. (1989) reported the number of patients with objective abnormalities post treatment. At 1 year, objective abnormalities were found in 16 percent of patients in the endoscopic sphincterotomy group and 61 percent of patients in the sham group. Statistical tests were not reported for this comparison. This study also allowed crossover from sham to endoscopic sphincterotomy after one year and continued to follow patients for up to four years. After four years, the improvement in pain scores was maintained for the endoscopic sphincterotomy group. The patients who crossed over from sham to endoscopic sphincterotomy had similar outcomes as the initial endoscopic sphincterotomy group.
Review of Evidence: Nonrandomized Controlled Trials
Five nonrandomized studies reported outcomes of endoscopic sphincterotomy in patients with abdominal pain of suspected pancreaticobiliary origin (Table 71). Brand, Wiese, Thonke, et al. (2001) was a prospective single-arm study that reported quantifiable pre and post values for pain. This study treated 29 consecutive patients with biliary-type pain, increased liver enzymes, and no evidence of other pancreatobiliary pathology with ERCP and endoscopic sphincterotomy. At 12 weeks post-treatment, 26 of the remaining 28 patients available for follow-up were pain-free, and all 26 patients remained pain-free after a median follow-up of 19 months. Wehrmann, Wiemer, Lembcke, et al. (1996) prospectively compared the results after endoscopic sphincterotomy in 20 patients with Type II SOD and 13 patients with Type III SOD. After a median of 2.5 years follow-up, 60 percent of the Type II SOD patients and only 8 percent of the Type III SOD patients maintained symptomatic relief.
The 3 retrospective single-arm studies compare outcomes among subgroups of patients who underwent ERCP and endoscopic sphincterotomy (Botoman, Kozarek, Novell, et al., 1994; Choudhry, Ruffolo, Jamidar, et al., 1993; Thatcher, Sivak, Tedesco, et al., 1987). In particular, these studies explore the relationship between improvement in pain following endoscopic sphincterotomy, baseline sphincter of Oddi pressure, and/or the presence of a dilated common bile duct. Because of the retrospective, uncontrolled nature of these studies, they do not provide strong data on the absolute improvement seen following treatment with endoscopic sphincterotomy. However, comparison of outcomes among clinical subgroups in these studies may provide useful information regarding the relative success of this treatment in different patient groups.
Among all patients treated with endoscopic sphincterotomy, these studies report good/fair improvement in over 60 percent. The presence of baseline sphincter of Oddi pressure greater than 40 mm Hg, a dilated common bile duct and/or delayed common bile duct emptying appear to be associated with slightly higher success rates after endoscopic sphincterotomy. However, confidence in this conclusion is limited by the small numbers of patients in the subgroup analyses, and the lack of tests of statistical significance in some cases.
Conclusions
The randomized controlled trials by Geenen, Hogan, Dodds et al. (1989) and Toouli, Robert-Thomson, Kellow et al. (2000) provide strong and consistent evidence that endoscopic sphincterotomy provides effective relief of pain in patients with pancreaticobiliary pain, sphincter of Oddi dysfunction, and elevated basal sphincter of Oddi pressure on manometry (greater than 40 mm Hg). The results of the nonrandomized studies corroborate these data and suggest that patients with a dilated common bile duct and/or delayed contrast emptying may also benefit from endoscopic sphincterotomy.
There is insufficient evidence to determine whether endoscopic sphincterotomy improves outcomes in patients with normal manometry findings. For this group, the small studies included in this review do not report significant improvements in pain for the endoscopic sphincterotomy group.
ERCP Evidence Review Results and Conclusions, Part V: Patient, Procedure or Operator Determinants of ERCP Complications
This chapter reviews evidence on the following questions:
What patient, procedure, or provider factors are determinants of adverse events of ERCP?
(Section 1: Multivariable Analyses)
(Section 2: Randomized, Controlled Comparison Trials)
Part V, Section 1: Multivariable Analyses
Body of Evidence
Thirteen studies reported on multivariable logistic regression analyses of factors associated with complications of ERCP (Table 72; see also "Evidence Tables" chapter). The four largest studies each included more than 1,800 patients, and the total number of complications observed in these studies ranged from 98 to 229 (Loperfido, Angelini, Benedetti, et al., 1998; Freeman, DiSario, Nelson, et al., 2001; Freeman, Nelson, Sherman, et al., 1996; Masci, Toti, Mariani, et al., 2001). The remaining 9 studies ranged from 100 to 535 patients, and the number of complications observed ranged from 10-34. Seven studies reported on therapeutic ERCP, 5 studies combined therapeutic and diagnostic ERCP, and one study reported on diagnostic ERCP.
Total complications were analyzed in seven studies. The specific complications most commonly analyzed separately were pancreatitis (7 studies) and hemorrhage (4 studies). The number of cases of pancreatitis observed ranged from 17 to 131; and cases of hemorrhage ranged from 10 to 48. Other complications analyzed separately in these studies include cholangitis, septicemia, and retroperitoneal perforation, with number of cases observed ranging from 10 to 34.
This systematic review addresses the relationship of patient, procedure, and operator factors to complications. The 13 included studies assessed numerous factors suspected to be related to the likelihood of complications. The various measures used in the literature were classified into categories. There are 12 categories for patient factors, 13 for procedure factors; and 4 categories for operator factors. Independent variables reported to be statistically significant risk factors for complications are listed for each study along with an estimate of the magnitude of the effect when available (i.e., odds ratio and confidence interval). Independent variables that were considered in the study but not found to be significantly associated with complications are denoted by an "X" under the appropriate category for that factor.
Study Quality
The number of events observed is the primary determinant of the power of a study to detect a significant association between a factor and an outcome of interest. When multivariable analysis is performed, the number of events also constrains the number of potential relationships that can be appropriately tested. A commonly accepted benchmark is a minimum of 10 outcome events per independent variable tested. A larger number of variables relative to events can lead to unstable results, spurious findings of significance, and unreliable estimates of the magnitude of the association. Extremely wide confidence intervals are a hallmark of such "overfitted" models. Another problem is that when multiple variables are incorporated in a model, some may be highly correlated. As a result, some independently significant factors can be obscured. Concato, Feinstein, and Holford (1993) offer an overview of the methodologic deficiencies that are common in multivariable analyses published in the medical literature.
Overall, the multivariable analyses included in this systematic review demonstrated overfitting, i.e., testing an excessive number of factors relative to the number of complications observed. Consequently, this literature is exploratory in nature. Candidate variables included in the analyses are often likely to be closely related to each other (potentially leading to collinearity) resulting in potentially spurious results from multivariable analysis including all variables. Instances where multiple factors identified to be highly associated with complications on univariate analysis disappear entirely from the multivariable models raises concern over the stability of the findings. Reported magnitudes of association are not reliable, significant independent variables may have been overlooked, and some significant associations may be misleading. Moreover, the existing studies do not use common, standardized definitions for the complications and factors of interest. Thus, caution should be used in drawing inferences for clinical practice from these studies.
This body of literature was overall rated as "Fair" (Table 73). The associations found in these analyses are hypothesis generating, but not predictive. The three studies with notably larger numbers of cases of complications (121-229 vs. 10-98) were designated as "Fair" quality for purposes of this review (Freeman, DiSario, Nelson, et al., 2001; Freeman, Nelson, Sherman, et al., 1996; Masci, Toti, Mariani, et al., 2001) while the remaining 10 studies were rated "Fair Minus." The results of the three "Fair" studies are slightly more robust, despite some degree of overfitting. The study by Loperfido, Angelini, Benedetti, et al. (1998) had 98 cases, but was classified as "Fair Minus" because confidence intervals were not reported and problems with missing data were noted.
This review focuses on factors that were found to be significant either in the more robust studies or in several studies. Also, factors are noted that were found to be not significant in all analyses. Rarely was a factor found to be significant in all studies in which it was analyzed; which is not surprising given the characteristics of the available studies. Extremely wide confidence intervals also are noted, which may suggest a spurious association.
Review of Evidence: Patient Factors
All 13 studies reported on patient factors associated with complications. These various factors were classified into 12 categories: age, gender, common bile duct size/diameter, cholangitis, anatomic variation, coagulopathy, laboratory values, comorbidities, indication for ERCP procedure, previous gastrectomy, history of jaundice, and history of allergy to contrast media.
Total Complications
Seven studies reported on total complications (Table 74). Two factors were found to be significant in a study rated as "Fair" and in one additional study. These were age equal to or less than 60 years (Masci, Toti, Mariani, et al., 2001; Rabenstein, Schneider, Bulling, et al., 2000) and suspected sphincter of Oddi dysfunction (Freeman, Nelson, Sherman, et al., 1996; Tzovaras, Shukla, Kow, et al., 2000).
Jaundice of malignancy was significant in the study by Tzovaras, Shukla, Kow, et al. (2000) and elevated serum bilirubin in Neoptolemos, Shaw, and Carr-Locke (1989). Factors found to be significant in a single study rated as "Fair Minus" were: pancreas divisum, coagulopathy, pancreatic obstruction (Rabenstein, Schneider, Bulling, et al., 2000), and juxtapapillary diverticulum (Boender, Nix, de Ridder, et al., 1994). However, confidence intervals were extremely wide for pancreas divisum (1.56-36.6) and coagulopathy (1.95-48.1).
The following factors were analyzed, but were not found to be significant for total complications in any study: gender (6 studies); common bile duct size/diameter (4 studies); cholangitis (2 studies); previous gastrectomy (3 studies);
Pancreatitis
Seven studies reported on patient factors associated with pancreatitis (Table 75). Younger age was significant in four studies, two rated as "Fair" quality. Each of the four studies used a different age cut-off: 70 years in Loperfido, Angelini, Benedetti, et al. (1998); 60 years in Masci, Toti, Mariani, et al. (2001); 59 years in Mehta, Pavone, Barkun, et al., (1998); and 30 years vs. 70 years in Freeman, Nelson, Sherman, et al. (1996). Suspected sphincter of Oddi dysfunction was significant in two studies, both rated "Fair" (Freeman, Nelson, Sherman, et al., 1996; Freeman, DiSario, Nelson, et al., 2001). Note that the two studies by Freeman and co-workers included different patient populations.
Factors found to be significant in a single study rated "Fair" (Freeman, DiSario, Nelson, et al., 2001) were: normal bilirubin, female gender, absence of chronic pancreatitis, and history of post-ERCP pancreatitis.
Factors found to be significant in a single study rated as "Fair Minus" were: absence of a common bile duct stone at ERCP (Mehta, Pavone, Barkun, et al., 1998); and pancreas divisum, but with an extremely wide (1.91-34.79) confidence interval (Rabenstein, Schneider, Bulling, et al., 2000). Loperfido, Angelini, Benedetti, et al. (1998) found non-dilated duct to be significant, but did not report the confidence interval.
Previous gastrectomy was analyzed in two studies, but was not significant.
Hemorrhage
Four studies reported on patient factors associated with hemorrhage (Table 76). Coagulopathy was significant in a study rated as "Fair" (Freeman, Nelson, Sherman, et al., 1996), prothrombin time and hemodialysis (Nelson and Freeman, 1994) were significant in one additional study. Factors found to be significant in a single study rated as "Fair" were: cholangitis (Freeman, Nelson, Sherman, et al., 1996), and obstructed papilla of Vater orifice (Masci, Toti, Mariani, et al., 2001).
Factors that were not significant in any analysis were: age (3 studies), gender (3 studies); common bile duct size/diameter (4 studies); indications for ERCP (3 studies); previous gastrectomy (2 studies); and history of jaundice (1 study).
Cholangitis
Two studies, both rated as "Fair Minus" quality, reported on patient factors associated with cholangitis (Table 77). Loperfido, Angelini, Benedetti, et al. (1998) reported that jaundice had a significant association with cholangitis. Lai, Lo, Choi, et al. (1989) reported significant associations for fever greater than 37.5 degrees Celsius within prior 72 hours; malignant obstruction; and serum AST of 70 IU or less.
The study by Loperfido, Angelini, Benedetti, et al. (1998) also included age, gender, common bile duct size and diameter, anatomic features, and previous gastrectomy in the analysis, but none were significant.
Septicemia and Retroperitoneal Perforation
Septicemia (Table 78) and retroperitoneal perforation (Table 79) were each addressed in a single study of "Fair Minus" quality.
Motte, Deviere, Dumonceau, et al. (1991) reported that prior cholangitis and elevated white blood count were significant factors for septicemia, but did not report p-values. Age, gender, anatomic variation, other comorbidities, and history of jaundice were not significant in this analysis.
Loperfido, Angelini, Benedetti, et al. (1998) reported that previous gastrectomy was a significant factor for retroperitoneal perforation, but did not report confidence intervals. Age, gender, common bile duct size/diameter; anatomic variation, and history of jaundice were not significant in this analysis.
Relationship of Total and Specific Complications
Pancreatitis and hemorrhage together comprise the majority of total complications in the three studies that report all 3 outcomes (Masci, Toti, Mariani, et al., 2001; Freeman, Nelson, Sherman, et al., 1996; Loperfido, Angelini, Benedetti, et al., 1998). Pancreatitis was 36 percent, 55 percent, and 30 percent, respectively in these studies; and hemorrhage was 25 percent, 21 percent and 21 percent.
In the study by Masci, Toti, Mariani, et al. (2001), younger age was a significant factor for both pancreatitis and total complications. There was no other overlap between risk factors for total complications and pancreatitis or hemorrhage.
In Freeman, Nelson, Sherman, et al. (1996), suspected sphincter of Oddi dysfunction was a significant factor for both pancreatitis and total complications. There was no other overlap between total complications and pancreatitis or hemorrhage. In contrast to Masci, Toti, Mariani, et al. (2001), younger age was significant only for pancreatitis, not for total complications.
Loperfido, Angelini, Benedetti, et al. (1998) found no significant relationships between patient factors and overall complications.
The inconsistencies noted here might suggest that analysis of patient factors related to specific complications may be more informative than total complications. Analysis of total complications may not be sufficiently sensitive. This suggests that large studies with adequate numbers of cases of the specific complications of interest will be more useful in identifying patient-related factors that might be used to improve clinical outcomes.
Review of Evidence: Procedure Factors
Eleven studies reported on patient factors associated with complications. The various measures were classified into 13 categories: papillotomy/endoscopic sphincterotomy; pre-cut endoscopic sphincterotomy; biliary drainage; failed procedure; length of endoscopic sphincterotomy; bleeding during endoscopic sphincterotomy; combination with other procedures; difficulty of cannulation; pancreatic opacification; post-procedure care; intramural injection; sphincter of Oddi manometry; emergency procedure.
Total Complications
Six studies reported on procedure factors associated with total complications (Table 80). Precut endoscopic sphincterotomy was significant in all four studies that tested for this association; including two studies rated as "Fair" (Masci, Toti, Mariani, et al., 2001; Freeman, Nelson, Sherman, et al., 1996). Freeman, Nelson, Sherman, et al. (1996) also found two additional significant factors, combined percutaneous-endoscopic procedures and difficulty in cannulation. Masci, Toti, Mariani, et al. (2001) found that failed stone removal, another indicator of a difficult procedure, was a significant factor for total complications.
Failed biliary drainage was significant in the study by Boender, Nix, de Ridder, et al. (1994). Tzovaras, Shukla, Kow, et al. (2000) reported two significant factors: previous failed ERCP (CI=1-21.8) and need for percutaneous procedure (CI=2.3-45.8); but confidence intervals were extremely wide for both factors.
Factors not significant were: emergency procedure (4 studies); pancreatic opacification (2 studies); and bleeding during endoscopic sphincterotomy (1 study).
Pancreatitis
Seven studies reported on procedure factors associated with pancreatitis (Table 81). Precut endoscopic sphincterotomy was significant in two studies rated as "Fair" (Masci, Toti, Mariani, et al., 2001; Freeman, Nelson, Sherman, et al., 1996); as was difficulty in cannulation and multiple pancreatic contrast injections (Freeman, Nelson, Sherman, et al., 1996 and Freeman, DiSario, Nelson, et al., 2001). Multiple pancreatic contrast injections was also a significant risk factor in Loperfido, Angelini, Benedetti, et al. (1998); and in Mehta, Pavone, Barkun, et al. (1998) for the subgroup of patients that did not undergo endoscopic sphincterotomy.
Masci, Toti, Mariani, et al. (2001) also reported that failed stone removal was a significant factor; and Freeman, DiSario, Nelson, et al. (2001) found that pancreatic sphincterotomy and balloon biliary sphincter dilatation were also significant factors.
Maldonado, Brady, Mamel, et al. (1999) identified performing a complete ERCP procedure in addition to sphincter of Oddi manometry as a significant risk factor for pancreatitis among patients who all underwent sphincter of Oddi manometry.
Factors not significant were: emergency procedure (3 studies); biliary drainage (1 study); and bleeding during endoscopic sphincterotomy (1 study).
Hemorrhage
Four studies reported on procedure factors associated with hemorrhage (Table 82). Bleeding during endoscopic sphincterotomy was significant in two studies, one of which was rated as "Fair" (Freeman, Nelson, Sherman, et al., 1996; Nelson and Freeman, 1994). Precut endoscopic sphincterotomy (Masci, Toti, Mariani, et al., 2001) and anticoagulation less than 3 days after procedure (Freeman, Nelson, Sherman, et al., 1996) were significant in a single study rated "Fair."
Factors not significant were: pancreatic opacification (3 studies) emergency procedure (2 studies); combined with other procedures (2 studies); biliary drainage (1 study); failed procedure (1 study); endoscopic sphincterotomy length (1 study); and difficulty of cannulation (1 study).
Cholangitis, Septicemia and Retroperitoneal Perforation
Cholangitis (Table 83), septicemia (Table 84) and retroperitoneal perforation (Table 85) were each addressed in a single study of "Fair Minus" quality.
Loperfido, Angelini, Benedetti, et al. (1998) analyzed precut endoscopic sphincterotomy, pancreatic opacification; and emergency procedure; but none of these factors were significant for cholangitis.
Motte, Deviere, Dumonceau, et al. (1991) reported that incomplete biliary drainage was a significant factor for septicemia, but did not report p-values. Combination with another procedure was not significant in this analysis.
Loperfido, Angelini, Benedetti, et al. (1998) reported that precut endoscopic sphincterotomy and intramural injection were significant factors for retroperitoneal perforation, but did not report confidence intervals. Pancreatic opacification and emergency procedure were not significant in this analysis.
Relationship of Total and Specific Complications
Pancreatitis and hemorrhage together comprise the majority of total complications in the three studies that report all three outcomes (Masci, Toti, Mariani, et al., 2001; Freeman, Nelson, Sherman, et al., 1996; Loperfido, Angelini, Benedetti, et al., 1998).
Masci, Toti, Mariani, et al. (2001) found the precut endoscopic sphincterotomy was a significant factor for total complications, pancreatitis and hemorrhage. Failed stone removal was a significant factor for total complications and pancreatitis, but not for hemorrhage. There was no other overlap between total complications and pancreatitis or hemorrhage.
Freeman, Nelson, Sherman, et al. (1996) found that precut endoscopic sphincterotomy and difficulty in cannulation were significant factors for total complications and pancreatitis. There was no other overlap between total complications and pancreatitis or hemorrhage.
Loperfido, Angelini, Benedetti, et al. (1998) found no overlap between total complications and pancreatitis or hemorrhage.
This suggests that procedure factors may be more generalizable across total and specific complications than is the case with patient factors.
Review of Evidence: Operator Factors
Operator factors were analyzed in four studies (Freeman, DiSario, Nelson, et al., 2001; Freeman, Nelson, Sherman, et al., 1996; Loperfido, Angelini, Benedetti, et al., 1998; Rabenstein, Schneider, Bulling, et al., 2000); two of which were rated as "Fair" quality (Table 86). Case volume was analyzed in all four studies; participation of a trainee in three studies; university affiliated center in one study and center size in one study. Only case volume was a significant factor for complications in any of these analyses. Importantly, cut-off points for classification as a low-volume operator varied significantly across studies. Freeman, Nelson, Sherman, et al. (1996) used a cut-off of centers with 1 or fewer procedures per endoscopist per week; Loperfido, Angelini, Benedetti, et al. (1998) defined lower volume centers as those with fewer than 200 procedures per year.
Case volume was not independently significant in the primary multivariate analysis of total complications conducted by Freeman, Nelson, Sherman, et al. (1996), probably because of the close relationship with intraoperative technique. In a multivariable model that was based solely on data available prior to the procedure, lower case volume (average less than 1 case/week per endoscopist vs more than one 1 case) was independently associated with higher complications (OR 1.43, CI=1.07-1.89). This suggests that endoscopist skill in avoiding specific procedural technique is the basis for the association between case volume and complications.
Lower volume of ERCP procedures was associated with hemorrhage in two studies (Freeman, Nelson, Sherman, et al., 1996 and Loperfido, Angelini, Benedetti, et al., 1998) (Table 87). Rabenstein, Schneider, Bulling, et al. (2000) was the only study to find a significant association between lower case volume and pancreatitis (Table 88). The cut off used was fewer than 40 endoscopic sphincterotomies per endoscopist per year. Loperfido, Angelini, Benedetti, et al., (1998) also explored the relationship between case volume and cholangitis or retroperitoneal perforation (Tables 89 and 90) and reported an odds ratio of 4.22 for cholangitis and no association with retroperitoneal perforation.
Conclusion
- Thirteen studies reported on multivariable logistic regression analyses of factors associated with complications of ERCP. The four largest studies each included more than 1,800 patients, and the total number of complications observed in these studies ranged from 98 to 229. Overall, the methodologic quality of the available analyses is limited by overfitting, i.e., testing an excessive number of factors relative to the number of complications observed. Consequently, this literature is exploratory in nature. Reported magnitudes of association are not reliable, significant independent variables may have been overlooked, and some significant associations may be misleading. Moreover, the existing studies do not use common, standardized definitions for the complications and factors of interest. Thus, caution should be used in drawing inferences for clinical practice from these studies.
- Patient, procedure and operator factors were identified that were found to be significantly associated with complications in several of the more robust studies. Younger age (using various cut-offs, but generally 60 years or less) was significantly associated with total complications and with pancreatitis; as was suspected sphincter of Oddi dysfunction. Precut endoscopic sphincterotomy was the procedure-related factor most commonly associated with total complications or pancreatitis; a significant association with difficulty in cannulation was also reported, but less frequently. Multiple pancreatic contrast injections was associated with pancreatitis. For hemorrhage, the clearest association was patient factors related to coagulopathy. Case volume was the only operator-related factor found to be significantly associated with complications. These studies used various cut-offs to define lower volume centers: 1 or fewer procedures per endoscopist per week; fewer than 40 endoscopic sphincterotomies per endoscopist per year; and fewer than 200 procedures per year.
Part V, Section 2: Randomized, Controlled Comparison Trials
Introduction
This section summarizes the available randomized, controlled trials that compare technical variations in performing the ERCP procedure and compare associated complication rates. Quality ratings for these studies are available in Table 91. In addition, some of these studies provide comparative information on technical success of the procedure. Based on discussion with this project's Technical Advisory Group, studies evaluating the use of pharmacologic agents or different contrast agents in preventing ERCP-induced pancreatitis were specifically excluded from this systematic review as the volume of this literature could not be incorporated within the scope of this project.
Review of Evidence
Sphincterotome versus Standard Catheter to Achieve Selective Common Bile Duct Cannulation
Two randomized controlled trials (total n=147) compared standard catheterization versus techniques using sphincterotomes to achieve higher success rates in selectively cannulating the common bile duct (Table 92). Cortas, Mehta, Abraham, et al. (1999) randomized 47 patients to standard catheter versus either a standard or wire-guided sphincterotome, and was rated a "Good" quality study. Fifteen attempts were made to cannulate the common bile duct with the randomly assigned catheter, after which patients crossed over. In the initial attempt, the sphincterotome was more successful than the standard catheter in achieving cannulation (97 percent vs. 67 percent, p=0.009). After cross overs, the techniques were equivalent (standard catheter 94 percent sphincterotome 97 percent, p=n.s.), but successful cannulation was achieved in the sphincterotome group with fewer attempts (12.4 vs. 2.8, p<0.001) and in less time (13.5 vs. 3.1 minutes, p<0.001). Pancreatitis occurred in 5.6 percent of standard catheter group, and 10.3 percent of the sphincterotome group, but numbers are too small to assess statistical significance.
Schwacha, Allgaier, Deibert, et al. (2000) randomized 100 patients to standard catheter versus sphincterotome and was rated "Fair." If the randomly assigned technique was unsuccessful patients underwent attempts with a tapered cannula, crossing over to the other treatment arm, and then needle knife sphincterotomy. In the initial attempts, the sphincterotome was more successful than the standard catheter (84 percent vs. 62 percent, p=0.023). Eventually, cannulation was equally successful in both groups (91 percent for both). Complications were not statistically different between the two groups.
Based on limited evidence, techniques using a sphincterotome appear to have greater success in selective cannulation of the common bile duct than standard catheter, but no definite conclusion can be made regarding the effect of this variation on complications.
Variations in Electric Current Used in Sphincterotomy to Reduce Post-ERCP Complications
Three randomized clinical trials (all rated "Fair" quality) compared variations of the electric current used in performing sphincterotomy as methods to reduce post-procedure complications such as hemorrhage or pancreatitis.
Elta, Barnett, Wille, et al. (1998) randomized 170 patients to either blended or pure cut current when undergoing sphincterotomy. Blended current combines intermittent high voltage pulses with continuous low voltage current, whereas pure cut current is simply continuous low voltage current. Total complications were significantly lower in the pure cut group (5 percent vs. 14 percent, p<0.05).
Kohler, Maier, Benz et al. (1998) randomized 100 patients to either conventional high-frequency blended current or a newly developed high-frequency system with automatically controlled cutting mode (Endocut). Mild bleeding during sphincterotomy was significantly reduced (4 percent compared to 26 percent, p=0.002), but no significant difference was observed in moderate/severe bleeding or mild pancreatitis, which both occurred very infrequently.
Siegel Veerappan, and Tucker (1994) randomized 100 patients to receive either a bipolar or monopolar electric current device when undergoing sphincterotomy. Pancreatitis occurred in 6 patients receiving monopolar electrocautery and 1 patients receiving bipolar electrocautery (p<0.05). Other complications were very uncommon and numbers were too small to make conclusions about statistical significance.
Forward-Viewing Endoscope versus Side-Viewing Endoscope to Achieve Successful Cannulation and Sphincterotomy in Patients with Billroth II Gastrectomy
Kim, Lee, Lee, et al. (1997) randomized 45 patients with Billroth II gastrectomy who required ERCP and sphincterotomy to have the procedure done with either a forward-viewing (FV) endoscope or side-viewing (SV) duodenoscope. Successful cannulation occurred in 87 percent of FV group and 68 percent of SV group (p=n.s.) Successful sphincterotomy was not statistically different (FV 83 percent, SV 80 percent). Jejunal perforation occurred in 4 patients using the SV duodenoscope and 0 patients using the FV endoscope (p<0.05). Use of the FV endoscope may cause fewer perforations than the SV duodenoscope.
Pancreatic Stenting to Reducing Pancreatitis after Sphincterotomy
Two small randomized controlled trials examined whether placing pancreatic stents after sphincterotomy reduces the incidence of post-ERCP pancreatitis among certain patients considered to be at high risk for such a complication.
Smithline, Silverman, Rogers, et al. (1993) randomized 98 patients using an alternate assignment scheme and was rated Fair quality. The patients included those with abnormal SOD manometry, clinical suspicion of SOD, a common bile duct <=10 mm or patients requiring a pre-cut sphincterotomy. Some patients requiring a pre-cut sphincterotomy were assigned a stent out of the randomization scheme. The results are analyzed only among those who received intended treatment, as patients with failed stent placement (5 patients) are analyzed separately. The no-stent group had an 18 percent rate of pancreatitis, the stent group had a 14 percent rate of pancreatitis (p=n.s.) If appropriately analyzed by intent-to-treat, the pancreatitis rates would be even more similar.
Tarnasky, Palesch, Cunningham et al. (1998) randomized 80 patients to receive stents or no stent and was rated "Good" quality. The selection criteria appear to be more selective than the study by Smithline, Silverman, Rogers, et al. (1993), as only patients with confirmed abnormal sphincter of Oddi manometry and pancreatic sphincter hypertension were included. The incidence of post-ERCP pancreatitis in the stent group was 2 percent, and in the no stent group was 26 percent (p=0.003). After correction for some baseline differences between study groups, the risk of post-ERCP pancreatitis was still highly associated with lack of stent placement (odds ratio 14.4, p=0.002).
An important distinction between the two studies is the selection criteria. Smithline, Silverman, Rogers, et al. (1993) included several types of patients that are thought to be at risk of post-ERCP pancreatitis, Tarnasky, Palesch, Cunningham et al. (1998) included only patients with both confirmed abnormal sphincter of Oddi manometry and pancreatic sphincter hypertension. About three-fourths of the patients in the Smithline, Silverman, Rogers, et al. (1993) study had abnormal sphincter of Oddi manometry, and among those, pancreatic sphincter pressure was not assessed. Thus the results may not be inconsistent, even though the same intervention is assessed using identical outcome measures.
In conclusion, evidence limited to only one trial shows some evidence of efficacy of pancreatic stent placement in preventing post-ERCP pancreatitis, but only among patients with confirmed sphincter of Oddi manometry and concurrent pancreatic sphincter hypertension.
- Part I: Common Bile Duct Stones
- Results and Conclusions, Part II: Pancreaticobiliary Malignancy
- Results and Conclusions, Part III: Pancreatitis
- Results and Conclusions, Part IV: Abdominal Pain Of Possible Pancreaticobiliary Origin
- ERCP Evidence Review Results and Conclusions, Part V: Patient, Procedure or Operator Determinants of ERCP Complications
- Results and Conclusions - Endoscopic Retrograde CholangiopancreatographyResults and Conclusions - Endoscopic Retrograde Cholangiopancreatography
- retbindin precursor [Mus musculus]retbindin precursor [Mus musculus]gi|21450337|ref|NP_659178.1|Protein
- Effect of the Supplemental Use of Antioxidants Vitamin C, Vitamin E, and Coenzym...Effect of the Supplemental Use of Antioxidants Vitamin C, Vitamin E, and Coenzyme Q10 for the Prevention and Treatment of Cancer
Your browsing activity is empty.
Activity recording is turned off.
See more...