NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.
Butler M, Urosevic S, Desai P, et al. Treatment for Bipolar Disorder in Adults: A Systematic Review [Internet]. Rockville (MD): Agency for Healthcare Research and Quality (US); 2018 Aug. (Comparative Effectiveness Review, No. 208.)
Overview
The evidence base for treatments for bipolar disorder (BD) is sparse and scattered. While a large number of studies were identified, they mapped across a considerable number of treatments and comparators, ultimately yielding few for each actual comparison.
We found no high or moderate strength of evidence for any treatment during any phase of bipolar illness (i.e., acute mania, acute depression, or maintenance). For treatment of acute mania, low-strength evidence was found for atypical antipsychotics compared to placebo for improvements in response and possible remission rates, and improvements in manic symptoms and clinical global impressions. (Table 48) There was also low-strength evidence for improved response and remission rates, as well as manic symptom improvement, for lithium versus placebo. However, most manic symptom improvements were of modest clinical significance, with values that were less than the minimally important difference (MID) but still large enough that a reasonable proportion of participants likely received a benefit. For maintenance phase treatment, only lithium achieved low-strength evidence for benefit for the long-term (1-2 years). No treatments with even low-strength evidence showed favorable outcomes for treatment of depression. Across treatment phases, the large majority of drug comparisons, including almost all comparisons using active comparators, had insufficient evidence from which to draw conclusions.
Similarly, only a few studies of psychosocial interventions reached low-strength evidence, finding no differences between particular psychosocial treatment approaches versus active comparators (e.g., another psychotherapeutic approach) for a subset of outcomes. Most comparisons had insufficient evidence to address whether the therapy of interest improves outcomes compared to either inactive (usual care) or active (another therapeutic approach) controls. However, the studies’ inclusion criteria and limitations (see section below on limitations) preclude definitive conclusions about the effects of psychosocial interventions.
We were unable to draw a conclusion for several Food and Drug Administration (FDA)-approved drugs for BD. One FDA-approved atypical antipsychotic, aripiprazole, had a limited number of studies and high risk of bias contributing to study limitations for mania treatment evidence. We noted that while a random effect model largely showed no difference between groups in response rates, manic symptom improvement, or withdrawal rates, if a fixed effect model is used, symptom improvements were seen, but at just over half the MID. Fixed effect models only allow inferences for the specific participants in the specific studies, not generalization to the larger applicable population. One FDA-approved drug, chlorpromazine, was used as a comparator in only one study and otherwise not examined. A typical (first generation) antipsychotic, chlorpromazine was approved by the FDA in 1957. The lack of chlorpromazine in the included literature reflects the treatment preference for a different typical antipsychotic, haloperidol, because of the sedative and blood pressure effects of chlorpromazine. Lurasidone, olanzapine, and quetiapine have been approved for depression in BD, based on 6 to 8 week studies, but no studies were identified with at least 3 months followup in this review.
Table 49 provides a list of all comparisons in this review for which we were unable to draw conclusions. Notably, the insufficiency of the evidence does not indicate that the examined approaches do not have therapeutic benefits, but rather that the scientific evidence is insufficient to draw any conclusions about their therapeutic effects.
Adverse events in drug studies were somewhat consistently reported for extrapyramidal symptoms, and clinically significant weight gain of greater than 7 percent, but otherwise variably reported. The harms findings from the included placebo-controlled studies were consistent with information currently reported by FDA labels. (Please see Appendix Q for drug label information on FDA box warnings and serious adverse events.) While most studies reported no differences between groups, we noted participants using antipsychotics, except quetiapine, reported more extrapyramidal symptoms compared to placebo, and those using olanzapine reported more clinically significant weight gain. For mood stabilizers, participants using carbamazepine reported more severe rash and adverse events compared to placebo. In head-to-head studies, we noted a general pattern of participants receiving atypical antipsychotics fewer extrapyramidal symptoms than participants receiving haloperidol. Unfortunately, psychosocial studies generally did not report attempting to collect harms or other unintentional consequences of receiving psychosocial treatments.
Although we had originally anticipated parsing study findings across several populations and subgroups of interest to address Key Question 4, the vast majority drug treatment studies enrolled participants with bipolar I disorder (BD-I). This held even for maintenance trials as many were extensions of trials with participants who had responded to treatment for an acute manic episode. Given the low to insufficient strength of evidence assessments arising from high study limitations and attrition for the main study research questions, any post-hoc analysis for subgroups would be by definition high risk of bias and not sufficient to draw conclusions. We were therefore unable to address how treatments may differ across different BD populations and subgroups. Likewise, we also did not locate any studies specifically testing interventions in BD patients to address drug treatment side effects for Key Question 3.
Applicability
Applicability of the review findings is challenging. The trials for drug treatments used restrictive exclusion criteria. Over three quarters of the studies for mania also excluded participants experiencing a first manic episode. Moreover, given the inclusion criteria, it is not clear if the current findings extend to populations with bipolar II disorder (BD II), current comorbid substance use, pregnant or nursing women with BD I, or older adults (i.e., age 65 and over). Conversely, the psychosocial trials often did not provide detailed information on the participants and the lack of population description limits the ability to infer from the results. Such a mixed population may mask patterns of effect. With the current information, we cannot determine if or to what extent, this contributed to the few findings of nonsignificance between groups.
Factoring in the issue of high attrition, trials with 20 to 50 percent attrition, such as were used in this review, at best provide an estimate of the effect of a treatment for participants who adhere to, tolerate, and, in some minimal sense, benefit from the treatment. However, at extremely high levels of attrition, even this interpretation is of limited value to clinicians.265 If over 50 percent of patients do not finish treatment, and thus were not followed-up to the end of the trial, then the chances of the trial results being applicable to a new patient would be less than half. Applicability drops even further when we recognize the original randomized sample excludes many subpopulations and co-occuring conditions which reduces how much the sample represents people encountered in regular clinical practice, Likewise, the maintenance trials are most applicable to people with BD-I who respond to initial treatment.
Findings in Relation to What Is Already Known
The findings of this review are consistent with other systematic reviews of treatments for BD, although, given the attention this review paid to the role of attrition, more restricted in positive findings. Compared to published Cochrane reviews, our findings were generally consistent, although somewhat more conservative. We also found benefit for olanzapine and risperidone compared to placebo for mania, and benefit for lithium compared to placebo for maintenance.266-268 Cochrane reviews have reported benefit for several additional antipsychotics compared to placebo for which we found insufficient evidence (aripiprazole, haloperidol as single drug and added to mood stabilizers, and olanzapine or risperidone plus mood stabilizers).266, 269-272 However, authors of these reports consistently noted issues with attrition and medication adherence may have impacted their results. Insufficient evidence for psychosocial interventions was consistent across all reviews.263, 273
Limitations of the Comparative Effectiveness Review
There were several limitations of the review. The search strategy relied on previous published reviews to identify relevant studies published prior to 1994. The original date was chosen to reflect the change to DMS-IV diagnostic criteria for BD and to focus review resources on abstracting relevant studies rather than searches for ground that has been otherwise well-tread. We believe we have identified the relevant literature, but the possibility of missing a publication, particularly on lithium, remains.
Several inclusion criteria may also have created limitations. We only included studies if the populations were exclusively diagnosed with BD, or if the bipolar subpopulation results were reported separately. While still relevant for drug treatments, psychosocial treatments in particular that were specific to depression or mania and combined in analyses participants with bipolar and nonbipolar diagnoses might not have been included in this review.
Excluding all outcomes except for time-to-event outcomes from studies with greater than 50 percent attrition hindered our ability to address outcomes of interest that require longer followup in studies of smaller sample sizes. However, as is noted in the section below on limitations of the evidence base, the missing data problems created by high attrition is a counterweight to this limitation. A recent overview of reviews from the International Society for Bipolar Disorders Task Force on Suicide in Bipolar Disorder found that while lithium or anticonvulsants are suggestive for preventing suicide attempts and deaths, more research is needed to before the effects can be confirmed.7
Literature on harms was essentially based on identified RCTs. We required studies to be at least prospective cohort studies with comparator arms and clearly reported for BD populations. This led to a number of observational studies being excluded, including observational studies that looked at broad classes of drugs, or individual drugs across broad populations.
We also chose minimum study followup periods of 3 weeks for acute mania studies, 3 months for depression studies, and 6 months for maintenance studies. Many studies for depression treatment and other somatic treatments, such as ECT or light therapy, were excluded due to too short of study followup. Given the chronic nature of BD, the clinical relevance of studies reporting the effects of treatments with shorter followup periods is questionable. For example, if a treatment response to depression is not sustained, does it matter if the initial response to one treatment was faster than another? Moreover, in order to provide evidence that a treatment reduces bipolar episode relapse rates, a study followup longer than 12 months is likely needed to capture frequency of episodes that may occur once or twice per year for some individuals with bipolar disorder.
Limitations of the Evidence Base
Even though we excluded studies with greater than 50 percent attrition (unless the outcome was time to relapse), one of the great challenges we confronted in conducting this systematic review was deciding how to interpret trial results in the face of often very high attrition rates. In the case of trials evaluating pharmaceutical treatments for acute mania, it was very common for anywhere from 10 to70 percent of randomized patients to not complete treatment for even 3 weeks trials. In principle, treatment discontinuation need not lead to trial discontinuation, i.e., dropping out of the study and subsequent missing data. A National Research Council (NRC) report on missing data in randomized control trials stressed the importance of continuing to collect data on patients for whom treatment is discontinued, be it due to lack of efficacy, adverse events, or other reasons.259 Unfortunately, while the majority of reports did not explicitly comment on whether treatment discontinuation implied trial dropout, we were generally left with this impression given the common reliance on last-observation carried forward (LOCF) techniques and usage of terms like ‘discontinued’ and ‘percent completing trial’. This means that many, if not most, trials had dropout rates (with subsequent missing data) ranging from 10 to 70 percent. Moreover, trials did not provide details about when in the trial period participants’ last observations were observed, other than generally after baseline. Given the frequency of measurements in these trials, dropout as early as the first week cannot be ruled out.
It is well known that missing outcome data can pose a serious threat to both the internal and external validity of a trial.265, 274 Some of this risk can be mitigated with appropriate analytic techniques. The appropriateness of different analytic methods depends upon the assumptions one makes, and the justifiability of these assumptions in the relevant context, about the missing-ness mechanism (the reason the data are missing and the relationship between observed and unobserved data). Ultimately every approach will require untestable assumptions. However, the aforementioned-panel recommends that some analytic approaches, including LOCF, ought to be avoided as their validity depends upon categorically unreasonable assumptions. The LOCF method, while easy, requires an assumption that the health-status of participants who dropped out of the trial would not have changed had future observations been recorded. When this assumption is inappropriate, use of LOCF methods can bias effect estimates. Moreover, estimates of standard errors will understate the true uncertainty surrounding effect estimates due to the added uncertainty of having to impute data, and this increases as the number of periods the value is carried forward increases. This can potentially inflate the type-I error rate.24
Several authors have proposed guidelines for acceptable levels of attrition in RCTs. One guideline suggested that anything greater than 5 percent was cause for concern, and anything greater than 20 percent represented a serious threat to validity.275 Although somewhat arbitrary, this is not without theoretical and empirical support. One simulation study found that, while there was only limited or even no bias in estimates of odds-ratios with attrition rates as high as 60 percent if the mechanism leading to the missing outcome data was unrelated to the value of the missing data (referred to as missing at random, or MAR), estimates were ‘seriously biased’ with even low levels of loss to follow-up when the mechanism for missing data was related to the value of the data (referred to as missing not at random, or MNAR).276 Missing at random is a hard assumption to make with a BD population.
On the other hand, it has been argued that, taken in isolation, the overall amount of attrition in a trial is a poor measure of the level of threat missing data poses to the validity of a trial’s conclusions.277 This is because the risk of bias also depends upon the size of the observed treatment effect, the reasons for attrition, the degree that attrition rates and reasons vary across arms, and many other factors that might be specific to a trial and intervention under study. Ideally trial reports would include a discussion of the results of sensitivity analyses performed to assess how, under a range of reasonable assumptions, observed levels of missing data might have influenced the primary results. However, such robustness-analyses were almost universally not performed. We were thus presented with the difficult task of trying to interpret the results of trials with often large percentages of missing outcome data and little to no information on how much risk this level of missing-ness posed to the validity of the trial’s primary outcome estimates, statistical inference, and even qualitative conclusions.
We acknowledge the extreme difficulty inherent in studying and treating patients with BD (see below for future research suggestions). Still, while it is reasonable to question the wisdom of the decision made in many of the trials to discontinue patients from the trial once they stop treatment (due to lack of efficacy or adverse events), this problem is not limited to patients with BD.23, 265
As a form of compromise, we used what we considered to be an extremely lenient set of criteria for evaluating risk-of-bias from attrition. First, we excluded any outcomes for which over 50 percent of the data was missing. In the context of pharmaceutical treatment of acute mania, if a trial had less than 50 percent attrition at 3 weeks but greater than 50 percent attrition after this, the former outcomes were included and the latter outcomes were excluded. Any trial with over 50 percent attrition by the first outcome was excluded entirely, but we present the attrition rates in the appendix. For studies with attrition rates between 40 to 50 percent, we considered the withdrawal rates to be a valid, poolable outcome but treated other outcomes and harms as suffering from a high-risk of bias. We note that this criterion did not apply to time-to-event outcomes in trials where patients were discontinued after the event was observed, e.g., patients discontinued from follow-up after suffering a mood-episode in a maintenance trial studying time-to-mood relapse.
The other major challenge of the evidence base was variability and potential accuracy of the diagnostic assessment methods during recruitment processes. Most studies used the DSM criteria current for the study period, but the methods and likely reliability of the patient ascertainment varied. Often, detailed information on diagnostic assessment and statistics reporting interrater reliability were lacking. Given the debate whether the underlying mechanisms support the bipolar types as qualitatively and categorically different or lay on a continuum of the same psychopathological dimensions, it would be important to include more standard information about lifetime history of bipolar episodes assessment. There is also great difficulty in accurately diagnosing comorbid mental health conditions that were commonly treated as exclusion criteria, which also speaks for the need of standardized diagnostic assessments and reporting of interrater reliability statistics. Additional information and rigor in diagnostic assessment would generate a greater sense of confidence about who the study participants represent.
Other common limitations of health and medical research were also present. Industry funding for drug treatments was the most common source of funding. Publication bias for antipsychotics, antidepressants, and psychosocial interventions for depressive disorders has been documented.278-281 Harms, particularly for drug trials, were variably and inconsistently reported in formats difficult to aggregate. Usual care was not well-described. Publications often incompletely reported study design and conduct.
Future Research
Since evidence-based medicine relies on three realms-evidence, clinical experience, and patient experience-insufficient evidence means decisions must be informed by the latter two realms. This is an unsatisfying position for both clinicians and patients. Additional research for pharmacological, psychosocial, and somatic treatment of various phases of bipolar disorders, especially maintenance and depression, is needed to provide stronger scientific evidence for clinical decisions in these instances. Since only low-strength evidence was reached for benefit or no difference between groups for any treatment, drug or psychosocial, essentially all Key Questions would benefit from further research.
Acknowledging the difficulty and unavoidable issue of withdrawal in BD treatment research, there are a few possible actions to take: (1) Examine clinical and demographic characteristics that may differentiate participants who withdraw from participants who complete, and incorporate these findings in caveats about potential conclusions of treatment effects. Increased awareness of the clinical and demographic predictors of withdrawal are likely to lead to new studies that can attempt to better address treatment for these specific subsets of population. (2) In combination with examining predictors of withdrawal, it is imperative to better assess reasons for withdrawal of consent and more systematically report reasons outside of side effects and lack of efficacy. Currently, often the reasons for withdrawal of consent are not provided, or are unsatisfyingly vague. (3) If high attrition rates exist in a study, performing sensitivity analyses to determine how different assumptions about missing data would affect the effect size and corresponding confidence intervals would be important prior to drawing conclusions based on the existing data. For example, if minor adjustments in the assumptions about the missing data (e.g., adjustments in symptom severity of potentially missing data) would eliminate the treatment effects in a particular study, this should lead researchers to be highly skeptical of such findings. (4) Assuming some indications that attrition was random, certain statistical techniques are more adapt at modeling missing data and not unduly influencing the results, such as average score/observation method or use of multilevel linear mixed modeling.
Future studies of BD treatments will require innovative ways to increase study completion rates (e.g., use of technology for followup assessments and study reminders; “smart” bottles, mobile apps, and pills for assessing study drug adherence; multiple secondary contacts for participants and all-inclusive contact information from cell phones, email, to social media; flexible scheduling outside of business hours, availability at the last minute notice). For example, more longitudinal data analysis techniques for intermittent follow-up would help, but that in turn generates the need for more effort to create data repositories that allow individual patient-level data pooling of these longitudinal studies. This also requires greater funding for research with longer study followup duration.
Future research also needs to attend to subpopulation analyses. It is clinically useful to know what treatments are more effective for patients with early (prior to age 18) versus later age of BD illness onset, older adult patients versus younger adult patients, patients with BD-I versus BD-II or bipolar disorder not otherwise specified (BD NOS), patients with comorbid substance use disorders versus without this comorbidity, or patients with specific demographic characteristics. The lack of evidence for specific subpopulations of patients with BD is a direct result of prevailing inclusion and exclusion criteria. For example, the majority of BD treatment studies have focused on individuals with BD-I diagnoses. While this practice is understandable for studies focusing on the mania treatment effect, it is less clear in cases of maintenance or depression treatment.
Future research should also endeavor to enroll people with different initial episodes and maintenance stages to fully understand the spectrum of responses. Attention should be given to addressing all states of the illness throughout the treatment stream. Different clinical states could use more attention. For example, is maintenance after acute mania versus a depression episode the same? Does maintenance after an acute episode differ from patients “off the street”? We need to understand the nature of the interventions within the context of clinical practice (co-treatments).
Psychosocial therapies also need to address whether people with BD can benefit from a generalized groups without manualized treatment, or if the treatments need to be specifically designed for BD. For certain psychosocial therapeutic treatments, particular bipolar states may not be as relevant. But where targets are based on theorized mechanisms that are likely to affect manic or depressive symptoms, the populations should match the mechanisms so the study can directly address the question.
Psychosocial studies were more likely than drug studies to have inclusive inclusion criteria, but still outcome results were often not reported separately by BD subtype. Failure to assess subjects based on the current clinical state (i.e., including individuals who are currently euthymic, in acute mania/hypomania, in a mixed state, or in acute depression) may have washed out any effects that interventions may have had for a subset of the sample (e.g., any improvements in depression symptoms for individuals in acute depression at the baseline).
With the possible exception of treatment of acute depressive episodes, most psychosocial interventions for people with BD are designed to be used in concert with other - generally pharmacologic - treatments, and not stand on their own as complete treatments of the syndrome. So perhaps it is unrealistic to look too closely at “effects on symptoms” of psychosocial and behavioral interventions in isolation. Beyond simply augmenting medication effects, behavioral interventions can enhance adherence to treatment, reduce family friction, and promote hopefulness in patients and their families and friends. Consistent collection and reporting of other relevant outcomes, such as adherence to drug treatment, which can be improved through educational efforts that help patients accept their diagnoses and improve their coping skills258 would be beneficial.
Consistent minimum outcome datasets for BD research (and report inter-rater reliability of measures used in the study) would help, including harms or unintended consequences for psychosocial interventions. Consistent minimum methodological rigor is also required at the journal level.
Conclusions
No high or moderate-strength evidence was found for any intervention to effectively treat any phase of any type of BD compared to placebo or an active comparator. Low-strength evidence showed improved mania symptoms for all FDA-approved antipsychotics, except aripiprazole, when compared to placebo for adults with BD-I. Participants using antipsychotics, except quetiapine, reported experiencing more extrapyramidal symptoms compared to placebo, and those using olanzapine reported experiencing more clinically significant weight gain. Low-strength evidence also showed benefit from lithium in the short-term for acute mania and longer time to relapse in the long-term versus placebo in adults with BD-I. Evidence was insufficient for most nondrug interventions. Low-strength evidence showed no effect for CBT on bipolar symptoms compared with active comparators, or systematic/collaborative care on relapse compared with inactive comparators. We were unable to address questions on subpopulations or treatments to reduce the metabolic change side effects of first line drug treatments. Future studies of BD treatments will require innovative ways to increase study completion rates.
- Discussion - Treatment for Bipolar Disorder in Adults: A Systematic ReviewDiscussion - Treatment for Bipolar Disorder in Adults: A Systematic Review
- Training, Implementation Strategies, and Interventions To Prepare Pediatric Pati...Training, Implementation Strategies, and Interventions To Prepare Pediatric Patients and Families - Transitions of Care From Pediatric to Adult Services for Children With Special Healthcare Needs
Your browsing activity is empty.
Activity recording is turned off.
See more...