Studies identified
A flow chart of the studies is shown in Figure 2. In total, 11,830 papers were identified from the combination of standard electronic databases (n = 11,659), specialist Chinese databases (n = 102) and various sources of grey literature (n = 69). Of these, 5152 duplicate papers were identified and deleted (5150 from the standard electronic databases, and two from the grey literature).
The deletion of duplicate papers left 6678 individual papers for assessment. After screening titles and abstracts, 322 papers were identified as of potential relevance and full-text copies of 309 papers were obtained (with the remainder unobtainable). Of these, 96 were judged ineligible for the effectiveness review and immediately excluded (narrative overviews, systematic literature reviews or economic evaluations). After the second exclusion process, comprising more detailed reading of each full-text paper, a further 138 papers were judged not to meet the inclusion criteria of the review and were also excluded.
Key reasons for exclusion were duplicate paper already included; participant inclusion criteria for the identified study judged not relevant to our review; did not include any of the pre-specified outcomes; or ineligible study design (no comparator group).
As a result, 75 papers were identified for data extraction, from a total of 73 separate studies. A full list of included studies is provided as Appendix 5. A table of excluded studies detailing reasons for exclusion is provided as Appendix 6.
Quality of included studies
Randomised controlled trials
Overall risk of bias
The results of the quality assessment procedure for the 64 included RCTs (reported in 66 papers) are displayed in Figure 3 and Table 4. There was variation both in terms of the quality of the studies and the quality of the reporting. In a large number of papers, there was insufficient detail provided to permit clear judgement of risk of bias in a range of key areas. Overall, 33 RCTs were classed as having low within-study risk of bias, 11 RCTs were classed as having high within-study risk of bias, and the remainder (n = 20) were classed as unclear in this respect. The high proportion of studies at unclear risk of bias was due to poor reporting and a lack of detail, particularly in the methods section. There were also a number of publications in abstract form only. As an unclear judgement was often due to poor reporting rather than specific methodological concerns, it was not judged appropriate to categorise studies with those deemed at high risk of bias as a result of more serious methodological flaws. Our robust approach to the assessment of the overall risk of bias within individual studies is described in more detail in Chapter 2, Risk of bias in included studies and quality assessment. More detail is provided below to illustrate the range in quality in terms of each individual component of the Cochrane’s risk of bias tool.46
Random sequence generation
The risk of bias arising from the method of generation of the allocation sequence was low in 39 of the included RCTs.13,41,57,60,62–64,66,67,69,70,76,80–83,85–89,91,93,94,98–104,106–108,110,112–115 Methods employed included random number tables, computer-generated sequence generation61,63,64,81–83,88,93,94,98,99,106,108,112,114 and randomised block design.62,67,80,87,89,91,101,102,107,113,115,117 One trial was classed as high risk because women were asked to draw an envelope from a box with the same appearance but with different contents.111 It was categorised as unclear in the remaining 24 RCTs due to insufficient information provided by the authors to permit judgement either way.42,43,58,59,65,68,71–75,77–79,84,90,92,95–97,104,105,109,116
Allocation concealment
Thirty studies employed allocation concealment methods judged to carry low risk of bias, such as the use of sequentially numbered sealed opaque envelopes containing allocation assignment.57,60,61,63,64,66,67,73,76,78,80–85,88,89,93,98,99,102,103,105–108,112,118 Thirty studies did not provide sufficient information to allow a judgement of low or high risk and were therefore classed as unclear.13,41–43,58,59,62,65,68–71,74,75,77,79,90,91,94–97,100,104,109,111,113,114,116,117 The remaining four RCTs were judged as having high risk of allocation concealment bias.72,87,92,115 For example, one study stated that patients were randomly divided into two groups by those involved in the study,72 or the nature of the intervention being tested meant it was not possible to conceal allocation.87,92,115
Blinding of participants, personnel and outcome assessors
Of the included RCT studies, 32 were judged to have low risk of bias in relation to the blinding of participants and other personnel involved in the trial,41,57,59–64,67,71,72,74,79,81–85,90,93,99,100,102,103,105,106,108,110,112,113,116,117 generally through the provision of medication in identical formats for both active and placebo. Sixteen studies were judged to have high risk of bias in this respect, for example due to clear differences in either the appearance, dosage rates or mode of delivery between intervention and placebo comparator, or as a result of evidence that the research staff involved were aware of allocation status.13,58,66,68,75,77,78,80,87–89,91,92,96,101,115 In some instances, however, despite lack of blinding, the nature of the intervention meant that this was not relevant; for example, in McParlin and colleagues88 where blinding of participants and staff was not possible as the packages of care delivered to the intervention and control groups varied in content. However, it is important to highlight that although it might not have been possible to blind patients or clinicians, outcome assessors and analysts handling the resultant data may nevertheless have been blinded. The remaining 16 studies did not provide sufficient information to permit a judgement of low or high bias, often due to imprecise, poor reporting, and were thus classed as unclear.42,43,65,69,70,73,76,94,95,97,98,104,107,109,111,114
Incomplete outcome data
Most studies (n = 50) were judged as carrying low risk of bias in relation to this component.13,41,42,57,59–61,63–65,67,70–72,74–76,78,80–85,87–89,91–94,96–99,102–108,110,112–118 Although published protocols were rarely available, all data for the primary outcomes pre-specified in the paper were reported for all randomised participants, or rates of drop-out were either sufficiently low (< 20%), or proportionately comparable between groups, so that it was not considered likely to result in a clinically relevant bias. Three studies displayed a high risk of bias in this regard, all as a result of high numbers of participant drop-outs.62,68,111 The remainder (11 studies in total) were judged as unclear due to lack of sufficient information.43,58,66,69,73,77,79,90,95,100,109
Selective outcome reporting
Six studies were judged as having high risk of bias in terms of selective outcome reporting, due to either not reporting data for pre-specified outcomes, or for reporting data in the results that were not pre-specified in either the original study protocol or methods section.87,90,94,113–115 Forty-five studies were classed as having low risk of bias, with all outcomes specified and subsequently reported.13,41,43,57,59–67,70–72,74–76,78,80,81,83–85,88,89,91–93,96–99,101–104,106–108,110,112,116,117 Risk of bias was judged as unclear for the final 13 studies.42,58,68,69,73,77,79,82,95,100,105,109,111
Other sources of bias
Twenty of the included RCT studies were judged as having low risk of bias in this area.13,41,42,61,62,64,67,70,71,74,81,83,88,96,101–103,106,110,112 However, a substantial number (n = 44) were classed as unclear, due to lack of sufficient information in the paper to permit detailed assessment of whether or not an important risk of bias existed, or due to insufficient rationale or evidence that an identified problem had introduced serious levels of bias to the study.43,57–60,65,66,68,69,72,73,75–80,82,84,85,87,89–95,97–100,104,105,107–109,111,113–117,119 For example, in one paper,76 lack of reporting of full results for the control group resulted in an unclear judgement in this area.
Case series studies
The nine case series or non-randomised studies were quality assessed using the component-based EPHPP tool,47 which appraises studies on the basis of six core components, rated 1–4 (where 1 is deemed to be the highest quality of study). These areas are selection bias; strength of overall study design; extent to which confounders were identified and controlled for in the study; blinding of participants and/or research personnel; approach to data collection; and rate of withdrawals/drop-outs from study. As shown in the Table 5, all studies were judged as weak in terms of quality (which corresponds to a high risk of bias judgement using the standard Cochrane approach for RCTs).
Interventions and comparators
The included studies were grouped into the three broad groups of interventions outlined in Chapter 1: patient-initiated first-line interventions; clinician-prescribed second-line interventions; and clinician-prescribed third-line interventions. It should be noted that, for patient-initiated first-line interventions, the only studies identified that could be classified as lifestyle interventions were those which trialled ginger preparations and/or vitamin B6. No studies of dietary- or hypnotherapy-based interventions were identified. However, studies of a number of novel therapies not covered by our original review protocol were identified, namely the use of aromatherapy, transdermal clonidine and gabapentin. The studies comprising the evidence base for each group of interventions are detailed in Table 6. Note that all studies are two-arm RCTs unless otherwise stated.
In addition, the network plot (Figure 4) shows the range of interventions from all comparative studies included in the review. Individual interventions have been grouped where appropriate.
The size of the nodes in the network plot is proportional to the frequency of the intervention in the review, and the width of the lines indicates the frequency of the comparisons made between two interventions. These nodes and lines, however, do not represent the weight of evidence in the review as this would also be influenced by sample size and the precision of estimates, as well as other factors. The plot did not include a trial on pre-emptive treatment of doxylamine/pyridoxine combination, outpatient versus inpatient care117,127 or two four-arm trials,68,101 which would have over-reported the number of comparisons in the network plot. These interventions included dietary instructions only, or together with either placebo, antihistamines or antihistamine/vitamin B6 combination in one trial68 and traditional acupuncture, P6 acupuncture, placebo or no acupuncture in another trial.101 Ginger, vitamin B6, antihistamines, acupressure, metoclopramide, corticosteroids, doxylamine/pyridoxine combination and the serotonin antagonist ondansetron are more widely reported than other interventions, but there is also information on interventions such as acupuncture, nerve stimulation therapy and aromatherapy oils which have been considered as treatments for NVP/HG. Evidence on the effects of interventions such as Chinese herbal medicine, dextrose saline, transdermal clonidine and diazepam is very limited and in most cases is reported in single trials. As expected, placebo interventions are most widely reported as comparators, and so this has the biggest node on the network plot (emphasised by the square node). The most commonly reported treatment comparisons are ginger capsules versus placebo; acupressure versus placebo; ginger capsules versus vitamin B6 capsules; corticosteroids versus ‘treatment as usual’; metoclopramide versus ondansetron; and acupuncture versus nocebo (nocebo is an inert intervention that creates comparable side effects/harmful effects in a patient, as opposed a placebo, which is an inert substance that creates either a beneficial response or no response in a patient).
Participants and symptom severity
In addition to substantial variation in terms of the range of interventions and comparators evident within the literature, it is also important to highlight the heterogeneity of symptom severity found among patient populations.
It was initially intended that as part of this review only studies that recruited women with severe NVP or HG would be included. However, assessment of symptom severity varied within and across studies, and it was not possible to easily place every participant population into categories. We therefore attempted to categorise the symptom severity of participants for each study, using the description of severity in the inclusion criteria and, if available, any severity score given at baseline. These two items of information were assessed by two independent assessors (CMP and SCR) to assign severity as mild, moderate, severe or unclear. Agreement was reached for all but one study, which was classified as unclear.
This classification was then used in each results chapter to describe symptoms and outcomes in terms of severity.
Outcome measures
Finally, and linked to the issues discussed above, the identified literature in this field was also characterised by the range of symptom severity scales employed from study to study to assess intervention outcomes. Out of the 73 included studies (reported in 75 papers), only 23 used validated NVP/HG assessment scales such as PUQE (10 studies), RINVR (11 studies) or the McGill Nausea Questionnaire (one study). Thirty-one studies assessed nausea and/or vomiting severity using a 10-point VAS. Twenty-one studies employed either a study-specific, non-validated author-defined assessment scale (including, for example, numbers of episodes of vomiting combined with the use of a Likert scale to assess subjective feelings of symptom severity among participants), or used the various proxy measures of symptom severity outlined in our protocol [e.g. percentage weight loss, length of hospital stay, or hospital (re-)admission episodes]. Table 7 illustrates the primary symptom severity outcome measures employed by each included study.
Additional sources of outcome data on medications
The UKTIS is currently commissioned by Public Health England to provide advice to UK health professionals on the fetal effects of therapeutic, poisoning and chemical exposures in pregnancy, and to conduct surveillance of known and emerging teratogens. The UKTIS database currently contains a record of just under 60,000 enquiries dating back to 1978, of which 320 relate to use of specific drugs in the treatment of HG (period of enquiry 18 June 1978 to 18 March 2014). Surveillance data collected by the UKTIS are reviewed periodically and published in UKTIS monographs through the National Poisons Information Service database (www.TOXBASE.org). Data collected by the UKTIS in relation to medications for NVP/HG, including specific monograph data on ginger, vitamin B6, vitamin B12, promethazine and olanzapine, are provided in Table 43, Appendix 7 for information.
Meta-analysis of included randomised controlled trials
As highlighted in the previous sections, there was wide variation across studies. Specifically, there was considerable heterogeneity between interventions within each of the categories of comparisons, and in terms of how interventions were administered/delivered. The measurement of outcomes also differed substantially between trials reporting the same comparisons, so in most cases the trials were not directly comparable. In a meta-analysis it is important not to combine outcomes that are too diverse; even if it had been possible to extract data for a meta-analysis, such an analysis is likely to produced misleading results due to the considerable heterogeneity between studies.46 Furthermore, many of these trials were extremely poorly reported and their conduct was often uncertain. In summary, clinical and methodological variations between studies were considerable, and the intervention effect was likely to be affected by the factors that varied across studies. Consequently, we have not conducted a meta-analysis of findings from the RCTs.
Structure of individual results chapters
The following chapters present more detailed findings from the evidence review for each individual intervention. As already indicated, given it was not possible to meta-analyse the data from individual studies for any group of interventions and comparators, the results are summarised in narrative form. The narrative content of each chapter focuses on the findings from the included studies in terms of their reported effectiveness for addressing our primary outcomes of interest, that is, the key symptoms associated with HG/NVP. Thus, where available, effectiveness is reported in terms of the validated overall HG/NVP assessment scales (PUQE, RINVR or McGill Nausea Questionnaire). Otherwise, the effectiveness of interventions is reported in relation to their impact on the three key symptoms: nausea, vomiting and retching. Data illustrating significant results in relation to these key symptoms are detailed in the narrative text; otherwise, results are described as not significance or not clear. Data for case series studies are not included in the narrative but available in the accompanying results tables for information. Additional secondary outcome data reported by included studies (see Table 2 for a full list) are presented in Appendix 8.
Publication Details
Copyright
Included under terms of UK Non-commercial Government License.
Publisher
NIHR Journals Library, Southampton (UK)
NLM Citation
O’Donnell A, McParlin C, Robson SC, et al. Treatments for hyperemesis gravidarum and nausea and vomiting in pregnancy: a systematic review and economic assessment. Southampton (UK): NIHR Journals Library; 2016 Oct. (Health Technology Assessment, No. 20.74.) Chapter 3, Clinical effectiveness: overview of included studies.