Methods used to develop this guideline

National Guideline Alliance (UK)

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

National Guideline Alliance (UK). Mental Health Problems in People with Learning Disabilities: Prevention, Assessment and Management. London: National Institute for Health and Care Excellence (NICE); 2016 Sep. (NICE Guideline, No. 54.)

Mental Health Problems in People with Learning Disabilities: Prevention, Assessment and Management.

Show details

Contents

< Prev Next >

3Methods used to develop this guideline

3.1. Overview

The development for most of this guideline followed The Guidelines Manual (NICE, 2012a), but some sections have used the 2014 version of the manual (where this has been done, it has been explained in the chapter below). A team of healthcare professionals, social care professionals, education professionals, lay representatives and technical experts known as the Guideline Committee (GC), with support from the NCCMH staff, undertook the development of a person-centred, evidence-based guideline. There are 7 basic steps in the process of developing a guideline:

Define the scope, which lays out exactly what will be included (and excluded) in the guidance.
Define review questions that cover all areas specified in the scope.
Develop a review protocol for each systematic review, specifying the search strategy and method of evidence synthesis for each review question.
Synthesise data retrieved, guided by the review protocols.
Produce evidence profiles and summaries using the Grading of Recommendations Assessment, Development and Evaluation (GRADE) system.
Consider the implications of the research findings for clinical practice and reach consensus decisions on areas where evidence is not found.
Answer review questions with evidence-based recommendations for clinical practice.

The clinical practice recommendations made by the GC are therefore derived from the most up-to-date and robust evidence for the clinical and cost effectiveness of the interventions and services covered in the scope. Where evidence was not found or was inconclusive, the GC adopted both formal and informal methods to reach consensus on what should be recommended, factoring in any relevant issues. In addition, to ensure a service user and carer focus, focus groups were also conducted, and the concerns of service users and carers regarding health and social care have been highlighted and addressed by recommendations agreed by the whole GC.

3.2. The scope

Topics are referred by NHS England and the letter of referral defines the remit, which defines the main areas to be covered. The NCCMH developed a scope for the guideline based on the remit (see Appendix A). The purpose of the scope is to:

provide an overview of what the guideline will include and exclude
identify the key aspects of care that must be included
set the boundaries of the development work and provide a clear framework to enable work to stay within the priorities agreed by NICE and the NCCMH, and the remit from the Department of Health
inform the development of the review questions and search strategy
inform professionals and the public about expected content of the guideline
keep the guideline to a reasonable size to ensure that its development can be carried out within the allocated period.

An initial draft of the scope was sent to registered stakeholders who had agreed to attend a scoping workshop. The workshop was used to:

obtain feedback on the selected key clinical issues
identify which population subgroups should be specified (if any)
seek views on the composition of the GC
encourage applications for GC membership.

The draft scope was subject to consultation with registered stakeholders over a 4-week period. During the consultation period, the scope was posted on the NICE website. Comments were invited from stakeholder organisations The NCCMH and NICE reviewed the scope in light of comments received, and the revised scope was signed off by NICE.

3.3. The Guideline Committee

During the consultation phase, members of the GC were appointed by an open recruitment process. GC membership consisted of: professionals in psychiatry, clinical psychology, speech and language therapy, physiotherapy, paediatrics and general practice; academic experts in education, psychiatry and psychology; commissioning managers; and carers and representatives from service user and carer organisations. The guideline development process was supported by staff from the NCCMH, who undertook the clinical and health economic literature searches, reviewed and presented the evidence to the GC, managed the process, and contributed to drafting the guideline.

3.3.1. Guideline Committee meetings

There were 12 GC meetings, held between October 2014 and January 2016. During each day-long GC meeting, in a plenary session, review questions and clinical and economic evidence were reviewed and assessed, and recommendations formulated. At each meeting, all GC members declared any potential conflicts of interest (see Appendix B), and service user and carer concerns were routinely discussed as a standing agenda item.

3.3.2. Service users and carers

The GC included 3 carers members who contributed as full GC members to writing the review questions, providing advice on outcomes most relevant to service users and carers, helping to ensure that the evidence addressed their views and preferences, highlighting sensitive issues and terminology relevant to the guideline, and bringing service user research to the attention of the GC. Services user involvement was secured through a series of focus groups which were run in collaboration with the British Institute for Learning Disabilities. Input from both service users and carers was central to the development of the guideline and they contributed to writing the guideline’s introduction and the recommendations from the service user and carer perspective.

3.3.3. Expert advisers

Expert advisers, who had specific expertise in 1 or more aspects of treatment and management relevant to the guideline, assisted the GC, commenting on specific aspects of the developing guideline and making presentations to the GC. Appendix C lists those who agreed to act as expert advisers.

3.3.4. National and international experts

National and international experts in the area under review were identified through the literature search and through the experience of the GC members. These experts were contacted to identify unpublished or soon-to-be published studies, to ensure that up-to-date evidence was included in the development of the guideline. They informed the GC about completed trials at the pre-publication stage, systematic reviews in the process of being published, studies relating to the cost effectiveness of treatment and trial data if the GC could be provided with full access to the complete trial report. Appendix E lists researchers who were contacted.

3.4. Review protocols

Review questions drafted during the scoping phase were discussed by the GC at the first few meetings and amended as necessary. The review questions were used as the starting point for developing review protocols for each systematic review (described in more detail below). Where appropriate, the review questions were refined once the evidence had been searched and, where necessary, sub-questions were generated. The final list of review questions can be found in Appendix F.

For questions about interventions, the PICO (Population, Intervention, Comparison and Outcome) framework was used to structure each question (see Table 2).

Table 2

Features of a well-formulated question on the effectiveness of an intervention – PICO.

Questions relating to case identification and assessment tools and methods do not involve an intervention designed to treat a particular condition, and therefore the PICO framework was not used. Rather, the questions were designed to pick up key issues specifically relevant to clinical utility, for example their accuracy, reliability, safety and acceptability to the service user.

In some situations, review questions related to issues of service delivery are occasionally specified in the remit from the Department of Health/Welsh Assembly Government. In these cases, appropriate review questions were developed to be clear and concise.

For each topic, addressed by 1 or more review questions, a review protocol was drafted by the technical team using a standardised template (based on the PROSPERO database of systematic reviews in health), review and agreed by the GC (all protocols are included in Appendix F).

To help facilitate the literature review, a note was made of the best study design type to answer each question. There are 4 main types of review question of relevance to NICE guidelines (though, only 3 were used in this guideline). These are listed in Table 3. For each type of question, the best primary study design varies, where ‘best’ is interpreted as ‘least likely to give misleading answers to the question’. For questions about the effectiveness of interventions, where randomised controlled trials (RCTs) were not available, the review of other types of evidence was pursued only if there was reason to believe that it would help the GC to formulate a recommendation.

Table 3

Best study design to answer each type of question.

However, in all cases, a well-conducted systematic review (of the appropriate type of study) is likely to always yield a better answer than a single study.

3.5. Clinical review methods

The aim of the clinical literature review was to systematically identify and synthesise relevant evidence from the literature in order to answer the specific review questions developed by the GC. Thus, clinical practice recommendations are evidence-based, where possible, and, if evidence is not available, either formal or informal consensus methods are used to try and reach general agreement between GC members (see section 3.5.7) and the need for future research is specified.

3.5.1. The search process

3.5.1.1. Scoping searches

A broad preliminary search of the literature was undertaken in September 2014 to obtain an overview of the issues likely to be covered by the scope, and to help define key areas. The searches were restricted to clinical guidelines, Health Technology Assessment (HTA) reports, key systematic reviews and RCTs. A list of databases and websites searched can be found in Appendix H.

3.5.1.2. Systematic literature searches

After the scope was finalised, a systematic search strategy was developed to locate as much relevant evidence as possible. The balance between sensitivity (the power to identify all studies on a particular topic) and specificity (the ability to exclude irrelevant studies from the results) was carefully considered, and a decision made to utilise a broad approach to searching to maximise retrieval of evidence to all parts of the guideline. Searches were restricted to certain study designs if specified in the review protocol, and conducted in the following databases:

Cumulative Index to Nursing and Allied Health Literature
Cochrane Database of Abstracts of Reviews of Effects
Cochrane Database of Systematic Reviews
Cochrane Central Register of Controlled Trials Excerpta Medica Database (Embase)
HTA database (technology assessments)
Medical Literature Analysis and Retrieval System Online (MEDLINE)/MEDLINE In-Process
Psychological Information Database (PsycINFO)

The search strategies were initially developed for MEDLINE before being translated for use in other databases/interfaces. Strategies were built up through a number of trial searches and discussions of the results of the searches with the review team and GC to ensure that all possible relevant search terms were covered. In order to assure comprehensive coverage, search terms for mental health and learning disabilities were kept purposely broad to help counter dissimilarities in database indexing practices and thesaurus terms, and imprecise reporting of study populations by authors in the titles and abstracts of records. The search terms for each search are set out in full in Appendix H.

3.5.1.3. Reference Management

Citations from each search were downloaded into reference management software and duplicates removed. Records were then screened against the eligibility criteria of the reviews before being appraised for methodological quality (see below). The unfiltered search results were saved and retained for future potential re-analysis to help keep the process both replicable and transparent.

3.5.1.4. Double-sifting

Titles and abstracts of identified studies were screened by 2 reviewers against inclusion criteria specified in the protocols, until a good inter-rater reliability was observed (percentage agreement ≥90% or Kappa statistics, K>0.60). Any disagreements between raters were resolved through discussion. Initially 10% of references were double-screened. If inter-rater agreement was good then the remaining references were screened by 1 reviewer.

Once full versions of the selected studies were acquired for assessment, full studies were usually checked independently by 2 reviewers, with any differences being resolved. For some review questions (review questions 1.1 and 1.3), a random sample of papers was checked for inclusion. Any studies that failed to meet the inclusion criteria at this stage were excluded.

3.5.1.5. Search filters

To aid retrieval of relevant and sound studies, filters were used to limit a number of searches to systematic reviews and RCTs. The search filters for systematic reviews and RCTs are adaptations of validated filters designed by the Health Information Research Unit (HIRU) at McMaster University.

3.5.1.6. Date and language restrictions

Systematic database searches were initially conducted in November 2014 up to the most recent searchable date. Search updates were generated on a 6-monthly basis, with the final re-runs carried out in December 2015 ahead of the guideline consultation. After this point, studies were only included if they were judged by the GC to be exceptional (for example, if the evidence was likely to change a recommendation).

Although no language restrictions were applied at the searching stage, foreign language papers were not requested or reviewed, unless they were of particular importance to a review question.

Date restrictions were not applied, except for searches of systematic reviews which were limited to research published from 1999. The search for systematic reviews was restricted to the last 15 years as older reviews were thought to be less useful.

3.5.1.7. Other search methods

Other search methods involved: (a) scanning the reference lists of all eligible publications (systematic reviews, stakeholder evidence and included studies) for more published reports and citations of unpublished research; (b) sending lists of studies meeting the inclusion criteria to subject experts (identified through searches and the GC) and asking them to check the lists for completeness, and to provide information of any published or unpublished research for consideration (see Appendix E); (c) checking the tables of contents of key journals for studies that might have been missed by the database and reference list searches; (d) tracking key papers in the Science Citation Index (prospectively) over time for further useful references; (e) conducting searches in ClinicalTrials.gov for unpublished trial reports; (f) contacting included study authors for unpublished or incomplete datasets. Searches conducted for existing NICE guidelines were updated where necessary. Other relevant guidelines were assessed for quality using the AGREE (Appraisal of Guidelines for Research and Evaluation Instrument) instrument (AGREE Collaboration, 2003). The evidence base underlying high-quality existing guidelines was utilised and updated as appropriate.

Full details of the search strategies and filters used for the systematic review of clinical evidence are provided in Appendix H.

3.5.1.8. Study selection and assessment of methodological quality

All primary-level studies included after the first scan of citations were acquired in full and re-evaluated for eligibility at the time they were being entered into the study information database. More specific eligibility criteria were developed for each review question and are described in the relevant clinical evidence chapters and the review protocols in Appendix F. Eligible systematic reviews and primary-level studies were critically appraised for methodological quality (risk of bias) using a checklist – see The Guidelines Manual (2012a) for templates. However, some checklists which were recommended in the 2014 manual update (NICE, 2014) were used (for example, for qualitative studies, for systematic reviews [Assessing the Methodological Quality of Systematic Reviews, AMSTAR, checklist] and for cross-sectional and cohort studies [the Newcastle Ottawa checklist for observational studies was used (Wells) for the epidemiological review on incidence and prevalence).

The Quality Assessment of Diagnostic Accuracy Studies – Revised (QUADAS-II) (Whiting, 2011) was used for diagnostic studies and was adapted for use with risk assessment studies as follows:

Index test question signalling question: ‘If a threshold was used, was it pre-specified?’ This was amended to: ‘Is information available to facilitate clinical judgment?’ (that is, how scores should be translated to risk level)
Flow and timing signalling question: ‘Was there an appropriate interval between index test(s) and reference standard?’ This was interpreted as: ‘Was there sufficient time for events of interest to occur?’

The eligibility of studies was confirmed by the GC. A flow diagram of the search process for selection of studies for inclusion in the clinical literature review conducted for this guideline is provided in Appendix P.

For some review questions, it was necessary to prioritise the evidence with respect to the UK context (that is, external validity). To make this process explicit, the GC took into account the following factors when assessing the evidence:

participant factors (for example, gender, age and ethnicity)
provider factors (for example, model fidelity, the conditions under which the intervention was performed and the availability of experienced staff to undertake the procedure)
cultural factors (for example, differences in standard care and differences in the welfare system).

It was the responsibility of the GC to decide which prioritisation factors were relevant to each review question in light of the UK context.

3.5.1.9. Unpublished evidence

Stakeholders were invited to submit any relevant unpublished data using the call for evidence process set out in the 2012 edition of The Guidelines Manual. The GC used a number of criteria when deciding whether or not to accept unpublished data. First, the evidence must have been accompanied by a trial report containing sufficient detail to properly assess risk of bias. Second, the evidence must have been submitted with the understanding that data from the study and a summary of the study’s characteristics would be published in the full guideline. Therefore, in most circumstances the GC did not accept evidence submitted ‘in confidence’. However, the GC recognised that unpublished evidence submitted by investigators might later be retracted by those investigators if the inclusion of such data would jeopardise publication of their research.

3.5.2. Data extraction

3.5.2.1. Quantitative analysis

Study characteristics, aspects of methodological quality, and outcome data were extracted from all eligible studies, using Review Manager Version 5.3.5 (Cochrane Collaboration, 2014) and an Excel-based form (see Appendix J, Appendix K, Appendix L and Appendix M).

In most circumstances, for a given outcome (continuous and dichotomous), where more than 50% of the number randomised to any group were missing or incomplete, the study results were excluded from the analysis (except for the outcome ‘leaving the study early’, in which case, the denominator was the number randomised). Where there were limited data for a particular review, the 50% rule was not applied. In these circumstances the evidence was downgraded (see section 3.5.4).

Where possible, outcome data from an intention-to-treat analysis (that is, a ‘once-randomised-always-analyse’ basis) were used. Where intention-to-treat had not been used or there were missing data, the effect size for dichotomous outcomes were recalculated using worse-case scenarios (for example, for positive outcomes this meant that it was assumed that the patients whose data was missing did not have the positive event). Where conclusions varied between scenarios (about the direction of effect or the confidence in the direction of effect or clinical importance), the evidence was downgraded (see section 3.5.4).

Where some of the studies failed to report standard deviations (for a continuous outcome), and where an estimate of the variance could not be computed from other reported data or obtained from the study author, the following approach was taken.^¹ When the number of studies with missing standard deviations was less than one-third and when the total number of studies was at least 10, the pooled standard deviation was imputed (calculated from all the other studies in the same meta-analysis that used the same version of the outcome measure). In this case, the appropriateness of the imputation was made by comparing the standardised mean differences (SMDs) of those trials that had reported standard deviations against the hypothetical SMDs of the same trials based on the imputed standard deviations. If they converged, the meta-analytical results were considered to be reliable.

When the conditions above could not be met, standard deviations were taken from another related systematic review (if available). In this case, the results were considered to be less reliable.

Also for continuous outcomes, final scores in each group were the preferred outcome for extraction. However, if final or change scores (from baseline) were not reported for each group in a study (for example, the study reported an F-value, p-value or t-value), the SMD was estimated, if possible, using a statistical calculator.

The meta-analysis of survival data, such as time to any mood episode, was based on log hazard ratios and standard errors. Since individual participant data were not available in included studies, hazard ratios and standard errors calculated from a Cox proportional hazard model were extracted. Where necessary, standard errors were calculated from confidence intervals (CIs) or p value according to standard formulae; see the Cochrane Reviewers’ Handbook 5.1.0 (Higgins & Green, 2011). Data were summarised using the generic inverse variance method using Review Manager.

Consultation with another reviewer or members of the GC was used to overcome difficulties with coding. Data from studies included in existing systematic reviews were extracted independently by 1 reviewer and cross-checked with the existing dataset. Where possible, 2 independent reviewers extracted data from new studies. Where double data extraction was not possible, data extracted by 1 reviewer was checked by the second reviewer. Disagreements were resolved through discussion. Where consensus could not be reached, a third reviewer or GC members resolved the disagreement. Masked assessment (that is, blind to the journal from which the article comes, the authors, the institution and the magnitude of the effect) was not used since it is unclear that doing so reduces bias (Berlin, 2001; Jadad et al., 1996).

The analyses performed for existing systematic reviews incorporated into the guideline were not amended unless the GC considered that additional important aspects needed to be taken into consideration. For example, this could include stratifying data, conducting additional analyses, or using different results from the primary studies in a given analysis. Otherwise, the analyses were not amended.

3.5.3. Evidence synthesis

The method used to synthesise evidence depended on the review question and availability and type of evidence (see Appendix F for full details). Briefly, for questions about the psychometric properties of instruments, reliability, validity and clinical utility were synthesised narratively based on accepted criteria. For questions about test accuracy, bivariate test accuracy meta-analysis would have been conducted but there was not enough data to conduct these types of meta-analyses. For questions about the effectiveness of interventions, standard meta-analysis was used where appropriate, otherwise narrative methods were used with clinical advice from the GC. In the absence of high-quality research, formal and informal consensus processes were used (see 3.5.7).

3.5.4. Grading the quality of evidence

For questions about the effectiveness of interventions and the organisation and delivery of care, the GRADE approach^² was used to grade the quality of evidence from group comparisons for each outcome (Guyatt et al., 2011). The technical team produced GRADE evidence profiles (see below) using the GRADEpro guideline development tool, following advice set out in the GRADE handbook (Schünemann et al., 2013). All staff doing GRADE ratings were trained, and calibration exercises were used to improve reliability (Mustafa et al., 2013).

For questions about epidemiology, methodology checklists (see Appendix M) were used to assess the risk of bias at the study level, and this information was taken into account when interpreting the evidence. For both types of questions, an overall quality rating was given to each study:

Epidemiological studies were rated individually as recommended in the Guidelines Manual (2012a): ‘++’, ‘+’ or ‘−’ on the basis of the assessment with the checklist; the strength of this evidence was considered to be ‘strong’, ‘moderate’, or ‘weak’, respectively.
Diagnostic accuracy: while the QUADAS framework does not provide an overall quality index for each study, this was deemed important to assist interpretation of the data on tools to augment assessment of mental health problems. We adopted the terminology used within GRADE (high, moderate, low or very low quality evidence). For each of the first 3 domains (patient selection, index test, reference standard) we used the ‘risk of bias’ and ‘concerns about applicability’ ratings (low, unclear and high risk for each) to create a 3×3 table (see Table 4). For domain 4 (flow and timing), which has only a ‘risk of bias’ rating, the same method was used, but ‘risk of bias’ was entered on both axes. We then used the 4 total domain ratings to generate an overall quality index. For the overall quality rating we took the mode classification and upgraded or downgraded from that point; that is, if a study had 2 ratings of ‘high’, one of ‘moderate’ and one of ‘very low’, then the final quality rating would be ‘moderate’.

Table 4

Process for determining overall quality ratings for QUADAS-II domains 1–3 (patient selection’, index test and reference standard).

3.5.4.1. Evidence profiles

A GRADE evidence profile was used to summarise both the quality of the evidence and the results of the evidence synthesis for each ‘critical’ and ‘important’ outcome (see Table 5 for completed evidence profiles). The GRADE approach is based on a sequential assessment of the quality of evidence, followed by judgment about the balance between desirable and undesirable effects, and subsequent decision about the strength of a recommendation.

Table 5

Example of a GRADE evidence profile.

Within the GRADE approach to grading the quality of evidence, the following is used as a starting point:

RCTs without important limitations provide high-quality evidence
observational studies without special strengths or important limitations provide low-quality evidence.

For each outcome, quality may be reduced depending on 5 factors: limitations, inconsistency, indirectness, imprecision and publication bias. For the purposes of the guideline, each factor was evaluated using criteria provided in Table 6.

Table 6

Factors that decrease quality of evidence.

For observational studies without any reasons for down-grading, the quality may be up-graded if there is a large effect, all plausible confounding would reduce the demonstrated effect (or increase the effect if no effect was observed), or there is evidence of a dose-response gradient (details would be provided under the ‘other’ column).

Each evidence profile includes a summary of findings: number of participants included in each group, an estimate of the magnitude of the effect, and the overall quality of the evidence for each outcome. Under the GRADE approach, the overall quality for each outcome is categorised into 1 of 4 groups (high, moderate, low, very low).

3.5.5. Presenting evidence to the Guideline Committee

Study characteristics tables and, where appropriate, forest plots generated with Review Manager Version 5.3 and GRADE summary of findings tables (see below) were presented to the GC.

Where meta-analysis was not appropriate and/or possible, the reported results from each primary-level study were reported in the study characteristics table and presented to the GC. The range of effect estimates were included in the GRADE profile, and where appropriate, described narratively.

3.5.5.1. Summary of findings tables

Summary of findings tables generated from GRADEpro were used to summarise the evidence for each outcome and the quality of that evidence (Table 7). The tables provide anticipated comparative risks, which are especially useful when the baseline risk varies for different groups within the population.

Table 7

Example of a GRADE summary of findings table.

Many of the study outcomes that were of interest were extractable only as standardised mean differences (SMDs). Although it is technically possible to back-convert SMDs to the original outcome measure for interpretation, this was not felt to be a helpful approach in this case. The GC are apt at making decisions based on SMDs using the recommended interpretation of Cohen’s effect size. Additionally the there was no familiar instrument that was considered useful for calculating mean differences (MDs) and to do so would have introduced a risk of bias by only using the results from 1 study to calculate baseline risk. Where the GC felt that effects were of sufficient magnitude to be clinically important this is described within the Linking Evidence to Recommendations (LETR) tables.

3.5.6. Extrapolation

When answering review questions, if there is no direct evidence from a primary dataset,^³ based on the initial search for evidence, it may be appropriate to extrapolate from another data set, using that dataset as indirect evidence. In this situation, the following principles were used to determine when to extrapolate:

a primary dataset is absent, of particularly high risk of bias or is judged to be not relevant to the review question under consideration, and
a review question is deemed by the GC to be important, such that in the absence of direct evidence, other data sources should be considered, and
non-primary data source(s) is in the view of the GC available, which may inform the review question.

When the decision to extrapolate was made, the following principles were used to inform the choice of the non-primary dataset:

the populations (usually in relation to the specified diagnosis or problem which characterises the population) under consideration share some common characteristic but differ in other ways, such as age, gender or in the nature of the disorder (for example, a common behavioural problem; acute versus chronic presentations of the same disorder), and
the interventions under consideration in the view of the GC have 1 or more of the following characteristics:
- share a common mode of action (for example, the pharmacodynamics of drug; a common psychological model of change – operant conditioning)
- be feasible to deliver in both populations (for example, in terms of the required skills or the demands of the health care system)
- share common side effects/harms in both populations, and
the context or comparator involved in the evaluation of the different datasets shares some common elements which support extrapolation, and
the outcomes involved in the evaluation of the different datasets shares some common elements which support extrapolation (for example, improved mood or a reduction in behaviour that challenges).

When the choice of the non-primary dataset was made, the following principles were used to guide the application of extrapolation:

the GC should first consider the need for extrapolation through a review of the relevant primary dataset and be guided in these decisions by the principles for the use of extrapolation
in all areas of extrapolation datasets should be assessed against the principles for determining the choice of datasets. In general the criteria in the 4 principles set out above for determining the choice should be met
in deciding on the use of extrapolation, the GC will have to determine if the extrapolation can be held to be reasonable, including ensuring that:
- the reasoning behind the decision can be justified by the clinical need for a recommendation to be made
- the absence of other more direct evidence, and by the relevance of the potential dataset to the review question can be established
- the reasoning and the method adopted is clearly set out in the relevant section of the guideline.

3.5.7. Method used to answer a review question in the absence of appropriately designed, high-quality research

In the absence of appropriately designed, high-quality research (including indirect evidence where it would be appropriate to use extrapolation), both formal and informal consensus processes were adopted.

3.5.7.1. Formal method of consensus

The modified nominal group technique (Bernstein et al., 1992) was chosen due to its suitability within the guideline development process. The method is concerned with deriving a group decision from a set of expert individuals and has been identified as the method most commonly used for the development of consensus in health care (Murphy et al., 1998). The nominal group technique requires participants to indicate their agreement with a set of statements about the intervention(s) of concern. These statements were developed by the NCCMH technical team drawing on the available sources of evidence on the methods of delivery and outcomes of the interventions. These sources of evidence could be supplemented by advice from external experts in the intervention(s). Agreement with the statements were rated on a 9-point Likert scale, where 1 represented least agreement and 9 represented most agreement. In the first round participants indicated the extent of their agreement with the statements and also provided written comment on their reason for any disagreement and how the statement could be modified.

In round 1, members were presented with an overview of the modified nominal group technique, a short summary of the available evidence, a consensus questionnaire containing the statements and instructions on the use of the questionnaire. Members were asked to rate their agreement with the statements taking into account the available evidence and their expertise. For the purpose of determining agreement, ratings were grouped into 3 categories to calculate the percentage agreement: 1–3 (inappropriate strategy), 4–6 (uncertain), or 7–9 (appropriate strategy or adaptation).

At the subsequent GC meeting, anonymised distributions of responses to each statement were given to all members, together with members’ additional comments and a ranking of statements based on consensus percentage agreement. Those statements with 80% or greater agreement were used to inform the drafting of recommendations, where appropriate taking into account the initial comments from and subsequent discussions with the GC.

For statements where there were 60 – 80% agreement a judgement was made based on the nature of the comments from the GC. If it appeared from the comments that the general principle included within the statement was agreed but that the comments could be addressed with some minor amendments incorporating the comments, the statements were used to inform the development of recommendations. Other statements that fell within this range were re-drafted based on the comments from the first rating and re-rated as in round 1 (round 2). If agreement at 80% or above on the re-rated was achieved, the statements were used to inform recommendations. Those that did not were discarded.

Any distribution of ratings with less than 60% agreement in round 1 was generally regarded as no consensus and discarded, unless obvious and addressable issues were identified from the comments.

3.5.7.2. Informal method of consensus

The informal consensus process involved a group discussion of what is known about the issues. The views of GC were synthesised narratively by a member of the review team, and circulated after the meeting. Feedback was used to revise the text, which was then included in the appropriate evidence review chapter.

3.6. Health economics methods

The aim of the health economics was to contribute to the guideline’s development by providing evidence on the cost effectiveness of interventions and services examined in this guideline. This was achieved by a systematic literature review of existing economic evidence in all areas covered in the guideline.

Economic modelling was planned to be undertaken in areas with likely major resource implications, where the current extent of uncertainty over cost effectiveness was significant and economic analysis was expected to reduce this uncertainty, in accordance with The Guidelines Manual. Prioritisation of areas for economic modelling was a joint decision between the Health Economist and the GC. The rationale for prioritising review questions for economic modelling was set out in an economic plan agreed between NICE, the GC, the Health Economist and the other members of the technical team. The following economic questions were selected as key issues that were addressed by economic modelling:

Interventions to prevent mental health problems in people with learning disabilities
Interventions to reduce and manage mental health problems in people with learning disabilities
Organisation and delivery of care for people with learning disabilities and mental health problems or at risk for mental health problems.

In addition, literature on the health-related quality of life (HRQoL) of people covered by this guideline was systematically searched to identify studies reporting appropriate utility scores that could be utilised in a cost-utility analysis.

The identified clinical evidence on the areas prioritised for economic modelling was very sparse and did not allow for the construction of a robust and informative economic model. Therefore, no economic modelling was carried out for this guideline. Nevertheless, the GC took into consideration resource implications and anticipated cost effectiveness of interventions and services for people with learning disabilities and mental health problems or at risk for mental health problems when making recommendations.

The methods adopted in the systematic literature review of economic evidence are described in the remainder of this section.

3.6.1. Search strategy for economic evidence

3.6.1.1. Scoping searches

A broad preliminary search of the literature was undertaken in September 2014 to obtain an overview of the issues likely to be covered by the scope, and help define key areas. Searches were restricted to economic studies and HTA reports, and conducted in the following databases:

Embase
MEDLINE/MEDLINE In-Process
HTA database (technology assessments)
NHS Economic Evaluation Database (NHS Economic Evaluation Database).

Any relevant economic evidence arising from the clinical scoping searches was also made available to the health economist during the same period.

3.6.1.2. Systematic literature searches

After the scope was finalised, a systematic search strategy was developed to locate all the relevant evidence. The balance between sensitivity (the power to identify all studies on a particular topic) and specificity (the ability to exclude irrelevant studies from the results) was carefully considered, and a decision made to utilise a broad approach to searching to maximise retrieval of evidence to all parts of the guideline. Searches were restricted to economic studies and health technology assessment reports, and conducted in the following databases:

Embase
HTA database (technology assessments)
MEDLINE/MEDLINE In-Process
NHS Economic Evaluation Database
PsycINFO.

Any relevant economic evidence arising from the clinical searches was also made available to the health economist during the same period.

The search strategies were initially developed for MEDLINE before being translated for use in other databases/interfaces. Strategies were built up through a number of trial searches, and discussions of the results of the searches with the review team and GC to ensure that all possible relevant search terms were covered. In order to assure comprehensive coverage, search terms for the guideline topic were kept purposely broad to help counter dissimilarities in database indexing practices and thesaurus terms, and imprecise reporting of study interventions by authors in the titles and abstracts of records.

For standard mainstream bibliographic databases (Embase, MEDLINE and PsycINFO) search terms for the guideline topic combined with a search filter for health economic studies. For searches generated in topic-specific databases (HTA, NHS Economic Evaluation Database) search terms for the guideline topic were used without a filter. The sensitivity of this approach was aimed at minimising the risk of overlooking relevant publications, due to potential weaknesses resulting from more focused search strategies. The search terms are set out in full in Appendix I.

3.6.1.3. Reference Management

Citations from each search were downloaded into reference management software and duplicates removed. Records were then screened against the inclusion criteria of the reviews before being quality appraised. The unfiltered search results were saved and retained for future potential re-analysis to help keep the process both replicable and transparent.

3.6.1.4. Search filters

The search filter for health economics is an adaptation of a pre-tested strategy designed by the Centre for Reviews and Dissemination (2007). The search filter is designed to retrieve records of economic evidence (including full and partial economic evaluations) from the vast amount of literature indexed to major medical databases such as MEDLINE. The filter, which comprises a combination of controlled vocabulary and free-text retrieval methods, maximises sensitivity (or recall) to ensure that as many potentially relevant records as possible are retrieved from a search. A full description of the filter is provided in Appendix I.

3.6.1.5. Date and language restrictions

Systematic database searches were initially conducted in November 2014 up to the most recent searchable date. Search updates were generated on a 6-monthly basis, with the final re-runs carried out in December 2015. After this point, studies were included only if they were judged by the GC to be exceptional (for example, the evidence was likely to change a recommendation).

Although no language restrictions were applied at the searching stage, foreign language papers were not requested or reviewed, unless they were of particular importance to an area under review. All the searches were restricted to research published from 2000 onwards in order to obtain data relevant to current healthcare settings and costs.

3.6.1.6. Other search methods

Other search methods involved scanning the reference lists of all eligible publications (systematic reviews, stakeholder evidence and included studies from the economic and clinical reviews) to identify further studies for consideration.

Full details of the search strategies and filter used for the systematic review of health economic evidence are provided in Appendix I.

3.6.2. Inclusion criteria for economic studies

The following inclusion criteria were used to select studies identified by the economic searches for further consideration:

Only studies from Organisation for Economic Co-operation and Development member countries were included, as the aim of the review was to identify economic information transferable to the UK context.
Selection criteria based on types of clinical conditions and service users as well as interventions assessed were identical to the clinical literature review.
Studies were included provided that sufficient details regarding methods and results were available to enable the methodological quality of the study to be assessed, and provided that the study’s data and results were extractable. Poster presentations of abstracts were excluded.
Full economic evaluations that compared 2 or more relevant options and considered both costs and consequences as well as costing analyses that compared only costs between 2 or more interventions were included in the review. Non-comparative studies were not considered in the review.
Economic studies were included if they used clinical effectiveness data from a clinical trial, a prospective or retrospective cohort study, a study with a before-and-after design, or from a literature review.

3.6.3. Applicability and quality criteria for economic studies

All economic papers eligible for inclusion were appraised for their applicability and quality using the methodology checklist for economic evaluations recommended in The Guidelines Manual (NICE, 2014). All studies that fully or partially met the applicability and quality criteria described in the methodology checklist were considered during the guideline development process. The completed methodology checklists for all economic evaluations considered in the guideline are provided in Appendix Q.

3.6.4. Presentation of economic evidence

The economic evidence considered in the guideline is provided in the respective evidence chapters, following presentation of the relevant clinical evidence. The references to included studies and the respective evidence tables with the study characteristics and results are provided in Appendix R. Characteristics and results of all economic studies considered during the guideline development process are summarised in economic evidence profiles provided in Appendix S.

3.6.5. Results of the systematic search of economic literature

The titles of all studies identified by the systematic search of the literature were screened for their relevance to the topic (that is, economic issues and information on HRQoL). References that were clearly not relevant were excluded first. The abstracts of all potentially relevant studies (124 references) were then assessed against the inclusion criteria for economic evaluations by the health economist. Full texts of the studies potentially meeting the inclusion criteria (including those for which eligibility was not clear from the abstract) were obtained. Studies that did not meet the inclusion criteria, were duplicates, were secondary publications of 1 study, or had been updated in more recent publications were subsequently excluded. An economic evaluation conducted for a previously published NICE guideline was also included in the systematic review as eligible for this guideline. All economic evaluations eligible for inclusion (5 studies) were then appraised for their applicability and quality using the methodology checklist for economic evaluations. Finally, those studies that fully or partially met the applicability and quality criteria set by NICE were considered at formulation of the guideline recommendations. A flow diagram of the search process for selection of studies for inclusion in the economic literature review conducted for this guideline is provided in Appendix P.

3.7. From evidence to recommendations

Once the clinical and health economic evidence was summarised, the GC drafted the recommendations. In making recommendations, the GC took into account the trade-off between the benefits and harms of the intervention/instrument, as well as other important factors, such as the relative value of different outcomes reported in the evidence, quality of the evidence, trade-off between net health benefits and resource use, values and experience of the GC and society, current clinical practice, the requirements to prevent discrimination and to promote equality^⁴, and the GC’s awareness of practical issues (Eccles et al., 1998; NICE, 2012a).

Finally, to show clearly how the GC moved from the evidence to the recommendations, each chapter (or sub-section) has a section called ‘recommendations and link to evidence’. Underpinning this section is the concept of the ‘strength’ of a recommendation (Schünemann et al., 2003). This takes into account the quality of the evidence but is conceptually different. Some recommendations are ‘strong’ in that the GC believes that the vast majority of healthcare professionals and service users would choose a particular intervention if they considered the evidence in the same way that the GC has. This is generally the case if the benefits clearly outweigh the harms for most people and the intervention is likely to be cost effective. However, there is often a closer balance between benefits and harms, and some service users would not choose an intervention whereas others would. This may happen, for example, if some service users are particularly averse to some side effect and others are not. In these circumstances the recommendation is generally weaker, although it may be possible to make stronger recommendations about specific groups of service users. The strength of each recommendation is reflected in the wording of the recommendation, rather than by using ratings, labels or symbols.

Where the GC identified areas in which there are uncertainties or where robust evidence was lacking, they developed research recommendations. Those that were identified as ‘high priority’ were developed further in the NICE version of the guideline, and presented in Appendix G.

3.8. Stakeholder contributions

Professionals, service users, and companies have contributed to and commented on the guideline at key stages in its development. Stakeholders for this guideline include:

service user and carer stakeholders: national service user and carer organisations that represent the interests of people whose care will be covered by the guideline
local service user and carer organisations: but only if there is no relevant national organisation
professional stakeholders’ national organisations: that represent the healthcare professionals who provide the services described in the guideline
commercial stakeholders: companies that manufacture drugs or devices used in treatment of the condition covered by the guideline and whose interests may be significantly affected by the guideline
providers and commissioners of health services in England
statutory organisations: including the Department of Health
Government, NHS Quality Improvement Scotland, the Care Quality Commission and the National Patient Safety Agency
research organisations: that have carried out nationally recognised research in the area.

NICE clinical guidelines are produced for the NHS in England, so a ‘national’ organisation is defined as 1 that represents England, or has a commercial interest in England.

Stakeholders have been involved in the guideline’s development at the following points:

commenting on the initial scope of the guideline and attending a scoping workshop held by NICE
commenting on the draft of the guideline.

3.9. Validation of the guideline

Registered stakeholders had an opportunity to comment on the draft guideline, which was posted on the NICE website during the consultation period. Following the consultation, all comments from stakeholders and experts (see Appendix D) were responded to, and the guideline updated as appropriate. NICE also reviewed the guideline and checked that stakeholders’ comments had been addressed.

Following the consultation period, the GC finalised the recommendations and the NGA produced the final documents. These were then submitted to NICE for a quality assurance check. Any errors were corrected by the NGA, then the guideline was formally approved by NICE and issued as guidance to the NHS in England.

Footnotes

1: Based on the approach suggested by Furukawa and colleagues (2006).
2: For further information about GRADE, see www.gradeworkinggroup.org
3: A primary data set is defined as a data set which contains evidence on the population and intervention under review.
4: See NICE’s equality scheme: www.nice.org.uk/aboutnice/howwework/NICEEqualityScheme.jsp

All rights reserved. NICE copyright material can be downloaded for private research and study, and may be reproduced for educational and not-for-profit purposes. No reproduction by or for commercial organisations, or for commercial purposes, is allowed without the written permission of NICE.

Bookshelf ID: NBK401812

Contents