Methods

National Clinical Guideline Centre (UK)

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

National Clinical Guideline Centre (UK). Stroke Rehabilitation: Long Term Rehabilitation After Stroke [Internet]. London: Royal College of Physicians (UK); 2013 May 23. (NICE Clinical Guidelines, No. 162.)

Stroke Rehabilitation: Long Term Rehabilitation After Stroke [Internet].

Show details

Contents

< Prev Next >

4Methods

This chapter sets out in detail the methods used to generate the recommendations that are presented in subsequent chapters. This guidance was developed in accordance with the methods outlined in the NICE Guidelines Manual 2009¹⁸⁷.

4.1. Developing the review questions and outcomes

Review questions were developed in a PICO framework (patient, intervention, comparison and outcome) for intervention reviews. This was to guide the literature searching process, appraisal, and synthesis of evidence and to facilitate the development of recommendations by the guideline development group (GDG). They were drafted by the NCGC technical team and refined and validated by the GDG. The questions were based on the key clinical areas identified in the scope (Appendix A).

A total of 22 review questions were identified.

Full literature searches, critical appraisals and evidence reviews were completed for all the specified clinical questions.

Chapter	Review questions	Outcomes
Structure and settings: stroke units	In people after stroke, does organised rehabilitation care (comprehensive, rehabilitation and mixed rehabilitation stroke units) improve outcome (mortality, dependency, requirement for institutional care and length of hospital stay)?	Death Death or dependency Death or institutional care Duration of stay in hospital or institution or both Quality of life Patient and carer satisfaction
Structure and settings: early supported discharge	In people after stroke what is the clinical and cost-effectiveness of early supported discharge versus usual care?	Barthel Index Length of hospital stay Functional Independence Measure (FIM) Caregiver strain index Falls Readmissions to hospital Hospital Anxiety and Depression Scale (HADS) Mortality Quality Of Life Nottingham Extended Activities of Daily Living
Service delivery: goal setting	Does the application of patient goal setting as part of planning stroke rehabilitation activities lead to an improvement in psychological wellbeing, functioning and activity?	Psychological wellbeing views about the quality of the goal setting process satisfaction with outcome health related quality of life physical function Activities of Daily Living (ADL)
Service delivery: intensity of rehabilitation	In people after stroke what is the clinical and cost-effectiveness of intensive rehabilitation versus standard rehabilitation?	Length of stay Functional Independence Measure (FIM) Barthel Index Quality of Life (any measure) Nottingham Activities of Daily Living Rankin Rivermead mobility index Frenchay Activities Index
Support and information: supported information provision	What is the clinical and cost- effectiveness of supported information provision versus unsupported information provision on mood and depression in people with stroke?	Impact on mood/depression: Hospital Anxiety and Depression Scale (HADS) General Health Questionnaire Visual Analogue Mood Scale Stroke Aphasic Depression Questionnaire (SAD-Q) Geriatric Depression Scale Beck Depression Inventory Self-efficacy General Self-efficacy Scale Stroke Self-efficacy Questionnaire Locus of Control Scale Extended activities of daily living (EADL) Nottingham extended ADL Frenchay Activities Index Yale mood question
Cognitive functions: visual neglect	In people after stroke what is the clinical and cost-effectiveness of cognitive rehabilitation versus usual care to improve spatial awareness and/or visual neglect?	Mini-mental state examination (MMSE), Behavioural Inattention Test (BIT), Drawing tests (for example: clock drawing ), Line Bisection tests, All cancellation tests (including: line cancellation, bell cancellation ), Sentence reading, Target screen examinations (lump together all cancellation tests and drawing tests), Rivermead Perceptual Assessment Battery (RPAB)
Cognitive functions: memory functions	In people after stroke what is the clinical and cost-effectiveness of memory strategies versus usual care to improve memory?	Wechsler Memory Scale, Rivermead behavioural memory assessment, Mini-mental state examination (MMSE), Addenbrook’s Cognitive Examination- Revised, Abbreviated Mental Test Score.
Cognitive functions: attention function	In people after stroke what is the clinical and cost-effectiveness of sustained attention training versus usual care to improve attention?	Mini-mental state examination, Behavioural inattention test, drawing tests, line- bisection test, cancellation tests, sentence reading, target screen examinations, Rivermead Perceptual Assessment Battery
Emotional functioning	In people after stroke what is the clinical and cost effectiveness of psychological therapies provided to the family (including the patients)?	Quality of Life (for both carer and patient)– Any QOL and depression outcomes including the following: stroke impact scale, EuroQoL, care giver burden scale, caregiver strain index, carer strain index, burden of stroke scale, Stroke and aphasia quality of life scale, ASCOT scale. Occurrence of depression/anxiety/mood in carers– Beck Depression Inventory, Beck Depression Inventory 2, Geriatric Depression Scale, neuropsychiatric inventory, Hospital Anxiety and Depression Scale (HADS), General health questionnaire, Visual Analogue Mood Scale, SADQ.
Vision: eye movement therapy	In people after stroke what is the clinical and cost-effectiveness of eye movement therapy for visual field loss versus usual care?	Reading (speed and accuracy) Eye movement tasks Scanning Letter Cancellation Test
Digestive systems: swallowing	In people after stroke what is the clinical and cost-effectiveness of interventions for swallowing versus alternative interventions	Occurrence of aspiration pneumonia Occurrence of chest infections Reduction in hospital stay Reduction in re-admission Return to normal diet
Communication: Aphasia	In people after stroke is speech and language therapy compared to no speech and language therapy or placebo (social support and stimulation) effective in improving language/communication abilities and/or psychological wellbeing?	Functional communication (language or communication skills sufficient to permit the transmission of message via spoken, written or non-verbal modalities, or a combination of these channels) Formal measures of receptive language skills (language understanding) Formal measures of expressive language skills (language production) Overall level of severity of aphasia as measured by specialist test batteries (may include Western Aphasia Battery or Porch Index of Communicative Abilities) Psychological or social wellbeing including depression, anxiety and distress Patient satisfaction/carer and family views Compliance/drop-out
Communication: Dysarthria	In people after stroke is speech and language therapy compared to social support and stimulation effective in improving dysarthria?	Measures of functional communication Formal measures of receptive language skills (language understanding) Formal measures of expressive language skills (language production) Psychological or social wellbeing including depression, anxiety and distress Frenchay Dysarthria Assessment. Measures of articulation (range, speed, strength, and co-ordination) Perceptual measures of voice and prosody (for example, Vocal Profile Analysis) Acoustic measures (for example, fundamental frequency, pitch perturbation (jitter), amplitude perturbation (shimmer), as measured by, computerised sound or spectrography)
Communication: intensity of speech and language therapy	In people after stroke with communication difficulties what is the clinical and cost-effectiveness of intensive speech therapy versus standard speech therapy?	Any outcome reported in the papers. Examples include: Functional Assessment of Communication Skills for Adults (ASHA FACS) Boston Naming Test Western Aphasia Battery Stroke Dysphasia Index McKenna Graded Naming Test
Communication: Listener advice	What listener advice skills/information would help family members/carers improve communication in people with aphasia after stroke?	Any outcome Quality of life
Movement strength training	In people after stroke what is the clinical and cost-effectiveness of strength training versus usual care on improving function and reducing disability?	Upper Limb MRC Scale Newton Metres Fugl-meyer Action Research Arm Test (ARAT) Functional Independence Measurement (FIM) Barthel Index Adverse events–pain or spasticity Lower Limb/Trunk Timed Up and Go Test Any timed walk Walking distance Functional; Independence Measure (FIM) Barthel Index Adverse events–falls, pain or spasticity
Movement: fitness training	In people after stroke, does cardiorespiratory or resistance fitness training improve outcome (fitness, function, quality of life, and mood) and reduce disability?	Mortality rate Dependence or level of disability Physical fitness Mobility Physical function Quality of life Mood
Movement: hand and arm: orthoses upper limb	In people after stroke what is the clinical and cost-effectiveness of orthoses for prevention of loss of range of the upper limb versus usual care?	Range of movement assessed by goniometry
Movement: hand and arm: electrical stimulation	In people after stroke what is the clinical and cost-effectiveness of Electrical Stimulation for hand function versus usual care?	Any outcome reported in the paper. Upper Limb outcomes including: Action Research Arm Test (ARAT) Fugl-Meyer Assessment (FMA) 9 hole peg test grip strength.
Movement: Hand and arm: constraint induced movement therapy	In people after stroke what is the clinical and cost-effectiveness of constraint-induced therapy versus usual care on improving function and reducing disability?	Functional Independence Measure (FIM) Barthel Index Fugl-Meyer Assessment Action Research Arm Test (ARAT) Wolf Motor Function Test (WMFT) 9 hole peg test Any adverse event
Movement: Repetitive task training	In people after stroke what is the clinical and cost-effectiveness of repetitive task training versus usual care on improving function and reducing disability?	Lower limb Any timed walk, 6m, 5m, 10m walk Change in walking distance Rivermead mobility index Upper limb Arm: Fugl-Meyer Assessment, Action Research Arm Test (ARAT) Hand: Any peg hole test, Frenchay Arm Test, Motor Assessment Scale (MAS)
Movement: walking therapy: treadmill training	In people after stroke what is the clinical and cost-effectiveness of all treadmill versus usual care on improving walking? In people after stroke who can walk, what is the clinical and cost- effectiveness of treadmill plus body support versus treadmill only on improving walking?	Walking speeds (5 m/10 m/30 m) Timed walk Walking endurance Functional Independence Measure (FIM) Barthel Index Rivermead Mobility Index
Movement: walking therapy: electromechanical gait training	In people after stroke what is the clinical and cost-effectiveness of electromechanical gait training versus usual care on improving function and reducing disability?	Walking speeds (5 metres/10 metres/30 metres) Any timed walk Walking endurance Functional Independence Measure (FIM) Barthel Index Rivermead Mobility Index
Movement: walking therapy: orthoses ankle-foot	In people after stroke what is the clinical and cost-effectiveness of ankle-foot orthoses of all types to improve walking function versus usual care?	Gait speed: 6 min walk, 10 m timed walk Lower limb MAS (stairs) Timed walk Walking endurance Functional Independence Measure(FIM)/Barthel Index Rivermead Mobility Index Cadence Gait symmetry (stance time, step length) Quality of Life outcomes
Self-care	In people after stroke what is the clinical and cost-effectiveness of intensive occupational therapy focused specifically on personal activities of daily living versus usual care?	Nottingham Extended Activities of Daily Living (NEADL) Extended Activities of Daily Living (EADL) Functional Independence Measure (FIM) Barthel Index Nottingham Stroke Dressing Assessment Northwick Park Nursing Dependency Scale Rivermead Mobility Index
Long term health and social support	In people after stroke what is the clinical and cost-effectiveness of interventions to aid return to work versus usual care?	Same job same employer Same job different employer Different job same employer Different job different employer Unemployment Retired due to ill health Voluntary work Benefit claims

During the development of questions concerning employment and return to work, provision of information, delivery of psychological therapies and early supported discharge, the GDG took the following issues into consideration:

When the GDG formulated the question about aids to return to work, they acknowledged the universal consensus in the literature about the predictive factors restricting people after stroke to return to work. For this reason, they believed that the review of observational or cohort studies investigating this issue would not provide any added value in the formulation of recommendations for this guideline. The GDG believed that randomised trials investigating the impact of any type of intervention that could facilitate people to return to employment (either former or new employment) was a higher priority for the purposes of this guideline. In addition, the GDG noted that the nature of vocational interventions would be very diverse and tailored to individual circumstances (type of disability, nature of employment).
During the formulation of a question related to provision of information for people after stroke and their carers, the GDG had a full discussion with regard to the large and heterogeneous area of information provision. We were clearly unable to address all information aspects within the timeline available. The GDG agreed that people after stroke live in a rich information environment, although it is not always tailored to the patient’s needs. The GDG felt it was particularly important to look at the evidence pertaining to the provision of ‘supported’ information (information given with additional support of some kind such as the active provision of information, the encouragement of feedback, availability of peer support or use of interactive computer programme as opposed to the provision of leaflets/booklets in isolation) in order to investigate its impact on mood and depression in people after stroke and potentially direct the development of recommendations in this area.
For the psychological support question, the GDG thought that this should investigate the effectiveness of the psychological therapies such as family therapy, cognitive-behaviour therapy and relationship counselling provided to the family (including the person with stroke) on the quality of life of people’s with stroke and their carers. The group acknowledged that it was not usual to have a psychological therapy in isolation and therefore all of these therapies may also include some form of education in combination. In light of the publication of the ‘Patient experience in adult NHS services’ (NICE clinical guideline 138) the GDG agreed that this guidance could be cross-referenced where appropriate
When formulating the question on early supported discharge, the GDG agreed to investigate the effectiveness of early supported discharge on improving specific patient and hospital related outcomes (such as mortality, quality of life, readmissions and length of stay in the hospital). The GDG did not consider that patients would have any different information needs after early supported discharge to other patients being discharged from hospital.

During the development of questions for this guideline scoping searches for cohort studies were undertaken and we consulted with the GDG on whether they were aware of any large cohort studies in these areas that would justify including studies other than randomised trials. None were identified.

4.2. Searching for evidence

4.2.1. Clinical literature search

The aim of the literature review was to identify all available, relevant published evidence in relation to the key clinical questions generated by the GDG. Systematic literature searches were undertaken to identify evidence within published literature in order to answer the review questions as per The Guidelines Manual [2009]¹⁸⁷. Clinical databases were searched using relevant medical subject headings, free-text terms and study type filters where appropriate. Studies published in languages other than English were not reviewed. Where possible, searches were restricted to articles published in English language. All searches were conducted on core databases, MEDLINE, Embase, Cinahl and The Cochrane Library. Additional subject specific databases were used for some questions: PsycInfo for patient views, all searches were updated on 5^th Oct 2012. No papers after this date were considered.

Search strategies were checked by looking at reference lists of relevant key papers, checking search strategies in other systematic reviews and asking the GDG for known studies in a specific area. The questions, the study types applied, the databases searched and the years covered can be found in Appendix [D].

During the scoping stage, a search was conducted for guidelines and reports on the websites listed below and on organisations relevant to the topic. Searching for grey literature or unpublished literature was not undertaken. All references sent by stakeholders were considered.

Guidelines International Network database (www.g-i-n.net)
National Guideline Clearing House (www.guideline.gov/)
National Institute for Health and Clinical Excellence (NICE) (www.nice.org.uk)
National Institutes of Health Consensus Development Program (consensus.nih.gov/)
Health Information Resources, NHS Evidence (www.library.nhs.uk/)

The titles and abstracts of records retrieved by the searches were scanned for relevance to the GDG’s clinical questions. Any potentially relevant publications were obtained in full text. These were assessed against the inclusion criteria and the reference lists were scanned for any articles not previously identified. Further references were also suggested by the GDG.

4.2.2. Health economic literature search

Systematic literature searches were also undertaken to identify health economic evidence within published literature relevant to the review questions. The evidence was identified by conducting a broad search relating to the guideline population in the NHS economic evaluation database (NHS EED), the Health Economic Evaluations Database (HEED) and health technology assessment (HTA) databases with no date restrictions. Additionally, the search was run on MEDLINE and Embase, with a specific economic filter, to ensure recent publications that had not yet been indexed by these databases were identified. Studies published in languages other than English were not reviewed. Where possible, searches were restricted to articles published in English language.

The search strategies for health economics are included in Appendix [D]. All searches were updated on 5^th Oct 2012. No papers published after this date were considered.

4.3. Evidence of effectiveness

The Research Fellow:

Identified potentially relevant studies for each review question from the relevant search results by reviewing titles and abstracts. Twenty per cent of the sift and selection of papers was quality assured by a second reviewer to eliminate any potential of selection bias or error. Full papers were then obtained.
Reviewed full papers against pre-specified inclusion/exclusion criteria to identify studies that addressed the review question in the appropriate population and reported on outcomes of interest (review protocols are included in Appendix [D]).
Critically appraised relevant studies using the appropriate checklist as specified in The Guidelines Manual¹⁸⁷
Extracted key information about the study’s methods and results into evidence tables (evidence tables are included in Appendix [H]).
Generated summaries of the evidence by outcome (included in the relevant chapter write-ups):
- Randomised studies: meta-analysed, where appropriate and reported in GRADE profiles (for clinical studies) – see below for details.

4.3.1. Inclusion/exclusion criteria

The inclusion/exclusion of studies was based on the review protocols. The GDG were consulted about any uncertainty regarding inclusion/exclusion of selected studies. Minimum sample size and the proportion of participants with stroke were among the inclusion/exclusion criteria used for the selection of studies in the evidence reviews. The GDG agreed that (with the exception of review questions on cognitive functions and Functional Electrical Stimulation) the sample size of 20 participants (10 in each arm) would be the minimum requirement for a study to be included. For the review questions on cognitive functions, the minimum sample size would be set at 10 participants in total due to the nature of interventions and the availability of studies in the literature. This decision on studies’ sample size cut off points was made for pragmatic reasons.

We have included any study on stroke population at least 2 weeks post stroke. We didn’t apply any restriction on selection of studies with populations on long term rehabilitation.

Due to the nature of interventions investigated in the following evidence reviews; memory strategies, eye movement therapy, swallowing, constraint induced movement therapy, treadmill, electromechanical gait training, ankle-foot, aids to return to work, which aimed ultimately to reduce disability and would be applicable to other populations (who have not experienced stroke), the GDG decided that we could use mixed populations for reviewing these questions, as long as the minimum proportion of participants with stroke in these studies was set at 50%. See the review protocols in Appendix E and excluded studies by the review questions (with their exclusion reasons) in Appendix M for full details.

4.3.2. Methods of combining clinical studies

Data synthesis for intervention reviews

Where possible, meta-analyses were conducted to combine the results of studies for each review question using Cochrane Review Manager (RevMan5) software. Fixed-effects (Mantel-Haenszel) techniques were used to calculate risk ratios (relative risk) for the binary outcomes. The outcome(s) was(were) analysed using an inverse variance method for pooling weighted mean differences and where the studies had different scales, standardised mean differences were used.

Statistical heterogeneity was assessed by considering the chi-squared test for significance at p<0.1 or an I-squared inconsistency statistic of >50% to indicate significant heterogeneity. Where significant heterogeneity was present, we carried out a sensitivity analysis with particular attention paid to allocation concealment, blinding and loss to follow-up (missing data). In cases where there was inadequate allocation concealment, unclear blinding or differential missing data more than 20% in the two groups, this was examined in a sensitivity analysis. For the latter, the duration of follow-up was also taken into consideration prior to including in a sensitivity analysis. No subgroup analyses were predefined with the exception of the clinical question for constraint induced therapy for which a subgroup analysis on duration of intervention (more or less than 5 hours) was pre-specified (see Appendix E for further details).

If no sensitivity analysis was found to completely resolve statistical heterogeneity then a random effects (DerSimonian and Laird) model was employed to provide a more conservative estimate of the effect.

For continuous outcomes, the means and standard deviations were required for meta-analysis. However, in cases where standard deviations were not reported, the standard error was calculated if the p-values or 95% confidence intervals were reported and meta-analysis was undertaken with the mean and standard error using the generic inverse variance method in Cochrane Review Manager (RevMan5) software. When the only evidence was based on studies summarised results by only presenting medians (and interquartile range), or only p values this information was included in the GRADE tables without calculating the relative and absolute effect. Consequently, imprecision of effect could not be assessed when results were not presented in the studies by means and standard deviations.

For binary outcomes, absolute event rates were also calculated using the GRADEpro software using event rate in the control arm of the pooled results.

The results from cross over studies were combined in a meta-analysis with those from parallel randomised trials, only after corrections have been made to the standard error for the crossover trials.

4.3.3. Type of studies

Systematic reviews, double blinded, single blinded and unblinded parallel randomised controlled trials (RCTs) and cross over randomized studies were included in the evidence reviews for this guideline.

We included randomised trials, as they are considered the most robust type of study design that could produce an unbiased estimate of the intervention effects. The GDG believed that the reason why no large trials were found for this population was largely because stroke units are relatively new and prior to their formation it has not been possible to conduct large multi-centre RCTs.

We also searched for systematic reviews of cohort studies, however none was found in any review question. The GDG decided not to include individual cohort studies. Cohort studies have been based in rehabilitation units where there are mixed population groups and extracting stroke data from those mixed populations would be challenging. Preliminary searches undertaken did not find any large cohort studies; therefore the GDG agreed that individual cohort studies would not provide any added value to the reviews of individual interventions.

For most of the reviews the content of interventions and the referred populations within the included studies was found to be very diverse, making the extraction of relevant data challenging and time consuming. In addition, the GDG had difficulties in drawing overall conclusions on the body of evidence presented and it was often not possible to make recommendations specifying what interventions should comprise of. In these instances, the GDG decided that the results of each outcome should be presented separately for each study and a meta-analysis could not be conducted. Due to the diversity of interventions, it was decided to include a summary table of studies included with individual characteristics (population, intervention, control, outcomes) at the beginning of each evidence review.

4.3.4. Type of analysis

Estimates of effect from individual studies were based on Intention To Treat (ITT) analysis with the exception of the outcome of experience of adverse events whereas we used Available Case Analysis (ACA). ITT analysis is where all participants included in the randomisation process were considered in the final analysis based on the intervention and control groups to which they were originally assigned. We assumed that participants in the trials lost to follow-up did not experience the outcome of interest (for categorical outcomes) and they would not considerably change the average scores of their assigned groups (for continuous outcomes).

It is important to note that ITT analyses tend to bias the results towards no difference. ITT analysis is a conservative approach to analyse the data, and therefore the effect may be smaller than in reality.

However, the majority of outcomes selected to be reviewed were continuous outcomes, very few people dropped out and most of the studies reported data on an ITT basis.

4.3.5. Appraising the quality of evidence by outcomes

The evidence for outcomes from the included RCTs was evaluated and presented using an adaptation of the ‘Grading of Recommendations Assessment, Development and Evaluation (GRADE) toolbox’ developed by the international GRADE working group (http://www.gradeworkinggroup.org/). The software (GRADEpro) developed by the GRADE working group was used to assess the quality of each outcome, taking into account individual study quality and the meta-analysis results. The summary of studies characteristics and findings was presented in one table in this guideline. The “Clinical/Economic Study Characteristics” table includes details of the quality assessment while the “Clinical/Economic Summary of Findings” table includes pooled outcome data and where appropriate, an absolute measure of intervention effect and the summary of quality of evidence for that outcome. In this table, the columns for intervention and control indicate summaries of the sum of the sample size for continuous outcomes. For binary outcomes such as number of patients with an adverse event, the event rates (n/N: number of patients with events divided by sum of number of patients) are shown with percentages. Reporting or publication bias was only taken into consideration in the quality assessment and included in the Clinical Study Characteristics table if it was apparent.

Each outcome was examined separately for the quality elements listed and defined in Table 1 and each graded using the quality levels listed in Table 2. The main criteria considered in the rating of these elements are discussed below (see section 4.3.6 Grading of Evidence). Footnotes were used to describe reasons for grading a quality element as having serious or very serious problems. The ratings for each component were summed to obtain an overall assessment for each outcome.

Table 1

Description of quality elements in GRADE for intervention studies.

Table 2

Levels of quality elements in GRADE.

Table 3: The GRADE toolbox is currently designed only for randomised trials and observational studies

Table 3

Overall quality of outcome evidence in GRADE.

Table 1: Descriptions of quality elements in GRADE for intervention studies

4.3.6. Grading the quality of clinical evidence

After results were pooled, the overall quality of evidence for each outcome was considered. The following procedure was adopted when using GRADE:

11.: A quality rating was assigned, based on the study design. RCTs start HIGH and observational studies as LOW, uncontrolled case series as LOW or VERY LOW.
12.: The rating was then downgraded for the specified criteria: Study limitations, inconsistency, indirectness, imprecision and reporting bias. These criteria are detailed below. Observational studies were upgraded if there was a large magnitude of effect, dose-response gradient, and if all plausible confounding would reduce a demonstrated effect or suggest a spurious effect when results showed no effect. Each quality element considered to have ‘serious’ or ‘very serious’ risk of bias was rated down 1 or 2 points respectively.
13.: The downgraded/upgraded marks were then summed and the overall quality rating was revised. For example, all RCTs started as HIGH and the overall quality became MODERATE, LOW or VERY LOW if 1, 2 or 3 points were deducted respectively.
14.: The reasons or criteria used for downgrading were specified in the footnotes.

The details of criteria used for each of the main quality element are discussed further in the following sections 4.3.7 to 4.3.10.

4.3.7. Study limitations

The main limitations for randomised controlled trials are listed in Table 4.

Table 4

Study limitations of randomised controlled trials.

Outcomes from studies which were not double blinded were downgraded on study limitations due to the higher risk of bias. However, the GDG expressed their concern that conducting double blinded trials in stroke rehabilitation was not practical as it would be impossible to blind the trial participant due to the nature of the interventions delivered in stroke rehabilitation. However, single blinded and unblinded trials were downgraded to maintain a consistent approach in quality rating across the guideline following the application of GRADE system, recognising that a double blinded trial would provide the least biased outcomes in a clinical setting. Table 4 listed the limitations considered for randomised controlled trials.

4.3.8. Inconsistency

Inconsistency refers to an unexplained heterogeneity of results. When estimates of the treatment effect across studies differ widely (i.e. heterogeneity or variability in results), this suggests true differences in underlying treatment effect. When heterogeneity exists (Chi square p<0.1 or I- squared inconsistency statistic of >50%), but no plausible explanation can be found (for example acute or chronic stroke populations, duration of intervention, different follow-up periods), the quality of evidence was downgraded by one or two levels, depending on the extent of uncertainty to the results contributed by the inconsistency in the results. Due to the diversity of interventions used in the included trials for this guideline, there were cases where the GDG believed the presentation of evidence should be kept separate and explanatory footnotes were given in GRADE tables where appropriate. In addition to the I- square and Chi square values, the decision for downgrading was also dependent on factors such as whether the intervention is associated with benefit in all other outcomes or whether the uncertainty about the magnitude of benefit (or harm) of the outcome showing heterogeneity would influence the overall judgment about net benefit or harm (across all outcomes).

If inconsistency could be explained based on pre-specified subgroup analysis, the GDG took this into account and considered whether to make separate recommendations based on the identified explanatory factors, i.e. population and intervention. Where subgroup analysis gives a plausible explanation of heterogeneity, the quality of evidence would not be downgraded. The most common factor of subgroup analysis was the time since stroke event and the GDG considered the evidence of some outcomes separately for acute and chronic stroke patients.

4.3.9. Indirectness

Directness refers to the extent to which the populations, intervention, comparisons and outcome measures are similar to those defined in the inclusion criteria for the reviews. Indirectness is important when these differences are expected to contribute to a difference in effect size, or may affect the balance of harms and benefits considered for an intervention. The GDG decided that for specific questions (for example the review of interventions to assess clinical and cost effectiveness of interventions to aid return to work) the review of evidence could include mixed populations with at least 50% stroke patients.

4.3.10. Imprecision

The sample size, event rates, the resulting width of confidence intervals and the minimal important difference in the outcome between the two groups were the main criteria considered.

The thresholds of important benefits or harms, or the MID (minimal important difference) for an outcome are important considerations for determining whether there is a “clinically important” difference between intervention and control groups and in assessing imprecision. For continuous outcomes, the MID is defined as “the smallest difference in score in the outcome of interest that informed patients or informed proxies perceive as important, ether beneficial or harmful, and that would lead the patient or clinician to consider a change in the management (⁹⁸ ¹²⁴^,²³¹^,²³²). An effect estimate larger than the MID is considered to be “clinically important”. For dichotomous outcomes, the MID is considered in terms of changes of absolute risk.

The difference between two interventions, as observed in the studies, was compared against the MID when considering whether the findings were of “clinical importance”; this is useful to guide decisions. For example, if the effect was small (less than the MID), this finding suggests that there may not be enough difference to strongly recommend one intervention over the other based on that outcome.

We searched the literature for published studies which gave a minimal important difference point estimate for the outcomes specified in the protocol and agreement was obtained from the GDG for their use in assessing imprecision throughout the reviews in the guideline. Table 5 presents the MID thresholds used for the specified outcomes and the source of base evidence. Where no published studies were found on MIDs for outcomes, the default GRADE pro MIDs was used. For categorical data, we checked whether the confidence interval of the effect crossed one or two ends of the range of 0.75–1.25. For quantitative outcomes two approaches were used. When only one trial was included as the evidence base for an outcome, the mean difference was converted to the standardized mean difference (SMD) and checked to see if the confidence interval crossed 0.5. However, the mean difference (95% confidence interval) was still presented in the Grade tables. If two or more included trials reported a quantitative outcome then the default approach of multiplying 0.5 by standard deviation (taken as the median of the standard deviations across the meta-analysed studies) was employed. When the default MIDs were used, the GDG would assess the estimate of effect with respects to the MID, and then the imprecision may be reconsidered.

Table 5

Agreed MIDs from the literature.

The confidence interval for the pooled or best estimate of effect was considered in relation to the MID, as illustrated in Figure 1. Essentially, if the confidence interval crossed the MID threshold, there was uncertainty in the effect estimate in supporting our recommendation (because the CI was consistent with two decisions) and the effect estimate was rated as imprecise.

Figure 1

Illustration of precise and imprecision outcomes based on the confidence interval of outcomes in a Forrest plot. Source: Figure adapted from GRADEPro software.

MID = minimal important difference determined for each outcome. The MIDs are the threshold for appreciable benefits and harms. The confidence intervals of the top three points of the diagram were considered precise because the upper and lower limits did not cross the MID. Conversely, the bottom three points of the diagram were considered imprecise because all of them crossed the MID and reduced our certainty of the results.

4.4. Evidence of cost-effectiveness

The Guideline Development Group (GDG) is required to make decisions based on the best available evidence of both clinical and cost effectiveness. Guideline recommendations should be based on the estimated costs of the treatment options in relation to their expected health benefits (that is, their ‘cost effectiveness’), rather than on the total cost or resource impact of implementing them. Thus, if the evidence suggests that an intervention provides significant health benefits at an acceptable cost per patient treated, it should be recommended even if it would be expensive to implement across the whole population.

Evidence on cost effectiveness related to the key clinical issues being addressed in the guideline was sought. The health economist undertook:

A systematic review of the published economic literature.
New cost-effectiveness analysis in priority areas.

When no relevant published studies were found, and a new analysis was not prioritised, the GDG made a qualitative judgement about cost effectiveness by considering expected differences in resource use between comparators and relevant UK NHS unit costs alongside the results of the clinical review of effectiveness evidence. Where considered useful, this included calculation of expected cost differences and consideration of the QALY gain that would be required to justify the expected additional cost of the intervention being considered. Unit costs were based on published national source where available. Staff costs are reported using the typical salary band of someone delivering the intervention as identified by clinical GDG members. It should be noted however that in practice staff bands will vary due to the need for a skill mix across teams. Inputs to calculations should not be interpreted as recommendations about who should deliver care.

4.4.1. Literature review

The health economist:

Identified potentially relevant studies for each review question from the economic search results by reviewing titles and abstracts – full papers were then obtained.
Reviewed full papers against pre-specified inclusion/exclusion criteria to identify relevant studies (see below for details).
Critically appraised relevant studies using the economic evaluations checklist as specified in The Guidelines Manual¹⁸⁷.
Extracted key information about the study’s methods and results into evidence tables (evidence tables are included in Appendix H).
Generated summaries of the evidence in NICE economic evidence profiles (included in the relevant chapter write-ups) – see below for details.

4.4.1.1. Inclusion/exclusion

Full economic evaluations (studies comparing costs and health consequences of alternative courses of action: cost–utility, cost-effectiveness, cost-benefit and cost-consequence analyses) and comparative costing studies that addressed the review question in the relevant population were considered potentially applicable as economic evidence.

Studies that only reported cost per hospital (not per patient), or only reported average cost effectiveness, without disaggregated costs and effects, were excluded. Abstracts, posters, reviews, letters/editorials, foreign language publications and unpublished studies were excluded. Studies judged to have an applicability rating of ‘not applicable’ were excluded (this included studies that took the perspective of a non-OECD country).

Remaining studies were prioritised for inclusion based on their relative applicability to the development of this guideline and the study limitations. For example, if a high quality, directly applicable UK analysis was available other less relevant studies may not have been included. Where exclusions occurred on this basis, this is noted in the relevant section.

For more details about the assessment of applicability and methodological quality see the economic evaluation checklist (The Guidelines Manual, Appendix H ¹⁸⁷) and the health economics research protocol in Appendix E.

4.4.1.2. NICE economic evidence profiles

The NICE economic evidence profile has been used to summarise cost and cost-effectiveness estimates. The economic evidence profile shows, for each economic study, an assessment of applicability and methodological quality, with footnotes indicating the reasons for the assessment. These assessments were made by the health economist using the economic evaluation checklist from The Guidelines Manual, Appendix H ¹⁸⁷. It also shows incremental costs, incremental effects (for example, QALYs) and the incremental cost-effectiveness ratio from the primary analysis, as well as information about the assessment of uncertainty in the analysis. See Table 6 for more details.

Table 6

Content of NICE economic profile.

If a non-UK study was included in the profile, the results were converted into pounds sterling using the appropriate purchasing power parity¹⁹⁴.

4.4.2. Undertaking new health economic analysis

As well as reviewing the published economic literature for each review question, as described above, new economic analysis was undertaken by the health economist in selected areas. Priority areas for new health economic analysis were agreed by the GDG after formation of the review questions and consideration of the available health economic evidence.

The GDG identified intensity of rehabilitation as the highest priority area for an original economic model. This issue impacts the largest group of people in the guideline as it relates to the whole population rather than a specific subset. In addition, the GDG considered that the intensity of rehabilitation provided currently varies considerably from service to service in terms of hours per day and duration of therapy, and it is generally lower than that currently recommended in the NICE quality standard for ongoing rehabilitation. Therefore recommendations in this area were considered likely to have the biggest impact on NHS resources and patient outcomes.

The following general principles were adhered to in developing the cost-effectiveness analysis:

Methods were consistent with the NICE reference case¹⁸⁵.
The GDG was consulted during the construction and interpretation of the model.
Model inputs were based on the systematic review of the clinical literature supplemented with other published data sources where possible.
When published data was not available expert opinion was used to populate the model.
Model inputs and assumptions were reported fully and transparently.
The results were subject to sensitivity analysis and limitations were discussed.
The model was peer-reviewed by another health economist at the NCGC.

Full methods for the intensity of rehabilitation cost effectiveness analysis are described in Appendix K.

4.4.3. Cost-effectiveness criteria

NICE’s report ‘Social value judgements: principles for the development of NICE guidance’ sets out the principles that GDGs should consider when judging whether an intervention offers good value for money¹⁸⁶^,¹⁸⁷.

In general, an intervention was considered to be cost effective if either of the following criteria applied (given that the estimate was considered plausible):

The intervention dominated other relevant strategies (that is, it was both less costly in terms of resource use and more clinically effective compared with all the other relevant alternative strategies), or
The intervention cost less than £20,000 per quality-adjusted life-year (QALY) gained compared with the next best strategy.

If the GDG recommended an intervention that was estimated to cost more than £20,000 per QALY gained, or did not recommend one that was estimated to cost less than £20,000 per QALY gained, the reasons for this decision are discussed explicitly in the ‘from evidence to recommendations’ section of the relevant chapter with reference to issues regarding the plausibility of the estimate or to the factors set out in the ‘Social value judgements: principles for the development of NICE guidance’¹⁸⁶.

If a study reported the cost per life year gained but not QALYs, the cost per QALY gained was estimated by multiplying by an appropriate utility estimate to aid interpretation. The estimated cost per QALY gained is reported in the economic evidence profile with a footnote detailing the life-years gained and the utility value used. When QALYs or life years gained are not used in the analysis, results are difficult to interpret unless one strategy dominates the others with respect to every relevant health outcome and cost.

4.5. Post consultation protocol including modified Delphi methodology

During consultation, substantial stakeholder comments were received which highlighted a number of significant issues in relation to the guideline scope and recommendations developed in the guideline. Stakeholders raised concerns that the guideline was incomplete because of the number of areas in the rehabilitation patient care pathway that the guideline had not covered, and this may result in therapies and services for the stroke population being reduced or even withdrawn. The areas identified in the consultation period included:

service delivery, roles and responsibility of the multidisciplinary team/stroke rehabilitation services
holistic assessment, care planning, goal setting, ongoing review and monitoring
transfer of care/discharge planning and interface with social care
long-term health and social support for people after stroke and patient information needs

Stakeholders also considered that some topics included in the scope had not been addressed adequately, including mood disorders (depression and anxiety), physical fitness and exercise, other speech and language therapies and diplopia.

The focus of the outcomes for the interventions included in the guideline has been on function and mobility as these were considered by the Guideline Development Group (GDG) to have the biggest impact on patients’ lives. However many stakeholders considered that the patient experience and holistic approaches to care had been neglected and represented a major gap in the guidance. In light of the comments received from stakeholders, the GDG agreed that additional work should be carried out for some of these areas or reference made to other NICE guidance, in order to produce a more complete piece of guidance that would be useful to health professionals delivering rehabilitation to a stroke population. The current guidance has followed standard NICE methodology and the GDG were in agreement that for those areas where either weak or no evidence was available a robust process needed to be followed.

In consultation with NICE and the GDG the NCGC technical team conducted additional work to address the areas identified by stakeholders and not covered in the original scope. Comprehensive searches of databases with terms designed to identify evidence related to the topics outlined above were undertaken following the NICE process but restricted to retrieve other guidelines and systematic reviews only. In addition a similar scoping search was done for economic evidence relating to the same areas. The search strategy was limited to capture only economic evaluations. A first sift was undertaken to identify potentially relevant economic papers related to the topics listed above.

Reviews of the clinical and economic literature were undertaken following the usual NICE process and presented to the GDG who used this evidence as a basis to make further recommendations.

Where there were recommendations in other NICE guidance relevant to the stroke population and addressed comments highlighted by stakeholders, cross reference to these was made rather than undertaking further original work.

Relevant guidelines identified from the comprehensive search were quality assessed using the AGREE II tool checklist. Those of sufficient quality were reviewed for recommendations relating to the topics identified in the stakeholder consultation.

The full protocol can be found in Appendix B.

Modified Delphi consensus methodology

As the evidence base was weak or absent for many of the areas stakeholders wished the guideline to include a different methodology. This was seen as necessary since it would provide a robust process to enable the GDG to make further recommendations. Where there was a lack of published evidence the NCGC technical team used a modified Delphi method (anonymous, multi-round, consensus-building technique) based on other available guidelines or expert opinion. This type of survey has been used successfully for generating, analysing and synthesising expert view to reach a group consensus position. The technique uses sequential questionnaires to solicit individual responses, with the potential threat of peer pressure removed⁹⁵. This is an important consideration and is a key strength of the technique. Strauss and Ziegler’s²⁴⁹ (1975) seminal work on the technique highlights the features of the technique:

Enables the effective use of a panel of experts
Data is generated through sequential questioning
Highlights consensus and divergent opinion
Anonymity is guaranteed
It handles judgemental data effectively

In NICE processes, little or no evidence for reviews is an exceptional circumstance when formal consensus techniques (such as the Delphi method) can be adopted¹⁸⁷. The methods and process proposed was discussed with methodological advisers within NICE and the protocol was agreed and signed off by them prior to work being carried out.

Delphi statements were distilled from the content of existing national and international stroke rehabilitation guidelines. The identified guidelines were quality assured by two research fellows using the Appraisal of Guidelines for Research and Evaluation (AGREE-II) instrument as described in the Appendix F The relevant sections of the guidelines were summarised (and noted whether the recommendations were based on consensus or evidence) and these summaries were used as the basis for draft statements. Statements were then discussed and revised with two external experts recruited to act as consultants in the development of the survey statements. A table with the relevant guideline sections and first draft statement can be found in Appendix F.

The Delphi panel comprised of stroke rehabilitation clinicians and other professionals with significant experience in stroke rehabilitation (referred to as the Delphi panel) covering a wide range of disciplines involved in stroke care. Members of the panel were identified by means of nomination by the GDG, and these were then collated and reviewed by the chair of the GDG and the RCP Intercollegiate Stroke Working Party and, after removal of duplicates, inspected for representativeness. In the first instance 164 experts were contacted and invited to participate. The professions comprised of :geriatricians, neurologists, nurses, occupational therapists, people from patient representation/organisations, physiotherapists, psychologists, research/policy makers, social workers, speech and language therapists, stroke physicians and other’ health care professionals (for example orthoptists, dieticians, GPs and pharmacists).

A survey, consisting of 68 statements plus 3 demographic questions (profession, setting, and geographic area), was then circulated to the Delphi panel. Free text boxes were available for panel comments, these were then evaluated and used to revise and refine statements if necessary. This process was carried out in conjunction with the consultant experts as well as the Chair of the guideline. The results from each round was summarised and then communicated to participants. Four rounds of the survey were undertaken in total. For the majority of statements (plus demographics), a Likert scale was applied to indicate the level of agreement. Some statements employed multiple choice options. A four option Likert scale was used: strongly disagree, disagree, agree and strongly agree. The purpose of using a four point scale was to be consistent for Delphi panel members who may have been familiar with both the size of scale and terms used to support Delphi processes from previous consensus work in Stroke Care. In published literature about Delphi methodology there has been much debate about what percentage of agreement among Delphi panel members constitutes consensus (see Murphy et al’s 1998 Health Technology Assessment)¹⁸¹ on this subject). While there is no universal agreement or guidelines on the level of consensus, Keeney et al. (2011)¹³⁵ suggested that researchers should decide on the consensus level before commencing the study and consider using a high level of consensus, such as 70%.

In line with Keeney et al (2011)¹³⁵ a level of 70% or higher of participants ‘strongly agreeing’ was set for rounds 1 and 2, with this threshold for consensus being reviewed in rounds 3 and 4. In analysing the data, and in understanding the difficulty of reaching consensus in the latter rounds where iteration had featured, a decision was reached by the technical team to lower the threshold marginally to 67% ‘strongly agree’ as long as the majority of other participant responses were ‘agree’. The analysis of this in every item adopting this approach in the latter rounds was that the combined Delphi panel response was in excess of 90% of participants either responding ‘strongly agree’ (at least 67% of total participant response) or ‘agree’. This was a pragmatic response by the technical team and meets published criteria that consensus is achieved when 66.6% of a Delphi panel agrees. Statements that reached these levels would not feature in the next round. Statements that did not reach this level were reviewed by the technical team with the GDG chair and expert consultants and were amended based on the panel’s comments in the survey. When there were low levels of disagreement, some statements were not edited and re-included in the next round. With already low levels of disagreement it was felt that re-inclusion of these statements would encourage panel members who ‘agreed’ to shift to a ‘strongly agree’ response. This procedure of re-evaluation continued until either the consensus rate was achieved or until the Delphi panel members no longer modified their previous estimates/responses (or comments). In summary, when both the level of agreement and the type of comments no longer changed it was agreed that a further round would not achieve consensus. The comments that illustrated these differences in opinions or comments that showed agreement but no longer changed were then highlighted in the final Delphi report.

There is no complete agreement about when to terminate a Delphi survey, and one researcher has stated ‘if no consensus emerges, at least a crystallizing of the disparate positions usually becomes apparent’ (Gordon, 1971)⁹⁷.

Since there was an over-representation of physiotherapists in the Delphi panel responses were inspected by profession in the analysis. There were no systematic differences in physiotherapists’ responses compared to those of other professions. Hence further details of responses per profession were not included in the report. However, in the GDG meeting in which recommendations were drafted from the Delphi statements GDG members were informed about the Delphi composition and asked to consider this in their discussion of the statements.

The full report was circulated to the GDG. The consensus statements emerging from the iterative modified Delphi technique were presented to the GDG and formed the basis of discussion. The economic search results were rechecked to see if there were any economic analyses relating to areas where new recommendations had been made. Since no economic evaluations was found on the new areas of the guideline, the GDG made a qualitative judgement about the cost effectiveness of the interventions they wanted to recommend based on the Delphi statements. Economic considerations were drafted for all those new recommendations where economic implications were deemed important.

A summary of the areas that are addressed in the post consultation process and the type of evidence identified is provided in Table 7 below.

Table 7

Summary of post-consultation topics and level of evidence identified (consensus refers to those areas that will be covered by the modified Delphi.

The GDG formulated new recommendations based on the consensus statements. The full Delphi report is in Appendix F

4.6. Developing recommendations

Over the course of the guideline development process, the GDG was presented with:

Evidence tables of the clinical and economic evidence reviewed from the literature. All evidence tables are in Appendices H and I.
Summary of clinical and economic evidence and quality (as presented in chapters –7 – 17).
Forest plots (Appendix J).
A description of the methods and results of the cost-effectiveness analysis undertaken for the guideline (Appendix K).

Recommendations were drafted on the basis of the GDG interpretation of the available evidence, taking into account the balance of benefits, harms and costs. When clinical and economic evidence was of poor quality, conflicting or absent, the GDG drafted recommendations based on their expert opinion. The considerations for making informal consensus based recommendations include the balance between potential harms and benefits, economic or implications compared to the benefits, current practices, recommendations made in other relevant guidelines, patient preferences and equality issues. The informal consensus recommendations were done through discussions in the GDG. The GDG may also consider whether the uncertainty is sufficient to justify delaying making a recommendation to await further research, taking into account the potential harm of failing to make a clear recommendation (See Appendix L).

The main considerations specific to each recommendation are outlined in the ‘Recommendations and link to evidence sections within each chapter.

4.6.1. Research recommendations

When areas were identified for which good evidence was lacking, the guideline development group considered making recommendations for future research. Decisions about inclusion were based on factors such as:

the importance to patients or the population
national priorities
potential impact on the NHS and future NICE guidance
ethical and technical feasibility

4.6.2. Validation process

The guidance is subject to an eight week public consultation and feedback as part of the quality assurance and peer review the document. All comments received from registered stakeholders are responded to in turn and posted on the NICE website when the pre-publication check of the full guideline occurs. Based on comments from the stakeholders during this consultation further areas were identified where guidance needed in order to address the patient pathway more comprehensively. For this reason a ‘post consultation’ protocol was drawn up and agreed with NICE (see section 4.5). A second consultation was then held after this extended development period.

4.6.3. Updating the guideline

Following publication, and in accordance with the NICE guidelines manual, NICE will ask a National Collaborating Centre or the National Clinical Guideline Centre to advise NICE’s Guidance executive on whether the evidence base has progressed significantly to alter the guideline recommendations and warrant an update.

4.6.4. Disclaimer

Health care providers need to use clinical judgement, knowledge and expertise when deciding whether it is appropriate to apply guidelines. The recommendations cited here are a guide and may not be appropriate for use in all situations. The decision to adopt any of the recommendations cited here must be made by the practitioners in light of individual patient circumstances, the wishes of the patient, clinical expertise and resources.

The National Clinical Guideline Centre disclaims any responsibility for damages arising out of the use or non-use of these guidelines and the literature used in support of these guidelines.

4.6.5. Funding

The National Clinical Guideline Centre was commissioned by the National Institute for Health and Clinical Excellence to undertake the work on this guideline.

Bookshelf ID: NBK327924

Contents