NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.
Cook JA, Julious SA, Sones W, et al. Practical help for specifying the target difference in sample size calculations for RCTs: the DELTA2 five-stage study, including a workshop. Southampton (UK): NIHR Journals Library; 2019 Oct. (Health Technology Assessment, No. 23.60.)
Practical help for specifying the target difference in sample size calculations for RCTs: the DELTA2 five-stage study, including a workshop.
Show detailsTarget differences and the analysis of interest
Randomised controlled trial design begins with clarifying the research question and then developing the required design to address it. Commonly the population, intervention, control, outcome and time frame (PICOT) framework has been used for this purpose.21 All of the relevant aspects of trial design (PICOT) should reflect the research questions of interest. The process of determining the design needs to be informed by the perspective(s) of relevant stakeholders, which are discussed in the following section (see Perspectives on the target difference of interest). A key step in the process is the selection of the primary outcome, which is considered in The primary outcome of a randomised controlled trial, given its key role in trial design and its relationship with the target difference. This section focuses on the need for clarity about how the design and intended analysis address the trial objectives.
The need for greater clarity in trial objectives with respect to the design and analysis of a RCT has been noted.22 This reflects greater recognition of the existence of multiple intervention (or treatment) effects of potential interest, even for the same outcome. For example, we may be interested in the typical benefit a patient received if they are given a treatment, but also the benefit a patient receives if they comply fully with the treatment (e.g. they take their medication as prescribed for the full treatment period, with use of additional treatments). Treatment effects can differ subtly in the population of interest, the role for additional treatment or ‘rescue’ medication, and how the effect is expressed. The concept of estimands has been proposed as a way to bring such distinctions to the fore. An estimand is a more specific formulation of the comparison of interest being addressed. This thinking is reflected in a recent addendum to international regulatory guidelines for clinical trials of pharmaceuticals. Five main strategies are proposed.23 Of particular note is the treatment policy strategy, which is consistent with what has often been described as an intention-to-treat (ITT)-based analysis.22–25 That is, the ITT analysis addresses the difference between a policy of offering treatment with a given therapy and the policy of offering treatment with a different therapy, regardless of which treatments are received. Different stakeholders can have somewhat differing perspectives on the comparison of interest and therefore the estimand of primary interest.22 Corresponding methods of analyses to address estimands that deviate from traditional conventional analyses are an active area of interest26 (see Appendix 3, Other topics of interest for a brief consideration of causal inference methods for dealing with non-compliance).
The target difference used in the sample size calculation should be one that at least addresses the trial’s primary objective and, therefore, the intended estimand of primary interest (with the corresponding implications for the handling of the receipt of treatment and population of interest). In some cases, it may be appropriate to ensure that the sample size is sufficient for more than one estimand, which might imply multiple target differences to address all key objectives. Different estimands may focus on different populations or subpopulations. Estimands will differ in their implications for the magnitude of missing data anticipated (see Appendix 3, Dealing with missing data for binary and continuous outcomes for how missing data can be taken into account in the sample size calculation in simple scenarios). Whatever the estimand of interest, the target difference is a key input into the sample size calculation.
Perspectives on the target difference of interest
Governmental/charity funder
Funders vary in the degree to which they will specify the research question. The primary concern is that the study provides value for money, by addressing a key research question in a robust manner and at reasonable cost to the funder’s stakeholders. This is typically an implicit consideration when the sample size and the target difference are determined. However, a very different approach, value of information (see Appendix 4), allows such wider considerations to be formally incorporated. The sample size calculation and the target difference, if well specified, provide reassurance that the trial will provide an answer to the primary research question, at least in terms of comparing the primary outcome between interventions. The specific criteria that proposals are invited to address, and are assessed against, vary among funders and individual schemes within a funder, as does the degree to which the research question may be a priori specified by the funder.
One particular aspect that varies substantially among funding schemes and funders is the extent to which they take into account the cost and cost-effectiveness of the interventions under consideration. Some funding schemes require the consideration of costs to come from a particular perspective; this might be the society as a whole or the health system alone. Alternatively, other schemes focus solely on clinical and patient perspectives, to greater or lesser extents.
All funders expect a RCT to have a sample size justification.27 Typically, although not necessarily, this would be via a sample size calculation, most commonly based on the specification of a target difference. The specified target difference would be expected to be one that is of interest to their stakeholders; this is typically patients and health professionals, and sometimes the likely funder of the health care (e.g. the NHS in the UK). For industry-funded trials, the considerations are different and these are outlined in the next section (see Industry, payers and regulator).
The practical implications of an overly large trial are perhaps mostly financial (the funder has paid more than necessary to get an answer to the research question and thus there is less available for other trials). However, it is also ethically important to avoid more patients than necessary possibly receiving a suboptimal treatment, or simply to avoid unnecessary burden on further individuals and to avoid losing the opportunity to devote scarce resource funds to other desirable research. What is and is not sufficient in statistical and more general terms is often very difficult to differentiate, except in extreme scenarios. A trial that is too small is at risk of missing an effect. The funder could also later use the target difference in the context of evaluating (formally or informally) whether or not to close a study due to the probability (or lack thereof) of providing a useful answer in the face of substantially slower progression partway through a trial’s recruitment period.
Industry, payers and regulator
Industry-funded trials are typically (but not always) conducted as part of a regulatory submission for a new drug or medical device, or to widen the indications of an existing drug or device. Generally, an active intervention is compared with a placebo control, as this addresses the regulatory question of whether or not the intervention ‘works’. The main exception would be situations in which a new drug is intended to replace an established effective drug, in which case the established drug would be the control. An example is the evaluation of the newer oral anticoagulants, which have been compared with active comparators, such as warfarin or low-molecular-weight heparin, in the submissions for approval.
From an industry perspective, the target difference is often one chosen so that it is important to regulators and health-care commissioners. The key aspects of interest tend to be safety, including tolerability of treatment and consideration of side effects, whether or not the treatment is stopped due to a lack of effect and the effect within those who complete treatment. This has corresponding implications for the estimand(s) of interest.22,23 Increasingly, payers (health insurance companies and governmental reimbursement agencies) are interested in comparisons with other active therapies, reflecting the need to inform treatment choices in actual clinical practice and considerations of affordability and cost-effectiveness. A new product will be more likely to be reimbursed if there are clinical advantages over existing therapies, in terms of either efficacy or adverse effect profiles, which are provided at an ‘acceptable’ cost. When an intervention is compared with an active control, the treatment effect between them will almost certainly be smaller and the sample size larger than for a placebo-controlled trial, all other things being equal. One common distinguishing feature between a definitive trial (e.g. Phase III) conducted in an industry setting, compared with an academic one, is that all of the evidence pertinent to planning such a trial of a new drug agent will often be readily available within the same company. It is also likely that at least some of the individuals involved will have been involved in a related earlier phase trial of the same drug.
Patient, service users, carers and the public
From the perspective of patients, service users, carers and the public,28 when a formal sample size calculation is performed, the target difference should be one that would be viewed as important by a key stakeholder group (such as health professionals, regulators, health-care funders and preferably patients). A specific point of interest, for those who serve as PPI contributors on research boards, who make funding recommendations and/or assess trial proposals, is likely to be ensuring that the study has considered the most patient-relevant outcome (e.g. a patient-reported outcome), even if it is not the primary outcome. In some situations, the most appropriate primary outcome may be a patient-reported outcome (e.g. comparing treatments for osteoarthritis, in which pain and function are the key measures of treatment benefit). It is highly desirable that a patient, service user and carer perspective feeds into the process for choosing the primary outcome in some way and, when possible, the chosen target difference reflects one that would have a meaningful impact on patient health, according to the research question. Some funders now require at least some PPI in the development of trial proposals and this perspective forms part of the assessment process.28 It is also increasingly part of the assessment process for assessing existing evidence.29
Research ethics
Fundamental to the standard ethical justification for the conduct of a RCT, which is a scientific experiment on humans, is (1) that it will contribute to scientific understanding and (2) that the participant is aware of what the study entails and, whenever possible, provides consent to participate.30,31 Commonly, a third condition, that the participant has the potential to benefit, is also appropriate; this is particularly the case when there may be some risk to the participant. Whatever the specifics of the trial in terms of population, setting, interventions and assessments, it is important that the sample size for a study is appropriate to achieve its aim. There is a need for justification of some form for the number of participants required. As noted earlier, no more participants than ‘necessary’ should be recruited, to avoid unnecessary exposure to a suboptimal treatment and/or the practical burden of participation in a research study. Such a sample size justification may take the form of informal heuristics or, more commonly, a formal sample size calculation.
Clarifying what the study is aiming to achieve and determining an appropriate target difference and sample size is very important, as the research can have a big impact not only on those directly involved as participants, but also on future patients. As far as possible, it is also relevant to consider key patient subgroups or subpopulations of individuals in terms of relevance of findings to them. This could be taken into account when undertaking the sample size calculation (see Appendix 3).
The primary outcome of a randomised controlled trial
The role of the primary outcome
The standard approach to a RCT is for one outcome to be assigned as the primary outcome.10 This is done by considering the outcomes that should be measured in the study.32 The outcome is ‘primary’ in the sense of it being more important than the others, at least in terms of the design of the trial, although preferably it is also the most important outcome to the stakeholders with respect to the research question being posed. The study sample size is then determined for the primary outcome. As noted earlier, it is important to consider how the primary outcome relates to the population of interest and intervention effects to be estimated (the estimand of interest). Choosing a primary outcome (and giving it prominence in the statistical analysis of the estimands of interest) performs a number of functions in terms of trial design, but it is clearly a pragmatic simplification to aid the interpretation and use of RCT findings. It provides clarification of what the study primarily aims to use to identify the intervention effects. The statistical precision with which this can be achieved is then calculated according to the analysis of interest. Additionally, it clarifies the initial basis on which to judge the study findings. Specification of the primary outcome in the study protocol (and similarly reporting it on a trial registry) helps reduce overinterpretation of findings. This arises from testing multiple outcomes and selectively reporting those that are statistically significant (irrespective of their clinical relevance). This multiple testing, or multiplicity,33,34 is particularly important, given the high likelihood of chance leading to spurious statistically significant findings when a large number of outcomes are analysed. Pre-specification of a primary outcome, along with the use of a statistical analysis plan and transparent reporting (e.g. making the trial protocol available), limits the scope for manipulating (intentionally or not) the findings of the study. This prevents post hoc shifting of the focus (e.g. in study reports) to maximise statistical significance.
Choosing the primary outcome
A variety of factors need to be considered when choosing a primary outcome. First, in principle, the primary outcome should, as noted above, be a ‘key’ outcome, such that knowledge of its result would help answer the research question. For example, in a RCT comparing treatment with eye drops to lower ocular pressure with a placebo for patients with high eye pressure (the key treatable risk factor for glaucoma, a progressive eye disease that can lead to blindness), loss of vision is a natural choice for the primary outcome.35 However, it would clearly be important to consider other outcomes (e.g. side effects of the eye drop drug). Nevertheless, knowing that the eye drops reduced the loss of vision due to glaucoma would be a key piece of knowledge. In some circumstances, the preferable outcome will not be used because of other considerations. In this glaucoma example, a surrogate might be used (intraocular pressure, i.e. pressure in the eye) because of the time it takes to measure any change in vision noticeable to a patient and also because this may enable prevention or at least a reduction in the degree of vision loss. Indeed, intraocular pressure is sometimes the primary outcome of RCTs in this area instead of vision or the visual quality of life.
Consideration is also needed of the ability to measure the chosen primary outcome reliably and routinely within the context of the study. Missing data are a threat to the usefulness of an analysis of any study, and RCTs are no different. The optimal mode of measurement may be impractical or even unethical. The most reliable way to measure intraocular pressure is through manometry;36 however, this requires invasive eye surgery. Subjecting participants to clinically unnecessary surgery for the purpose of a RCT is ethical only with very strong mitigating circumstances, particularly as an alternative, even if less accurate, way of measuring intraocular pressure exists. Furthermore, invasive measurements may dissuade participants from consenting to take part in the RCT.
Calculating the sample size varies depending on the outcome and the intended analysis. In some situations, ensuring that the sample size is sufficient for multiple outcomes is appropriate.37 The three most common outcome types are binary, continuous and survival (time-to-event) outcomes; they are briefly considered in Box 2 and in greater depth in Appendix 3. Other outcome types are not considered here, although it should be noted that ordinal, categorical and count outcomes can be used, although a more complex analysis and corresponding sample size calculation approach is likely to be needed. Continuous outcomes (or a transformed version of them) are typically assumed to be normally distributed, or at least ‘approximately’ so, for ease and interpretability of analysis and for the sample size calculation. This assumption may be inappropriate for some outcomes, such as operation time, hospital stay and costs, which often have very skewed distributions. From a purely statistical perspective, a continuous outcome should not be converted to a binary outcome (e.g. converting a quality-of-life score to high/low quality of life). Such a dichotomisation would result in less statistical precision and lead to a larger sample size being required.40 If it is viewed as necessary to aid interpretability, the target difference (and corresponding analysis) used in the continuous measure can also be represented as a dichotomy, in addition to being expressed on its continuous scale. Some authors, although acknowledging that this should not be routine, would make an exception in some circumstances when a dichotomy is seen as providing a substantive gain in interpretability, even if it is at a loss of statistical precision.41 For example, the severity of depression may be measured and analysed on a latent scale, but the proportion of individuals meeting a prespecified threshold for depression or improvement might also be reported and potentially analysed.42
- General considerations for specifying the target difference - Practical help for...General considerations for specifying the target difference - Practical help for specifying the target difference in sample size calculations for RCTs: the DELTA2 five-stage study, including a workshop
- Conclusion and recommendations for further research - Initiating change locally ...Conclusion and recommendations for further research - Initiating change locally in bullying and aggression through the school environment (INCLUSIVE): a pilot randomised controlled trial
- Methods - Managing Faecal INcontinence in people with advanced dementia resident...Methods - Managing Faecal INcontinence in people with advanced dementia resident in Care Homes (FINCH) study: a realist synthesis of the evidence
- Discussion - Exercise for depression in care home residents: a randomised contro...Discussion - Exercise for depression in care home residents: a randomised controlled trial with cost-effectiveness analysis (OPERA)
- Case studies of sample size calculations - Practical help for specifying the tar...Case studies of sample size calculations - Practical help for specifying the target difference in sample size calculations for RCTs: the DELTA2 five-stage study, including a workshop
Your browsing activity is empty.
Activity recording is turned off.
See more...