Methods

Fay Crawford; Genevieve Cezard; Francesca M Chappell; Gordon D Murray; Jacqueline F Price; Aziz Sheikh; Colin R Simpson; Gerard P Stansby; Matthew J Young

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Crawford F, Cezard G, Chappell FM, et al. A systematic review and individual patient data meta-analysis of prognostic factors for foot ulceration in people with diabetes: the international research collaboration for the prediction of diabetic foot ulcerations (PODUS). Southampton (UK): NIHR Journals Library; 2015 Jul. (Health Technology Assessment, No. 19.57.)

Cover of A systematic review and individual patient data meta-analysis of prognostic factors for foot ulceration in people with diabetes: the international research collaboration for the prediction of diabetic foot ulcerations (PODUS)

A systematic review and individual patient data meta-analysis of prognostic factors for foot ulceration in people with diabetes: the international research collaboration for the prediction of diabetic foot ulcerations (PODUS).

Show details

Contents

< Prev Next >

Chapter 3Methods

Systematic reviews and meta-analyses of IPD use ‘raw’ data obtained from the authors of individual studies instead of mean or aggregate data extracted from published reports. These complex reviews are more time-consuming and expensive than aggregate systematic reviews because obtaining study data and data dictionaries and undertaking data checking and cleaning takes more time than the extraction of data from a published report (Figure 1).²⁵

FIGURE 1

Flow diagram: stages of an individual patient-based meta-analysis. Reproduced from the original with kind permission from John Wiley & Sons.

Individual patient data systematic reviews are useful for both randomised controlled trial (RCT) and observational data and enhance the main purpose of meta-analysis – the augmentation of statistical power – by permitting the conduct of complex statistical techniques, including multivariable analyses in which interactions between interventions and patient-level characteristics can be explored.²⁶ In the case of observational study designs, IPD is the best way to pool observational study data to allow adjustments and a standard statistical approach to be conducted.

This review method also confers an advantage on the process of quality assessment because the necessary communication between the review team and those contributing the data means that potential biases arising from the conduct, rather than the report, of the study can be investigated. However, although the opportunity to discuss the manner in which the study was conducted with the author means the reviewer is not required to interpret possible biases, IPD reviews do not avoid flaws in the original studies arising from conduct or design.²⁷

Ethics and governance

The ethics of obtaining data collected from a number of sources that cross international boundaries and different legal systems was carefully considered and informed by ethics advice issued by the Medical Research Council (UK).²⁸ This study did not require separate ethics committee approval for the following reasons:

Investigators of each of the original studies obtained local ethics committee approval and written, informed patient consent prior to each of the cohorts included in the review.
The data from each of the studies were already in the public domain.
The project uses anonymised data from individuals recruited to the original studies who cannot be identified.

Obtaining data

The aggregate systematic review of predictive factors for foot ulceration in diabetes led by the chief investigator (FC) identified 11 cohort studies that met the eligibility criteria.⁴ During the review process requests were made to the corresponding author of each primary study for points of clarity, as per conventional systematic review methods. All those contacted provided additional information about their study, and there was strong encouragement for the aggregate review and enthusiasm for an IPD review to create a statistical model exploring the independent contribution of predicative factors for use in foot risk assessment procedures. A key factor in deciding to undertake the IPD meta-analysis was the total absence of industry sponsorship and the ownership of original study data by the corresponding authors who were prepared to contribute them if funding from a suitable source could be found to support the research.

The value of the IPD analysis lies in the production of a global data set. Anonymised data from each of the collaborators of the primary cohort studies were accepted in the way deemed most convenient to original study investigators.

Data were stored in password-protected files on a secure University of Edinburgh computer (University of Edinburgh data protection registration number Z6426984) during the conduct of the review and were only accessible to members of the Data Management Committee, membership of which can be found in the appendices (see Appendix 1).

Our published protocol²⁹ incorporated a data confidentiality agreement making clear the need for the data provided to de-identify individual patients. It also included an assurance that the original investigators were in possession of local ethical approval for their study. A copy of this agreement can be found in the appendices (see Appendix 2).

Review Committee structure

A three-committee structure was created to manage the review:

The Data Management Committee developed the methods for the review and ensured the attainment of project milestones. They also took responsibility for reporting the progress to the National Institute for Health Research (NIHR) Health Technology Assessment (HTA) programme within the standard reporting mechanisms required by the Clinical Evaluation and Trials Board. Only these individuals had access to the data from individual cohort studies.
The research committee included a group of epidemiologists, health services researchers, clinicians and statisticians who advised the Data Management Committee about methodological and clinically relevant aspects.
An international steering committee comprising all principal investigators/corresponding authors of the included studies was strengthened with methodological input from five additional members with expertise in diabetic medicine, foot care provision in primary and community settings, methodological expertise in CPRs and IPD meta-analysis.

A list of members of each of these committees can be found in Appendix 1.

Identifying studies

Electronic search strategy

We searched for relevant studies using the highest methodological standards.³⁰ The electronic search strategies created during the aggregate systematic review of predictive factors for foot ulceration in diabetes were updated and rerun to January 2013.⁴ Copies of the EMBASE and MEDLINE search strategies can be found in Appendix 3.

Selection criteria

One reviewer applied the IPD review eligibility criteria to the full-text articles of the studies identified in our literature search and also all studies excluded from our aggregate systematic review to ensure that we did not miss eligible IPD. A second reviewer applied the eligibility criteria to a 10% random sample of the search yield to ensure that no relevant material was missed.

Eligibility criteria

Types of participants

The review includes only data from individuals who were free of foot ulceration at the time of study entry and who had a diagnosis of diabetes mellitus (either type 1 or type 2). When we identified studies with patients who had prevalent foot ulcers at the time of recruitment, we ascertained whether or not IPD were available for patients who were free of ulcers at time of entry. The corresponding authors of all identified cohort studies were contacted and invited to share their data.

Types of exposure variables

All elements from the patient history, symptoms, signs and diagnostic test results were considered for inclusion in the prognostic model. These were collected variously as continuous, binary and multicategorical data.

Type of outcome variable

The outcome variables were incident foot ulceration (present/absent) and time to ulceration from initial diagnosis of diabetes as well as from the time of screening.

Types of studies

We included studies that used a cohort design and did not distinguish between those that planned the analysis before or after data collections. We excluded studies using all other study designs, including case–control designs. Our previous research indicated that data collected in older studies could be difficult to obtain and that some investigators were no longer in possession of their study data (David Armstrong, Southern Arizona Limb Salvage Alliance, University of Arizona, 2012, and Lawrence Lavery, UT Southwestern Medical Center, Texas, 2012, personal communication).

Risk of bias

The assessment of methodological quality is an important component of an IPD systematic review, but there is complexity in assessing potential threats to the validity of primary studies for this research genre. No widely agreed criteria exist for assessing the risk of bias in aggregate systematic reviews of prognostic studies,³¹ and, currently, there is a complete absence of established guidelines for prognostic IPD reviews (Douglas Altman, University of Oxford; Richard Riley, Research Institute of Primary Care and Health, Keele University, 2012, personal communication). Although flaws in the recruitment of patients or the manner of data collection can influence systematic review findings, some quality domains usually assessed by systematic reviewers of published reports are irrelevant in IPD reviews (e.g. those pertinent to the analysis performed by the primary authors). We compiled a list of items relevant to our IPD review question which were judged likely to identify studies with data compromised by threats of validity. This checklist of items can be found in Appendix 4;²²^,³²^–⁴² this has been refined during a pilot phase by two researchers working independently.

Data extraction was undertaken by two reviewers working independently, and disagreements were resolved by discussion. For quality assessment, a two-stage process was used; two reviewers worked independently using items available from the published report first of all, then supplementing this with additional details obtained from authors of the primary studies.

Plan for analysis and handling missing data

The methodology of IPD meta-analyses of observational studies is relatively undeveloped compared with that for RCTs and reviewers undertaking IPD meta-analyses of observational studies need to proceed with caution as guidance is not always available and the methodology is untested.⁴³

There were, therefore, difficult methodological issues regarding the analysis for this review, some of which were particular to IPD meta-analysis methodology, and others which were more general:

method of meta-analysis (one step vs. two step)
method of meta-analysis (random vs. fixed effects)
assessment and handling of heterogeneity
handling of missing data, where data are missing for some but not all patients in a given data set (ordinarily missing data)
handling of missing data, where data are missing for a given variable for all patients in a given data set (systematically missing data)
choice of predictors
choice of effect size
validating the model.

Method of meta-analysis: one-step versus two-step methods

The two main methods of meta-analysis are commonly known as one-step and two-step methods.⁴⁴ Both these methods have pros and cons.

The one-step method uses just one model fitted to all the studies, with a term to indicate which patient belongs to which study. The model can be sophisticated and used to explore common structures in the data sets that would otherwise be undetectable. For this reason, it is the preferred method of meta-analysis for some statisticians.⁴³ However, it does require that all the data sets be available at the same time to the meta-analysts in order to fit one model to all the data sets. This was not the case for this project. It is also a relatively new development of meta-analysis methods; although IPD meta-analyses have been used for some time, they have most often been used for RCT data, where the recommendation is to use a two-step method to avoid comparison of patient groups that were not randomised together.⁴⁵

Two large data sets were contributed to this project but access to one was constrained,⁴⁶^,⁴⁷ with around 3412 patients’ data only available to the authors via a safe haven facility. The safe haven facility allowed the analyses of data to obtain an estimate of effect but not to remove or copy the data. Another data set⁴⁸^,⁴⁹ with 1489 patients was not permitted to be shared by the US Institutional Review Board governing its use. However, specific analyses could be requested and estimates of effect obtained from the original study authors.

Use of the one-step method of meta-analysis would mean that neither of these large data sets could be used, although it is straightforward to include them in a two-step meta-analysis. The two-step method is also simpler and more transparent as it uses methods that have been much used and are well understood by systematic reviewers.

For the two-step method, each data set is analysed in turn by the meta-analysts, using ordinary methods of analysis such as logistic regression, and then the estimates from each analyses are combined using established meta-analysis methods. The advantage of the two-step method over a meta-analysis of published studies is that the statistician has some flexibility in the estimates they can obtain from each study. If, for example, they require all estimates to be adjusted for age, and all the data sets have the patients’ ages, it is simple to get age-adjusted estimates.

We did consider a refinement to the one-step method that, in theory, would have enabled us to perform a one-step meta-analysis and incorporate the aggregate results from the two data sets not directly available to us.⁵⁰ However, like much of the methodology of IPD meta-analysis of observational studies, it is a new and therefore relatively untested development, and we did not consider it for this project.

Method of meta-analysis: random versus fixed-effects meta-analysis

The data sets contributing IPD covered a range of temporal, geographical and clinical settings. It is therefore reasonable to expect some degree of heterogeneity between the studies. The data sets also varied in size from a few hundred to a few thousand patients. There has been much discussion among experts in the field about standard meta-analytic methods for examining the difference between random- and fixed-effects meta-analyses.⁵¹ We have chosen to use random-effects meta-analysis, which does not assume that all the estimates from each study are estimates of the same underlying true value, but rather that the estimates belong to the same distribution. It has been argued that random-effects methods more appropriately weight the contribution of smaller versus larger studies.⁵² Moreover, as the estimates will be adjusted odds ratios (ORs) (note that the same is true for hazard ratios), the appropriate method of meta-analysis is the generic inverse method.⁵²

Assessment and handling of heterogeneity

Before undertaking any meta-analysis we assessed the extent of heterogeneity. We employed the standard methods of assessing heterogeneity, by examining forest plots of estimates and calculating I²- and τ-statistics. However, we also conducted a thorough examination of heterogeneity, by visual comparison of histograms of continuous variables and bar charts of categorical variables. We also produced summary statistics for each continuous variable (mean, standard deviation, median, 25th and 75th percentile, minimum and maximum) and proportions with confidence intervals (CIs) for each categorical variable.

The assessment of heterogeneity for any meta-analysis was a matter of judgement, covering both statistical and clinical aspects. Therefore, the decision on whether or not a particular variable and/or study should be included in the meta-analyses was made in discussion between methodological and clinical authors, with due consideration of any possible bias or loss of precision in the estimate as a result of inclusion or exclusion. Specifically, we did not define any particular I² percentage as representing an acceptable level of heterogeneity.

Handling of ordinarily missing data

Ordinarily missing data in epidemiological cohort studies occur when a variable is not recorded, completed or collected for one patient. For example, one patient may not want to provide personal information or test results may not be performed, available or readable. Handling missing data by analysing complete cases leads only to loss of information (exclusion of a portion of the original data) and bias. Methods to address missing data assume specific patterns of missingness and allow patients with incomplete data to be included in the analysis.

Our method of handling missing data depends on the extent of the missingness and if the mechanism causing the missingness is known, specifically if they are missing at random (MAR) or missing not at random. Under the MAR assumption, we planned to use the multiple imputation using chained equations (MICE) developed in R [R 2.13.1, Murray Hill, NJ, USA; see (http://cran.r-project.org/)],⁵³^,⁵⁴ which is a flexible and practical approach to handling missing data. To account for all patients’ data available and to help predict missing data for the risk factors of interest, we applied multiple imputations on the set of variables selected in our final model of predictors where the percentage of missing value did not exceed 15% and included the outcome variable.⁵⁵ We created m = 20 imputed data sets, where missing values were replaced by imputed values using imputation techniques specific to each type of variable (logistic regression for binary variables and Bayesian linear regression for continuous variables). The final model estimators were calculated for each imputed data set and differed owing to the variation introduced by the imputed set of missing values. Estimators were averaged and standard errors calculated using Rubin’s rules, which take into account the variability between imputed sets. To discuss the potential bias attributable to missing data, the results of the final model after imputation procedure were interpreted and compared with the complete case analysis.

Handling of systematically missing data

A systematically missing variable is a variable that has not been collected at all in a given data set. For example, not all the studies contributing IPD collected HbA_1c, as it has not always been part of routine care. Therefore, if we wanted to adjust ORs of ulceration in patients with and without positive monofilament tests for HbA_1c, then our analysis choices are:

to use only ORs from studies that collected HbA_1c data, with resulting loss of data from not using all the studies (i.e. complete case analysis)
to use all studies by treating all ORs as if they have been adjusted for HbA_1c, with resulting possible bias in the summary estimate
to use multiple imputation for the systematically missing data.

Given that all of the studies have not collected at least one of the variables of interest, we had systematically missing data. The methodology of handling systematically missing data in IPD meta-analysis is still very much in development and key papers were published after the start of this project.⁵⁶ We therefore felt that it would be useful to present the results of a complete case, as complete case analyses are known not to be biased, providing the missing data are MAR.⁵⁷ However, the loss of power by not using all the data results in wide CIs and large p-values. To overcome the loss of power, we could have used either the second or third method listed above. However, the second method was not chosen because it produces possibly biased estimates. The third method was another relatively new and untested method, and statistical methodological contributions also fell outside the scope of this project.

Choice of predictors

The studies contributing data to this IPD analysis collected data on hundreds of variables. It would not have been statistically rigorous or clinically relevant to meta-analyse all these variables. We therefore needed a method to select candidate variables for meta-analysis. We used the following criteria:

Variables had to have been collected in at least three studies, with < 60% missing.
Variables needed to have been coded in such a way to allow standardisation across data sets. For example, we were unable to use eye data, because in some data sets this had been defined as retinopathy and in others as requiring glasses.

We did not use a common method of variable selection, namely choosing variables for a multivariable model on the basis of univariate results, as we believe this to be a flawed method.⁵⁸^,⁵⁹

We also had the aim of producing a model with easily collectable or readily available data, and therefore had a preference for such variables.

Choice of effect size

Initially, we had hoped to use time-to-ulceration data to perform survival analyses and so obtain hazard ratios for a meta-analysis. Unfortunately, not all the data sets had time-to-event data and we therefore decided to use a binary outcome (ulcer vs. no ulcer) and use logistic regression to obtain ORs. Neither of the two largest data sets, with a combined total of over 9000 patients, had time-to-event data. Logistic regression is considered a less statistically powerful method than survival analysis, but we thought the loss of more than half of the data that would occur with a survival analysis would not compensate for the method’s increased power.

Copyright © Queen’s Printer and Controller of HMSO 2015. This work was produced by Crawford et al. under the terms of a commissioning contract issued by the Secretary of State for Health. This issue may be freely reproduced for the purposes of private research and study and extracts (or indeed, the full report) may be included in professional journals provided that suitable acknowledgement is made and the reproduction is not associated with any form of advertising. Applications for commercial reproduction should be addressed to: NIHR Journals Library, National Institute for Health Research, Evaluation, Trials and Studies Coordinating Centre, Alpha House, University of Southampton Science Park, Southampton SO16 7NS, UK.

Included under terms of UK Non-commercial Government License.

Bookshelf ID: NBK305627

Contents