Internal Validity
All trials were randomized studies that appear to have used acceptable methods (IVRS/IWRS, computer-generated allocation schedule) to randomize patients to treatment groups. The two DB trials (DRIVE-FORWARD and DRIVE-AHEAD) performed necessary measures to maintain blinding and conceal treatment allocation; all study medications including respective placebos were packaged and supplied in identical containers/bottles. The clinical expert consulted for this review indicated that DRV and EFV are associated with an increased incidence of gastrointestinal and neuropsychiatric AEs, respectively. This is consistent with the relatively high frequency of diarrhea reported among patients receiving DRV/r in DRIVE-FORWARD and dizziness and sleep disorders and disturbances reported among patients receiving EFV/FTC/TDF in DRIVE-AHEAD. It was possible for patients to surmise the greater potential for gastrointestinal and neuropsychiatric side effects with DRV and EFV/FTC/TDF administration, respectively, which might have compromised treatment blinding. Many efficacy and safety outcomes were measured in blood/plasma samples in an objective manner, therefore, reporting bias, if any unblinding occurred, was less likely. However, the possibility remains that ascertainment of treatment allocation influenced patient reporting of subjective outcomes (neuropsychiatric AEs and HRQoL) as well as patients’ decisions on whether to remain in the trial, potentially biasing the primary efficacy outcome (given that patients who discontinued the study were considered to have failed to achieve the primary outcome).
In all three studies, the primary efficacy end point was the proportional differences in HIV-1 RNA < 50 copies/mL between the treatment arms. While this is the FDA-recommended efficacy outcome for treatment-naive patients, the end point of interest in switch trials is the proportional difference in HIV-1 RNA ≥ 50 copies/mL (not success of < 50 copies/mL as per the manufacturer’s analysis).34 This is because switch trials include patients who are already virologically suppressed. The end point should therefore be focused on patients who lose virologic control post-switching. Even though the proportional difference in HIV-1 RNA ≥ 50 copies/mL was measured, this was not part of the statistical testing hierarchy and it was not compared against a pre-specified NIM. The FDA-recommended NIM is four percentage points for HIV-1 RNA ≥ 50 copies/mL in switch trials.34 Therefore, the primary efficacy outcome in DRIVE-SHIFT is inconsistent with FDA recommendations for switch trials. Notably, the manufacturer of DRIVE-SHIFT indicated that the latest issue of FDA guidance for industries34 with these updated recommendations was published after the trial began.
For all three trials, it is unclear if all of the patients were classified appropriately according to the FDA snapshot algorithm for the outcome of HIV-1 RNA ≥ 50 copies/mL, as patients lacking virologic data were not included as failures (assumption of HIV-1 RNA ≥ 50 copies/mL). The impact this would have had on the results is uncertain. Other secondary efficacy outcomes as well as safety end points were consistent with FDA guidance and commonly measured in HIV trials. One trial (DRIVE-SHIFT) assessed an HRQoL outcome relevant for this review, but the assessment of the EQ-5D-5L VAS was done without generating an index score, and it provided no supporting evidence for the validity and MCID among HIV patients from the literature.
The statistical analyses plan, including missing data handling (i.e., missing data = failure and missing data = excluded), deriving sample size/power, and adjusting for multiple comparisons was carried out appropriately and generally followed FDA guidance for HIV trials. One notable exception was the handling of missing data in DRIVE-SHIFT. After the initial database lock (dated March 27, 2018) the manufacturer identified a number of patients in the ISG arm with missing HIV-1 RNA data at key efficacy time points. According to the FDA snapshot approach these patients would be counted as treatment failures. The manufacturer discovered additional blood samples were available from the pharmacokinetic and viral resistance samples that could be used to test for HIV-1 RNA (week 24, n = 3; week 48, n = 2). With the addition of sample data for these five patients, the NIM was met for the primary outcome. However, noninferiority was not met based on the data from the initial database lock.
Although subgroup analyses for the DB trials were pre-planned and stratified at randomization, no testing of interaction between subgroups with respect to treatment effect was reported. Additionally, it is unclear if the margin for the overall trial should be used in the evaluation of the subgroups or if subgroup specific margins should have been employed. Indeed, several of the subgroups exceeded the margins, which may be expected given the lack of power within the subgroup analyses. Moreover, multiplicity of testing is still a concern within the subgroups. As a result, over-interpretation of subgroup data should be avoided.
The studies did not use a true intention-to-treat population as several patients were excluded after randomization. However, the numbers are low and are unlikely to affect the study results. Moreover, the DB trials, but not the switch trial, appropriately performed the primary efficacy analysis in a PP population with findings supportive of analysis using the FAS population.
The treatment groups appeared to be generally balanced with respect to baseline characteristics within studies. An exception to this is a lower proportion of patients in the DOR arm with gastrointestinal disorders in DRIVE-FORWARD, and a higher proportion of patients in the ISG arm with immune system disorders, drug hypersensitivity, neoplasms, and psychiatric disorders in DRIVE-SHIFT. Although these differences may have arisen from chance, it is possible that randomization may also have failed. The frequency of dropouts among treatment-naive patients ranged from 13% to 19% across trials by week 48 and between 18% and 29% by week 96. Patients receiving DOR in both trials had fewer dropouts, in part due to fewer AEs. The higher incidences of dropouts in the comparator arms may bias the results in favour of DOR as dropouts were treated as treatment failures.
In the switch study, the primary efficacy analyses, as well as a number of secondary efficacy and safety analyses, involved comparing the ISG arm at week 48 and the baseline regimen of the DSG arm at week 24. This form of differential follow-up between groups is unusual and the CDR team is uncertain of the impact this has on the results; between-treatment comparisons based on the same duration of follow-up would have more internal validity. While comparisons for efficacy end points were also reported between the treatment arms at week 24, those were not controlled for multiplicity. The FDA guidance document34 indicates virologic response at 48 weeks is the recommended time point for comparative efficacy determination among patients who are treatment-naive or who have a well-documented treatment history demonstrating no virologic failure, stating, “Twenty-four weeks of data are appropriate for drugs that have some benefit over existing options (e.g., better efficacy, tolerability, ease of administration), while 48 weeks is recommended for drugs with comparable characteristics to existing options.” However, the expert consulted for this CDR review indicated that, while 24 weeks is a reasonable follow-up period for viral breakthrough after treatment switch, a longer duration of observation may increase the number of AEs identified.
External Validity
All trials were multinational, enrolling patients from a range of countries across North America, Central and South America, Western Europe, and Asia. Approximately 20% to 25% of the screened patients did not meet the eligibility criteria, primarily due to resistance to any of the study medications (all trials) and having plasma HIV-1 RNA level of < 1,000 copies/mL at screening (treatment-naive patients). According to the clinical expert consulted for this review, it is standard of care to perform baseline resistance-testing to prevent prescription of an inadequately active ARV, thus exclusion of patients based on resistance-testing does not affect the generalizability of the reviewed trials. Other notable eligibility criteria included not having serious liver or kidney impairments (i.e., not having exclusionary laboratory values), active infection, or acute hepatitis. The results may therefore not be generalized to patients with these conditions. A small proportion of patients (< 5%) were hepatitis B and/or C virus–positive, but the clinical expert consulted by CDR indicated that hepatitis co-infection should not negatively affect the bioavailability of the ARVs or their effectiveness.
The clinical expert consulted for this review indicated that the baseline demographic and clinical characteristics in DRIVE-FORWARD and DRIVE-AHEAD were generally reflective of treatment-naive patients in a Canadian setting. However, the number of patients with a history of AIDS (9% to 15% across groups) was higher than expected for a treatment-naive population. The clinical expert consulted by CDR indicated that AIDS is associated with lower CD4 counts and higher viral loads, which may lead to a lower likelihood of virologic success. A higher percentage of patients in the switch trial had a history of AIDS compared with the treatment-naive patients, likely resulting from their history of living with HIV-1 infection for longer than newly diagnosed treatment-naive patients.
The comparators used in the treatment-naive setting, and in particular, EFV/FTC/TDF used in DRIVE-AHEAD, is infrequently prescribed in contemporary clinical practice according to the expert, and have been largely displaced by first-line therapies that are better-tolerated regimens endorsed by the DHHS,4 e.g., BIC/TAF/FTC (Biktarvy), EVG/c/TAF/FTC (Genvoya), and DTG/ABC/3TC (Triumeq). EFV and DRV/r are known to cause neuropsychiatric and gastrointestinal adverse effects, respectively, which should be considered when assessing the generalizability of the safety data.