Generalizing the intention-to-treat effect of an active control from historical placebo-controlled trials: A case study of the efficacy of daily oral TDF/FTC in the HPTN 084 study Qijia He1, Fei Gao2, Oliver Dukes3, Sinead Delany-Moretlwe4, Bo Zhang*,2 1Department of Statistics, University of Washington 2Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Center 3Department of Applied Mathematics, Computer Science and Statistics, Ghent University 4Wits Reproductive Health and HIV Institute, University of the Witwatersrand, Johannesburg, South Africa. Abstract In many clinical settings, an active-controlled trial design (e.g., a non-inferiority or superiority design) is often used to compare an experimental medicine to an active control (e.g., an FDA- approved, standard therapy). One prominent example is a recent phase 3 efficacy trial, HIV Prevention Trials Network Study 084 (HPTN 084), comparing long-acting cabotegravir, a new HIV pre-exposure prophylaxis (PrEP) agent, to the FDA-approved daily oral tenofovir disoproxil fumarate plus emtricitabine (TDF/FTC) in a population of heterosexual women in 7 African countries. One key complication of interpreting study results in an active-controlled trial like HPTN 084 is that the placebo arm is not present and the efficacy of the active control (and hence the experimental drug) compared to the placebo can only be inferred by leveraging other data sources. In this article, we study statistical inference for the intention-to-treat (ITT) effect of the active control using relevant historical placebo-controlled trials data under the potential outcomes (PO) framework. We highlight the role of adherence and unmeasured confounding, discuss in detail identification assumptions and two modes of inference (point versus partial identification), propose estimators under identification assumptions permitting point identification, and lay out sensitivity analyses needed to relax identification assumptions. We applied our framework to estimating the intention-to-treat effect of daily oral TDF/FTC versus placebo in HPTN 084 using data from an earlier Phase 3, placebo-controlled trial of daily oral TDF/FTC (Partners PrEP). Keywords Active-controlled trial; Compliance; Generalizability; HIV prevention; Intention-to-treat effect; Post-randomization event *Correspondence to Bo Zhang, Assistant Professor of Biostatistics, Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Center, Seattle, Washington, 98109. bzhang3@fredhutch.org. HHS Public Access Author manuscript J Am Stat Assoc. Author manuscript; available in PMC 2025 January 02. Published in final edited form as: J Am Stat Assoc. 2024 ; 119(548): 2478–2492. doi:10.1080/01621459.2024.2360643. A uthor M anuscript A uthor M anuscript A uthor M anuscript A uthor M anuscript 1 Introduction 1.1 HIV Prevention Trials Network Study 084: A landmark clinical trial in HIV prevention The HIV Prevention Trials Network Study 084 (HPTN 084) is a phase 3, double- blind, randomized trial comparing long-acting cabotegravir (CAB-LA), an intramuscular injectable, long-acting form of pre-exposure prophylaxis (PrEP) for HIV prevention, to daily oral tenofovir disoproxil fumarate plus emtricitabine (TDF/FTC) among HIV- uninfected, heterosexual women (Delany-Moretlwe et al., 2022). The study was conducted in 7 countries of sub-Saharan Africa, including Botswana, Eswatini, Kenya, Malawi, South Africa, Uganda, and Zimbabwe. Daily oral TDF/FTC (sold under the brand name Truvada™), a World Health Organization (WHO) recommended PrEP for HIV prevention, has been introduced in these countries; however, despite increasing availability and access to oral PrEP in the region, women have faced considerable barriers, including social stigma, judgement and violence (Delany-Moretlwe et al., 2022), to daily pill-taking, which partly explained why the global HIV prevention efforts have stalled with nearly 1.5 million new HIV infections in 2021, or 4,000 every day, a statistic nearly the same as in 2020. High- risk populations, especially those facing barriers to adhering to the daily oral PrEP, are in urgent need of a long-acting prevention modality like injectible CAB-LA. HPTN 084 reported an HIV incidence of 0.20 per 100 person-years in the CAB-LA arm compared to 1.86 per 100 person-years in the daily TDF/FTC arm (hazard ratio, 0.12; 95% CI, 0.05 to 0.31), demonstrating, unequivocally, the superiority of CAB-LA compared to the daily oral TDF/FTC (see Figure S4 in Web Appendix E). Not long after this landmark trial, WHO recommended that “long-acting injectable cabFotegravir (CAB-LA) be offered as an additional HIV prevention option for people at substantial risk of HIV infection” (World Health Organization, 2022). 1.2 Active-controlled trial; intention-to-treat effect; sources of heterogeneity and bias An important aspect of HPTN 084 is its active-controlled trial design. Active-controlled trials are commonly used in clinical settings to evaluate the safety and effectiveness of an experimental medication compared to a standard therapy (referred to as an active control and abbreviated as AC) when it is unethical to randomize patients to placebo and deprive them of the available standard therapies (Ellenberg and Temple, 2000). Two popular choices of an active-controlled trial design are a superiority design and a non-inferiority (NI) design. In an active-controlled trial design, the placebo arm is not present, so it is not straightforward to estimate the intention-to-treat (ITT) effect of the active control compared to the placebo in the trial population (Fleming et al., 2011). There are two motivations for understanding the ITT effect of an active control compared to the placebo in an active-controlled trial. First, in the design stage of a non-inferiority trial, a key design factor is to select the so-called NI margin, defined as an acceptable loss of efficacy comparing the experimental therapy with the AC in the NI trial population. The current standard practice is to set the NI margin to a fraction of the assumed ITT effect of the AC; hence, a better understanding of AC’s ITT effect facilitates selecting a rigorous and scientifically justifiable NI margin (Rothmann et al., 2003; James Hung et al., 2003; Fleming et al., 2011). Second, in a post-hoc analysis of the active-controlled trial data, the ITT effect He et al. Page 2 J Am Stat Assoc. Author manuscript; available in PMC 2025 January 02. A uthor M anuscript A uthor M anuscript A uthor M anuscript A uthor M anuscript of AC versus placebo can be used to establish the ITT effect of the experimental drug versus placebo. The ITT effect of an experimental drug plays a key role in designing future trials to evaluate other experimental drugs, where the current experimental drug may serve as a comparator. In addition, it provides evidence to quantify the experimental drug’s public health impact and facilitates comparison of the experimental drug to other therapeutics. Lastly, the ITT effect of the experimental drug helps evaluate how much society should be willing to pay for the improved efficacy of the experimental drug compared to the active control. For instance, Neilan et al. (2022) evaluated the cost-effectiveness of CAB-LA using the Cost-Effectiveness of Preventing AIDS Complications model and a key model parameter in this analysis is the ITT effect of CAB-LA versus placebo. What’s more, additional HIV prevention modalities, like an HIV vaccine (Fauci, 2017) and monoclonal antibodies (Miner et al., 2021), are currently under development. The placebo-controlled intention-to-treat effect of CAB-LA serves as an important benchmark to these new interventions. There are at least three sources of heterogeneity that complicate generalizing an AC’s ITT effect from any historical, randomized, placebo-controlled trial to the planned active- controlled trial. First, the actual treatment effect of the AC could be heterogeneous (treatment effect heterogeneity). Second, within the same study, different participants could have different probabilities of adhering to the assigned treatment (within-trial compliance heterogeneity); for example, in the field of HIV prevention, it was reported that age was correlated with adherence to the prescribed PrEP dose (Grant et al., 2014). Moreover, the same AC could be implemented differently across trials and even the same participants could respond differently to distinct implementations (between-trial compliance heterogeneity). Third, trials could target different populations, and therefore, key demographic and health information could differ among trial populations (target population heterogeneity). An interplay among treatment effect heterogeneity, within- and between-trial compliance heterogeneity, and target population heterogeneity may lead to generalization bias (Stuart et al., 2011) of the ITT effect. In fact, ITT estimates of the same intervention often differ across historical trials (see, e.g., Table S3 in Web Appendix E). In an editorial discussing discrepancies among these findings, Cohen and Baden (2012) concluded: Why the results differ across the various studies reported to date is unclear. However, important considerations include the populations studied; the likely routes of HIV transmission (vaginal vs. anal mucosa)…and most important, medication adherence by study participants. Cohen and Baden’s (2012) comments echo three of the aforementioned sources of heterogeneity. 1.3 Current FDA guidelines; existing approaches and literature; our contributions Current FDA guidelines for designing an NI trial recommend two strategies for estimating the efficacy of an AC in the planned NI trial from historical evidence (Food and Drug Administration, 2016). First, one may choose a historical placebo-controlled trial of the AC and assume that its ITT effect would remain unchanged in the target NI trial. This assumption is known as the “constancy assumption” (Fleming et al., 2011) and could be implausible considering various sources of heterogeneity previously discussed. He et al. Page 3 J Am Stat Assoc. Author manuscript; available in PMC 2025 January 02. A uthor M anuscript A uthor M anuscript A uthor M anuscript A uthor M anuscript Alternatively, one may employ a meta-analytical approach and derive an average estimate based on summary statistics of multiple historical trial results and a random-effects model. The meta-analytical approach acknowledges the variability of ITT estimates across historical trials and incorporates uncertainty quantification using random effects; however, the method is still largely ad hoc and is not underpinned by clear identification assumptions. Either way, the FDA guidelines recommend acknowledging the unreliability of generalization and using a “discounted” estimate as a means of protection against the generalization bias. Some authors acknowledge the important role of observed covariates in generalizing the intention-to-treat effect, and have proposed covariate adjustment methods under a “conditional constancy” assumption, that is, the intention-to-treat effect within the same strata of study participants (defined by their observed covariates) is constant across trials (Zhang, 2009). Zhang et al. (2014) developed a sensitivity analysis method that tackles residual inconstancy due to unmeasured confounding. The conditional constancy assumption improves upon the constancy assumption and addresses the target population heterogeneity; however, even the conditional constancy assumption is hard to justify because of the across- trial compliance heterogeneity arising from different implementation strategies. Another unsolved issue concerns unmeasured confounders: What is the precise role of unmeasured confounders in preventing generalization of the ITT effect? Under the conditional constancy assumption, recent developments in the generalization and transportation methods for causal inference could be directly leveraged to generalize the ITT effect from a historical trial to the planned active-controlled trial (Stuart et al., 2011; Dahabreh et al., 2019); see, e.g., Degtiar and Rose (2021) for a recent review. Pearl (2011, Section 6, Equation 24) discussed identification of the causal effect in the presence of a post-randomization surrogate endpoint under a sequential ignorability assumption (Joffe and Greene, 2009). Rudolph and van der Laan (2017) proposed targeted maximum likelihood estimators (TMLEs) to transport the intention-to-treat effect across populations under a version of the conditional constancy assumption. They consider a different setting where covariates, treatment assignment and treatment received are observed in both reference and trial populations, but there is only follow up data in the reference population. Further, we develop a distinct instrumental variable-based identification strategy that leads to different estimators of the ITT effect. More recently, Dahabreh et al. (2022) discussed in detail unidentifiability of the ITT effect when there are unmeasured common causes of trial participation and treatment, and interpretation of the covariate-standardized ITT estimates (under the conditional constancy assumption) as estimating the effects of joint interventions that scale-up the trial and assign the treatment. In the absence of patient-level data, many authors have proposed meta-analysis-based approaches to estimating causal effects accounting for noncompliance. For instance, Zhou et al. (2019) proposed a Bayesian hierarchical modeling approach to estimating complier average treatment effects and Zhou et al. (2022) proposed a closely related, frequentist approach that targets the same estimand. In this article, we propose historical-data-driven estimators for AC’s ITT effect in a target trial population using relevant historical trials and under different identification assumptions. Our developed framework helps translate, assess, and quantify what FDA guidelines refer to as non-statistically-based uncertainties (Food and Drug Administration, 2016, Page 20) He et al. Page 4 J Am Stat Assoc. Author manuscript; available in PMC 2025 January 02. A uthor M anuscript A uthor M anuscript A uthor M anuscript A uthor M anuscript and places many “essential considerations” raised in Fleming et al. (2011) in the context of formal causal identification assumptions. We assess the finite-sample performance of proposed estimators in simulation studies and apply the proposed estimators to estimating the ITT effect of daily oral TDF/FTC against placebo in the HPTN 084 study using data from this trial and an earlier, historical placebo-controlled trial of daily oral TDF/FTC (Baeten et al., 2012). 2 Notation and framework 2.1 Potential outcomes We consider the potential outcomes framework (Angrist et al., 1996) to formalize a placebo- controlled trial with noncompliance involving an active control (AC) and a placebo (P). Let Zi ∈ 0,1 denote a binary treatment assignment (0 for placebo and 1 for AC), and Di Zi = zi ∈ 0,1 the potential treatment received had i been assigned the treatment Zi = zi. Each study participant has a pre-specified probability of receiving either treatment (AC or P). A study participant with Di 1 , Di 0 = 1,0 complies with the treatment assignment and is referred to as a complier. A participant with Di 1 , Di 0 = 1,1 is referred to as an always-taker, Di 1 , Di 0 = 0,0 a never-taker, and Di 1 , Di 0 = 0,1 a defier (Angrist et al., 1996). We have assumed the Stable Unit Treatment Value Assumption (SUTVA) in the definition of Di Zi = zi so that a study participant’s treatment received depends only on the person’s own treatment assignment (Rubin, 1980; Angrist et al., 1996). Each participant is also associated with potential outcomes Y i di, zi , di ∈ 0,1 , zi ∈ 0,1 where we again assume the SUTVA in this definition. Under the exclusion restriction assumption, we further have Y i di, zi = Y i di , that is, the treatment assignment affects the outcome only via the actual treatment received. Next, we assume Z is randomly assigned and is “relevant” in the sense that E D Z = 1 − D Z = 0 ≠ 0. The SUTVA, exclusion restriction, relevance, and random assignment will be referred to as “core IV assumptions” in this article. Additional assumptions like “monotonicity” Di Zi = 1 ≥ Di Zi = 0 and “one-sided noncompliance” Di Zi = 0 = 0 help further simplify potential outcomes; however, we do not a priori make these additional assumptions, though we will consider these as important special cases. We use the indicator S to denote the trial membership of a participant: S = t if a participant is in the target active-controlled trial; S = ℎ if a participant is in a generic historical placebo- controlled trial. In the later development, we will also explore scenarios where data from two historical trials may be leveraged; in this case, we will use ℎ1 and ℎ2 to distinguish distinct historical trials. Regardless of trial membership, each participant is associated with a vector of baseline covariates X. We will use Pt to denote the joint distribution of X in the target trial and Ph that in the historical trial ℎ. We use EX ∈ Pt ⋅ and EX ∈ Ph ⋅ to denote taking expectation over Pt and Pℎ. Throughout, we will make a positivity assumption that f S = ℎ ∣ X = x > 0 for all x in the support of X in the planned NI trial. 2.2 Estimands The conditional ITT effect of AC versus P in the historical trial is defined as ITT X; S = h = E Y Z = 1 − Y Z = 0 ∣ X, S = h . Averaging ITT X; S = h over the He et al. Page 5 J Am Stat Assoc. Author manuscript; available in PMC 2025 January 02. A uthor M anuscript A uthor M anuscript A uthor M anuscript A uthor M anuscript distribution of observed covariates X ∈ Ph then yields the average intention-to-treat effect in the historical trial S = ℎ : ITT S = h = EX ∈ Ph E Y Z = 1 − Y Z = 0 ∣ X, S = h , (1) which was unbiasedly estimated in the historical placebo-controlled trial by virtue of randomization. In parallel, we use ITT X; S = t = E Y Z = 1 − Y Z = 0 ∣ X, S = t to denote the conditional ITT effect of AC versus P in a hypothetical placebo-controlled trial in the target trial population. The intention-to-treat effect of AC in the target trial population is then obtained by averaging ITT X; S = t over the target AC trial population as follows: ITT S = t = EX ∈ Pt E Y Z = 1 − Y Z = 0 ∣ X, S = t . (2) As discussed in Section 1.2, the NI margin and the ITT effect of the experimental drug can be immediately determined once ITT S = t is determined. The causal parameter ITT S = t is of primary scientific interest and hence our target parameter. 2.3 The constancy assumption The constancy assumption in the NI trial literature (Fleming et al., 2011) states the following relationship between the ITT effect of AC in a target NI trial and that in a chosen historical trial: Assumption 1 (Constancy). Let ITT S = t and ITT S = ℎ be defined as in (2) and (1), respectively. The constancy assumption is said to hold if ITT S = t = ITT S = ℎ . Another version of the constancy assumption, referred to as the conditional constancy assumption (Zhang, 2009), states the following: Assumption 2 (Conditional constancy). Let X denote a vector of observed covariates collected in the planned NI trial. Let ITT X; S = t and ITT X; S = ℎ denote conditional intention-to-treat effects in the planned NI trial and the chosen historical trial, respectively. Then the conditional constancy assumption is said to hold if ITT X; S = t = ITT X; S = ℎ . It is transparent from definitions (1) and (2) that when trials enroll study participants from different populations, that is, when Pt ≠ Ph, then Assumption 1 could fail even when Assumption 2 holds. This has been discussed in great detail in the context of generalizability and transportability by many authors; see, e.g., Dahabreh et al. (2019). Under Assumption 2, the ITT effect in the target active-controlled trial is identified from the observed data of the historical trial S = ℎ and may be estimated using outcome regression, inverse-probability- weighting, or a doubly-robust combination of both; see, e.g., Dahabreh et al. (2019, Section 5). Although weaker than Assumption 1, Assumption 2 is still a strong assumption; it is, He et al. Page 6 J Am Stat Assoc. Author manuscript; available in PMC 2025 January 02. A uthor M anuscript A uthor M anuscript A uthor M anuscript A uthor M anuscript after all, a statement about the intention-to-treat effect, not the actual treatment effect. Even when interventions have consistent treatment effects for similar study participants across different trials, interventions may be implemented differently, induce different compliance even among similar study participants, and lead to different ITT effects. This is particularly true in HIV prevention studies with daily oral PrEP (Cohen and Baden, 2012). 3 Identification assumptions; a road map for estimation and sensitivity analysis In this section, we replace the constancy assumption with a set of assumptions regarding effect homogeneity and generalizability. These assumptions are not necessarily weaker; however, they are transparent, problem-specific, and more amenable to being assessed and critiqued. They also motivate the estimation procedures and associated sensitivity analyses. 3.1 No-interaction/homogeneity-type assumption Intuitively, a statement about the intention-to-treat effect implicitly entails a statement about the compliance structure and the actual treatment effect. Unlike Z, the actual treatment received D is a post-randomization event and not randomized. Some version of a homogeneity or no-interaction assumption is therefore necessary to link the ITT effect to the average treatment effect (Swanson et al., 2018, Section 5). Below, we adopt one version from Wang and Tchetgen Tchetgen (2018). Assumption 3 (No-interaction). Let U denote unmeasured covariates that confound D’s effect on Y . The no-interaction assumption holds if there is no additive U − D interaction in E Y D ∣ X, U : E Y D = 1 ∣ X, U − E Y D = 0 ∣ X, U = E Y D = 1 ∣ X − E Y D = 0 ∣ X . (Assumption 3a) or no additive U − Z interaction in E D Z ∣ X, U : E D Z = 1 ∣ X, U − E D Z = 0 ∣ X, U = E D Z = 1 ∣ X − E D Z = 0 ∣ X (Assumption 3b) Assumption 3 holds if either Assumption 3a or Assumption 3b holds. Assumption 3a holds if there are no more modifiers of D’s effect on Y beyond those captured by X. Assumption 3a does not hold, for instance, if some genetic factor is suspected to modify D’s effect on Y . Fleming et al. (2011) describe an example where the effect of epidermal growth factor receptor-inhibiting drugs in colorectal cancer patients depends strongly on whether tumors express the wild type or the mutated version of the KRAS gene. In this example, the KRAS gene U modifies the effect of drug D on colorectal cancer Y , and Assumption 3a fails in an analysis not accounting for it. He et al. Page 7 J Am Stat Assoc. Author manuscript; available in PMC 2025 January 02. A uthor M anuscript A uthor M anuscript A uthor M anuscript A uthor M anuscript Assumption 3 also holds if Assumption 3b holds, that is, when the unmeasured modifier of D’s effect on Y does not interact with the treatment assignment Z in predicting the treatment received. In the colorectal cancer example, Assumption 3 would still hold if the KRAS gene does not interact with a colorectal cancer patient’s treatment assignment in predicting whether or not the patient adheres to the prescribed treatment conditional on X (possibly including some easier-to-measure aspects of the tumor). This appears to be more reasonable, at least in some applications. Assumption 3 is a generic assumption that could be applied to either the target AC trial S = t or a historical trial S = ℎ. Assumption 3, when applied to the hypothetical placebo- controlled trial in the planned active-controlled trial population, implies the following decomposition: ITT X; S = t = E Y D = 1 − Y D = 0 ∣ X, S = t CATE X; S = t × E D Z = 1 − D Z = 0 ∣ X, S = t , CC X; S = t (3) where the term CATE X; S = t describes the average treatment effect of AC versus P conditional on a study participant’s covariates, and the conditional compliance term CC X; S = t describes the effect of treatment assignment on treatment received conditional on a study participant’s covariates, both in the planned active-controlled trial. 3.2 Conditional average treatment effect; mean generalizability To link the active-controlled trial to historical trials, we make the following mean generalizability (also known as mean exchangeability) assumption (Stuart et al., 2011; Dahabreh et al., 2019): Assumption 4 (Mean generalizability/exchangeability). CATE X; S = t : = E Y D = 1 − Y D = 0 ∣ X, S = t = E Y D = 1 − Y D = 0 ∣ X, S = ℎ : = CATE X; S = ℎ . Assumption 4 essentially says that study participants with the same observed covariates X would experience the same average treatment effect of D on Y in the hypothetical trial and the selected historical trial S = ℎ. The major difference between Assumption 4 and Assumption 2 is that Assumption 4 is a statement about the actual treatment effect rather than the intention-to-treat effect. Assumption 4 is in some sense the minimal assumption needed to extend inference from a historical trial to a target population (Dahabreh et al., 2019, Section 3). Assumption 4 can still be violated if there exist multiple versions of an active control or placebo across trials (i.e., Rubin’s SUTVA is violated); for instance, this could happen if the active control therapy employed in a historical trial is different from that in the planned active-controlled trial due to difference in dosage or ancillary therapies (Fleming et al., 2011; Food and Drug Administration, 2016). For Assumption 4 to hold, researchers should select a historical trial with an active control as similar as possible to that in He et al. Page 8 J Am Stat Assoc. Author manuscript; available in PMC 2025 January 02. A uthor M anuscript A uthor M anuscript A uthor M anuscript A uthor M anuscript the planned active-controlled trial (e.g., both investigating the same medication at the same dose with near-identical ancillary therapies). It is important to note that Assumption 4 only requires the AC therapy itself be identical between trials, not the methods of implementation or dissemination. A sensitivity analysis that models CATE X; S = t as a fraction of CATE X; S = ℎ should be considered when Assumption 4 is suspected not to hold. Identification of the conditional average treatment effect from a historical placebo-controlled trial, i.e., CATE X; S = ℎ , has been discussed extensively in the literature. Two identification strategies are available. First, point identification could be achieved by further imposing Assumption 3 on the selected historical trial. Alternatively, CATE X; S = ℎ is partially identified under different sets of identification assumptions, including minimal, core IV assumptions. A partial identification interval bounds the range of possible values of the CATE X; ℎ that are consistent with the observed data. Unlike a confidence interval, a partial identification interval would not shrink to a point even when the sample size goes to infinity, as the true parameter may take a range of values and cannot be point identified; see, e.g., Swanson et al. (2018) for a recent review. An assumption related to Assumption 4 is given in Rudolph and van der Laan (2017), also in the context of transporting ITT effects under non-compliance: E Y ∣ D, Z, X, S = t = E Y ∣ D, Z, X, S = ℎ . This is arguably more difficult to interpret than Assumption 4, since it concerns the generalizability of associations rather than causal effects. In our analysis, we explicitly assume that it is the CATE that generalizes, which we identify by leveraging randomization as an instrument. 3.3 Conditional compliance The conditional compliance term is a difference between CCAC X; S = t : = E D Z = 1 ∣ X, S = t and CCP X; S = t : = E D Z = 0 ∣ X, S = t . It then suffices to identify each term separately. The former term equals E D ∣ X, S = t, Z = 1 and is identified based on the compliance data from the active-controlled trial by virtue of randomization. The latter term CCP X; S = t is not identified from the active-controlled trial, but may be estimated using relevant historical trial data under the following placebo-arm compliance generalizability assumption: Assumption 5 (Placebo-arm compliance generalizability/exchangeability). E D Z = 0 ∣ X, S = t = E D Z = 0 ∣ X, S = ℎ . Note that E D Z = 0 ∣ X, S = ℎ = E D ∣ X, S = ℎ, Z = 0 is directly estimable from historical trial data by randomization. Alternatively, researchers may do a sensitivity analysis for CCP X by varying it from 0 to a sensible value. If the active control therapy is not available in the AC trial population, then one would reasonably set CCP X = 0. Researchers may also vary CCP X in a sensitivity interval centered around E D ∣ X, S = ℎ, Z = 0 . Either way, instead of outputting a point estimate of CCP X and CC X , one may output a plausible range. He et al. Page 9 J Am Stat Assoc. Author manuscript; available in PMC 2025 January 02. A uthor M anuscript A uthor M anuscript A uthor M anuscript A uthor M anuscript In applications where the intention-to-treat effect of AC is an important design factor, e.g., when selecting the NI margin in the design phase of the NI trial, researchers do not have compliance data from the NI trial and need to identify the entire conditional compliance term using data from a historical trial under the following mean compliance generalizability assumption: Assumption 6 (Mean compliance generalizability/exchangeability). E D Z = 1 − D Z = 0 ∣ X, S = t = E D Z = 1 − D Z = 0 ∣ X, S = ℎ . By randomization of Z, the quantity E D Z = 1 − D Z = 0 ∣ X, S = ℎ equals E D ∣ X, S = ℎ, Z = 1 − E D ∣ X, S = ℎ, Z = 0 and is directly estimable from historical trial data. For Assumption 6 to hold (or approximately hold), the selected historical trial should have a near-identical implementation strategy of the AC as in the active- controlled trial. Researchers are also advised to relax Assumption 6 in a sensitivity analysis that varies the conditional compliance term in a sensitivity interval around E D Z = 1 − D Z = 0 ∣ X, S = ℎ . 3.4 Summary of identification strategies and non-statistically-based uncertainties Figure 1 summarizes four aspects we have discussed so far: (i) identification assumptions, including those necessary to identify causal quantities in a trial with non-compliance and those necessary to generalize inference across trials, (ii) quantities involved in the estimation procedure, (iii) different modes of identification, including point or partial identification, and identification using historical trials alone or historical data plus partial AC trial data, and (iv) sensitivity analyses relaxing core assumptions. Together, they help quantify what FDA guidelines refer to as “non-statistically-based uncertainties” (Food and Drug Administration, 2016, Page 20). We next discuss statistically-based uncertainties, that is, those associated with sampling variability, by formally proposing estimators for the target parameter. 4 Estimation and inference We consider two scenarios for estimation and inference, each corresponding to one major scientific objective of estimating the ITT effect of the AC versus placebo. We first consider the design stage where researchers have access to data from the historical trial S = ℎ1, Dℎ1 = Xi, Zi, Di, Y i, Si = ℎ1 : i = 1, …, N1 , data from a second historical trial S = ℎ2, Dℎ2 = Xi, Zi, Di, Si = ℎ2 : i = N1 + 1, …, N1 + N2 , and baseline covariates data from the target AC trial Dt = Xi, Si = t : i = N1 + N2 + 1, …, N1 + N2 + N . Researchers will attempt to leverage the historical data in S = ℎ1 to estimate CATE X; S = t and data in S = ℎ2 to estimate CC X; S = t . In this scenario, we write D = Dℎ1 ∪ Dℎ2 ∪ Dt, where D denote its cardinality. We next consider a more stylistic case where the interest lies in estimating the ITT effect of the AC and hence the experimental therapy versus placebo in an post hoc analysis after seeing the compliance data from the target AC trial. In this case, researchers would have access to data Dℎ1 as mentioned previously plus data Dt = Xi, Zi, Di, Si = t : i = N1 + 1, …, N1 + N from the target AC trial. In this second scenario, we write D = Dℎ1 ∪ Dt. He et al. Page 10 J Am Stat Assoc. Author manuscript; available in PMC 2025 January 02. A uthor M anuscript A uthor M anuscript A uthor M anuscript A uthor M anuscript 4.1 Estimation and inference in the design stage We first consider estimating ITT S = t in the design stage and derive a regression-based, historicaldata-driven estimator under Assumption 3 (applied to both S = t and S = ℎ1), Assumption 4 and Assumption 6. Under Assumption 3, the conditional average treatment effect in the trial S = ℎ1, i.e., CATE X; ℎ1 , is identified as follows: E Y D = 1 − Y D = 0 ∣ X, S = ℎ1 = E Y ∣ X, S = ℎ1, Z = 1 − E Y ∣ X, S = ℎ1, Z = 0 E D ∣ X, S = ℎ1, Z = 1 − E D ∣ X, S = ℎ1, Z = 0 = δY X; ℎ1 δD X; ℎ1 , (4) where δY X; s = E Y ∣ X, S = s, Z = 1 − E Y ∣ X, S = s, Z = 0 , δD X; s = E D ∣ X, S = s, Z = 1 − E D ∣ X, S = s, Z = 0 . The expression (4) is sometimes known as the conditional Wald estimand. It also identifies the conditional complier average treatment effect, if we further assume monotonicity in the historical trial (Angrist et al., 1996). Suppose that we obtain δ̂Y X; ℎ1 and δ̂D X; ℎ1 by fitting correctly specified parametric models for E Y ∣ X, S = ℎ1, Z = z and E D ∣ X, S = ℎ1, Z = z and that these models are indexed by finite-dimensional parameters which are estimated, for instance, via maximum likelihood. Then a regression-based estimator CATE X; ℎ1 is obtained as CATE X; ℎ1 = δ̂Y X; ℎ1 /δ̂D X; ℎ1 . As discussed by Wang and Tchetgen Tchetgen (2018), a limitation of this approach with a binary outcome is that one may obtain estimates of CATE X; ℎ1 outside of the − 1,1 interval. A regression-based estimator of conditional compliance in the historical trial ℎ2 can be analogously obtained as CC X; ℎ2 = δ̂D X; ℎ2 , where δ̂D X; ℎ2 denotes an estimator for the unknown δD X; ℎ2 obtained from Dℎ2 via parametric regression modelling of E D ∣ X, S = ℎ2, Z = z . By averaging CATE X; ℎ1 and CC X; ℎ2 over X ∈ Dt, we obtain the following regression-based estimator of ITT S = t : ITT full, reg = 1 Dt i = 1 D 1 Si = t × CATE Xi; ℎ1 × CC Xi; ℎ2 = 1 Dt i = 1 D 1 Si = t × δ̂Y Xi; ℎ1 δ̂D Xi; ℎ1 δ̂D Xi; ℎ2 . (5) By standard M-estimation theory (Stefanski and Boos, 2002), the estimator ITT full, reg is a consistent and asymptotically normal estimator for the target parameter ITT S = t under the modeling assumptions previously discussed. To obtain a confidence interval, one may use an empirical sandwich variance estimator or the non-parametric bootstrap (Cheng and Huang, 2010). He et al. Page 11 J Am Stat Assoc. Author manuscript; available in PMC 2025 January 02. A uthor M anuscript A uthor M anuscript A uthor M anuscript A uthor M anuscript The regression-based estimator ITT full, reg is expected to perform well if parametric models are correctly specified. Below, we describe an estimator derived from semiparametric efficiency theory (Bickel et al., 1993), which allows for more flexible estimation of nuisance functions using modern statistical learning approaches, whilst still facilitating parametric- rate inference on the target parameter. It is developed from the same general theory as recent developments in de-biased machine learning (Chernozhukov et al., 2017) and targeted learning (van der Laan and Rose, 2011). Recall that the target parameter ITT S = t can be expressed as the functional ψ = E δY X; ℎ1 δD X; ℎ1 δD X; ℎ2 S = t . Theorem 1 gives our main result on semiparametric inference. Theorem 1. Under a non-parametric model ℳ that places no restrictions on the observed data distribution, the efficient influence function (EIF) for ψ is equal to EIFψ = 1 κ 2Z − 1 1 S = ℎ1 f Z ∣ X, S = ℎ1 f S = t ∣ X f S = ℎ1 ∣ X δD X; ℎ2 δD X; ℎ1 Y − μY , 0 X; ℎ1 − D − μD, 0 X; ℎ1 δY X; ℎ1 δD X; ℎ1 + 1 κ 2Z − 1 1 S = ℎ2 f Z ∣ X, S = ℎ2 f S = t ∣ X f S = ℎ2 ∣ X δY X; ℎ1 δD X; ℎ1 D − μD, 0 X; ℎ2 − δD X; ℎ2 Z + 1 κ 1 S = t δY X; ℎ1 δD X; ℎ1 δD X; ℎ2 − ψ , where μY , z X; s = E Y ∣ X, S = s, Z = z , μD, z X; s = E D ∣ X, S = s, Z = z and κ = f S = t . The semiparametric efficiency bound under ℳ is E EIFψ 2 . Although ℳ is a non-parametric model, the treatment assignment probabilities f Z = z ∣ X, S are known by design in our setting and in particular do not typically depend on X. Nevertheless, it follows, e.g. from Hahn (1998), that knowledge of f Z = z ∣ X, S in this case should not change the bound. In contrast, one may be able to leverage information on f S = s ∣ X to gain precision, although we do not pursue this since such knowledge may not generally be available. To construct an estimator of ψ based on EIFψ, one must estimate δY X; ℎ1 , δD X; ℎ1 and δD X; ℎ2 , plus the additional nuisance functions μY , z X; s , μD, z X; s and f S = s ∣ X . Although f Z ∣ X, S = ℎ1 is known, one typically uses the estimated version of it. One strategy would be to develop a multiply robust approach similar to that in Wang and Tchetgen Tchetgen (2018), based on parametric working models for the nuisance functions. Here, we describe an alternative approach, which allows for off-the-shelf methods to learn these quantities. These could include classical non-parametric estimators (e.g. kernel smoothers, sieves) or potentially more flexible statistical learning approaches (random forests, kernel ridge regression, Lasso, ensemble methods). After obtaining estimates He et al. Page 12 J Am Stat Assoc. Author manuscript; available in PMC 2025 January 02. A uthor M anuscript A uthor M anuscript A uthor M anuscript A uthor M anuscript δ̂Y X; ℎ1 , δ̂D X; ℎ1 , δ̂D X; ℎ2 , f̂ Z ∣ X, S = s , f̂ S = s ∣ X , μ̂Y , 0 X; ℎ1 and μ̂D, 0 X; ℎ2 , one can then estimate ψ as ITT EIF = 1 Dt i = 1 D 2Zi − 1 1 Si = ℎ1 f̂ Zi ∣ Xi, Si = ℎ1 f̂ Si = t ∣ Xi f̂ Si = ℎ1 ∣ Xi δ̂D Xi; ℎ2 δ̂D Xi; ℎ1 Y i − μ̂Y , 0 Xi; ℎ1 − Di − μ̂D, 0 Xi; ℎ1 δ̂Y Xi; ℎ1 δ̂D Xi; ℎ1 + 1 Dt i = 1 D 2Zi − 1 1 Si = ℎ2 f̂ Zi ∣ Xi, Si = ℎ2 f̂ Si = t ∣ Xi f̂ Si = ℎ2 ∣ Xi δ̂Y Xi; ℎ1 δ̂D Xi; ℎ1 Di − μ̂D, 0 Xi; ℎ2 − δ̂D Xi; ℎ2 Zi + 1 Dt i = 1 D 1 Si = t δ̂Y Xi; ℎ1 δ̂D Xi; ℎ1 δ̂D Xi; ℎ2 . Under regularity conditions, if each of the nuisance estimators converges to the truth with mean squared error rate shrinking faster than n−1/4 and certain Donkser conditions on the nuisance functions hold (Van der Vaart, 2000), then TTT EIF is n1/2-consistent and asymptotically normal. Furthermore, supposing that E D ∣ X, S = ℎ1, Z is consistently estimated, the estimator is asymptotically unbiased (although not necessarily n1/2-consistent) so long as one of the following restrictions hold: (1) E Y ∣ X, S = ℎ1, Z and E D ∣ X, S = ℎ2, Z are consistently estimated; (2) E Y ∣ X, S = ℎ1, Z and f S ∣ X are consistently estimated; and (3) E D ∣ X, S = ℎ2, Z and f S ∣ X are consistently estimated. See Section 4.5 of Wang and Tchetgen Tchetgen (2018) for further discussion about the robustness properties. Additional robustness may be attained by using doubly robust estimators for certain nuisance functions and/or adopting the parametrizations in Wang and Tchetgen Tchetgen (2018). An estimator of the asymptotic variance can be obtained using a sandwich estimator. As discussed in Chernozhukov et al. (2017), if very flexible learning methods are used, sample-splitting (estimating the nuisance functions on a training split, and ψ on a test split) or cross-fitting are recommended to alleviate the Donsker conditions. 4.2 Estimation and inference in the post hoc analysis The previous results straightforwardly extend to the setting where one wishes to evaluate the ITT effect of the AC versus placebo with data available from the target AC trial. In that case, conditional compliance E D Z = 1 ∣ X, S = t can be identified under randomization as μD, 1 X; t : = E D ∣ X, S = t, Z = 1 . However, E D Z = 0 ∣ X, S = t cannot be identified as straightforwardly, because there is no placebo arm in the AC trial. We will proceed here, as in our case study, by treating E D Z = 0 ∣ X, S = t as a sensitivity parameter μD, 0 * X; t , such that δD * X; t : = μD, 1 X; t − μD, 0 * X; t and δ̂D * X; t : = μ̂D, 1 X; t − μD, 0 * X; t . In that case, the identification functional is now ψ = E δY X; ℎ1 δD X; ℎ1 δD * X; t S = t . Results on estimation follow closely along the lines described in the previous subsection. Indeed, the regression-based estimators equal He et al. Page 13 J Am Stat Assoc. Author manuscript; available in PMC 2025 January 02. A uthor M anuscript A uthor M anuscript A uthor M anuscript A uthor M anuscript ITT full, reg = 1 Dt i = 1 D 1 Si = t × δ̂Y Xi; ℎ1 δ̂D Xi; ℎ1 δ̂D * Xi; t . whereas the estimators based on the efficient influence function simplify to ITT EIF = 1 Dt i = 1 D 2Zi − 1 1 Si = ℎ1 f̂ Zi ∣ Xi, Si = ℎ1 f̂ Si = t ∣ Xi f̂ Si = ℎ1 ∣ Xi δ̂D * Xi; t δ̂D Xi; ℎ1 Y i − μ̂Y , 0 Xi; ℎ1 − Di − μ̂D, 0 Xi; ℎ1 δ̂Y Xi; ℎ1 δ̂D Xi; ℎ1 + 1 Dt i = 1 D 1 Zi = 1 1 Si = t f̂ Zi ∣ Xi, Si = t δ̂Y Xi; ℎ1 δ̂D Xi; ℎ1 Di − μ̂D, 1 Xi; t + 1 Dt i = 1 D 1 Si = t δ̂Y Xi; ℎ1 δ̂D Xi; ℎ1 δ̂D * Xi; t . One can then estimate the ITT effect of the experimental therapy versus placebo in the target AC trial population by adding one of the estimates described above to the estimated ITT comparison of the experimental therapy versus active control. Although the above developments treat μD, 0 * Xi; t as fixed, in practice, one may wish to very it based on a plausible range of values. 4.3 Extensions Point identification of CATE X; ℎ1 requires imposing Assumption 3 or other homogeneity- type assumptions on the historical trial S = ℎ1 (Swanson et al., 2018, Section 5.2). Alternatively, one may proceed by constructing partial identification intervals L X , U X such that CATE X; ℎ1 ∈ L X , U X almost surely. Depending on the assumptions one is willing to make about the treatment assignment and treatment received, different partial identification bounds can be formulated (Swanson et al., 2018). We review some estimation strategies for partial identification bounds in Web Appendix B for completeness. We will construct partial identification bounds that are motivated by our case study in Section 6. Web Appendix C also discusses some variants of the regression-based estimator and a sensitivity analysis assessing Assumption 3. 5 Simulation study 5.1 Goal and structure We consider data generating processes that have all three three sources of heterogeneity: treatment effect heterogeneity, within- and across-trial compliance heterogeneity, and target population heterogeneity. We generate two historical datasets Dℎ1 and Dℎ2, and a hypothetical placebo-controlled trial dataset Dtarget according to the following data generating process: Sample sizes: N1 = N2 = N = 1000, 2000, and 5000. Observed covariates and overlap: We consider the following two data generating processes for X, one mimicking the case study (Scenario X1) and the other following a standard multivariate normal distribution (Scenario X2). He et al. Page 14 J Am Stat Assoc. Author manuscript; available in PMC 2025 January 02. A uthor M anuscript A uthor M anuscript A uthor M anuscript A uthor M anuscript Scenario X1:We sample with replacement HPTN 084 participants’ observed covariates to form Dtarget. We then sample Partners PrEP participants’ observed covariates to form Dℎ1 and Dℎ2 and control the amount of overlap between these two historical datasets and Dtarget using the following biased sampling strategy. For each study participant in Partners PrEP, we estimate a “probability of trial participation,” defined as the probability of selection into the HPTN 084 study over the Partners PrEP study based on a participant’s baseline characteristics (Cole and Stuart, 2010; Stuart et al., 2011). This “probability of trial participation” is a version of Rosenbaum and Rubin’s (1983) propensity score and captures the covariate balance between the target and historical datasets. By over- and under-sampling participants in Partners PrEP with large estimated “probability of participation,” we then control the amount of overlap between datasets. Specifically, the historical dataset Dℎj was formed by sampling Nj, high and Nj, low participants with high (above 0.5) and low (below 0.5) probability of participation, j = 1,2. We consider three overlap levels: (i) Poor overlap: N1, high = 0.1N1, N1, low = 0.9N1, N2, high = 0.15N2, N2, low = 0.85N2; (ii) Limited overlap: N1, high = 0.19N1, N1, low = 0.81N1, N2, high = 0.19N2, N2, low = 0.81N2; (iii) Sufficient overlap: N1, high = 0.4N1, N1, low = 0.6N1, N2, high = 0.5N2, N2, low = 0.5N2. To illustrate, Figure 2 plots the overlap between Dtarget and Dℎ1 in poor overlap, limited overlap, and sufficient overlap scenarios. Scenario X2: We generate a 10-dimensional X Multivariate Normal μ, 0.5 ⋅ Id , where Id is an identity matrix, μ = c, c, c, 0, …, 0 T in Dℎ2, μ = 1.2c, 1.2c, 1.2c, 0, …, 0 T in Dℎ1, and μ = 0.8c, 0.8c, 0.8c, 0, …, 0 T in Dtarget, and c ∈ 0, 0.25, 0.50 . Parameter c controls the amount of overlap in this scenario. Treatment assignment: Z is Bernoulli (0.5) in Dℎ1, Dℎ2 and Dtarget. Treatment received: D is Bernoulli with P D Z = 1 ∣ X = expit Z 2.5 − 0.1X1 + 0.3X2 − 0.4X5 + 1 − Z −0.3X1 − 0.4X5 − 1.5 in Dℎ2 and Dtarget, and expit Z 2 + 0.2X1 − 0.2X5 + 1 − Z −0.2X5 − 1 in Dℎ1, where expit x = exp x / 1 + exp x is the inverse of the logit function. According to the above data generating process, the covariate distribution X is distinct among Dℎ1, Dℎ2, and Dtarget (that is, target population heterogeneity exists) in Scenario X1 and in Scenario X2 when c ≠ 0. The effect of Z on D in Dℎ1 is different from that in Dℎ2 and Dtarget (that is, across-trial compliance heterogeneity exists). Moreover, X1, X2 modify the effect of Z on D in Dℎ2 and Dtarget (that is, within-trial compliance heterogeneity exists) so that the marginal compliance rate is different between Dℎ2 and Dtarget. Outcome: We consider two sets of data generating processes in Dℎ1 and Dℎ2: a linear data generating process (Scenario Y1): He et al. Page 15 J Am Stat Assoc. Author manuscript; available in PMC 2025 January 02. A uthor M anuscript A uthor M anuscript A uthor M anuscript A uthor M anuscript P Y Z = 1 ∣ X = expit Z 2.6 − 0.6X1 − 0.8X2 + 0.4X3 + 1 − Z 1.6 − 0.7X1 − 0.7X2 + 0.4X3 − 0.2X5 in Dℎ1, expit 1.4 − X1 − 0.6X2 + 0.4X3 − 0.6X5 + 3.5Z in Dℎ2, and a nonlinear data generating process (Scenario Y2): P Y Z = 1 ∣ X = expit Z 2.6 − 0.6X1 − 0.8X2 + 0.4 X3 + 1 − Z 1.6 − 0.7X1 − 0.7X2 3 + 0.4X3 − 0.2X5 in Dℎ1, expit 1.4 − X1 − 0.6 X2 + 0.4X3 − 0.6X5 + 3.5Z in Dℎ2 . In the hypothetical trial dataset Dtarget, we generated the potential outcome P Y Z = 0 = 1 ∣ X = 0 and hence P Y Z = 1 = 1 ∣ X equals the conditional intention-to- treat effect which is a product of the conditional average treatment effect in Dℎ1 and the conditional compliance in Dℎ2. The data-generating process also ensures that CATE X; S = ℎ1 is bounded between −1 and 1. We considered 6 estimators of the ITT effect: (i) a difference-in-means estimator ITT hypo based on the unobservable outcome data in Dtarget, (ii) and (iii) two covariate-adjusted estimators that (incorrectly) assume the conditional constancy assumption between Dtarget and Dℎ1 ITT const, 1 and between Dtarget and Dℎ2 ITT const,2 , (iv) a historical-data-driven, regression- based estimator ITT reg, par, (v) a historical-data-driven, EIF-based estimator ITT EIF, par with all nuisance parameters estimated via parametric regression models, and (vi) a historical-data- driven, EIF-based estimator ITT EIF, gam with all nuisance parameters estimated via generalized additive models (Hastie, 2017). In each setting, we repeat the simulation 1000 times. 5.2 Results Figure S1 in Web Appendix D compares the sampling distributions of 6 estimators under consideration when the sample sizes are N1 = N2 = N = 2000, observed covariates are generated according to Scenario X1, and the outcomes are generated according to Scenario Y1. The ground truth intention-to-treat effects are superimposed using red dashed lines. The three historical-datadriven estimators ITT reg, par, ITT EIF, par, and ITT EIF, gam all closely resemble the ground truth ITTs, though they have larger variances compared to that of the unobtainable, gold-standard estimator ITT hypo. As the overlap between historical and target datasets improves, the variance of each historical-data-driven estimator starts to shrink and the sampling distribution becomes more concentrated around the ground truth. Table 1 summarizes the percentage of bias and coverage of 95% confidence intervals for different sample sizes and overlap levels. We encountered simulated datasets where the estimator ITT EIF, gam became unstable due to the small weights in the denominator, especially when the covariate overlap is poor. We reported in the table captions the number of times this phenomenon occurred out of 1000 Monte Carlo replications. In these cases, we applied a hard thresholding and let the estimator be ϕ ITT EIF, gam where function He et al. Page 16 J Am Stat Assoc. Author manuscript; available in PMC 2025 January 02. A uthor M anuscript A uthor M anuscript A uthor M anuscript A uthor M anuscript ϕ x = 1, ∀x ≥ 1, ϕ x = − 1, ∀x ≤ − 1, and ϕ x = x otherwise. The percentage of bias of ITT EIF, gam reported in Table 1 is based on the truncated version. Similar to the impression delivered by Figure S1, in all cases considered in this simulation study, three historical-data- driven estimators had small to negligible biases. On the other hand, two estimators based on the incorrect conditional constancy assumption (ITT const,1 and ITT const,2) were heavily biased and their confidence intervals’ coverage was nowhere close to the nominal level. The bias that persists after adjusting for the observed covariates difference is often observed in empirical studies and referred to as “residual confounding” by Zhang et al. (2014). Our simulation exhibits concrete settings where such residual confounding could emerge. The 95% confidence intervals for all but ITT EIF, gam were based on nonparametric bootstrap. We found that the bootstrapped 95% CIs of ITT reg, par and ITT EIF, par approximately attained their nominal level when sample sizes are as large as 2000 in each dataset. The bootstrapped CIs of ITT EIF, gam were found to be highly conservative; on the other hand, the 95% CIs obtained based on asymptotic normality and estimated asymptotic variances tended to undercover when the overlap was poor and the sample size was small, but began to achieve nominal coverage level when the overlap was sufficient and sample size was as large as 5000. In the Web Appendix D.2, we report additional simulation results when observed covariates were generated according to Scenario X2 and outcomes were generated according to Scenario Y1 and Scenario Y2. In the additional nonlinear data generating process Scenario Y2, ITT EIF, gam continued to have negligible bias and good coverage, while ITT reg, par became biased once the parametric models became misspecified. 6 Case study: Efficacy of daily TDF/FTC in HIV-1 prevention 6.1 Historical placebo-controlled trials of daily oral TDF/FTC Our goal is to estimate the ITT effect of daily oral TDF/FTC versus placebo and then the ITT effect of CAB-LA in the HPTN 084 trial population. We consider an integrated analysis of the patient-level data from HPTN 084 and a historical placebo-controlled trial of daily oral TDF/FTC using our proposed framework and methods. There are 3 large-scale, multicenter, randomized trials that evaluated daily oral TDF/FTC: Partners PrEP (Baeten et al., 2012), FEM-PrEP (Van Damme et al., 2012), and VOICE (Marrazzo et al., 2015). The FEM-PrRP and VOICE were conducted in the heterosexual women population in multiple African countries, while the Partners PrEP study enrolled HIV-uninfected heterosexual men and women who had a partner living with HIV (i.e., HIV-1-serodiscordant heterosexual couples) from Kenya and Uganda. These three historical studies recorded quite different annualized HIV incidence in the daily oral TDF/FTC arm. The Partners PrEP study reported an incidence of 0.95 per 100 person-years in the TDF/FTC arm among heterosexual women (Baeten et al., 2012, Figure 3). On the other hand, both the FEM-PrEP study (Van Damme et al., 2012, Table 2) and the VOICE study (Marrazzo et al., 2015, Table 3) reported an incidence of 4.7 per 100 person-years in the TDF/FTC arm. The annualized HIV incidence was reported to be 1.86 per 100 person-years in the TDF/FTC arm of the HPTN 084 study. In this integrated analysis, we chose the Partners PrEP study as the historical placebo- controlled trial because the gap between the annualized HIV incidence rate in the TDF/FTC He et al. Page 17 J Am Stat Assoc. Author manuscript; available in PMC 2025 January 02. A uthor M anuscript A uthor M anuscript A uthor M anuscript A uthor M anuscript arm between the Partners PrEP study and the HPTN 084 study, while still substantial, was considerably smaller compared to the other two studies. 6.2 Overlap between the HPTN 084 and Partners PrEP trial populations; target population We first examine the overlap between trial population of the HPTN 084 study and that of the Partners PrEP study. Because the HPTN 084 study enrolled only female participants, we focused on the female participants in the Partners PrEP study. Moreover, as the Partners PrEP study enrolled heterosexual women who had a partner living with HIV, the study would not reveal the ITT effect or the conditional average treatment effect of daily oral TDF/FTC for heterosexual women whose partners did not live with HIV, if a partner’s HIV status is an important modifier of the compliance pattern or the treatment effect. Therefore, instead of making inference for the entire HPTN 084 population, we only focused on about one third of the HPTN 084 participants whose partners either living with HIV or having an unknown HIV status as our target population. The second and third columns in Table 2 summarize the baseline characteristics of study participants in the target population of HPTN 084 and female participants in the Partners PrEP study. Compared to those in Partners PrEP, participants in the HPTN 084 target population were younger (mean age 26.3 versus 33.5) and received more education (46.3% versus 6.3% completing the secondary school). HPTN 084 participants also had higher unemployment rate (75.8% versus 31.8%), higher positivity rates of baseline diagnoses of gonorrhea (6.0% versus 1.2%), chlamydia (16.3% versus 1.1%) and trichomonas (7.8% versus 6.8%), and lower positivity rate of syphilis (2.6% versus 5.8%). To help better summarize and visualize the covariate overlap between two population, Figure S5 in the Web Appendix E exhibits the distributions of the estimated “probability of participation” in the target HPTN 084 population and among female participants in the Partners PrEP study (Cole and Stuart, 2010; Stuart et al., 2011). The plot suggests that there is overlap between two populations across the spectrum of the “probability of participation,” although the overlap is limited so covariate adjustment is warranted. The first column of Table 2 further exhibits the covariate distribution of the entire HPTN 084 study. We found that participants in the target population were similar to the entire HPTN 084 trial population in age, education, unemployment rate and baseline sexually transmitted infections; nevertheless, because partners’ HIV status could be an important risk factor, we still restricted our analysis to n = 1,139 participants whose partners lives with HIV or had an unknown status. In addition to baseline characteristics of trial participants, self-reported adherence to the daily pill-taking was also different in HPTN 084 and Partners PrEP. Consuming at least 80% of prescribed pills was typically considered “adhering to the drug” in the HIV prevention literature (Murnane et al., 2015). Adopting this definition, 80.5% of daily TDF/FTC recipients adhered to the prescription in the Partners PrEP Study, and this number was 52.4% in the HPTN 084 study. In the HPTN 084 study, measurements of plasma tenofovir concentrations from a prespecified random cohort of 405 study participants were obtained; 812 out of 1, 939 samples (41.0%) had tenofovir concentrations consistent with daily use (≥ 40ng/mL). He et al. Page 18 J Am Stat Assoc. Author manuscript; available in PMC 2025 January 02. A uthor M anuscript A uthor M anuscript A uthor M anuscript A uthor M anuscript The right panel of Figure 3 plots the cumulative incidence curves in the CAB-LA and TDF/FTC arms in the target population, while the left panel plots the cumulative incidence curves in the TDF/FTC and placebo arms among heterosexual women in the Partners PrEP study. In view of the difference in patient composition and adherence pattern, both the constancy and conditional constancy assumptions are likely to be violated. Below, we seek to estimate the ITT effect of daily oral TDF/FTC against placebo for the target population based on evidence from the Partners PrEP study using the framework developed in the article. 6.3 Estimating the ITT effect of daily oral TDF/FTC in the target population: two approaches We estimated the intention-to-treat effect of daily oral TDF/FTC against placebo in reducing HIV-1 incidence in the target population under both point and partial identification frameworks. First, we assume Assumption 3 holds for the Partners PrEP study and the hypothetical placebo-controlled trial of daily oral TDF/FTC in the target population with the observed covariates including age, employment status, education, and four comorbidties including gonorrhea, chlamydia, trichomonas, and syphilis. Under this assumption and Assumption 4, we estimated the average treatment effect of daily oral TDF/FTC against placebo conditional on observed baseline covariates based on the Partners PrEP data. We then estimated the conditional compliance using the observed self-reported adherence data in the TDF/FTC arm in the HPTN 084 study and by treating the probability that a placebo recipient received daily oral TDF/FTC (i.e., CCP X; S = HPTN 084 ) as a sensitivity parameter. In this way, if we assume no cross-over in the hypothetical placebo- controlled trial (i.e., CCP X; S = HPTN 084 = 0 , then the intention-to-treat effect of daily oral TDF/FTC against placebo in the target population was estimated to be −3.9 HIV infections per 100 person-years (95% CI: −9.7 to −0.7). In a sensitivity analysis, we further allowed some minor degree of cross-over by setting CCP X; S = HPTN 084 = 5% and 10%, and the ITT effect of TDF/FTC was estimated to be −3.5 per 100 person-years (95% CI: −8.7 to −0.6) and −3.1 per 100 person-years (95% CI: −7.8 to −0.5), respectively. We then conducted inference using the EIF-based estimator proposed in Section 4.2. The ITT effect of TDF/FTC was estimated to be −3.1 per 100 person-years (95% CI: −5.3, −0.9) assuming no cross-over. If we set CCP X; S = HPTN084 = 5% and 10%, the ITT effect of TDF/FTC was estimated to be −2.5 per 100 person-years (95% CI: −4.5, −0.5) and −1.8 per 100 person-years (95% CI: −3.6, 0.0), respectively. In this integrated analysis, we found that the EIF-based estimators were more efficient compared to the regression-based estimators. Because the target population and the Partners PrEP population was not well-overlapped in baseline commodities including Gonorrhea and Chlamydia, we further considered an analysis restricted to participants testing negative for Gonorrhea, Chlamydia and Trichomonas at baseline. For this target population, the ITT effect of TDF/FTC against placebo was estimated to be −3.3 per 100 person-years (95% CI: −9.0 to −0.4) assuming no cross-over. According to the EIF-based estimator, the ITT effect of TDF/FTC was estimated to be −2.2 per 100 person-years (95% CI: −4.5 to 0.0) assuming no cross-over. He et al. Page 19 J Am Stat Assoc. Author manuscript; available in PMC 2025 January 02. A uthor M anuscript A uthor M anuscript A uthor M anuscript A uthor M anuscript Finally, we considered relaxing Assumption 3 in the Partners PrEP study and only partially identifying the conditional average treatment effect of daily TDF/FTC versus placebo. To this end, we considered the following strategy. Within each stratum defined by observed covariates X, the conditional average treatment effect can be decomposed into a weighted sum of the treatment effect among the subgroup of compliers and that among the non-compliers, weighted by their relatively proportions in the stratum. It then suffices to determine the average treatment effect among the compliers, non-compliers, and their proportions, all within the strata of observed covariates. Conditional on the observed covariates, the complier average effect is identified by the ratio estimator (4). On the other hand, the treatment effect of daily oral TDF/FTC among non-compliers has the following natural bounds: the maximum treatment effect was to reduce all HIV incidence in the placebo arm of the Partners PrEP study and the minimum effect was 0. In this way, we estimated bounds for the conditional average treatment effect and used these bounds to form the final ITT effect estimates against placebo in the target population. Assuming no cross-over, the interval estimates of the ITT effect were [−3.6, −2.5] per 100 person-years (95% CI: −8.0 to −0.4). 6.4 Estimating the absolute efficacy of CAB-LA in the target population Our analysis also immediately implies that the HIV incidence was 6.5 (95% CI: 3.1 to 12.4) per 100 person-years in the counterfactual placebo arm (primary analysis under the point identification assumptions and based on the regression-based estimator) in the target population. This estimate became 5.5 (95% CI: 1.5 to 11.0]) per 100 person-years if we further restrict the target population to those who tested negative for Gonorrhea, Chlamydia and Trichomonas at baseline. These estimates of placebo arm HIV incidence agreed reasonably well with those reported in the FEM-PrEP study (5.0 per 100 person- years) and the VOICE study (4.6 per 100 person-years). On the other hand, under a naïve adoption of the constancy assumption, one would conclude an HIV incidence of 3.7 per 100 person-year in the target population, which appeared to largely underestimating the HIV incidence among young women in sub-Saharan Africa. Our result also implies an absolute efficacy of CAB-LA as large as −6.1 per 100 person-years (95% CI: −11.9 to −2.6) in the target population and −5.3 per 100 person-years (95% CI: −10.5 to −1.3) if the target population was further restricted to those who tested negative for Gonorrhea, Chlamydia and Trichomonas at baseline. Put together the estimates for HIV incidence in the placebo arm and the estimates of the efficacy of CAB-LA, we estimated that CAB-LA eliminated about 95% of HIV acquisitions in the target population. 7 Discussion In this article, we systematically study the problem of generalizing the intention-to-treat effect of an active control versus placebo from historical placebo-controlled trials to an active-controlled trial. Our key insight is that generalization critically depends on the post-randomization event like adherence to the prescribed treatment in clinical trials. Our framework helps translate what FDA refers to as non-statistically-based uncertainties (Food and Drug Administration, 2016, Page 20) into concrete causal identification assumptions, highlights multiple sources of heterogeneity, including heterogeneity in He et al. Page 20 J Am Stat Assoc. Author manuscript; available in PMC 2025 January 02. A uthor M anuscript A uthor M anuscript A uthor M anuscript A uthor M anuscript participants composition, compliance, and treatment effect, and emphasizes the role of a post-randomization event when generalizing and transporting causal conclusions. Our work adds to existing HIV prevention literature on inferring intent-to-treat effect of an active-control based on a “counterfactual placebo” incidence estimate, which may be constructed via leveraging data from a concurrent registrational cohort that receives access to available standard of care for HIV prevention (US National Library of Medicine, 2021), placebo arm data of historical trials in a similar population (Donnell et al., 2022), HIV recency testing data collected at screening (Gao et al., 2021) and adherence-efficacy relationship (Glidden et al., 2020, 2021). The statistical problem of generalizing the ITT effect of an active control versus placebo from relevant historical trials has become more relevant as active-controlled trials have become increasingly prevalent. There are several ways to further this line of research. One important future direction is to generalize the framework to more complicated settings where post-randomization events like adherence to the intervention are time-varying, and the endpoint of interest is a time-to-event endpoint. Second, compliance or adherence to the intervention in an instrumental variable framework is a particular instance of a post-randomization event. It is also of interest to further extend the framework to a more generic, post-randomization event and allow a direct effect from the treatment to the endpoint of interest. Lastly, in many practical circumstances, researchers may not have the luxury to work with the patient-level data across multiple phase 3 clinical trials. Study-level adherence from historical trials has been used in a meta-regression analysis to infer oral PrEP effectiveness (Hanscom et al., 2019). Other meta-analysis-based approaches are also available; see, e.g., related discussion in Section 1.3. It is of interest to link the patient-level analysis proposed in this article to the meta-analysis-based framework and articulate what identification and modeling assumptions are needed to facilitate using only summary data from relevant historical trials. Two statistical challenges are particularly relevant in generalizing efficacy estimates from historical data. First, researchers need to always pay close attention to the overlapping covariate space between the planned active-controlled trial and historical trials and, in our opinion, should always focus on the well-overlapped covariate space to avoid over- extrapolation with limited data. Traditional methods like multivariate matched sampling (Rubin, 1979) can be generalized to the context of across-trials comparisons; see, e.g., Zhang (2023). Examining the scalar summary statistic like Stuart et al.’s (2011) “probability of participation” is also useful. If the target trial enrolls a heterogeneous population of participants, then it is conceivable that multiple historical trials targeting different different constituent parts of the target population may be needed. Second, in some cases, it is conceivable that trials may not maintain a similar list of important covariates or may collect different versions of the same covariates. This is less of a concern if the studies were conducted via the same clinical trials network (e.g., the HIV Prevention Trials Network and the HIV Vaccine Trials Network) but could lead to many practical challenges and prevent researchers from pursuing covariate adjustment in other cases. Classical measurement error methods or methods that leverage proxy variables could be useful. He et al. Page 21 J Am Stat Assoc. Author manuscript; available in PMC 2025 January 02. A uthor M anuscript A uthor M anuscript A uthor M anuscript A uthor M anuscript Supplementary Material Refer to Web version on PubMed Central for supplementary material. Acknowledgement We appreciate all the constructive feedback from the editor, associate editor, and two anonymous reviewers. We are grateful to the study participants, study staff and investigators on HPTN 084 and Partners PrEP who provided the data for this analysis. We acknowledge the funders and sponsors of the trials. We are grateful to the HPTN Manuscript Review Committee for helpful feedback. This work was supported by the U.S. National Institutes of Health grants R01AI177078 and UM1AI068617 (Fei Gao) and by the VIDD Faculty Initiative Award at the Fred Hutchinson Cancer Center (Fei Gao and Bo Zhang). Oliver Dukes received support from the Research Foundation Flanders (1222522N). References Angrist JD, Imbens GW, and Rubin DB (1996). Identification of causal effects using instrumental variables. Journal of the American Statistical Association, 91(434):444–455. Baeten JM, Donnell D, Ndase P, Mugo NR, Campbell JD, Wangisi J, Tappero JW, Bukusi EA, Cohen CR, Katabira E, et al. (2012). Antiretroviral prophylaxis for HIV prevention in heterosexual men and women. New England Journal of Medicine, 367(5):399–410. [PubMed: 22784037] Bickel PJ, Klaassen CA, Bickel PJ, Ritov Y, Klaassen J, Wellner JA, and Ritov Y (1993). Efficient and adaptive estimation for semiparametric models, volume 4. Springer. Cheng G and Huang JZ (2010). Bootstrap consistency for general semiparametric m-estimation. The Annals of Statistics, 38(5):2884–2915. Chernozhukov V, Chetverikov D, Demirer M, Duflo E, Hansen C, and Newey W (2017). Double/ debiased/neyman machine learning of treatment effects. American Economic Review, 107(5):261– 65. Cohen MS and Baden LR (2012). Preexposure prophylaxis for HIV—where do we go from here? New England Journal of Medicine, 367(5):459–461. [PubMed: 22784041] Cole SR and Stuart EA (2010). Generalizing evidence from randomized clinical trials to target populations: the actg 320 trial. American journal of epidemiology, 172(1):107–115. [PubMed: 20547574] Dahabreh IJ, Robertson SE, and Hernán MA (2022). Generalizing and transporting inferences about the effects of treatment assignment subject to non-adherence. arXiv preprint arXiv:2211.04876. Dahabreh IJ, Robertson SE, Tchetgen EJ, Stuart EA, and Hernán MA (2019). Generalizing causal inferences from individuals in randomized trials to all trial-eligible individuals. Biometrics, 75(2):685–694. [PubMed: 30488513] Degtiar I and Rose S (2021). A review of generalizability and transportability. arXiv preprint arXiv:2102.11904. Delany-Moretlwe S, Hughes JP, Bock P, Ouma SG, Hunidzarira P, Kalonji D, Kayange N, Makhema J, Mandima P, Mathew C, et al. (2022). Cabotegravir for the prevention of HIV-1 in women: results from HPTN 084, a phase 3, randomised clinical trial. The Lancet, 399(10337):1779–1789. Donnell D, Gao F, Hughes J, and Hanscom B (2022). Counterfactual estimation of CAB-LA efficacy against placebo using external trials. volume 86, Virtual. Ellenberg SS and Temple R (2000). Placebo-controlled trials and active-control trials in the evaluation of new treatments. part 2: practical issues and specific cases. Annals of Internal Medicine, 133(6):464–470. [PubMed: 10975965] Fauci AS (2017). An hiv vaccine is essential for ending the hiv/aids pandemic. Jama, 318(16):1535– 1536. [PubMed: 29052689] Fleming TR, Odem-Davis K, Rothmann MD, and Li Shen Y (2011). Some essential considerations in the design and conduct of non-inferiority trials. Clinical Trials, 8(4):432–439. [PubMed: 21835862] He et al. Page 22 J Am Stat Assoc. Author manuscript; available in PMC 2025 January 02. A uthor M anuscript A uthor M anuscript A uthor M anuscript A uthor M anuscript Food and Drug Administration (2016). Non-inferiority clinical trials to establish effectiveness: Guidance for industry. Gao F, Glidden DV, Hughes JP, and Donnell DJ (2021). Sample size calculation for active-arm trial with counterfactual incidence based on recency assay. Statistical Communications in Infectious Diseases, 13(1). Glidden DV, Das M, Dunn DT, Ebrahimi R, Zhao Y, Stirrup OT, Baeten JM, and Anderson PL (2021). Using the adherence-efficacy relationship of emtricitabine and tenofovir disoproxil fumarate to calculate background hiv incidence: a secondary analysis of a randomized, controlled trial. Journal of the International AIDS Society, 24(5):e25744. [PubMed: 34021709] Glidden DV, Stirrup OT, and Dunn DT (2020). A bayesian averted infection framework for prep trials with low numbers of hiv infections: application to the results of the discover trial. The Lancet HIV, 7(11):e791–e796. [PubMed: 33128906] Grant RM, Anderson PL, McMahan V, Liu A, Amico KR, Mehrotra M, Hosek S, Mosquera C, Casapia M, Montoya O, et al. (2014). Uptake of pre-exposure prophylaxis, sexual practices, and hiv incidence in men and transgender women who have sex with men: a cohort study. The Lancet infectious diseases, 14(9):820–829. [PubMed: 25065857] Hahn J (1998). On the role of the propensity score in efficient semiparametric estimation of average treatment effects. Econometrica, pages 315–331. Hanscom B, Hughes JP, Williamson BD, and Donnell D (2019). Adaptive non-inferiority margins under observable non-constancy. Statistical methods in medical research, 28(10–11):3318–3332. [PubMed: 30293490] Hastie TJ (2017). Generalized additive models. In Statistical models in S, pages 249–307. Routledge. James Hung H, Wang S-J, Tsong Y, Lawrence J, and O’Neil RT (2003). Some fundamental issues with non-inferiority testing in active controlled trials. Statistics in Medicine, 22(2):213–225. [PubMed: 12520558] Joffe MM and Greene T (2009). Related causal frameworks for surrogate outcomes. Biometrics, 65(2): 530–538. [PubMed: 18759836] Marrazzo JM, Ramjee G, Richardson BA, Gomez K, Mgodi N, Nair G, Palanee T, Nakabiito C, Van Der Straten A, Noguchi L, et al. (2015). Tenofovir-based preexposure prophylaxis for HIV infection among African women. New England Journal of Medicine, 372(6):509–518. [PubMed: 25651245] Miner MD, Corey L, and Montefiori D (2021). Broadly neutralizing monoclonal antibodies for hiv prevention. Journal of the International AIDS Society, 24:e25829. [PubMed: 34806308] Murnane PM, Brown ER, Donnell D, Coley RY, Mugo N, Mujugira A, Celum C, Baeten JM, Team PPS, Mujugira A, et al. (2015). Estimating efficacy in a randomized trial with product nonadherence: application of multiple methods to a trial of preexposure prophylaxis for HIV prevention. American Journal of Epidemiology, 182(10):848–856. [PubMed: 26487343] Neilan AM, Landovitz RJ, Le MH, Grinsztejn B, Freedberg KA, McCauley M, Wattananimitgul N, Cohen MS, Ciaranello AL, Clement ME, et al. (2022). Cost-effectiveness of long-acting injectable hiv preexposure prophylaxis in the united states: a cost-effectiveness analysis. Annals of internal medicine, 175(4):479–489. [PubMed: 35099992] Pearl J (2011). Transportability across studies: A formal approach. Rosenbaum PR and Rubin DB (1983). The central role of the propensity score in observational studies for causal effects. Biometrika, 70(1):41–55. Rothmann M, Li N, Chen G, Chi GY, Temple R, and Tsou H-H (2003). Design and analysis of non-inferiority mortality trials in oncology. Statistics in Medicine, 22(2):239–264. [PubMed: 12520560] Rubin D (1980). Discussion of “Randomization analysis of experimental data in the Fisher randomization test” by D. Basu. Journal of the American Statistical Association, 75:591–593. Rubin DB (1979). Using multivariate matched sampling and regression adjustment to control bias in observational studies. Journal of the American Statistical Association, 74(366a):318–328. Rudolph KE and van der Laan MJ (2017). Robust estimation of encouragement design intervention effects transported across sites. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 79(5):1509–1525. [PubMed: 29375249] He et al. Page 23 J Am Stat Assoc. Author manuscript; available in PMC 2025 January 02. A uthor M anuscript A uthor M anuscript A uthor M anuscript A uthor M anuscript Stefanski LA and Boos DD (2002). The calculus of M-estimation. The American Statistician, 56(1):29–38. Stuart EA, Cole SR, Bradshaw CP, and Leaf PJ (2011). The use of propensity scores to assess the generalizability of results from randomized trials. Journal of the Royal Statistical Society. Series A, (Statistics in Society), 174(2):369–386. Swanson SA, Hernán MA, Miller M, Robins JM, and Richardson TS (2018). Partial identification of the average treatment effect using instrumental variables: review of methods for binary instruments, treatments, and outcomes. Journal of the American Statistical Association, 113(522):933–947. [PubMed: 31537952] US National Library of Medicine (2021). A Combination Efficacy Study in Africa of Two DNA-MVA- Env Protein or DNA-Env Protein HIV-1 Vaccine Regimens With PrEP (PrEPVacc). Van Damme L, Corneli A, Ahmed K, Agot K, Lombaard J, Kapiga S, Malahleha M, Owino F, Manongi R, Onyango J, et al. (2012). Preexposure prophylaxis for HIV infection among African women. New England Journal of Medicine, 367(5):411–422. [PubMed: 22784040] van der Laan MJ and Rose S (2011). Targeted learning: causal inference for observational and experimental data, volume 10. Springer. Van der Vaart AW (2000). Asymptotic statistics, volume 3. Cambridge university press. Wang L and Tchetgen Tchetgen E (2018). Bounded, efficient and multiply robust estimation of average treatment effects using instrumental variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 80(3):531–550. [PubMed: 30034269] World Health Organization (2022). Guidelines on long-acting injectable cabotegravir for HIV prevention. World Health Organization. Zhang B (2023). Efficient algorithms for building representative matched pairs with enhanced generalizability. Biometrics (in press). Zhang Z (2009). Covariate-adjusted putative placebo analysis in active-controlled clinical trials. Statistics in Biopharmaceutical Research, 1(3):279–290. Zhang Z, Nie L, Soon G, and Zhang B (2014). Sensitivity analysis in non-inferiority trials with residual inconstancy after covariate adjustment. Journal of the Royal Statistical Society: Series C (Applied Statistics), 63(4):515–538. Zhou J, Hodges JS, Suri MFK, and Chu H (2019). A bayesian hierarchical model estimating cace in meta-analysis of randomized clinical trials with noncompliance. Biometrics, 75(3):978–987. [PubMed: 30690716] Zhou T, Zhou J, Hodges JS, Lin L, Chen Y, Cole SR, and Chu H (2022). Estimating the complier average causal effect in a meta-analysis of randomized clinical trials with binary outcomes accounting for noncompliance: A generalized linear latent and mixed model approach. American journal of epidemiology, 191(1):220–229. [PubMed: 34564720] He et al. Page 24 J Am Stat Assoc. Author manuscript; available in PMC 2025 January 02. A uthor M anuscript A uthor M anuscript A uthor M anuscript A uthor M anuscript Figure 1: A schematic flow chart summarizing different identification assumptions, quantities involved in the estimation, mode of identification, and associated sensitivity analyses examining core assumptions. He et al. Page 25 J Am Stat Assoc. Author manuscript; available in PMC 2025 January 02. A uthor M anuscript A uthor M anuscript A uthor M anuscript A uthor M anuscript Figure 2: The probability of trial participation based on a participant’s baseline characteristics in poor, limited, and sufficient overlap scenarios. He et al. Page 26 J Am Stat Assoc. Author manuscript; available in PMC 2025 January 02. A uthor M anuscript A uthor M anuscript A uthor M anuscript A uthor M anuscript Figure 3: Left panel: Kaplan-Meier estimates of incident HIV acquisition in the Partners PrEP study. Right panel: Kaplan-Meier estimates of incident HIV acquisition among HPTN 084 participants whose partners either living with HIV or having an unknown HIV status (target population). He et al. Page 27 J Am Stat Assoc. Author manuscript; available in PMC 2025 January 02. A uthor M anuscript A uthor M anuscript A uthor M anuscript A uthor M anuscript A uthor M anuscript A uthor M anuscript A uthor M anuscript A uthor M anuscript He et al. Page 28 Table 1: Simulation results of 6 estimators corresponding to Scenario X1 and Scenario Y1. The percentage of bias and coverage of 95% confidence intervals are reported. Confidence intervals of ITT EIF, gam were estimated based on asymptotic normality and the efficient influence function. Confidence intervals of ITT hypo were based on two- sample tests. Confidence intervals of the other estimators were obtained via bootstrap. Out of 1,000 simulations, ITT EIF, gam fell outside [−1, 1] twice in the limited overlap setting and once in the sufficient overlap setting when n = 1,000. ITT hypo ITT const, 1 ITT const, 2 ITT reg, par ITT EIF, par ITT EIF, gam Sample size % Bias 95% CI Coverage % Bias 95% CI Coverage % Bias 95% CI Coverage % Bias 95% CI Coverage % Bias 95% CI Coverage % Bias 95% CI Coverage Poor Overlap 1000 0.0 96.0% −16.5 93.2% 73.2 27.0% 2.5 96.0% 2.5 96.2% 2.7 84.0% 2000 −0.1 97.4% −17.8 87.8% 72.1 5.0% 0.8 95.6% −0.7 92.8% −0.5 85.8% 5000 −0.1 96.4% −17.0 81.0% 70.4 0.0% 2.2 96.0% 2.3 94.9% 2.3 88.1% Limited Overlap 1000 −0.6 96.2% −16.1 91.2% 72.4 19.2% 3.2 93.4% 1.6 94.4% 2.8 89.6% 2000 0.4 97.2% −17.5 85.6% 70.6 3.8% 1.3 95.6% 0.8 94.6% 0.8 88.6% 5000 0.3 95.3% −17.0 73.2% 72.4 0.0% 2.0 94.7% 2.3 95.1% 2.3 90.4% Sufficient Overlap 1000 0.1 96.2% −15.9 88.2% 72.2 5.6% 3.9 94.6% 3.4 94.6% 4.2 92.8% 2000 −0.7 96.4% −16.6 83.0% 73.2 0.0% 2.7 94.0% 2.7 93.8% 3.0 90.8% 5000 0.1 95.8% −17.1 62.2% 71.9 0.0% 1.8 94.7% 2.0 95.2% 2.0 93.6% J Am Stat Assoc. Author manuscript; available in PMC 2025 January 02. A uthor M anuscript A uthor M anuscript A uthor M anuscript A uthor M anuscript He et al. Page 29 Table 2: Baseline characteristics of all participants in HPTN 084, HPTN 084 participants whose partners lived with HIV, and female participants in the Partners PrEP study. Mean (SD) are reported for continuous variables. Counts (%) are reported for categorical variables. HPTN 084 All participants HPTN 084 Participants whose partner lived with HIV or had an unknown status Partners PrEP Female participants whose partners lived with HIV (N=3224) (N=1139) (N=1184) Study arm CAB-LA 1614 (50.1%) 559 (49.1%) 0 (0%) TDF/FTC 1610 (49.9%) 580 (50.9%) 565 (47.7%) Placebo 0 (0%) 0 (0%) 619 (52.3%) Age 26.0 (5.78) 26.3 (6.03) 33.5 (7.55) Gonorrhea Neg 2977 (92.3%) 1059 (93.0%) 1068 (90.2%) Pos 210 (6.5%) 68 (6.0%) 14 (1.2%) Missing 37 (1.1%) 12 (1.1%) 102 (8.6%) Chlamydia Neg 2583 (80.1%) 941 (82.6%) 1068 (90.2%) Pos 604 (18.7%) 186 (16.3%) 13 (1.1%) Missing 37 (1.1%) 12 (1.1%) 103 (8.7%) Trichomonas Neg 2859 (88.7%) 1021 (89.6%) 1057 (89.3%) Pos 270 (8.4%) 89 (7.8%) 80 (6.8%) Missing 95 (2.9%) 29 (2.5%) 47 (4.0%) Syphilis Neg 3116 (96.7%) 1107 (97.2%) 1101 (93.0%) Pos 103 (3.2%) 30 (2.6%) 69 (5.8%) Missing 5 (0.2%) 2 (0.2%) 14 (1.2%) Employment Employed 878 (27.2%) 276 (24.2%) 807 (68.2%) Not employed 2346 (72.8%) 863 (75.8%) 377 (31.8%) Education Complete secondary school 1528 (47.4%) 527 (46.3%) 75 (6.3%) Not complete secondary school 1346 (41.7%) 506 (44.4%) 271 (22.9%) Not complete primary school 350 (10.9%) 106 (9.3%) 838 (70.8%) J Am Stat Assoc. Author manuscript; available in PMC 2025 January 02. Abstract Introduction HIV Prevention Trials Network Study 084: A landmark clinical trial in HIV prevention Active-controlled trial; intention-to-treat effect; sources of heterogeneity and bias Current FDA guidelines; existing approaches and literature; our contributions Notation and framework Potential outcomes Estimands The constancy assumption Identification assumptions; a road map for estimation and sensitivity analysis No-interaction/homogeneity-type assumption Conditional average treatment effect; mean generalizability Conditional compliance Summary of identification strategies and non-statistically-based uncertainties Estimation and inference Estimation and inference in the design stage Estimation and inference in the post hoc analysis Extensions Simulation study Goal and structure Results Case study: Efficacy of daily TDF/FTC in HIV-1 prevention Historical placebo-controlled trials of daily oral TDF/FTC Overlap between the HPTN 084 and Partners PrEP trial populations; target population Estimating the ITT effect of daily oral TDF/FTC in the target population: two approaches Estimating the absolute efficacy of CAB-LA in the target population Discussion References Figure 1: Figure 2: Figure 3: Table 1: Table 2: