**Copula modeling for realistic virtual patient covariate simulation**

Laura B. Zwep (1), Tingjie Guo (1), Jacqueline J. Meulman (2,3), J.G. Coen van Hasselt (1)

(1) Leiden Academic Centre for Drug Research, Leiden University, The Netherlands, (2) LUXs data science, The Netherlands, (3) Mathematical Institute, Leiden University, The Netherlands

**Objectives:** Simulation of virtual patients is a commonly applied procedure in pharmacometrics for optimizing dosing regimens and clinical trial simulation. For such simulations, sets of patient-associated covariates are sampled and used as input for pharmacometric models. To simulate realistic sets of patient-associated covariates, the correlation structure between such covariates must be considered. In particular when several different covariates are considered, the underlying dependency structure between such covariates can become increasingly complex.

A simple yet commonly applied approach relies on sampling from the marginal distributions (MDs) for each covariate, ignoring such correlations, which may results in unrealistic patients being simulated. Recently, a conditional distribution (CD) based on imputation was proposed which showed good simulation performance in comparison to other simulation approaches, although a limitation of the CD approach is the requirement of access to individual-level data.^{1} Access to such clinical data however is often not available due to limitations in data sharing.

The dependency structure of multiple covariates is captured by their joint distribution. Copulas use multivariate distribution functions to model this distribution. Copulas can be estimated on the joint distribution of multiple uniformly distributed covariates. They can be used to characterize the dependence structure between covariates^{2} including higher dimensional distributions.^{3,4}

In this study we evaluate and demonstrate the use of copula models for characterizing covariate dependency structures as a tool to support virtual patient simulation in pharmacometrics, without the need to share underlying patient data. Our specific aims included to (1) compare performance of copula models to the original dataset and alternative simulation techniques, (2) demonstrate the impact of using copula models versus alternative strategies when applied to PK simulations; (3) explore the utility to apply copula models to time-varying covariates and (4) show how correlation structures between large covariate data can be characterized.

**Methods:**

**Copula model fitting**

Copula models were implemented for different analyses performed as follows. For each covariate the MD was estimated using kernel density estimation.

^{5}Vine copulas with different distributions were fitted using the

*rvinecopulib*library

^{6}and the copula with the lowest AIC was chosen. The resulting joint density distribution was used to simulate covariate sets of virtual patients.

**Evaluation and comparison of simulation performance**Using covariates from a pediatric dataset

^{7}we compared the performance of copula simulations with two alternative approaches: MDs and CD. We evaluated the simulation performance of a small three-covariate model and a full twelve-covariate model, using each method, as the difference between the covariances of the covariates from the original data and simulated covariates.

**PK simulation application**From the same pediatric dataset

^{7}, we used body weight (WT), and serum creatinine level (SCR) from the three-covariate copula model. We simulated virtual patients using this copula model and MD simulations using a population PK model for vancomycin

^{8}. The resulting sets of PK profiles were then compared using the AUC.

**Time-varying covariates**Using a dataset of six time-varying covariates during pregnancy

^{9}, including albumin concentration, bilirubin concentration, lymphocytes, neutrophils, platelets and SCR, we demonstrated how copulas for time-varying covariates can be developed. We fit a random effects polynomial regression model on the temporal data. We then fit a copula model on the set of individual parameter estimates and evaluate its performance.

**Covariate distributions in large data**

A copula model was fit to a large dataset of 30 patient-associated covariates with primary focus of clinical laboratory measurements from >50,000 ICU patients.^{10} We demonstrated how copulas can be used to characterize the underlying dependency structure of these covariates.

**Results:**

**Evaluation and comparison of simulation performance**

For the small three-covariate model based on age, WT and SCR, the difference in performance between different simulation methods was modest. Copula simulations showed underprediction between 5.3-14%, whereas CD simulations show underprediction of 0.2-8.8%, thus outperforming copula models slightly. In contrast, for the twelve-covariate simulations, the copula simulation, on average, slightly underpredicted the covariances by 12.3%, while the CD also showed a large average underestimation of 80.9%. By definition the MD method has covariance of 0 and underestimation of 100%.

**PK simulation application**Covariate sets simulated for SCR and WT for pediatric patients were used to predict PK profiles and compute subsequent AUCs. The original dataset showed correlations between the AUC and covariates WT and SCR. For example, we show that between the AUC and WT the original correlation (r=-0.67) was lost in MD simulations (r=0.0), whereas copula models preserve their dependence (r=-0.62).

**Time-varying covariates**For the time-varying covariates in pregnancy, the covariances of the simulated individual polynomial parameters were on average close to the estimated polynomial parameters with a 17.7% underestimation. The simulated individual parameters were used to generate time-varying covariate values. Polynomial regression coefficients were simulated in a realistic domain, while simulating from a MD led to more extreme polynomial curves, with a 5 times higher error on the standard deviation of the AUC. This shows how covariate values can be inflated when simulating independent values.

**Covariate distributions in large data**Copula estimation and simulation was feasible on a large dataset, showing how copulas can be useful for simulation for extensive pharmacometric models. The higher dimension did increase the underestimation of the covariances as compared to the estimation on the lower dimensional twelve- and three-covariate datasets to 17.4%. Some covariates showed interesting dependency structures, which can be evaluated to inform covariate selection decision making.

**Conclusions: **Copula models represent an attractive approach to fit multivariate covariate distributions, which can be readily implemented for pharmacometric simulations. Copula models clearly outperform commonly used marginal distribution approaches. Importantly, copula models have the distinct advantage that access to original individual-level datasets is not required when applied for virtual patient simulation, in contrast to resampling based strategies. To this end, copula models can address hurdles in sharing clinical data by developing open access virtual patient simulation models for distinct (special) patient populations, which can be readily shared with the community and support pharmacometric clinical trial- and treatment optimization simulations.

**References:**[1] Smania, G. & Jonsson, E. N. Conditional distribution modeling as an alternative method for covariates simulation: comparison with joint multivariate normal and bootstrap techniques.

*CPT Pharmacometrics Syst. Pharmacol.*psp4.12613 (2021).doi:10.1002/psp4.12613

[2] Sklar, A. Random variables, joint distribution functions, and copulas.

*Kybernetika*9, 449–460 (1973).

[3] Nagler, T. & Czado, C. Evading the curse of dimensionality in nonparametric density estimation with simplified vine copulas.

*J. Multivar. Anal.*151, 69–89 (2016).

[4] Czado, C.

*Analyzing Dependent Data with Vine Copulas.*Lect. Notes Stat. 222, (Springer International Publishing, Cham, 2019).

[5] Nagler, T. & Vatter, T. kde1d: Univariate Kernel Density Estimation. (2020).at <https://cran.r-project.org/package=kde1d>

[6] Nagler, T. & Vatter, T. rvinecopulib: High Performance Algorithms for Vine Copula Modeling. (2021).at <https://cran.r-project.org/package=rvinecopulib>

[7] Cock, R. F. W. De

*et al.*Simultaneous pharmacokinetic modeling of gentamicin, tobramycin and vancomycin clearance from neonates to adults: Towards a semi-physiological function for maturation in glomerular filtration.

*Pharm. Res.*31, 2643–2654 (2014).

[8] Grimsley, C. & Thomson, A. H. Pharmacokinetics and dose requirements of vancomycin in neonates.

*Arch. Dis. Child. - Fetal Neonatal Ed.*81, F221–F227 (1999).

[9] Patel, J. P.

*et al.*Population Pharmacokinetics of Enoxaparin During the Antenatal Period.

*Circulation*128, 1462–1469 (2013).

[10] Johnson, A.

*et al.*MIMIC-IV. (2021).doi:10.13026/s6n6-xd98