**The SAEM algorithm for non-linear mixed models with left-censored data and differential systems: application to the joint modeling of HIV viral load and CD4 dynamics under treatment**

Samson, Adeline (1), Marc Lavielle (2) and France Mentré (1)

(1) INSERM U738, University Paris 7, Hospital Bichat, Paris, France; (2) University Statistics Department, Orsay, France

**Introduction**

Maximum likelihood estimation in non-linear mixed effects models cannot be performed directly since the likelihood has no close form. Most algorithms implemented in available software are based on a linearisation of the model. Several authors showed that these algorithms may produce inconsistent estimates and an increased type I error for the Likelihood Ratio Test. To avoid models linearisation, Kuhn and Lavielle [1] proposed the SAEM (Stochastic Approximation Expectation Maximisation) algorithm, an exact maximum likelihood estimation method with good statistical properties, which is implemented in the Monolix software [2]. With this method, the likelihood is evaluated by importance sampling and the Fisher information matrix by stochastic approximation. This version of the algorithm does not handle left-censored data yet and does not allow regression function defined by differential equations. Our objective was to extend the SAEM algorithm to these two cases and to apply it to real data of HIV dynamics for which both problems occur.

**Estimation with left-censored data**

*a) Extension of the SAEM algorithm*

Left censored data are due to the limit of quantification (LOQ) of measurement techniques. We extended the SAEM algorithm to handle these left-censored data in non-linear mixed effect models by including the simulation of the left-censored data with a right-truncated Gaussian distribution [3]. We proved the convergence of this exact maximum likelihood estimation method under general conditions.

*b) Evaluation by simulation*

We simulated datasets using the bi-exponential model for HIV dynamics proposed by Ding and Wu [4], which has 4 parameters. We simulated 1000 datasets with two designs (40 or 200 subjects and 7 samples per patient). In addition to the extended SAEM algorithm, we considered two classical approaches which consisted in either omitting the data points below the LOQ or imputing LOQ/2 to the first left-censored data and omitting the following. The bias and RMSE obtained by the extended SAEM algorithm were highly satisfactory for all parameters. They almost reach the benchmark obtained before censoring the datasets, even though for the last observation time, 72% of the observations are below the LOQ. We showed that the bias and RMSE obtained by the extended SAEM algorithm were significantly reduced compared to the two classical approaches. We then evaluated a treatment effect of the first decay rate of the viral load decrease considering two equal size groups. The type I errors of the Wald test and the LRT provided by the extended SAEM algorithm are close to the expected value of 5%.

*c) Application to TRIANON-ANRS 082 clinical trial*

We applied this extended SAEM algorithm to the analysis of the viral load decrease of the TRIANON clinical trial in HIV treatment using the same model than in the simulation. Two different treatments were randomly administered to respectively 71 and 74 patients that were followed during 16 weeks. A significant difference in the viral load decay between the two treatment groups was found using the extended SAEM algorithm. This study on data of a clinical trial exemplified the necessity to handle carefully the left-censored data, as the traditional approach to handle them failed to detect statistical difference between the two treatment groups.

**Non-linear mixed models with ordinary or stochastic differential equations**

*a) Extension of the SAEM algorithm*

When the regression function of the mixed model is the solution of a differential system with no analytical solution, we combined the SAEM algorithm with a numerical integration method and then applied it to the approximate statistical model which regression function is the numerical approximate solution of the differential equation. We proved the convergence of this algorithm on the approximate model and quantified the error induced by the numerical approximation. We showed on a simulated example the ability of this SAEM algorithm [5]. We also extended this method to the case of stochastic differential equations, approximating the diffusion process by a Euler-Maruyama method. A tuned SAEM algorithm, supplying both theoretical and computational convergence properties was developed. The error on the estimation induced by the Euler-Maruyama scheme was quantified.

*b) Application to the joint modeling of the HIV viral dynamics of the COPHAR II-ANRS 102 clinical trial*

We analyzed simultaneously the data of the HIV viral load decrease and the CD4 increase from the COPHAR II clinical trial. Thirty-two patients were followed during 48 weeks after initiation of an antiviral treatment containing the protease inhibitor lopinavir and two nucleoside analogs. The bi-exponential model used for the previous analysis of viral load decrease is a simplified solution of a differential equation model proposed by Perelson et al. [6] describing the global HIV dynamics with both the viral load decrease and the CD4 increase under treatment. Ding and Wu [4] showed that the bi-exponential function is a simplified solution of this differential system under the assumption that the non-infected CD4 concentration is constant. However this assumption is not valid after several weeks under treatment, making it unacceptable. The differential system has thus no analytical solution. The estimation of the parameters of such a mixed model is a great statistical and computational challenge, never carried through until now in all its complexity (two response variables, a complex model with 5 differential equations, left censored data for the viral load). We used a tuned version of the SAEM algorithm, including the two extensions proposed previously to analyse the dynamics of HIV viral load and CD4 of the COPHAR II trial. The ten physiological parameters of the differential system and their inter-patient variabilities were adequately estimated from this data with this algorithm.

**Conclusion**

The SAEM algorithm with the two extensions proposed in this work allowing to handle left-censored data and models defined by differential equations, is a highly useful method to obtain satisfying estimates from longitudinal data analyzed by non-linear mixed effects models. We showed the numerical ability of this method to estimate parameters from a complex dynamic model.

**References**

[1] Kuhn and Lavielle. Maximum likelihood estimation in nonlinear mixed effects model, Computational Statistics and Data Analysis, 49 (2005), 1020-1038.

[2] Monolix software, Lavielle. http://www.math.u-psud.fr/~lavielle /monolix.

[3] Samson, Lavielle and Mentré. Extension of the SAEM algorithm to left-censored data in non-linear mixed -effects model: application to HIV dynamics model, Computational Statistics and Data Analysis (2006)

[4] Ding and Wu. Assessing antiviral potency of anti-HIV therapies in vivo by comparing viral decay rates in viral dynamics models. Biostatistics, 2 (2001), 1-18.

[5] Samson, Panhard, Lavielle and Mentré. Generalisation of the SAEM algorithm to nonlinear mixed effects model defined by differential equations: application to HIV viral dynamic models, PAGE 14 (2005) Abstr 716

[6] Perelson, Neumann, Markowitz, Leonard and Ho. HIV-1 dynamics in vivo: virion clearance rate, infected cell life-span, and viral generation time. Science, 271 (1996),1582-1586.