Modelling pharmacogenetic data in population studies during drug development
Adrien Tessier (1,2,3), Julie Bertrand (4), Marylore Chenel (3) and Emmanuelle Comets (1,2,5)
(1) INSERM, IAME, UMR 1137, F-75018 Paris, France. (2) UniversitÚ Paris Diderot, IAME, UMR 1137, Sorbonne Paris CitÚ, F-75018 Paris, France. (3) Division of Clinical Pharmacokinetics and Pharmacometrics, Institut de Recherches Internationales Servier, Suresnes, France. (4) Genetics Institute, University College London, London, UK. (5) INSERM CIC 1414, UniversitÚ Rennes 1, Rennes, France.
Pharmacogenetics (PG) studies the proportion of interindividual variability (IIV) in drug response explained by genetic variation, investigating the link between genotype and pharmacokinetic (PK)/pharmacodynamic (PD) phenotypes . In the hopes to personalise therapy, genetic data are now collected in many clinical trials in large arrays. For instance, the pharmaceutical company Servier developed a microarray that informs on metabolism enzymes and transporters polymorphisms. In early PK studies, genetic variation is often tested for an association with phenotypes estimated using noncompartmental analysis (NCA) , although nonlinear mixed effects models (NLMEM) are increasingly used in clinical trials. Currently, there is no consensus on methods to study the PG of drugs in clinical development. We investigated the methodology of PG analysis in PK early phase studies, to propose approaches enhancing the detection of genetic effects.
- To compare the ability of different PK phenotypes to detect genetic effects.
- To assess the performance of different association tests.
- To improve PG analysis in small samples with large genetic arrays, through combined analysis of phase I and II data or using a PK phenotype enrichment approach.
We performed this study through simulations.
A case-study from Servier concerning a drug under phase I development was the setting for a series of simulation. The PK of this drug exhibited nonlinear bioavailability and a double absorption process. In the clinical studies, 176 Single Nucleotide Polymorphisms (SNPs) were genotyped through the microarray developed by Servier.
We used the genetic array and the PK model developed to simulate genotypes and PK profiles under the null (H0, no genetic effect) and an alternative hypothesis (H1). Under H1, 6 SNPs were drawn randomly to affect the log-clearance (CL) through an additive linear model. Each SNP explained a different proportion of CL IIV (between 1 and 12%, totally 30%).
First, we simulated two phase I studies: one inspired from the real-case example with extensive sampling (16 observations per subject) of 78 subjects and an “asymptotic” version with the same rich design of 384 subjects. The PK phenotypes were two observed concentrations (C24h and C192h), the area under the curve (AUC) estimated by NCA, and CL Empirical Bayes Estimates (EBE) estimated using Monolix . The four association tests applied to the 4 PK phenotypes were a stepwise procedure  and three penalised regressions: ridge regression , Lasso  and HyperLasso . The 16 combinations of 4 PK phenotypes and 4 association methods were compared on the two phase I studies in terms of probability to detect genetic effects, computed as the percentage of data sets simulated under H1 where one to six of the six causal variants were selected .
Second, we moved on to the next phase of clinical development, exploring realistic ways of increasing the amount of PK information by combining phase I data with data collected in sparse phase II studies. To investigate the influence of the design and of the amount of information, we simulated three phase II studies with sparse sampling (1 to 3 observations per subject, 306 subjects), optimising the three samples designs . We focussed on the EBEs of CL as the phenotype and two association tests were considered: a stepwise procedure and Lasso. The probability of detection was compared and related to estimated shrinkage .
Finally we investigated a new approach for PK phenotype enrichment to increase the amount of information in sparse designs . We used imputations randomly drawn in the conditional distribution of CL using Monolix. We compared applying a linear mixed model to handle the correlation in imputations of a same subject, to a linear model on the CL EBEs. The alternative hypothesis was simulated as one genetic marker affecting CL in two phase I studies of 78 subjects with rich or sparse sampling. In both cases, we estimated the probability to detect the genetic variant.
Interestingly, in the presence of nonlinearity and/or variability in bioavailability, model-based phenotype allowed a higher probability to detect the SNPs than other phenotypes. When PK was simulated without nonlinearity and variability in bioavailability, the tests based on AUC and CL had a similar power. None of the penalised regressions or the stepwise procedure showed a much higher power than the others, but ridge regression had the best probability to detect SNPs, with also a higher number of false positives. This result holds regardless the number of subjects. In this realistic phase I setting with a limited number of subjects, the probability to detect genetic effects was low regardless of the method. As expected it increased with the number of subjects for all methods.
Compared to phase I data alone, additional phase II data, even with sparse sampling, increased markedly the detection probability due to the larger sample size, showing that rich PK information is only required in a subset of subjects. A direct relationship was observed between the design of the phase II study, the shrinkage in the individual CL estimates and the probability of detection. Optimising the phase II design reduced the shrinkage and allowed the highest probability to detect the genetic variants. But this gain was low compared to the one due to the sample size increase.
Imputations allow a better description of the uncertainty of the estimated parameters and reduced the shrinkage when compared to using only the EBEs. With this approach the probability of detection improved marginally (less than 10%).
The present work focussed on PG studies performed during the clinical development of a drug. It shows how, in contrast to what is done in most early phase studies, modelling approaches should generally be preferred to estimate PK phenotypes, in particular in the presence of complex PK involving non-linearity. Our results also reinforce the importance of the sample size in PG studies, and show that phase I trials are underpowered to detect even strong genetic effects and/or genetic effects due to rare alleles. To improve their detection, we propose to play on two aspects: the increase in sample size, combining data from phase I and II studies, and PK phenotype enrichment. Phase II data is needed to confirm the impact of genetic variants on drug response and design optimisation improves the power of the studies. We also show that a new imputation-based approach provides a slight gain of the same order than design optimisation.
To conclude we recommend the combined analysis of phase I and II data for the exploration of genetic associations and to prospectively optimise the phase II study design accordingly. Increasing the sample size is the main driver of genetic association analyses power.
We are grateful to Marc Lavielle for help implementing the imputations in the conditional distribution of the parameters.
 Motulsky AG. Drugs and genes. Ann. Intern. Med. 1969;70:1269–72.
 Tessier A, Bertrand J, Chenel M, Comets E. Comparison of nonlinear mixed effects models and non-compartmental approaches in detecting pharmacogenetic covariates. AAPS J. 2015;
 Lavielle M, Mesa H, Chatel K. The MONOLIX software. 2010. (http://www.lixoft.eu/)
 Cule E, Vineis P, De Iorio M. Significance testing in ridge regression for genetic data. BMC Bioinformatics. 2011;12:372.
 Tibshirani R. Regression Shrinkage and Selection Via the Lasso. J. R. Stat. Soc. Ser. B. 1994;58:267–88.
 Hoggart CJ, Whittaker JC, De Iorio M, Balding DJ. Simultaneous analysis of all SNPs in genome-wide and re-sequencing association studies. PLoS Genet. 2008;4:e1000130.
 Bazzoli C, Retout S, Mentré F. Design evaluation and optimisation in multiple response nonlinear mixed effect models: PFIM 3.0. Comput. Methods Programs Biomed. 2010;98:55–65.
 Savic RM, Karlsson MO. Importance of shrinkage in empirical bayes estimates for diagnostics: problems and solutions. AAPS J. 2009;11:558–69.