# 2002

Paris, France

**Dealing With Missing Data Through Random Effects Models**

Lewis Sheiner, MD.

University of California San Francisco, USA

A standard clinical trial assigns treatments A, observes outcomes Y, and, generically, uses the disparity between the goodness of fit of p_{Y}(Y|A) vs. p_{Y}(Y) to determine if assignment is causal for differences in outcomes. Covariates X may be used with p_{Y}(Y|A,X) to sharpen conclusions, but they are not required. With clinical trials of chronic conditions it is almost inevitable that departures from protocol occur. Missingness, wherein one or more scheduled observations in a longitudinal series (i.e., Y is a time-indexed vector with elements Y_{t}, for t an element of t, the set of observation times) are not made, is a common form of departure. Dropout at time T, wherein Y_{t}, t > T, denoted Y_{miss}, is missing is a particularly simple and illustrative form of missingness. With dropout, the observed outcome is (Y_{obs},T), not Y, and the required model is therefore p_{Y,T}(Y_{obs},T). The standard approach to dropout is to impute Y_{miss} using LOCF (last observation carried forward), and analyze the now "complete" data using the analysis procedure originally proposed. Whereas this approach may sometimes suffice for a (conservative) confirmatory analysis, it does not generally lead to unbiased conclusions because the LOCF prediction of Y_{miss} is rarely unbiased.

More generally, the data distribution p_{Y,T}(Y,T|A) can be factored as p_{T}(T|Y,A)p_{Y}(Y|A). The first factor, the model for missingness, is potentially causal, as T now depends on Y, and cannot necessarily be ignored (complete case analysis). If an X exists such that p_{T}(T|A,Y,X)=p_{T}(T|A,X) then Y_{miss} is ignorable (in the sense that p_{T}(T|A,X) may be ignored with only some slight loss of efficiency if it shares parameters with p_{Y}(Y|A,X)) and the complete cases can be validly analyzed using p_{Y}(Y|A,X) instead of p_{Y}(Y|A).

Failing a covariate that renders the missingness ignorable, heuristically, if Y_{obs} can be used to model E(Y_{miss}|Y_{obs}), then imputing Y_{miss} = E(Y_{miss}|Y_{obs}) makes the missing data be only the residuals, Y_{miss} - E(Y_{miss}|Y_{obs}), and these may be almost ignorable (i.e., E(Y_{miss}|Y_{obs}) may be almost unbiased for Y_{miss}). LOCF rarely accomplishes this as it recognizes no data trends that might be expected to persist after dropout. Random effects models for the time-evolution of Y can accomplish this: such models assert that p_{T}(T|Y,A)p_{Y}(Y|A) = p_{T}(T|b,A)p_{Y}(Y|b,A), where the b are random effects. If so, Y_{miss} no longer affects p_{T}(T|A), and if Y_{obs} supplies unbiased information about b, the missingness is ignorable. Further, such models usually also assert conditional independence of the elements of Y given b, whence p_{Y}(Y|b,A) equals the product of the terms p_{Y}(Y_{t}|b,A) for all t in t, and the likelihood p_{Y}(Y_{obs}|b,A) is immediate.

For a random-effects model-based approach to non-ignorable missingness to be credible, the assumption that given the particular choice of p_{Y}(Y_{t}|b,A), Y_{obs} supplies unbiased information about b (which p_{Y}(Y_{t}|b,A,t>T) = LOCF rarely does) must hold, and this in turn requires that it be scientific (i.e. be based on prior knowledge) as the current data (absent Y_{miss}) cannot provide evidence for or against it.