Dealing With Missing Data Through Random Effects Models
Lewis Sheiner, MD.
University of California San Francisco, USA
A standard clinical trial assigns treatments A, observes outcomes Y, and, generically, uses the disparity between the goodness of fit of pY(Y|A) vs. pY(Y) to determine if assignment is causal for differences in outcomes. Covariates X may be used with pY(Y|A,X) to sharpen conclusions, but they are not required. With clinical trials of chronic conditions it is almost inevitable that departures from protocol occur. Missingness, wherein one or more scheduled observations in a longitudinal series (i.e., Y is a time-indexed vector with elements Yt, for t an element of t, the set of observation times) are not made, is a common form of departure. Dropout at time T, wherein Yt, t > T, denoted Ymiss, is missing is a particularly simple and illustrative form of missingness. With dropout, the observed outcome is (Yobs,T), not Y, and the required model is therefore pY,T(Yobs,T). The standard approach to dropout is to impute Ymiss using LOCF (last observation carried forward), and analyze the now "complete" data using the analysis procedure originally proposed. Whereas this approach may sometimes suffice for a (conservative) confirmatory analysis, it does not generally lead to unbiased conclusions because the LOCF prediction of Ymiss is rarely unbiased.
More generally, the data distribution pY,T(Y,T|A) can be factored as pT(T|Y,A)pY(Y|A). The first factor, the model for missingness, is potentially causal, as T now depends on Y, and cannot necessarily be ignored (complete case analysis). If an X exists such that pT(T|A,Y,X)=pT(T|A,X) then Ymiss is ignorable (in the sense that pT(T|A,X) may be ignored with only some slight loss of efficiency if it shares parameters with pY(Y|A,X)) and the complete cases can be validly analyzed using pY(Y|A,X) instead of pY(Y|A).
Failing a covariate that renders the missingness ignorable, heuristically, if Yobs can be used to model E(Ymiss|Yobs), then imputing Ymiss = E(Ymiss|Yobs) makes the missing data be only the residuals, Ymiss - E(Ymiss|Yobs), and these may be almost ignorable (i.e., E(Ymiss|Yobs) may be almost unbiased for Ymiss). LOCF rarely accomplishes this as it recognizes no data trends that might be expected to persist after dropout. Random effects models for the time-evolution of Y can accomplish this: such models assert that pT(T|Y,A)pY(Y|A) = pT(T|b,A)pY(Y|b,A), where the b are random effects. If so, Ymiss no longer affects pT(T|A), and if Yobs supplies unbiased information about b, the missingness is ignorable. Further, such models usually also assert conditional independence of the elements of Y given b, whence pY(Y|b,A) equals the product of the terms pY(Yt|b,A) for all t in t, and the likelihood pY(Yobs|b,A) is immediate.
For a random-effects model-based approach to non-ignorable missingness to be credible, the assumption that given the particular choice of pY(Yt|b,A), Yobs supplies unbiased information about b (which pY(Yt|b,A,t>T) = LOCF rarely does) must hold, and this in turn requires that it be scientific (i.e. be based on prior knowledge) as the current data (absent Ymiss) cannot provide evidence for or against it.