2002
Paris, France

Dealing With Missing Data Through Random Effects Models

Lewis Sheiner, MD.

University of California San Francisco, USA

A standard clinical trial assigns treatments A, observes outcomes Y, and, generically, uses the disparity between the goodness of fit of p_Y(Y|A) vs. p_Y(Y) to determine if assignment is causal for differences in outcomes. Covariates X may be used with p_Y(Y|A,X) to sharpen conclusions, but they are not required. With clinical trials of chronic conditions it is almost inevitable that departures from protocol occur. Missingness, wherein one or more scheduled observations in a longitudinal series (i.e., Y is a time-indexed vector with elements Y_t, for t an element of t, the set of observation times) are not made, is a common form of departure. Dropout at time T, wherein Y_t, t > T, denoted Y_miss, is missing is a particularly simple and illustrative form of missingness. With dropout, the observed outcome is (Y_obs,T), not Y, and the required model is therefore p_Y,T(Y_obs,T). The standard approach to dropout is to impute Y_miss using LOCF (last observation carried forward), and analyze the now "complete" data using the analysis procedure originally proposed. Whereas this approach may sometimes suffice for a (conservative) confirmatory analysis, it does not generally lead to unbiased conclusions because the LOCF prediction of Y_miss is rarely unbiased.

More generally, the data distribution p_Y,T(Y,T|A) can be factored as p_T(T|Y,A)p_Y(Y|A). The first factor, the model for missingness, is potentially causal, as T now depends on Y, and cannot necessarily be ignored (complete case analysis). If an X exists such that p_T(T|A,Y,X)=p_T(T|A,X) then Y_miss is ignorable (in the sense that p_T(T|A,X) may be ignored with only some slight loss of efficiency if it shares parameters with p_Y(Y|A,X)) and the complete cases can be validly analyzed using p_Y(Y|A,X) instead of p_Y(Y|A).

Failing a covariate that renders the missingness ignorable, heuristically, if Y_obs can be used to model E(Y_miss|Y_obs), then imputing Y_miss = E(Y_miss|Y_obs) makes the missing data be only the residuals, Y_miss - E(Y_miss|Y_obs), and these may be almost ignorable (i.e., E(Y_miss|Y_obs) may be almost unbiased for Y_miss). LOCF rarely accomplishes this as it recognizes no data trends that might be expected to persist after dropout. Random effects models for the time-evolution of Y can accomplish this: such models assert that p_T(T|Y,A)p_Y(Y|A) = p_T(T|b,A)p_Y(Y|b,A), where the b are random effects. If so, Y_miss no longer affects p_T(T|A), and if Y_obs supplies unbiased information about b, the missingness is ignorable. Further, such models usually also assert conditional independence of the elements of Y given b, whence p_Y(Y|b,A) equals the product of the terms p_Y(Y_t|b,A) for all t in t, and the likelihood p_Y(Y_obs|b,A) is immediate.

For a random-effects model-based approach to non-ignorable missingness to be credible, the assumption that given the particular choice of p_Y(Y_t|b,A), Y_obs supplies unbiased information about b (which p_Y(Y_t|b,A,t>T) = LOCF rarely does) must hold, and this in turn requires that it be scientific (i.e. be based on prior knowledge) as the current data (absent Y_miss) cannot provide evidence for or against it.

2002 Paris, France

Dealing With Missing Data Through Random Effects Models

2002
Paris, France