What is PAGE?

We represent a community with a shared interest in data analysis using the population approach.


2001
   Basel, Switzerland

Dealing With Missing Data in Longitudinal Studies Through Modeling

Lewis B Sheiner

UCSF, San Francisco

A standard clinical trial assigns treatments A, observes outcomes Y, and, generically, uses the disparity between the goodness of fit of p(Y|A) vs. p(Y) to determine if assignment is causal for differences in outcomes. Covariates X may be used with p(Y|A,X) to sharpen conclusions, but they are not required. With clinical trials of chronic conditions it is almost inevitable that departures from protocol occur. Missingness, wherein one or more scheduled observations in a longitudinal series are not made, is a common form of departure. Dropout, wherein Ys, sT, denoted Ymiss, is missing is a particularly simple and illustrative form of missingness. With dropout, the observed outcome is (Yobs,T), not Y, and the required model is therefore p(Yobs,T). The standard approach to dropout is to impute Ymiss using LOCF (last observation carried forward), and analyze the now "complete" data using the analysis procedure originally proposed. Whereas this approach may sometimes suffice for a (conservative) confirmatory analysis, it does not generally lead to unbiased conclusions for at least two reasons. The first is technical and has to do with the problem that imputed data are not real and should not therefore make results appear more precise than they really are. For heuristic reasons, I will not consider this objection further, and assume that single imputation is valid if the inputuation itself is valid (unbiased for the missing value). The second reason is that the LOCF prediction is rarely unbaised.

More generally, the data distribution p(Y,T|A) can be factored as p(T|Y,A)p(Y|A). The first factor, the model for missingness, is potentially causal, as T now depends on Y, and cannot necessarily be ignored (complete case analysis). If an X exists such that p(T|A,Y,X)=p(T|A,X) then Ymis is ignorable (in the sense that p(T|A,X) may be ignored with only some slight loss of efficiency if it shares parameters with p(Y|A,X)) and the complete cases can be validly analyzed using p(Y|A,X) insted of p(Y|A). If X and Yobs can be used to model E(Ymis|Yobs,X), then imputing Ymis = E(Ymis|Yobs,X) makes the missing data be only the residuals, Ymis-E(Ymis|Yobs,X), and these may be almost ignorable (i.e., E(Ymis|Yobs,X) may be almost unbiased for Ymis). LOCF rarely acomplishes this as it recognizes no data trends that might be expected to persist after dropout.

For a model-based approach to dealing with non-ignorable missingness to be credible, the model for Ymis|Yobs,X must be credible (which LOCF rarely is), and this in turn requires that it be scientific (i.e. based on prior knowledge) as the current data (without Ymis) cannot provide evidence for or against it.



Top