Statistical Analysis of Nonignorable Missing Data
Samidha Shetty (Dept. of Statistics, Penn State)
02/14/2022
Abstract: Missing data is common in data sets in every field of science. In the past
few decades, there has been interest in understanding the underlying pattern of missingness,
formally known as the missingness mechanism. There are three types of missingness
mechanisms: Missing Completely at Random (MCAR), Missing at Random (MAR) and Missing
Not at Random (MNAR). These can also be classified into two main categories: Ignorable
(MCAR and MAR) and Nonignorable (MNAR). Most likelihood or imputation-based methods
developed assume the MAR condition, which is the more well studied condition. We discuss
the MNAR condition, which is less well studied. It is the hardest to deal with but
also the most likely to occur.
In this project, we propose a robust estimator of a parameter or a summary quantity
of the model parameters in the context where outcome is subject to nonignorable missingness.
We completely avoid modeling the regression relation, while allowing the propensity
to be modeled by a semiparametric logistic relation where the dependence on covariates
is unspecified. We discover a surprising phenomenon in that the estimation of the
parameter in the propensity model as well as the functional estimation can be carried
out without assessing the missingness dependence on covariates. This allows us to
propose a general class of estimators for both model parameter estimation and estimation
of summary quantities of the model parameters, including the outcome mean. These estimators
are robust to misspecification of the dependence on covariates. The robustness of
the estimators are nonstandard and are established rigorously through theoretical
derivations, and are supported by simulations and a data application.