Samidha Shetty (Dept. of Statistics, Penn State)

02/14/2022

Abstract:  Missing data is common in data sets in every field of science. In the past few decades, there has been interest in understanding the underlying pattern of missingness, formally known as the missingness mechanism. There are three types of missingness mechanisms: Missing Completely at Random (MCAR), Missing at Random (MAR) and Missing Not at Random (MNAR). These can also be classified into two main categories: Ignorable (MCAR and MAR) and Nonignorable (MNAR). Most likelihood or imputation-based methods developed assume the MAR condition, which is the more well studied condition. We discuss the MNAR condition, which is less well studied. It is the hardest to deal with but also the most likely to occur.

In this project, we propose a robust estimator of a parameter or a summary quantity of the model parameters in the context where outcome is subject to nonignorable missingness. We completely avoid modeling the regression relation, while allowing the propensity to be modeled by a semiparametric logistic relation where the dependence on covariates is unspecified. We discover a surprising phenomenon in that the estimation of the parameter in the propensity model as well as the functional estimation can be carried out without assessing the missingness dependence on covariates. This allows us to propose a general class of estimators for both model parameter estimation and estimation of summary quantities of the model parameters, including the outcome mean. These estimators are robust to misspecification of the dependence on covariates. The robustness of the estimators are nonstandard and are established rigorously through theoretical derivations, and are supported by simulations and a data application.