Christian Stratton Ph.D. Defense in Statistics (Dept. of Mathematical Sciences, MSU)

03/25/2022

Abstract: Assessment of similarity in species composition or abundance across sampled locations is a common goal in ecological monitoring programs. Existing ordination techniques provide a framework for clustering sample locations based on species composition by projecting high-dimensional community data into a low-dimensional, latent ecological gradient representing species composition. However, these techniques require specification of the number of distinct ecological communities present in the latent space, which can be difficult to determine prior to analysis. Additionally, many existing techniques rely on algorithmic projection and clustering methods that do not appropriately account for uncertainty in the ordination. We develop a hierarchical ordination model capable of simultaneous clustering and ordination that allows for estimation of the number of clusters present in the latent ecological gradient. This model draws latent coordinates for each sample location from a Dirichlet process mixture model, affording researchers with probabilistic statements about the number of clusters present in the latent ecological gradient. Additionally, the model is extended to accommodate hierarchical sampling designs, providing ordination results that are aligned with primary sampling units. This model is applied to an empirical data set describing presence-absence records of plant species in Craters of the Moon National Monument and Preserve (CRMO) in Idaho, USA. Application of the model to the CRMO data provided evidence of four ecological regions in the latent space, corresponding to various features of the ecological gradient in CRMO, including elevation and proximity to volcanic features. Development of the Dirichlet process ordination model provides ecologists and wildlife managers with data-driven inferences about the number of distinct ecological communities present across monitored locations. This information can be leveraged to develop more cost-effective monitoring strategies, supporting reliable decision-making for wildlife and conservation management.

In this project, we propose a robust estimator of a parameter or a summary quantity of the model parameters in the context where outcome is subject to nonignorable missingness. We completely avoid modeling the regression relation, while allowing the propensity to be modeled by a semiparametric logistic relation where the dependence on covariates is unspecified. We discover a surprising phenomenon in that the estimation of the parameter in the propensity model as well as the functional estimation can be carried out without assessing the missingness dependence on covariates. This allows us to propose a general class of estimators for both model parameter estimation and estimation of summary quantities of the model parameters, including the outcome mean. These estimators are robust to misspecification of the dependence on covariates. The robustness of the estimators are nonstandard and are established rigorously through theoretical derivations, and are supported by simulations and a data application.