Paul Harmon's Ph.D. Dissertation Defense in Statistics (Dept. of Mathematical Sciences, MSU)

08/18/2022

Abstract: 

Working with high-dimensional data involves various statistical challenges.  A suite of tools and methods are explored for dimension reduction, using latent-variable models, techniques for mapping high-dimensional data, clustering, and working with multivariate responses across a variety of use cases. First, a method for classifying institutions of higher education is presented based on Structural Equation Models, and compared with the current standard for institutional classification, the Carnegie Classification.  Second, a new method is presented for identifying influential points in high-dimensional mapping tools based on calculating the difference in the shape of resulting ordinations based on inclusion/exclusion of points, similar in style to the influence diagnostic Cook’s Distance for regression.  Finally, a method for feature selection is presented in a specific type of divisive hierarchical clustering, called monothetic clustering.  Using sparsity to reduce the impact of noise features allows the monothetic clustering method to better make splits based on single features at a time, leading to better, more interpretable cluster results.