Priscilla Bacino's Ph.D. Dissertation Defense (Dept. of Mathematical Sciences, MSU)

11/15/2023:

Abstract:  This dissertation explores multiple testing scenarios commonly encountered in biological sciences and elsewhere, where a large number of null hypotheses need to be tested, but typically, only a small fraction of them is actually false. In such situations, it becomes crucial to not only control the overall Type I error rate, such as the family-wise error rate, but also to maintain a sufficient level of statistical power to detect true signals. Traditional methods used for multiple testing adjustment in this context are often overly conservative, resulting in very few true detections, if any. Methods like the Bonferroni procedure exhibit this conservatism due to their assumption that all hypotheses are true nulls, which in practice, cannot be ruled out beforehand. To enhance statistical power, additional assumptions, domain-specific knowledge, or data-driven information are necessary. This work is motivated by applications in untargeted metabolomics, such as in the analysis of metabolite concentrations or peak intensities across groups of subjects, where metabolites share information that can typically be represented by their correlation coefficients. We propose generating a hierarchical structure based on correlation-based dissimilarities, to reflect this interdependence. This hierarchical structure allows us to organize hypotheses and identify the relationships among these hypotheses. Consequently, not all combinations of null hypotheses are valid within the hierarchical structure. These logical constraints can be leveraged when adjusting evidence obtained from the hypothesis tests to enhance the statistical power of multiple testing methods. Furthermore, visualizing the hierarchical structure can aid in understanding the dependency relationships among the hypotheses, facilitating result interpretation. In this dissertation, we present correlation-based hierarchical methods for controlling the family-wise error rate when dealing with correlated outcome variables. Within this framework, we propose methodologies to increase power when comparing means of outcome variables across different groups or treatments and conducting associated follow-up tests. These methodologies have been implemented in an R package to facilitate their practical application.