Talk by Christian Stratton (PhD Proposal in Statistics)

5/21/2020  9:00am  Online

 

Abstract:  

In this talk, we discuss various projects we have worked on over the past two years, including: (1) the development of a web application to teach statistical power; (2) the development of a model capable of simultaneously conducting clustering and ordination of ecological community data; and (3) the development of an R package to fit computationally efficient multi-scale occupancy models and an extension of this model to longitudinal data types. Below, we provide a brief overview of each of these projects.
Statistical power is an important topic taught in most graduate-level and undergraduate-level mathematical statistics courses, but it is often difficult to understand conceptually. Visualizing the power curve, sampling distributions, and how they interact can help students more easily conceptualize power, but the creation of such visuals can be difficult and time-consuming. Interactive web applications provide a way for students to dynamically visualize power, and many web applications for understanding power exist. In this talk, we describe a web application suitable for undergraduate-level and graduate-level mathematical statistics courses that we created to allow users to visualize the complex relationships underlying power for multiple different statistics and population distributions.
In the face of a changing global climate, there is greater need to assess how ecological communities change over time. Historically, this assessment is made using a statistical tool set known as distancebased ordination, which results in projecting high-dimensional community data into a low-dimensional space allowing one to assess the similarity between communities across time and space. However, this ordination technique requires a number of subjective choices, potentially leading different researchers to different conclusions about the same data. Furthermore, this process does not provide a likelihood with which to assess the quality of the ordination or provide any measure of uncertainty in that ordination. In the past 10 years, advancements have been made in model-based ordination. These techniques use latent variable models to project the high-dimensional community data into a lower dimensional space. In this talk, we describe a proposed adaptation to this model that imposes an infinite mixture model in the latent space, thereby allowing the model to learn the number of clusters and provide estimates of the uncertainty in the number of clusters present.
Environmental DNA (eDNA) sampling is a promising tool for the detection of rare and cryptic taxa, such as aquatic pathogens, parasites, and invasive species. Environmental DNA sampling workflows commonly rely on multi-stage hierarchical sampling designs that induce complicated dependencies within the data. This complex dependence structure can be intuitively modeled with Bayesian multi-scale occupancy models. However, current software for such models are computationally demanding, impeding their use. In this talk, we describe an R package we created that implements a data augmentation strategy to fit fully Bayesian, computationally efficient multi-scale occupancy models. Additionally, we describe a supplemental web application that was created to allow users to conduct power analyses for multi-scale occupancy models and act as a graphical user interface to the R package. Finally, we conclude with a proposed adaptation to the existing multi-scale occupancy modeling framework that accounts for longitudinal survey designs.