Bayesian Techniques for Ecological Data Structures
Talk by Christian Stratton (PhD Proposal in Statistics)
5/21/2020 9:00am Online
Abstract:
In this talk, we discuss various projects we have worked on over the past two years,
including: (1) the development of a web application to teach statistical power; (2)
the development of a model capable of simultaneously conducting clustering and ordination
of ecological community data; and (3) the development of an R package to fit computationally
efficient multi-scale occupancy models and an extension of this model to longitudinal
data types. Below, we provide a brief overview of each of these projects.
Statistical power is an important topic taught in most graduate-level and undergraduate-level
mathematical statistics courses, but it is often difficult to understand conceptually.
Visualizing the power curve, sampling distributions, and how they interact can help
students more easily conceptualize power, but the creation of such visuals can be
difficult and time-consuming. Interactive web applications provide a way for students
to dynamically visualize power, and many web applications for understanding power
exist. In this talk, we describe a web application suitable for undergraduate-level
and graduate-level mathematical statistics courses that we created to allow users
to visualize the complex relationships underlying power for multiple different statistics
and population distributions.
In the face of a changing global climate, there is greater need to assess how ecological
communities change over time. Historically, this assessment is made using a statistical
tool set known as distancebased ordination, which results in projecting high-dimensional
community data into a low-dimensional space allowing one to assess the similarity
between communities across time and space. However, this ordination technique requires
a number of subjective choices, potentially leading different researchers to different
conclusions about the same data. Furthermore, this process does not provide a likelihood
with which to assess the quality of the ordination or provide any measure of uncertainty
in that ordination. In the past 10 years, advancements have been made in model-based
ordination. These techniques use latent variable models to project the high-dimensional
community data into a lower dimensional space. In this talk, we describe a proposed
adaptation to this model that imposes an infinite mixture model in the latent space,
thereby allowing the model to learn the number of clusters and provide estimates of
the uncertainty in the number of clusters present.
Environmental DNA (eDNA) sampling is a promising tool for the detection of rare and
cryptic taxa, such as aquatic pathogens, parasites, and invasive species. Environmental
DNA sampling workflows commonly rely on multi-stage hierarchical sampling designs
that induce complicated dependencies within the data. This complex dependence structure
can be intuitively modeled with Bayesian multi-scale occupancy models. However, current
software for such models are computationally demanding, impeding their use. In this
talk, we describe an R package we created that implements a data augmentation strategy
to fit fully Bayesian, computationally efficient multi-scale occupancy models. Additionally,
we describe a supplemental web application that was created to allow users to conduct
power analyses for multi-scale occupancy models and act as a graphical user interface
to the R package. Finally, we conclude with a proposed adaptation to the existing
multi-scale occupancy modeling framework that accounts for longitudinal survey designs.