Meeting Date and Location

We will meet October 11th, 2019, from 8:30am to 4:30pm



2327 University Way,  Suite 2



Please register at the following link so we can plan for meals (breakfast and lunch provided).

 Registration Form

Meeting schedule



Breakfast and introductions


9:15 Kara Johnson, "Introduction to Opinion Diffusion and Fitting Models using a Genetic Algorithm"
9:45 Jordan Schupbach, "Using TDA for Image Analysis"
10:15 Coffee and tea break
10:30 Kenny Flagg, "Practical Application of Spatial Point Process Models: The SPDE Approach"
11:00 Ricky Jones, "Lessons learned from the Geo for Good Summit"
11:30 Lunch and roundtable discussion "Moving beyond p < 0.05 and statistical significance."
1:15 Gordon Bower, "Physically plausible PDFs for intervals between geyser eruptions"
1:45 Paul Harmon, "Sparse Clustering: Using Regularization in Unsupervised Situations"
2:15 Steve Walsh, "Opportunities for Particle Swarm Optimization in Statistics"
2:45 Coffee and tea break
3:00 Leslie Gains-Germain, "Experiences working in environmental consulting"
3:45 Kevin Ferris, "Data Analysis in Major League Baseball"
4:30 Executive committee meeting


Presentation Abstracts

Presenter Title Abstract
Paul Harmon Sparse Clustering: Using Regularization in Unsupervised Situations

In regression problems, regularization via the L-1 norm (called the Lasso) is a common tool for feature selection.  By shrinking less important features all the way to zero, the Lasso is able to achieve sparsity in the set of available predictors.  However, L-1 penalization need not be constrained to supervised regression problems; rather, it can be utilized in many unsupervised methods, including clustering and dimension-reduction.  This talk overviews a method for implementing sparse clustering in k-means and hierarchical clustering settings by imposing an L-1 penalty on the dissimilarity matrix (for hierarchical clustering) or on the variables used (for k-means). Further, I will describe how we might impose sparsity on two other unsupervised statistical tools and some challenges in doing this for each method: t-distributed Stochastic Neighbor Embedding (a tool for dimension reduction) and monothetic clustering (a divisive clustering algorithm).

Jordan Schupbach  Using TDA for Image Analysis

Topological Data Analysis (TDA) is a growing field with the ability to analyze a variety of data that is of nonstandard type. In this talk, I will describe the use of TDA in the context of image classification and some interesting statistical questions that arise. Specifically, a certain topological descriptor can be represented as a point process (called a persistence diagram), whose intensity measure can be used as a datum in a functional regression model. This talk is focused on the estimation of these intensity measures for a given
sampling plan and the inferences we can draw from them.

Kenny Flagg Practical Application of Spatial Point Process Models: The SPDE Approach

Spatial point process models are a natural way to model and map things that occur discretely in two-dimensional space, such as munitions debris observed through metal detector surveys or animals observed through distance sampling. However, for decades fitting these models has been computationally infeasible in all but the simplest examples. I provide an overview of recent computational developments that open the door to routine implementation of spatial point process models.

Ricky Jones Lessons learned from the Geo for Good Summit

I will be discussing new tools available to obtain useful information about anything from Amazon deforestation rates to population trends in Zimbabwe. Let's create a world in which data is not a limiting factor to making decisions, and the only thing standing in our way is the ability to ask the right questions.

Gordon Bower  Physically plausible PDFs for intervals between geyser eruptions  Many of Yellowstone National Park's geysers exhibit a right-skewed distribution of the intervals between successive eruptions.
While it is possible to model these intervals with a translated gamma or Weibull model, an alternative is to derive a family of distributions, based on a simple physical model for the recharge of the plumbing system feeding the geyser and a simple hazard function to describe how likely an eruption is to occur given the state of the plumbing system.

The resulting family of distributions fits real-world data significantly better than normal, gamma, or Weibull models, and has additional explanatory power that an ad hoc model does not: when a geyser becomes less frequent, it typically also becomes more erratic and more right-skewed. Our new family of distributions mimics this behavior as the rate-of-recharge parameter is varied.

It is suggested that many real-world problems might be best approached by means of this 'semi-physical modeling', using basic facts about the dynamics of the process being modeled to inform our choice of what family of distributions to fit.
Kara Johnson Introduction to Opinion Diffusion and Fitting Models using a Genetic Algorithm Opinion diffusion is the process by which opinions spread through and change within a social network. These processes can be modeled using the DeGroot model where subjects update their opinions as a weighted average of their own opinions and the opinions of those in their social group. Variations of the DeGroot model that include bounded confidence and decay are also explored. Due to the large number of parameters and other restrictions unique to these models, good methods for fitting models using data do not currently exist. The development of genetic algorithms with operators specifically adapted to the unique problems and restrictions of opinion diffusion models will be presented.
Steve Walsh Opportunities for Particle Swarm Optimization in Statistics

Particle swarm optimization (PSO) is a meta-heuristic optimization algorithm well suited for optimizing high dimensional non-convex functions.  Strengths of PSO include: minimal assumptions regarding the behavior of the objective function, robustness to getting entrapped in local optima, and simplicity as the core logic of the algorithm relies on two simple update equations. These strengths make PSO an attractive candidate for applications in statistics that involve high dimensional searches on multimodal objectives. In this presentation we will present the basic PSO algorithm, illustrate its application to the 2-D Rastrigin function, and show proof of concept for PSO in solving non-linear model parameter estimation, parameter estimation for mixture distributions, and optimal design of experiments.

Leslie Gains-Germain  Experiences working in environmental consulting  Since graduating from MSU in 2015 with a master's degree in statistics, I have worked on a variety of environmental consulting projects through employment at Neptune and Company, Inc. Our clients include state and federal governments as well as private companies. In this talk, I will provide an overview of projects I have worked on, and I will discuss the decision making process for environmental problems.
Kevin Ferris Data Analysis in Major League Baseball  A short presentation that was given to a group of pro baseball coaches, followed by a Q&A