Talk by Dr. Christopher R. Barbour (Data Scientist, Atrium Insights)

3/12/2020  4:10-5:00pm  Wilson Hall 1-143

Abstract:  

The first half of this talk will be focused on job-searching strategies and an overview of skills that are sought after in the data science industry.  Many important skills/tools are difficult to pick up (or even know about) during graduate studies and having a little insight into these can help you stand out as a candidate. The latter portion will discuss internal R&D that is being performed in the Atrium data science team, focusing on improving data readiness through quantification of data quality.

A large portion of currently collected enterprise data contains gaps in the quality of the information being captured. This lack of quality in the collection and governance limits the impact of statistical and machine learning models for improving business processes. Constructing a methodological toolkit that quantifies the amount of ‘data-contamination’ can provide guidance on specific improvements and recommendations on data collection, storage, and governance.  Additionally, it can improve the relevance of the uncovered insights and predictions of predictive models built with such data. This research will begin with previous methodology attempting to address this issue as well as example data to demonstrate different aspects of data contamination. A simulation study and real-world data will demonstrate common issues the proposed methodology and illustrate improvements that can be made in predictive modeling, primarily, estimating less biased relationships between predictors and model outcomes. Challenges and future directions will also be discussed.