Efficient Algorithms for Fundamental Problems in Data Science
Talk by Peng Zhang (Georgia Institute of Technology)
01/26/2021 3:00pm WebEx Meeting
Abstract: One task of data science is to analyze massive data, using tools such as
linear equations, linear programs, and optimization. This task can be simplified if
better data has been collected, for example, from carefully planned experiments. In
this talk, I will discuss my work on the design of fast algorithms for these two problems.
In the first part of the talk, I will present an efficient algorithm that improves
the design of randomized controlled trials (RCTs). RCTs are widely used to test the
effectiveness of new drugs and interventions. In a RCT, we randomly assign subjects
to different treatment groups to balance covariates — characteristics of the subjects
we know before conducting the experiment. Randomness allows us to make valid statistical
inferences and reduce the impact of unobserved biases; balancing covariates improves
the precision of estimating treatment effects if covariates are predictive of treatment
outcomes. Our algorithm guarantees both randomness and covariate balance simultaneously.
In the second part of the talk, I will discuss my work on designing and understanding
the limit of fast algorithms for solving linear equations and linear programs with
additional structures that arise commonly in practice, such as geometric structures,
spectral properties, non-negativity of variables and coefficients.