CSE Colloquium: Efficient Algorithms for Fundamental Problems in Data Science

ZOOM INFORMATION: Join from PC, Mac, Linux, iOS or Android: https://psu.zoom.us/j/95646587141?pwd=dDFWeFdKemxLdW5XUjVVOG5naUpQQT09 Password: 464564 

or iPhone one-tap (US Toll): +13126266799,95646587141# or +16468769923,95646587141# 

or Telephone: Dial: +1 312 626 6799 (US Toll) +1 646 876 9923 (US Toll) +1 301 715 8592 (US Toll) +1 346 248 7799 (US Toll) +1 669 900 6833 (US Toll) +1 253 215 8782 (US Toll) Meeting ID: 956 4658 7141 Password: 464564 International numbers available: https://psu.zoom.us/u/acU10LW9sZ 

 ABSTRACT: One task of data science is to analyze massive data, using tools such as linear equations and linear programs. This task can be simplified if better data have been collected, for example, from carefully planned experiments. In this talk, I will discuss my research on the design of fast algorithms addressing fundamental problems in these two tasks. 

In the first part of the talk, I will present an efficient algorithm that improves the design of randomized controlled trials (RCTs). RCTs are widely used to test the effectiveness of drugs and interventions. In an RCT, we randomly partition experimental subjects into a treatment group and a control group that are balanced in covariates — features of subjects that we know before running an experiment. 

Randomness reduces the biases caused by unknown features. Balancing covariates improves the estimation of treatment effects if covariates are correlated with treatment outcomes. We obtain random partitions with a nearly optimal tradeoff between the gain we have if covariates are correlated with treatment outcomes and the loss we suffer if covariates are not. We guarantee that the estimates of treatment effects are tightly concentrated around the truth. 

In the second part of the talk, I will survey my research on designing and understanding the limits of fast algorithms for solving structured linear equations and linear programs that arise commonly in optimization, scientific computing, and data science. I will discuss linear equations in a slight generalization of Laplacians, linear equations in 3D truss stiffness matrices, and linear programs with non-negative variables and coefficients. 

BIOGRAPHY: Peng Zhang is a Postdoctoral Associate in Computer Science at Yale University, under the supervision of Daniel Spielman. She obtained her Ph.D. from Georgia Tech, advised by Richard Peng. Her research lies broadly in the design of efficient algorithms. She has worked on structured linear equations and linear programs, discrepancy theory and its application in the design of randomized controlled trials. Peng’s work received the best student paper award at FOCS 2017, and Georgia Tech College of Computing Dissertation Award in 2019. 

 

Share this event

facebook linked in twitter email

Media Contact: Chunhao Wang

 
 

About

The School of Electrical Engineering and Computer Science was created in the spring of 2015 to allow greater access to courses offered by both departments for undergraduate and graduate students in exciting collaborative research fields.

We offer B.S. degrees in electrical engineering, computer science, computer engineering and data science and graduate degrees (master's degrees and Ph.D.'s) in electrical engineering and computer science and engineering. EECS focuses on the convergence of technologies and disciplines to meet today’s industrial demands.

School of Electrical Engineering and Computer Science

The Pennsylvania State University

207 Electrical Engineering West

University Park, PA 16802

814-863-6740

Department of Computer Science and Engineering

814-865-9505

Department of Electrical Engineering

814-865-7667