CSE Colloquium: Stream Processing Systems for Emerging Trends

Zoom Information 

Join from PC, Mac, Linux, iOS or Android: https://psu.zoom.us/j/99286975522?pwd=d2FQdUNEbEplRmppWkUrd2crQkN4QT09 Password: 066132 

or iPhone one-tap (US Toll): +16468769923,99286975522# or +13017158592,99286975522# 

or Telephone: Dial: +1 646 876 9923 (US Toll) +1 301 715 8592 (US Toll) +1 312 626 6799 (US Toll) +1 669 900 6833 (US Toll) +1 253 215 8782 (US Toll) +1 346 248 7799 (US Toll) Meeting ID: 992 8697 5522 Password: 066132 International numbers available: https://psu.zoom.us/u/auXVyVxZh 

ABSTRACT: Stream processing is proposed and popularized as a “technology like Hadoop but can give you results faster”, which lets users query a continuous data stream and quickly get results within a very short time period from the time of receiving the data. For that reason, stream processing technology has become a critical building block of many applications, such as making business decisions from marketing streams, identifying spam campaigns from social network streams, predicting tornados and storms from radar streams, and analyzing genomes in different labs and countries to track the sources of a potential epidemic. However, state-of-art solutions have dominantly centered around stateless stream processing, leaving another urgent trend—stateful stream processing—much less explored. A driving need is that the future stream applications need to store and update state along with their processing, and process live data streams in a timely fashion from massive and geo-distributed data sets. Unfortunately, existing systems are mainly designed for low-latency intra-datacenter settings. They do not scale well for running stream applications that contain large-distributed states in geo-distributed datacenters, suffering a significantly centralized bottleneck and high latency. 

In this talk, I will present a next-generation geo-distributed scalable stateful stream processing system. (1) At the architecture layer, I will introduce a decentralized “many masters/many workers” architecture that revolutionary improves the scalability of stream processing systems. 

(2) At the operator layer, I will present an in-memory data structure for storing state that minimizes the memory overhead. (3) At the mechanism layer, I will present a fragment-based parallel recovery mechanism that recovers large-distributed states by leveraging distributed hash table (DHT) based overlay networks and erasure codes. (4) Finally, I will outline future research agenda on developing scalable stream processing systems for emerging trends. 

BIOGRAPHY: Dr. Liting Hu is an Assistant Professor of Computer Science in the School of Computing and Information Sciences at Florida International University (FIU). She received her Ph.D. in Computer Science from Georgia Institute of Technology in 2016 under the supervision of Dr. Karsten Schwan. Her research interests span distributed systems, cloud and edge computing, distributed systems, and system virtualization, with a focus on building scalable stream processing systems. She directs the Experimental and Virtualized Systems (ELVES) Research Lab, where she conducts experimental computer systems research. Examples include stream processing systems (with Spark Streaming, Storm, Flink), container as a service (with Docker and Kubernetes), identifying threats (e.g., fake news, rumors, social bots) in online social networks, and resource management in large-scale data centers (with Xen and KVM). She has served on numerous IEEE/ACM program committees and peer-reviewed more than a dozen journals. She interned at VMware, IBM Research, Microsoft Research Asian, and Intel labs at CMU. Her research has been funded by the NSF, Department of Homeland Security, and Cyber Florida. She was the recipient of an NSF SPX Award in 2019 and an NSF CAREER Award in 2020. 

 

Share this event

facebook linked in twitter email

Media Contact: Jack Sampson

 
 

About

The School of Electrical Engineering and Computer Science was created in the spring of 2015 to allow greater access to courses offered by both departments for undergraduate and graduate students in exciting collaborative research fields.

We offer B.S. degrees in electrical engineering, computer science, computer engineering and data science and graduate degrees (master's degrees and Ph.D.'s) in electrical engineering and computer science and engineering. EECS focuses on the convergence of technologies and disciplines to meet today’s industrial demands.

School of Electrical Engineering and Computer Science

The Pennsylvania State University

207 Electrical Engineering West

University Park, PA 16802

814-863-6740

Department of Computer Science and Engineering

814-865-9505

Department of Electrical Engineering

814-865-7667