Colloquium: DNN inference optimization across the system stack

Abstract: Recent breakthroughs in deep learning have made Deep Neural Network (DNN) models a key component of many AI applications ranging from speech recognition and translation, face recognition, and object/human detection and tracking, etc. These DNN models are very resource demanding in terms of computation cycles, memory footprint, power and energy consumption, etc. and are mostly trained and deployed in the cloud/datacenters. However, these is a growing demand on pushing the deployment of these AI applications from cloud to a wide variety of edge and IoT devices that are closer to data and information generation sources for reasons such as better user experience (latency and throughput sensitive apps), data privacy and security, limited/intermittent network bandwidth, etc.  Compared to datacenters, these edge devices are very resource constrained and may not be even able to host these compute expensive DNN models. Great efforts have been made to optimize the serving/inference of these DNN models to enable their deployment on edge devices and to even reduce resource consumption/cost in datacenters.

We will talk about a few research and product work at Microsoft on optimizing DNN inference pipeline that touch upon hardware accelerator, compiler, model architecture, application requirements and system dynamics.  We will discuss how these works optimize different layers of the DNN system stack. Moreover, we will show the importance of looking at the DNN system stack holistically in order to achieve better model performance and resource constraints tradeoffs.


Speaker's Biography: Dr. Di Wang is currently a researcher of Microsoft Ambient Intelligent Team at Microsoft AI Perception and Mixed Reality.

His research interests span the areas of computer systems, computer architecture, applied machine learning, VLSI design, energy-efficient systems design and sustainable computing. Specifically, he has applied his expertise on these topics to the areas of datacenters, IoT, storage systems, fault tolerant systems, EDA tools and recommendation systems. Wang has authored over 30 publications in top conferences and journals and has received 4 best paper awards and 1 best paper nomination. His work has also been featured in the CACM news and was chosen as IEEE sustainable computing register’s pick of the month.

Wang received his Ph.D. in Computer Science and Engineering from Penn State University in 2014, M.S. in Computer Systems Engineering from Technical University of Denmark (DTU) in 2008 and B.E. in Computer Science and Technology from Zhejiang University in 2005.


Share this event:

facebook linked in twitter email

Media Contact: Anand Sivasubramaniam



The School of Electrical Engineering and Computer Science was created in the spring of 2015 to allow greater access to courses offered by both departments for undergraduate and graduate students in exciting collaborative research in fields.

We offer B.S. degrees in electrical engineering, computer science, computer engineering and data science and graduate degrees (master's degrees and Ph.D.'s) in electrical engineering and computer science and engineering. EECS focuses on the convergence of technologies and disciplines to meet today’s industrial demands.

School of Electrical Engineering and Computer Science

The Pennsylvania State University

207 Electrical Engineering West

University Park, PA 16802


Department of Computer Science and Engineering


Department of Electrical Engineering