W375 Westgate Building
10:00AM
ABSTRACT
As AI models become increasingly powerful and integrated into society, their "black box" nature poses significant risks and distrust. In this talk, I will introduce principled methods to scientifically understand AI systems and improve their trustworthiness across the full spectrum, from training processes to model mechanisms to data features. First, I will discuss techniques to trace model behavior back to specific training updates, enabling training data assessment, model auditing, and credit assignment. Next, I will address the need for revealing the internal mechanisms of Large Language Models (LLMs), particularly regarding their knowledge representation and safety during post-training. Finally, I will introduce a structure-aware framework for examining how data features drive AI decisions, allowing non-AI experts to interpret and effectively use AI in healthcare and science applications. The presentation concludes with a look toward the future: controlling advanced agentic AI systems and creating frameworks where AI empowers humans.
Additional Information:
BIO
Shichang Zhang is a postdoctoral fellow at the Digital Data Design Institute at Harvard University. He earned his Ph.D. in Computer Science from the University of California, Los Angeles (UCLA), an M.S. in Statistics from Stanford University, and a B.A. in Statistics from the University of California, Berkeley (UCB). His research focuses on developing principled methods to understand and improve the trustworthiness of AI systems, with applications in high-stakes domains such as science and healthcare. His work has been published in leading venues, including NeurIPS, ICML, ICLR, ACL, WWW, and ISR, and was highlighted in a Nature News Feature for its educational impact. He delivered a comprehensive tutorial on Explainable AI at NeurIPS 2025 and has industry experience at Amazon Web Services (AWS) and Snap Research. He is a recipient of the J.P. Morgan AI Ph.D. Fellowship, the Amazon Ph.D. Fellowship, and the NeurIPS Outstanding Paper Award, and has received multiple Outstanding Reviewer Awards (ICML 2022; KDD 2023, 2025).
Details...