CSE Colloquium: Pan-genomic advances for fighting reference bias

Abstract:
Sequencing data analysis often begins with aligning reads to a reference genome, where the reference takes the form of a linear string of bases. But linearity leads to reference bias, a tendency to miss or misreport alignments containing non-reference alleles, which can confound downstream statistical and biological results. This is a major concern in human genomics; we don't want to live in a world where diagnostics and therapeutics are differentially effective depending whether and where our genetic variants happen to match the reference.

Fortunately, computer science and bioinformatics are meeting the moment. We can now index and align sequencing reads to references that include many population variants. I will present some of the major and insights that have shaped this journey from the early days of efficient genome indexing -- especially the Burrows-Wheeler Transform -- continuing through recent methods for indexing graph-shaped references and references that include many genomes. I will emphasize recent results that show how to optimize simple and complex pan-genome representations for effective avoidance of reference bias. Finally, I will outline promising methods for the bias, including new ideas for how to measure bias, new proposals in compressed indexing, and new workflows that integrate genotype imputation to improve reference bias.

Much of this work is collaborative with Travis Gagie, Christina Boucher, Alan Kuhnle and others.

Bio:
Ben Langmead is an Associate Professor of Computer Science at Johns Hopkins University. He earned a Ph.D. in Computer Science from the University of Maryland in 2012. His group seeks to make high-throughput biological datasets easy for biomedical researchers to use. The group studies and applies ideas from sequence alignment, text indexing, statistics and parallel programming. He has released several high-impact software tools (e.g. Bowtie, Bowtie 2) that address common genomics research questions. His paper describing Bowtie won the Genome Biology award for outstanding paper in 2009. He has also released scalable software tools that use the MapReduce parallel programming model and commercial cloud computing services to analyze large collections of sequencing data. Ben's lab also collaborates with biostatisticians and biologists to create resources that allow biological researchers to easy query the huge amount of sequencing data available in public archives. He is the recipient of a Sloan Research Fellowship (2014), a National Science Foundation CAREER award (2014) and the Benjamin Franklin award for contributions to open access (2016).

 

Share this event

facebook linked in twitter email

Media Contact: Timothy Zhu

 
 

About

The School of Electrical Engineering and Computer Science was created in the spring of 2015 to allow greater access to courses offered by both departments for undergraduate and graduate students in exciting collaborative research fields.

We offer B.S. degrees in electrical engineering, computer science, computer engineering and data science and graduate degrees (master's degrees and Ph.D.'s) in electrical engineering and computer science and engineering. EECS focuses on the convergence of technologies and disciplines to meet today’s industrial demands.

School of Electrical Engineering and Computer Science

The Pennsylvania State University

207 Electrical Engineering West

University Park, PA 16802

814-863-6740

Department of Computer Science and Engineering

814-865-9505

Department of Electrical Engineering

814-865-7667