W375 Westgate Building
10:00AM
The rapid growth of genomic data is transforming biology but is also creating major computational challenges. Modern sequencing projects now generate thousands of genomes for a single species, requiring indexing and search methods that scale beyond the assumptions of a single reference genome. A central task in this setting is identifying maximal exact matches (MEMs) between sequencing reads and large genome collections, which serve as seeds for read alignment. In this talk, I present algorithms and systems for scalable pangenomic indexing based on the r-index, a compressed data structure whose size depends on the repetitiveness of the dataset rather than its total length. I describe how prefix-free parsing (PFP) enables efficient construction of these indexes for large collections of highly repetitive genomes. Building on this idea, we developed MONI, a pangenomic aligner that uses r-index–based MEM queries to align sequencing reads against hundreds of genomes. Experiments on large human genome collections show that this approach enables indexing and alignment at scales that were previously infeasible while maintaining competitive accuracy. Finally, I will discuss several directions for future work, including recursive prefix-free parsing for even larger genome collections, graph-aware indexing methods for pangenomes, and GPU-accelerated algorithms for large-scale genomic search.
Additional Information:
Dr. Christina Boucher is a Professor in the Department of Computer and Information Science and Engineering at the University of Florida. Her research focuses on the design of algorithms and compressed data structures for large-scale biological sequence analysis, enabling efficient search and analysis of massive genomic datasets. She has authored over 170 publications in bioinformatics, including many on succinct data structures, sequence alignment, and pangenomic analysis. Dr. Boucher has delivered keynote addresses at major international venues including WABI 2025, HiCOMB 2022, IGGSY 2022, SPIRE 2021, RECOMB-SEQ 2016, and the ECCB Workshop on Pan-Genomics. She is the recipient of the ESA 2016 Best Paper Award and has led the development of widely used bioinformatics tools such as MONI, MEGARes, AMRPlusPlus, METAMarc, Kohdista, Vari, and VariMerge. Her research program is highly interdisciplinary, bringing together collaborators in microbiology, veterinary medicine, epidemiology, public health, and clinical sciences. Her work is supported by the National Institutes of Health, the National Science Foundation, and the U.S. Department of Agriculture. Dr. Boucher has served as Program Committee Chair for several international conferences, including WABI 2022, SPIRE 2020, RECOMB-SEQ 2019, and ACM-BCB 2018. She has been a Standing Member of the NIH Biodata Management and Analysis (BDMA) Study Section since 2021 and is a member of AAAS and ACM and a Senior Member of IEEE.
Details...