De novo Generation for Molecular Structure Elucidation from Mass Spectrometry

Recent advances in generative AI are enabling new approaches to scientific discovery in chemistry and biology. In this talk, we present DiffMS and FRIGID, two generative AI frameworks for de novo molecular structure elucidation from tandem mass spectrometry (MS/MS). DiffMS introduces a formula-constrained graph diffusion model that generates molecular structures directly from experimental spectra using transformer-based spectral encoding and large-scale pretraining on fingerprint–structure pairs. Building on this foundation, FRIGID develops a scalable diffusion language model trained on hundreds of millions of molecular structures and introduces inference-time scaling through cycle-consistent refinement with forward fragmentation models such as ICEBERG, enabling targeted correction of spectrum-inconsistent molecular fragments. Together, these works demonstrate how diffusion models, large-scale pretraining, and inference-time reasoning can advance generative AI for scientific discovery and molecular identification.

Speaker bio

Runzhong Wang is a Postdoc working with Prof. Connor Coley at MIT. Prior to that, he received his B.S. and Ph.D. from the Department of Computer Science and Engineering, Shanghai Jiao Tong University. His research is at the intersection of machine learning, optimization, and computational metabolomics. He has published more than 30 papers on machine learning and AI for Science topics.

Series: MSR New England Generative Modeling & Sampling Seminar