De novo Generation for Molecular Structure Elucidation from Mass Spectrometry
- Runzhong Wang | MIT
- Microsoft Research New England Generative Modeling & Sampling Seminar
Recent advances in generative AI are enabling new approaches to scientific discovery in chemistry and biology. In this talk, we present DiffMS and FRIGID, two generative AI frameworks for de novo molecular structure elucidation from tandem mass spectrometry (MS/MS). DiffMS introduces a formula-constrained graph diffusion model that generates molecular structures directly from experimental spectra using transformer-based spectral encoding and large-scale pretraining on fingerprint–structure pairs. Building on this foundation, FRIGID develops a scalable diffusion language model trained on hundreds of millions of molecular structures and introduces inference-time scaling through cycle-consistent refinement with forward fragmentation models such as ICEBERG, enabling targeted correction of spectrum-inconsistent molecular fragments. Together, these works demonstrate how diffusion models, large-scale pretraining, and inference-time reasoning can advance generative AI for scientific discovery and molecular identification.
Speaker bio
Runzhong Wang is a Postdoc working with Prof. Connor Coley at MIT. Prior to that, he received his B.S. and Ph.D. from the Department of Computer Science and Engineering, Shanghai Jiao Tong University. His research is at the intersection of machine learning, optimization, and computational metabolomics. He has published more than 30 papers on machine learning and AI for Science topics.
Series: MSR New England Generative Modeling & Sampling Seminar
-
Constrained Generative AI for Materials Inverse Design
- Mouyang Cheng
-
-
Designing Dynamic Measure Transport for Sampling
- Aimee Maurais
-
-
Physics and information theory of generative diffusion
- Luca Ambrogioni
-
-
Matching features, not tokens: Energy-based fine-tuning of language models
- Mujin Kwun,
- Carles Domingo-Enrich
-
-
-
Generative Models for Molecular Dynamics Across Timescales
- Michael Plainer,
- Winfried Ripken,
- Gregor Lied
-
-
Q-learning with Flow-Matching Policies
- Qiyang (Colin) Li
-
-
-
A non-Markovian approach to diffusion-based sampling
- Lorenz Richter
-
Blind denoising diffusion models and the blessings of dimensionality
- Aram-Alexandre Pooladian
-
Meta Flow Maps
- Peter Potaptchik