Life is ruled by biological sequences and molecules, i.e. DNA, RNA, and protein sequences, following the de facto ‘natural’ language of biology. Understanding how these biomolecular behaves and interacts with each other can help with millions of lives that are still dying of diseases like cancers. However, it is not easy to effectively understand the biomolecule, such as protein sequence, the labeled data (e.g., structural information) is quite limited and cost to collect. Therefore, understanding these sequences is vital and urgent for biology, healthcare, and medicine.
In this project, the goal is to learn meaningful representations for biomolecule (protein, molecule). Specifically, we aim to design bio-inspired pretraining techniques and to empower (or even enable) impactful downstream applications by applying these developed techniques.
People
Liang He
Senior Researcher
Fusong Ju
Researcher
Tie-Yan Liu
Distinguished Scientist, Microsoft Research AI for Science
Tao Qin
Partner Research Manager
Yingce Xia
Principal Researcher