Bio Embedding

Life is ruled by biological sequences and molecules, i.e. DNA, RNA, and protein sequences, following the de facto ‘natural’ language of biology. Understanding how these biomolecular behaves and interacts with each other can help with millions of lives that are still dying of diseases like cancers. However, it is not easy to effectively understand the biomolecule, such as protein sequence, the labeled data (e.g., structural information) is quite limited and cost to collect. Therefore, understanding these sequences is vital and urgent for biology, healthcare, and medicine.

In this project, the goal is to learn meaningful representations for biomolecule (protein, molecule). Specifically, we aim to design bio-inspired pretraining techniques and to empower (or even enable) impactful downstream applications by applying these developed techniques.

人员

Liang He的肖像

Liang He

Senior Researcher

Fusong Ju的肖像

Fusong Ju

Researcher

Tie-Yan Liu的肖像

Tie-Yan Liu

Distinguished Scientist, Microsoft Research AI for Science

Tao Qin的肖像

Tao Qin

Senior Principal Research Manager

Bin Shao的肖像

Bin Shao

Senior Principal Research Manager

Yingce Xia的肖像

Yingce Xia

Principle Researcher