{"id":788771,"date":"2021-10-26T19:56:39","date_gmt":"2021-10-27T02:56:39","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-project&p=788771"},"modified":"2023-01-24T12:07:01","modified_gmt":"2023-01-24T20:07:01","slug":"bio-embedding","status":"publish","type":"msr-project","link":"https:\/\/www.microsoft.com\/en-us\/research\/project\/bio-embedding\/","title":{"rendered":"Bio Embedding"},"content":{"rendered":"
\n\t
\n\t\t
\n\t\t\t\t\t<\/div>\n\t\t\n\t\t
\n\t\t\t\n\t\t\t
\n\t\t\t\t\n\t\t\t\t
\n\t\t\t\t\t\n\t\t\t\t\t
\n\t\t\t\t\t\t
\n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\n

Bio Embedding<\/h1>\n\n\n\n

<\/p>\n\n\t\t\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t<\/div>\n\t\t<\/div>\n\t<\/div>\n<\/section>\n\n\n\n\n\n

Life is ruled by biological sequences and molecules, i.e. DNA, RNA, and protein sequences, following the de facto<\/em> \u2018natural\u2019 language of biology. Understanding how these biomolecular behaves and interacts with each other can help with millions of lives that are still dying of diseases like cancers. However, it is not easy to effectively understand the biomolecule, such as protein sequence, the labeled data (e.g., structural information) is quite limited and cost to collect. Therefore, understanding these sequences is vital and urgent for biology, healthcare, and medicine.<\/p>\n\n\n\n

In this project, the goal is to learn meaningful representations for biomolecule (protein, molecule). Specifically, we aim to design bio-inspired pretraining techniques and to empower (or even enable) impactful downstream applications by applying these developed techniques.<\/p>\n\n\n\n\n\n

  • Liang He, Shizhuo Zhang, Lijun Wu, Huanhuan Xia, Fusong Ju, He Zhang, Siyuan Liu, Yingce Xia, Jianwei Zhu, Pan Deng, Bin Shao, Tao Qin, Tie-Yan Liu, Pre-training Co-evolutionary Protein Representation via A Pairwise Masked Language Model, arXiv preprint arXiv:2110.15527<\/em>, 2021.<\/li>
  • Jinhua Zhu, Yingce Xia, Tao Qin, Wengang Zhou, Houqiang Li, Tie-Yan Liu, Dual-view Molecule Pre-training, arXiv preprint arXiv: 2106.10234<\/em>, 2021.<\/li><\/ul>\n\n\n","protected":false},"excerpt":{"rendered":"

    Life is ruled by biological sequences and molecules, i.e. DNA, RNA, and protein sequences, following the de facto \u2018natural\u2019 language of biology. Understanding how these biomolecular behaves and interacts with each other can help with millions of lives that are still dying of diseases like cancers. However, it is not easy to effectively understand the biomolecule, such as protein sequence, the labeled data (e.g., structural information) is quite limited and cost to collect. Therefore, understanding these sequences is vital and urgent for biology, healthcare, and medicine.<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","footnotes":""},"research-area":[13556],"msr-locale":[268875],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-788771","msr-project","type-msr-project","status-publish","hentry","msr-research-area-artificial-intelligence","msr-locale-en_us","msr-archive-status-active"],"msr_project_start":"","related-publications":[],"related-downloads":[],"related-videos":[],"related-groups":[],"related-events":[],"related-opportunities":[],"related-posts":[],"related-articles":[],"tab-content":[],"slides":[],"related-researchers":[{"type":"user_nicename","display_name":"Liang He","user_id":38505,"people_section":"Section name 0","alias":"lihe"},{"type":"user_nicename","display_name":"Fusong Ju","user_id":40738,"people_section":"Section name 0","alias":"fusongju"},{"type":"user_nicename","display_name":"Tie-Yan Liu","user_id":34431,"people_section":"Section name 0","alias":"tyliu"},{"type":"user_nicename","display_name":"Tao Qin","user_id":33871,"people_section":"Section name 0","alias":"taoqin"},{"type":"user_nicename","display_name":"Yingce Xia","user_id":37784,"people_section":"Section name 0","alias":"yinxia"}],"msr_research_lab":[199560,851467],"msr_impact_theme":[],"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/788771"}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-project"}],"version-history":[{"count":5,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/788771\/revisions"}],"predecessor-version":[{"id":914397,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/788771\/revisions\/914397"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=788771"}],"wp:term":[{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=788771"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=788771"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=788771"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=788771"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}