{"id":922338,"date":"2023-02-27T17:11:05","date_gmt":"2023-02-28T01:11:05","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-blog-post&p=922338"},"modified":"2023-02-27T18:03:13","modified_gmt":"2023-02-28T02:03:13","slug":"smart-a-generalized-pretraining-framework-for-control-tasks","status":"publish","type":"msr-blog-post","link":"https:\/\/www.microsoft.com\/en-us\/research\/articles\/smart-a-generalized-pretraining-framework-for-control-tasks\/","title":{"rendered":"SMART \u2013 A Generalized Pretraining Framework for Control Tasks"},"content":{"rendered":"\n

We are announcing SMART, a generalized pretraining framework for a wide variety of control tasks.<\/p>\n\n\n\n

\"The<\/figure>\n\n\n\n
\n
\n
\n
\n
\n
\n
Paper<\/a><\/div>\n\n\n\n
SMART code<\/a><\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n

Self-supervised pretraining of large neural networks (BERT (opens in new tab)<\/span><\/a>, GPT (opens in new tab)<\/span><\/a>, MoCo (opens in new tab)<\/span><\/a>, and CLIP (opens in new tab)<\/span><\/a>) has been shown to be successful in a wide range of language and vision problems. These works demonstrate that one single pretrained model can be easily finetuned to perform many downstream tasks, resulting in a simple, effective, and data-efficient paradigm. When it comes to control tasks, however, it is not clear yet whether the successes of pretraining approaches can be easily replicated. So, we ask the question: can we enable similar pretraining paradigm for efficient decision-making across various control tasks?<\/em> <\/p>\n<\/div>\n\n\n\n

In “SMART: Self-supervised Multi-task pretrAining with contRol Transformers (opens in new tab)<\/span><\/a>“, to be published at\u00a0ICLR2023 (opens in new tab)<\/span><\/a> (as notable-top-25%), we study how to pretrain a versatile, generalizable and resilient model for a wide variety of control tasks. We demonstrate that SMART can significantly improve the learning efficiency and facilitate rapid transfer to novel tasks under different learning scenarios including Imitation Learning (IL) and Reinforcement Learning (RL). Benefiting from the proposed control-centric objective, SMART is resilient to distribution shift between pretraining and finetuning, and even works well with low-quality datasets that are randomly collected. <\/p>\n\n\n\n

We now discuss the challenges and introduce our key designing concepts and technical details.<\/p>\n\n\n\n

\n