{"id":788960,"date":"2021-10-27T00:02:50","date_gmt":"2021-10-27T07:02:50","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-project&p=788960"},"modified":"2021-11-25T19:01:57","modified_gmt":"2021-11-26T03:01:57","slug":"rl-theory","status":"publish","type":"msr-project","link":"https:\/\/www.microsoft.com\/en-us\/research\/project\/rl-theory\/","title":{"rendered":"RL Theory"},"content":{"rendered":"
\n\t
\n\t\t
\n\t\t\t\t\t<\/div>\n\t\t\n\t\t
\n\t\t\t\n\t\t\t
\n\t\t\t\t\n\t\t\t\t
\n\t\t\t\t\t\n\t\t\t\t\t
\n\t\t\t\t\t\t
\n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\n

RL Theory<\/h1>\n\n\n\n

<\/p>\n\n\t\t\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t<\/div>\n\t\t<\/div>\n\t<\/div>\n<\/section>\n\n\n\n\n\n

Finding the optimal policy of a given controlled problem is a fundamental task in artificial intelligence. The reinforcement problem faces the following challenges: (1) No pre-given I.I.D. data. (2) No direct supervised label as in supervised learning. (3) Hard to do the optimization. To propose efficient algorithms to solve the control problem, we need the theoretical view to better understand the problem structure. Specifically, we focus on how to collect and use the data: For collecting data, we focus on developing efficient exploration algorithms which is 1) sample-efficient in theory; 2) computationally efficient and 3) compatible with deep reinforcement. For using data, we focus on the theoretical understanding of the algorithms under non-i.i.d. setting and understanding the policy optimization process by using optimal control as a tool and propose new policy optimization algorithms that are more stable and efficient.<\/p>\n\n\n\n\n\n

  • Wang Yue, Chen Wei, Liu Yuting, Ma Zhi-Ming, Liu Tie-Yan.   Finite sample analysis of the GTD policy evaluation algorithms in Markov setting. NeurIPS, 2017<\/li><\/ul>\n\n\n\n
    • Zhou, Yichi, Shihong Song, Huishuai Zhang, Jun Zhu, Wei Chen, and Tie-Yan Liu. Regularized OFU: an Efficient UCB Estimator forNon-linear Contextual Bandit. arXiv preprint arXiv:2106.15128<\/em> (2021).<\/li><\/ul>\n\n\n","protected":false},"excerpt":{"rendered":"

      Finding the optimal policy of a given controlled problem is a fundamental task in artificial intelligence. The reinforcement problem faces the following challenges: (1) No pre-given I.I.D. data. (2) No direct supervised label as in supervised learning. (3) Hard to do the optimization. To propose efficient algorithms to solve the control problem, we need the theoretical […]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","footnotes":""},"research-area":[13556],"msr-locale":[268875],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-788960","msr-project","type-msr-project","status-publish","hentry","msr-research-area-artificial-intelligence","msr-locale-en_us","msr-archive-status-active"],"msr_project_start":"","related-publications":[],"related-downloads":[],"related-videos":[],"related-groups":[],"related-events":[],"related-opportunities":[],"related-posts":[],"related-articles":[],"tab-content":[],"slides":[],"related-researchers":[{"type":"user_nicename","display_name":"Yue Wang","user_id":39931,"people_section":"Section name 0","alias":"yuwang5"}],"msr_research_lab":[199560],"msr_impact_theme":[],"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/788960"}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-project"}],"version-history":[{"count":5,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/788960\/revisions"}],"predecessor-version":[{"id":799939,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/788960\/revisions\/799939"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=788960"}],"wp:term":[{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=788960"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=788960"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=788960"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=788960"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}