{"id":548745,"date":"2018-11-07T13:48:03","date_gmt":"2018-11-07T21:48:03","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-research-item&p=548745"},"modified":"2021-09-16T12:24:18","modified_gmt":"2021-09-16T19:24:18","slug":"trust-region-evolution-strategies","status":"publish","type":"msr-research-item","link":"https:\/\/www.microsoft.com\/en-us\/research\/publication\/trust-region-evolution-strategies\/","title":{"rendered":"Trust Region Evolution Strategies"},"content":{"rendered":"

Evolution Strategies (ES), a class of black-box optimization algorithms, has recently been demonstrated to be a viable alternative to popular MDP-based RL techniques such as Q-learning and Policy Gradients. ES achieves fairly good performance on challenging reinforcement learning problems and is easier to scale in a distributed setting. However, standard ES algorithms perform one gradient update per data sample, which is not very ef\ufb01cient. In this paper, with the purpose of more ef\ufb01cient using of sampled data, we propose a novel iterative procedure that optimizes a surrogate objective function, enabling to reuse data sample for multiple epochs of updates. We prove monotonic improvement guarantee for such procedure. By making several approximations to the theoretically-justi\ufb01ed procedure, we further develop a practical algorithm called Trust Region Evolution Strategies (TRES). Our experiments demonstrate the effectiveness of TRES on a range of popular MuJoCo locomotion tasks in the OpenAI Gym, achieving better performance than ES algorithm.<\/p>\n","protected":false},"excerpt":{"rendered":"

Evolution Strategies (ES), a class of black-box optimization algorithms, has recently been demonstrated to be a viable alternative to popular MDP-based RL techniques such as Q-learning and Policy Gradients. ES achieves fairly good performance on challenging reinforcement learning problems and is easier to scale in a distributed setting. However, standard ES algorithms perform one gradient […]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"footnotes":""},"msr-content-type":[3],"msr-research-highlight":[],"research-area":[13556],"msr-publication-type":[193716],"msr-product-type":[],"msr-focus-area":[],"msr-platform":[],"msr-download-source":[],"msr-locale":[268875],"msr-field-of-study":[],"msr-conference":[],"msr-journal":[],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-548745","msr-research-item","type-msr-research-item","status-publish","hentry","msr-research-area-artificial-intelligence","msr-locale-en_us"],"msr_publishername":"","msr_edition":"","msr_affiliation":"","msr_published_date":"2019-2-1","msr_host":"","msr_duration":"","msr_version":"","msr_speaker":"","msr_other_contributors":"","msr_booktitle":"","msr_pages_string":"","msr_chapter":"","msr_isbn":"","msr_journal":"","msr_volume":"","msr_number":"","msr_editors":"","msr_series":"","msr_issue":"","msr_organization":"","msr_how_published":"","msr_notes":"","msr_highlight_text":"","msr_release_tracker_id":"","msr_original_fields_of_study":"","msr_download_urls":"","msr_external_url":"","msr_secondary_video_url":"","msr_longbiography":"","msr_microsoftintellectualproperty":1,"msr_main_download":"","msr_publicationurl":"","msr_doi":"","msr_publication_uploader":[{"type":"file","viewUrl":"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2018\/11\/trust-region-evolution-strategies.pdf","id":"548025","title":"trust-region-evolution-strategies","label_id":"243109","label":0}],"msr_related_uploader":"","msr_attachments":[],"msr-author-ordering":[{"type":"text","value":"Guoqing Liu","user_id":0,"rest_url":false},{"type":"user_nicename","value":"Li Zhao","user_id":36152,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Li Zhao"},{"type":"user_nicename","value":"Feidiao Yang","user_id":31795,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Feidiao Yang"},{"type":"user_nicename","value":"Jiang Bian","user_id":38481,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Jiang Bian"},{"type":"user_nicename","value":"Tao Qin","user_id":33871,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Tao Qin"},{"type":"text","value":"Nenghai Yu","user_id":0,"rest_url":false},{"type":"user_nicename","value":"Tie-Yan Liu","user_id":34431,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Tie-Yan Liu"}],"msr_impact_theme":[],"msr_research_lab":[],"msr_event":[],"msr_group":[],"msr_project":[708421],"publication":[],"video":[],"download":[],"msr_publication_type":"inproceedings","_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/548745"}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-research-item"}],"version-history":[{"count":1,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/548745\/revisions"}],"predecessor-version":[{"id":548748,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/548745\/revisions\/548748"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=548745"}],"wp:term":[{"taxonomy":"msr-content-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-content-type?post=548745"},{"taxonomy":"msr-research-highlight","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-highlight?post=548745"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=548745"},{"taxonomy":"msr-publication-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-publication-type?post=548745"},{"taxonomy":"msr-product-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-product-type?post=548745"},{"taxonomy":"msr-focus-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-focus-area?post=548745"},{"taxonomy":"msr-platform","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-platform?post=548745"},{"taxonomy":"msr-download-source","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-download-source?post=548745"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=548745"},{"taxonomy":"msr-field-of-study","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-field-of-study?post=548745"},{"taxonomy":"msr-conference","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-conference?post=548745"},{"taxonomy":"msr-journal","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-journal?post=548745"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=548745"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=548745"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}