{"id":425190,"date":"2017-08-12T08:24:32","date_gmt":"2017-08-12T15:24:32","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-research-item&#038;p=425190"},"modified":"2019-09-02T07:18:46","modified_gmt":"2019-09-02T14:18:46","slug":"contextual-decision-processes-low-bellman-rank-pac-learnable","status":"publish","type":"msr-research-item","link":"https:\/\/www.microsoft.com\/en-us\/research\/publication\/contextual-decision-processes-low-bellman-rank-pac-learnable\/","title":{"rendered":"Contextual Decision Processes with low Bellman rank are PAC-Learnable"},"content":{"rendered":"\n\n\n<p class=\"wp-block-paragraph\">This paper studies systematic exploration for reinforcement learning (RL) with rich observations and function approximation. We introduce contextual decision processes (CDPs), that unify most prior RL settings. Our first contribution is a complexity measure, the Bellman rank, that we show enables tractable learning of near-optimal behavior in CDPs and is naturally small for many well-studied RL models. Our second contribution is a new RL algorithm that does systematic exploration to learn near-optimal behavior in CDPs with low Bellman rank. The algorithm requires a number of samples that is polynomial in all relevant parameters but independent of the number of unique contexts. Our approach uses Bellman error minimization with optimistic exploration and provides new insights into efficient exploration for RL with function approximation.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>This paper studies systematic exploration for reinforcement learning (RL) with rich observations and function approximation. We introduce contextual decision processes (CDPs), that unify most prior RL settings. Our first contribution is a complexity measure, the Bellman rank, that we show enables tractable learning of near-optimal behavior in CDPs and is naturally small for many well-studied [&hellip;]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-author-ordering":[{"type":"user_nicename","value":"Nan Jiang","user_id":"36614"},{"type":"user_nicename","value":"Akshay Krishnamurthy","user_id":"30913"},{"type":"user_nicename","value":"Alekh Agarwal","user_id":"30928"},{"type":"user_nicename","value":"John Langford","user_id":"32204"},{"type":"user_nicename","value":"Robert Schapire","user_id":"33549"}],"msr_publishername":"PMLR","msr_publisher_other":"","msr_booktitle":"","msr_chapter":"","msr_edition":"","msr_editors":"","msr_how_published":"","msr_isbn":"","msr_issue":"","msr_journal":"","msr_number":"","msr_organization":"","msr_pages_string":"1704\u20131713","msr_page_range_start":"1704","msr_page_range_end":"1713","msr_series":"","msr_volume":"","msr_copyright":"","msr_conference_name":"International Conference on Machine Learning","msr_doi":"","msr_arxiv_id":"1610.09512","msr_mag_id":"","msr_other_authors":"","msr_other_contributors":"","msr_speaker":"","msr_award":"","msr_affiliation":"","msr_institution":"","msr_host":"","msr_version":"","msr_duration":"","msr_release_tracker_id":"","msr_highlight_type":"","msr_date_display_format":"","msr_main_download_label":"","msr_external_link_label":"","msr_doi_label":"","msr_published_date":"2017-08-12","msr_startdate":"","msr_presentation_date":"","msr_highlight_text":"","msr_notes":"","msr_longbiography":"","msr_publicationurl":"https:\/\/arxiv.org\/abs\/1610.09512","msr_external_url":"","msr_secondary_video_url":"","msr_conference_url":"","msr_journal_url":"","msr_year":2017,"msr_month":8,"msr_day":12,"msr_microsoftintellectualproperty":true,"msr_pub_id":"","msr_publication_uploader":[{"type":"url","viewUrl":"false","id":false,"title":"https:\/\/arxiv.org\/abs\/1610.09512v2","label_id":243109,"label":0}],"msr_related_uploader":[],"msr_original_fields_of_study":[],"msr_s2_paper_id":"","msr_s2_pdf_url":"","msr_citation_count_updated":"","msr_citation_count":0,"msr_influential_citations":0,"msr_reference_count":0,"msr_s2_open_access":false,"msr_s2_author_ids":[],"msr_pub_ids":[{"provider":"arxiv","id":"1610.09512"}],"msr_hide_image_in_river":0,"footnotes":""},"msr-research-highlight":[],"research-area":[13556],"msr-publication-type":[193716],"msr-publisher":[],"msr-publication-cta":[],"msr-focus-area":[],"msr-locale":[268875],"msr-post-option":[],"msr-field-of-study":[],"msr-conference":[],"msr-journal":[],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-425190","msr-research-item","type-msr-research-item","status-publish","hentry","msr-research-area-artificial-intelligence","msr-locale-en_us"],"msr_publishername":"PMLR","msr_edition":"","msr_affiliation":"","msr_published_date":"2017-08-12","msr_host":"","msr_duration":"","msr_version":"","msr_speaker":"","msr_other_contributors":"","msr_booktitle":"","msr_pages_string":"1704\u20131713","msr_chapter":"","msr_isbn":"","msr_journal":"","msr_volume":"","msr_number":"","msr_editors":"","msr_series":"","msr_issue":"","msr_organization":"","msr_how_published":"","msr_notes":"","msr_highlight_text":"","msr_release_tracker_id":"","msr_original_fields_of_study":"","msr_download_urls":"","msr_external_url":"","msr_secondary_video_url":"","msr_longbiography":"","msr_microsoftintellectualproperty":1,"msr_main_download":"","msr_publicationurl":"https:\/\/arxiv.org\/abs\/1610.09512","msr_doi":"","msr_publication_uploader":[{"type":"url","viewUrl":"false","id":"false","title":"https:\/\/arxiv.org\/abs\/1610.09512v2","label_id":"243109","label":0}],"msr_related_uploader":[],"msr_citation_count":0,"msr_citation_count_updated":"","msr_s2_paper_id":"","msr_influential_citations":0,"msr_reference_count":0,"msr_arxiv_id":"1610.09512","msr_s2_author_ids":[],"msr_s2_open_access":false,"msr_s2_pdf_url":null,"msr_attachments":[{"id":0,"url":"https:\/\/arxiv.org\/abs\/1610.09512"}],"msr-author-ordering":[{"type":"user_nicename","value":"Nan Jiang","user_id":36614,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Nan Jiang"},{"type":"user_nicename","value":"Akshay Krishnamurthy","user_id":30913,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Akshay Krishnamurthy"},{"type":"user_nicename","value":"Alekh Agarwal","user_id":30928,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Alekh Agarwal"},{"type":"user_nicename","value":"John Langford","user_id":32204,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=John Langford"},{"type":"user_nicename","value":"Robert Schapire","user_id":33549,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Robert Schapire"}],"msr_impact_theme":[],"msr_research_lab":[199571],"msr_event":[],"msr_group":[144902,395930],"msr_project":[568491],"publication":[],"video":[],"msr-tool":[],"msr_publication_type":"inproceedings","related_content":{"projects":[{"ID":568491,"post_title":"Real World Reinforcement Learning","post_name":"real-world-reinforcement-learning","post_type":"msr-project","post_date":"2019-05-03 10:02:09","post_modified":"2024-01-16 11:11:48","post_status":"publish","permalink":"https:\/\/www.microsoft.com\/en-us\/research\/project\/real-world-reinforcement-learning\/","post_excerpt":"The mission of Real World Reinforcement Learning (Real-World RL) team is to develop learning methods, from foundations to real world applications, to empower people and organizations to make better decisions. The research enables the next generation of machine learning using interactive reinforcement-based approaches to solve real-world problems.","_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/568491"}]}}]},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/425190","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-research-item"}],"version-history":[{"count":1,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/425190\/revisions"}],"predecessor-version":[{"id":425193,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/425190\/revisions\/425193"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=425190"}],"wp:term":[{"taxonomy":"msr-research-highlight","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-highlight?post=425190"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=425190"},{"taxonomy":"msr-publication-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-publication-type?post=425190"},{"taxonomy":"msr-publisher","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-publisher?post=425190"},{"taxonomy":"msr-publication-cta","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-publication-cta?post=425190"},{"taxonomy":"msr-focus-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-focus-area?post=425190"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=425190"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=425190"},{"taxonomy":"msr-field-of-study","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-field-of-study?post=425190"},{"taxonomy":"msr-conference","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-conference?post=425190"},{"taxonomy":"msr-journal","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-journal?post=425190"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=425190"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=425190"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}