{"id":1018506,"date":"2024-03-27T08:38:56","date_gmt":"2024-03-27T15:38:56","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-research-item&p=1018506"},"modified":"2024-03-28T05:47:23","modified_gmt":"2024-03-28T12:47:23","slug":"the-successor-representation-in-human-reinforcement-learning","status":"publish","type":"msr-research-item","link":"https:\/\/www.microsoft.com\/en-us\/research\/publication\/the-successor-representation-in-human-reinforcement-learning\/","title":{"rendered":"The successor representation in human reinforcement learning"},"content":{"rendered":"

Theories of reward learning in neuroscience have focused on two families of algorithms thought to capture deliberative versus habitual choice. \u2018Model-based\u2019 algorithms compute the value of candidate actions from scratch, whereas \u2018model-free\u2019 algorithms make choice more efficient but less flexible by storing pre-computed action values. We examine an intermediate algorithmic family, the successor representation, which balances flexibility and efficiency by storing partially computed action values: predictions about future events. These pre-computation strategies differ in how they update their choices following changes in a task. The successor representation\u2019s reliance on stored predictions about future states predicts a unique signature of insensitivity to changes in the task\u2019s sequence of events, but flexible adjustment following changes to rewards. We provide evidence for such differential sensitivity in two behavioural studies with humans. These results suggest that the successor representation is a computational substrate for semi-flexible choice in humans, introducing a subtler, more cognitive notion of habit.<\/p>\n","protected":false},"excerpt":{"rendered":"

Theories of reward learning in neuroscience have focused on two families of algorithms thought to capture deliberative versus habitual choice. \u2018Model-based\u2019 algorithms compute the value of candidate actions from scratch, whereas \u2018model-free\u2019 algorithms make choice more efficient but less flexible by storing pre-computed action values. We examine an intermediate algorithmic family, the successor representation, which […]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","footnotes":""},"msr-content-type":[3],"msr-research-highlight":[],"research-area":[13556],"msr-publication-type":[193715],"msr-product-type":[],"msr-focus-area":[],"msr-platform":[],"msr-download-source":[],"msr-locale":[268875],"msr-post-option":[],"msr-field-of-study":[],"msr-conference":[],"msr-journal":[],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-1018506","msr-research-item","type-msr-research-item","status-publish","hentry","msr-research-area-artificial-intelligence","msr-locale-en_us"],"msr_publishername":"","msr_edition":"","msr_affiliation":"","msr_published_date":"2017-9-1","msr_host":"","msr_duration":"","msr_version":"","msr_speaker":"","msr_other_contributors":"","msr_booktitle":"","msr_pages_string":"","msr_chapter":"","msr_isbn":"","msr_journal":"Nature Human Behaviour","msr_volume":"1","msr_number":"","msr_editors":"","msr_series":"","msr_issue":"","msr_organization":"","msr_how_published":"","msr_notes":"","msr_highlight_text":"","msr_release_tracker_id":"","msr_original_fields_of_study":"","msr_download_urls":"","msr_external_url":"","msr_secondary_video_url":"","msr_longbiography":"","msr_microsoftintellectualproperty":0,"msr_main_download":"","msr_publicationurl":"","msr_doi":"","msr_publication_uploader":[{"type":"url","viewUrl":"false","id":"false","title":"https:\/\/www.momen-nejad.org\/_files\/ugd\/a6d7e4_e502ef7b88b84d79bbde6febd5f9b208.pdf","label_id":"243132","label":0},{"type":"doi","viewUrl":"false","id":"false","title":"10.1038\/s41562-017-0180-8","label_id":"243106","label":0}],"msr_related_uploader":"","msr_attachments":[],"msr-author-ordering":[{"type":"user_nicename","value":"Ida Momennejad","user_id":39832,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Ida Momennejad"},{"type":"text","value":"E. M. Russek","user_id":0,"rest_url":false},{"type":"text","value":"J. H. Cheong","user_id":0,"rest_url":false},{"type":"text","value":"M. M. Botvinick","user_id":0,"rest_url":false},{"type":"text","value":"N. D. Daw","user_id":0,"rest_url":false},{"type":"text","value":"S. J. Gershman","user_id":0,"rest_url":false}],"msr_impact_theme":[],"msr_research_lab":[],"msr_event":[],"msr_group":[],"msr_project":[],"publication":[],"video":[],"download":[],"msr_publication_type":"article","related_content":[],"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/1018506"}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-research-item"}],"version-history":[{"count":2,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/1018506\/revisions"}],"predecessor-version":[{"id":1018524,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/1018506\/revisions\/1018524"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=1018506"}],"wp:term":[{"taxonomy":"msr-content-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-content-type?post=1018506"},{"taxonomy":"msr-research-highlight","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-highlight?post=1018506"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=1018506"},{"taxonomy":"msr-publication-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-publication-type?post=1018506"},{"taxonomy":"msr-product-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-product-type?post=1018506"},{"taxonomy":"msr-focus-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-focus-area?post=1018506"},{"taxonomy":"msr-platform","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-platform?post=1018506"},{"taxonomy":"msr-download-source","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-download-source?post=1018506"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=1018506"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=1018506"},{"taxonomy":"msr-field-of-study","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-field-of-study?post=1018506"},{"taxonomy":"msr-conference","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-conference?post=1018506"},{"taxonomy":"msr-journal","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-journal?post=1018506"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=1018506"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=1018506"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}