{"id":821692,"date":"2022-02-23T12:02:31","date_gmt":"2022-02-23T20:02:31","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-research-item&p=821692"},"modified":"2022-02-23T12:02:31","modified_gmt":"2022-02-23T20:02:31","slug":"verbal-focus-of-attention-system-for-learning-from-observation","status":"publish","type":"msr-research-item","link":"https:\/\/www.microsoft.com\/en-us\/research\/publication\/verbal-focus-of-attention-system-for-learning-from-observation\/","title":{"rendered":"Verbal Focus-of-Attention System for Learning-from-Observation"},"content":{"rendered":"

The learning-from-observation (LfO) framework aims to map human demonstrations to a robot to reduce programming effort. To this end, an LfO system encodes a human demonstration into a series of execution units for a robot, which are referred to as task models. Although previous research has proposed successful task-model encoders, there has been little discussion on how to guide a task-model encoder in a scene with spatio-temporal noises, such as cluttered objects or unrelated human body movements. Inspired by the function of verbal instructions guiding an observer\u2019s visual attention, we propose a verbal focus-of-attention (FoA) system (i.e., spatiotemporal filters) to guide a task-model encoder. For object manipulation, the system first recognizes the name of a target object and its attributes from verbal instructions. The information serves as a where-to-look FoA filter to confine the areas in which the target object existed in the demonstration. The system then detects the timings of grasp and release that occurred in the filtered areas. The timings serve as a when-to-look FoA filter to confine the period of object manipulation. Finally, a task-model encoder recognizes the task models by employing the FoA filters. We demonstrate the robustness of the verbal FoA in attenuating spatio-temporal noises by comparing it with an existing action localization network. The contributions of this study are as follows: (1) to propose a verbal FoA for LfO, (2) to design an algorithm to calculate FoA filters from verbal input, and (3) to demonstrate the effectiveness of a verbal FoA in localizing an action by comparing it with a state-of-the-art vision system.<\/p>\n","protected":false},"excerpt":{"rendered":"

The learning-from-observation (LfO) framework aims to map human demonstrations to a robot to reduce programming effort. To this end, an LfO system encodes a human demonstration into a series of execution units for a robot, which are referred to as task models. Although previous research has proposed successful task-model encoders, there has been little discussion […]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","footnotes":""},"msr-content-type":[3],"msr-research-highlight":[],"research-area":[13556,13562,13554],"msr-publication-type":[193716],"msr-product-type":[],"msr-focus-area":[],"msr-platform":[],"msr-download-source":[],"msr-locale":[268875],"msr-post-option":[],"msr-field-of-study":[246694,246691,246688,248332,256711,248482,246712,256111,248479,246754,255229],"msr-conference":[263299],"msr-journal":[],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-821692","msr-research-item","type-msr-research-item","status-publish","hentry","msr-research-area-artificial-intelligence","msr-research-area-computer-vision","msr-research-area-human-computer-interaction","msr-locale-en_us","msr-field-of-study-artificial-intelligence","msr-field-of-study-computer-science","msr-field-of-study-computer-vision","msr-field-of-study-encoder","msr-field-of-study-filter-video","msr-field-of-study-grasp","msr-field-of-study-machine-vision","msr-field-of-study-object-computer-science","msr-field-of-study-robot","msr-field-of-study-robustness-computer-science","msr-field-of-study-task-project-management"],"msr_publishername":"IEEE","msr_edition":"","msr_affiliation":"","msr_published_date":"2021-5-29","msr_host":"","msr_duration":"","msr_version":"","msr_speaker":"","msr_other_contributors":"","msr_booktitle":"","msr_pages_string":"","msr_chapter":"","msr_isbn":"","msr_journal":"","msr_volume":"","msr_number":"","msr_editors":"","msr_series":"","msr_issue":"","msr_organization":"","msr_how_published":"","msr_notes":"","msr_highlight_text":"","msr_release_tracker_id":"","msr_original_fields_of_study":"","msr_download_urls":"","msr_external_url":"","msr_secondary_video_url":"","msr_longbiography":"","msr_microsoftintellectualproperty":1,"msr_main_download":"","msr_publicationurl":"","msr_doi":"","msr_publication_uploader":[{"type":"doi","viewUrl":"false","id":"false","title":"10.1109\/ICRA48506.2021.9562102","label_id":"243106","label":0},{"type":"url","viewUrl":"false","id":"false","title":"https:\/\/ieeexplore.ieee.org\/stamp\/stamp.jsp?tp=&arnumber=9562102","label_id":"243109","label":0},{"type":"url","viewUrl":"false","id":"false","title":"https:\/\/arxiv.org\/ftp\/arxiv\/papers\/2007\/2007.08705.pdf","label_id":"252679","label":0}],"msr_related_uploader":"","msr_attachments":[],"msr-author-ordering":[{"type":"user_nicename","value":"Naoki Wake","user_id":39916,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Naoki Wake"},{"type":"guest","value":"iori-yanokura","user_id":821599,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=iori-yanokura"},{"type":"guest","value":"kazuhiro-sasabuchi","user_id":821605,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=kazuhiro-sasabuchi"},{"type":"user_nicename","value":"Katsushi Ikeuchi","user_id":32500,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Katsushi Ikeuchi"}],"msr_impact_theme":[],"msr_research_lab":[],"msr_event":[],"msr_group":[],"msr_project":[821527],"publication":[],"video":[],"download":[],"msr_publication_type":"inproceedings","related_content":{"projects":[{"ID":821527,"post_title":"Interactive Learning-from-Observation","post_name":"interactive-learning-from-observation","post_type":"msr-project","post_date":"2022-02-24 20:57:14","post_modified":"2024-03-19 15:24:14","post_status":"publish","permalink":"https:\/\/www.microsoft.com\/en-us\/research\/project\/interactive-learning-from-observation\/","post_excerpt":"Service-robot solutions to empower senior citizens \u200b to achieve more and to enhance their lives The goal of this project is to develop an interactive learning-from-observation (LfO) system in the service-robot domain so as to empower senior citizens to achieve more and enhance their lives. Currently, many seniors in assisted living facilities would have preferred to remain at their homes. If we can use service robots to assist them, they can stay at home, conduct…","_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/821527"}]}}]},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/821692"}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-research-item"}],"version-history":[{"count":1,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/821692\/revisions"}],"predecessor-version":[{"id":821695,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/821692\/revisions\/821695"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=821692"}],"wp:term":[{"taxonomy":"msr-content-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-content-type?post=821692"},{"taxonomy":"msr-research-highlight","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-highlight?post=821692"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=821692"},{"taxonomy":"msr-publication-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-publication-type?post=821692"},{"taxonomy":"msr-product-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-product-type?post=821692"},{"taxonomy":"msr-focus-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-focus-area?post=821692"},{"taxonomy":"msr-platform","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-platform?post=821692"},{"taxonomy":"msr-download-source","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-download-source?post=821692"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=821692"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=821692"},{"taxonomy":"msr-field-of-study","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-field-of-study?post=821692"},{"taxonomy":"msr-conference","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-conference?post=821692"},{"taxonomy":"msr-journal","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-journal?post=821692"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=821692"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=821692"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}