{"id":506147,"date":"2018-09-18T05:24:34","date_gmt":"2018-09-18T12:24:34","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-research-item&#038;p=506147"},"modified":"2019-07-22T23:14:01","modified_gmt":"2019-07-23T06:14:01","slug":"learning-attentional-recurrent-neural-network-for-visual-tracking","status":"publish","type":"msr-research-item","link":"https:\/\/www.microsoft.com\/en-us\/research\/publication\/learning-attentional-recurrent-neural-network-for-visual-tracking\/","title":{"rendered":"Learning Attentional Recurrent Neural Network for Visual Tracking"},"content":{"rendered":"<p>Existing visual tracking methods face many challenges: (a) the changed size and number of targets over time, (b) occlusion in discrete frames, (c) mis-identification for crossing targets. Long short-term memory (LSTM) has the advantage of modeling long-term tasks and is suitable for tracking. We propose a novel online Attentional Recurrent Neural Network (ARNN) model for visual tracking whose core component is a two-layer bidirectional LSTM along the x and y axes. Several bidirectional LSTMs can be cascaded or parallelly connected together to exploit multi-scale target features and can give more precise tracked object locations. Each bidirectional LSTM utilizes the convolutional features of Convolutional Neural Network (CNN) inside two bounding boxes from two frames to check whether the target in the current frame is the one in previous frames. Attention mechanism is also adopted to enhance the proposed model for better expressing the patch-level features of the tracking targets. Inter attention and intra attention models are proposed to imitate the temporal and spatial tracking mechanism of primate visual cortex. Inter attention learns to overcome the occlusion problem and intra attention is able to mark important regions to better trace the target. The bidirectional LSTM and the attention mechanism are jointly trained. The combination of them further improves the accuracy of target tracking in videos. The outstanding performances in the experiments demonstrate the effectiveness of our proposed online method ARNN and yield competitive results compared with the state-of-the-art tracking methods.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Existing visual tracking methods face many challenges: (a) the changed size and number of targets over time, (b) occlusion in discrete frames, (c) mis-identification for crossing targets. Long short-term memory (LSTM) has the advantage of modeling long-term tasks and is suitable for tracking. We propose a novel online Attentional Recurrent Neural Network (ARNN) model for [&hellip;]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-author-ordering":null,"msr_publishername":"","msr_publisher_other":"","msr_booktitle":"","msr_chapter":"","msr_edition":"","msr_editors":"","msr_how_published":"","msr_isbn":"","msr_issue":"4","msr_journal":"IEEE Transactions on Multimedia","msr_number":"","msr_organization":"","msr_pages_string":"","msr_page_range_start":"930","msr_page_range_end":"942","msr_series":"","msr_volume":"21","msr_copyright":"\u00a9 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting\/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.","msr_conference_name":"","msr_doi":"","msr_arxiv_id":"","msr_s2_paper_id":"","msr_mag_id":"","msr_pubmed_id":"","msr_other_authors":"","msr_other_contributors":"","msr_speaker":"","msr_award":"","msr_affiliation":"","msr_institution":"","msr_host":"","msr_version":"","msr_duration":"","msr_original_fields_of_study":"","msr_release_tracker_id":"","msr_s2_match_type":"","msr_citation_count_updated":"","msr_published_date":"2019-4-1","msr_highlight_text":"","msr_notes":"","msr_longbiography":"","msr_publicationurl":"https:\/\/ieeexplore.ieee.org\/document\/8457310\/","msr_external_url":"","msr_secondary_video_url":"","msr_conference_url":"","msr_journal_url":"","msr_s2_pdf_url":"","msr_year":0,"msr_citation_count":0,"msr_influential_citations":0,"msr_reference_count":0,"msr_s2_match_confidence":0,"msr_microsoftintellectualproperty":true,"msr_s2_open_access":false,"msr_s2_author_ids":[],"msr_pub_ids":[],"msr_hide_image_in_river":0,"footnotes":""},"msr-research-highlight":[],"research-area":[13556,13562],"msr-publication-type":[193715],"msr-publisher":[],"msr-focus-area":[],"msr-locale":[268875],"msr-post-option":[],"msr-field-of-study":[],"msr-conference":[],"msr-journal":[],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-506147","msr-research-item","type-msr-research-item","status-publish","hentry","msr-research-area-artificial-intelligence","msr-research-area-computer-vision","msr-locale-en_us"],"msr_publishername":"","msr_edition":"","msr_affiliation":"","msr_published_date":"2019-4-1","msr_host":"","msr_duration":"","msr_version":"","msr_speaker":"","msr_other_contributors":"","msr_booktitle":"","msr_pages_string":"","msr_chapter":"","msr_isbn":"","msr_journal":"IEEE Transactions on Multimedia","msr_volume":"21","msr_number":"","msr_editors":"","msr_series":"","msr_issue":"4","msr_organization":"","msr_how_published":"","msr_notes":"","msr_highlight_text":"","msr_release_tracker_id":"","msr_original_fields_of_study":"","msr_download_urls":"","msr_external_url":"","msr_secondary_video_url":"","msr_longbiography":"","msr_microsoftintellectualproperty":1,"msr_main_download":"","msr_publicationurl":"https:\/\/ieeexplore.ieee.org\/document\/8457310\/","msr_doi":"","msr_publication_uploader":[{"type":"url","viewUrl":"false","id":"false","title":"https:\/\/ieeexplore.ieee.org\/document\/8457310\/","label_id":"243109","label":0}],"msr_related_uploader":"","msr_citation_count":0,"msr_citation_count_updated":"","msr_s2_paper_id":"","msr_influential_citations":0,"msr_reference_count":0,"msr_arxiv_id":"","msr_s2_author_ids":[],"msr_s2_open_access":false,"msr_s2_pdf_url":null,"msr_attachments":[{"id":0,"url":"https:\/\/ieeexplore.ieee.org\/document\/8457310\/"}],"msr-author-ordering":[{"type":"text","value":"Qiurui Wang","user_id":0,"rest_url":false},{"type":"text","value":"Chun Yuan","user_id":0,"rest_url":false},{"type":"user_nicename","value":"Jingdong Wang","user_id":32299,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Jingdong Wang"},{"type":"user_nicename","value":"Wenjun Zeng","user_id":34830,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Wenjun Zeng"}],"msr_impact_theme":[],"msr_research_lab":[199560],"msr_event":[],"msr_group":[144711],"msr_project":[],"publication":[],"video":[],"msr-tool":[],"msr_publication_type":"article","related_content":[],"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/506147","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-research-item"}],"version-history":[{"count":1,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/506147\/revisions"}],"predecessor-version":[{"id":506150,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/506147\/revisions\/506150"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=506147"}],"wp:term":[{"taxonomy":"msr-research-highlight","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-highlight?post=506147"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=506147"},{"taxonomy":"msr-publication-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-publication-type?post=506147"},{"taxonomy":"msr-publisher","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-publisher?post=506147"},{"taxonomy":"msr-focus-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-focus-area?post=506147"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=506147"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=506147"},{"taxonomy":"msr-field-of-study","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-field-of-study?post=506147"},{"taxonomy":"msr-conference","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-conference?post=506147"},{"taxonomy":"msr-journal","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-journal?post=506147"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=506147"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=506147"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}