{"id":775786,"date":"2021-06-30T17:02:44","date_gmt":"2021-07-01T00:02:44","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-research-item&p=775786"},"modified":"2021-10-19T08:06:12","modified_gmt":"2021-10-19T15:06:12","slug":"visual-recognition-beyond-appearances-and-its-robotic-applications","status":"publish","type":"msr-video","link":"https:\/\/www.microsoft.com\/en-us\/research\/video\/visual-recognition-beyond-appearances-and-its-robotic-applications\/","title":{"rendered":"Visual Recognition beyond Appearances, and its Robotic Applications"},"content":{"rendered":"

The goal of Computer Vision, as coined by Marr, is to develop algorithms to answer What are Where at When from visual appearance. The speaker, among others, recognizes the importance of studying underlying entities and relations beyond visual appearance, following an Active Perception paradigm. This talk will present the speaker’s efforts over the last decade, ranging from 1) reasoning beyond appearance for visual question answering, image\/video captioning tasks, and their evaluation, through 2) temporal and self-supervised knowledge distillation with incremental knowledge transfer, till 3) their roles in a Robotic visual learning framework via a Robotic Indoor Object Search task. The talk will also feature the Active Perception Group (APG)\u2019s ongoing projects (NSF RI, NRI and CPS, DARPA KAIROS, and Arizona IAM) addressing emerging challenges of the nation in autonomous driving and AI security domains, at the ASU School of Computing, Informatics, and Decision Systems Engineering (CIDSE).<\/p>\n

\n\t\n\t\tView slides\t<\/a>\n\n\t<\/div>\n
<\/div>\n

List of major papers covered in the talk:<\/em><\/p>\n

V&L model robustness<\/strong>
\nECCV 2020: VQA-LOL: Visual Question Answering under the Lens of Logic (opens in new tab)<\/span><\/a>
\nACL 2021:
SMURF: SeMantic and linguistic UndeRstanding Fusion for Caption Evaluation via Typicality Analysis (opens in new tab)<\/span><\/a>
\nEMNLP 2020:
MUTANT: A Training Paradigm for Out-of-Distribution Generalization in Visual Question Answering (opens in new tab)<\/span><\/a>
\nEMNLP 2020:\u00a0
Video2Commonsense: Generating Commonsense Descriptions to Enrich Video Captioning (opens in new tab)<\/span><\/a><\/p>\n

Robotic object search<\/strong>
\nCVPR 2021:
Hierarchical and Partially Observable Goal-driven Policy Learning with Goals Relational Graph (opens in new tab)<\/span><\/a>
\nICRA 2021\/RA-L:
Efficient Robotic Object Search via HIEM: Hierarchical Policy Learning with Intrinsic-Extrinsic Modeling (opens in new tab)<\/span><\/a><\/p>\n

Other teasers:<\/em><\/p>\n

AI security\/GAN attribution<\/strong>
\nICLR 2021:
Decentralized Attribution of Generative Models (opens in new tab)<\/span><\/a>
\nAAAI 2021:
Attribute-Guided Adversarial Training for Robustness to Natural Perturbations (opens in new tab)<\/span><\/a><\/p>\n","protected":false},"excerpt":{"rendered":"

The goal of Computer Vision, as coined by Marr, is to develop algorithms to answer What are Where at When from visual appearance. The speaker, among others, recognizes the importance of studying underlying entities and relations beyond visual appearance, following an Active Perception paradigm. This talk will present the speaker’s efforts over the last decade, […]<\/p>\n","protected":false},"featured_media":775789,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","footnotes":""},"research-area":[13556,13562],"msr-video-type":[259633],"msr-locale":[268875],"msr-post-option":[],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-775786","msr-video","type-msr-video","status-publish","has-post-thumbnail","hentry","msr-research-area-artificial-intelligence","msr-research-area-computer-vision","msr-video-type-vision-language-summer-talk-series","msr-locale-en_us"],"msr_download_urls":"","msr_external_url":"https:\/\/youtu.be\/RRxbNcgvPG4","msr_secondary_video_url":"","msr_video_file":"","_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-video\/775786"}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-video"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-video"}],"version-history":[{"count":5,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-video\/775786\/revisions"}],"predecessor-version":[{"id":786304,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-video\/775786\/revisions\/786304"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media\/775789"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=775786"}],"wp:term":[{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=775786"},{"taxonomy":"msr-video-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-video-type?post=775786"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=775786"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=775786"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=775786"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=775786"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}