{"id":489125,"date":"2018-06-05T09:23:14","date_gmt":"2018-06-05T16:23:14","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-blog-post&#038;p=489125"},"modified":"2018-06-05T09:23:14","modified_gmt":"2018-06-05T16:23:14","slug":"netizen-style-commenting-fashion-photos-autonomous-diverse-cognitive","status":"publish","type":"msr-blog-post","link":"https:\/\/www.microsoft.com\/en-us\/research\/articles\/netizen-style-commenting-fashion-photos-autonomous-diverse-cognitive\/","title":{"rendered":"Netizen-style commenting for fashion photos: autonomous, diverse, and cognitive"},"content":{"rendered":"<p>The advance of deep neural networks has brought huge advances in image captioning. However, current work is deficient in several ways. It simply generates \u201cvanilla\u201d sentences, which describe the shallow appearance of things (e.g., color, types) in the photo and typically doesn\u2019t create a caption with engaging information about context or their intentions, in a way that a human would.<\/p>\n<p>Recently, Professor Winston Hsu from National Taiwan University collaborated with researchers at Microsoft Research Asia (MSRA) to address this challenge in social media photo commenting for user-contributed fashion photos.<\/p>\n<p>Hsu noted, \u201cThis idea was developed during our previous collaboration on\u00a0<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/arxiv.org\/abs\/1801.01957\" target=\"_blank\" rel=\"noopener\">XiaoIce\u00a0<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>with Ruihua Song, Principal Data and Applied Science Lead at MRSA, where we empowered a chatbot to compose modern Chinese poems for user-uploaded photos. We were very excited about how much more can be done in photo commenting for user-contributed fashion photos. In this work, we aim to create comments like a \u201cnetizen\u201d would actually sound, which reflects the culture in the designated social community and fosters more engagement with the chatbot and between the users.\u00a0 We expect the results have application in social media, customer services, e-commerce, fashion mining, and other areas.\u201d<\/p>\n<p>In their project, Hsu and MSRA researchers aimed for an autonomous learning process by leveraging freely available online social media big data. They focused on designing robust and fine-grained neural networks that train by aligning noisy comments with photos. Their approach addressed the attention issue in cluttered photos in an automatic manner and avoided costly object-level annotations. Given that freshness and diversity in comments is desired, they brought in diversity by further marrying a topic discovery model (i.e., latent Dirichlet allocation) with the neural networks (i.e., coupling the generative and discriminative network models in the end-to-end learning framework). They quickly realized that there are usually intentions behind the user-contributed photos. Their work comprehends cognition in photos by further reasoning (a) the intention and (b) the context and proactively consider the two in commenting (e.g. \u201cmight be cold tonight; bring a coat with you\u201d).<\/p>\n<div id=\"attachment_489128\" style=\"width: 1439px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-489128\" class=\"wp-image-489128 size-full\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2018\/05\/netizen-style-commenting-for-fashion-photos.jpg\" alt=\"netizen-style commenting for fashion photos\" width=\"1429\" height=\"594\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2018\/05\/netizen-style-commenting-for-fashion-photos.jpg 1429w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2018\/05\/netizen-style-commenting-for-fashion-photos-300x125.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2018\/05\/netizen-style-commenting-for-fashion-photos-768x319.jpg 768w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2018\/05\/netizen-style-commenting-for-fashion-photos-1024x426.jpg 1024w\" sizes=\"(max-width: 1429px) 100vw, 1429px\" \/><p id=\"caption-attachment-489128\" class=\"wp-caption-text\">Legend: Project result examples. Last row: Result from this project. 2nd row: Input by humans. 3rd-5th rows: Results by other state-of-the-art methods. As shown in the last row, comments generated from this project are more vivid and dynamic, and sound more like a human said them.<\/p><\/div>\n<p>To generate more human-like online comments for fashion photos, the team compiled a large collection of paired user-contributed fashion photos and comments, called NetiLook, from an online clothing style community. \u201cOur collected NetiLook has 350 thousand photos, with 5 million comments. To the best of our knowledge, it\u2019s the largest fashion comment dataset so far. We have made the dataset public for the research community,\u201d Hsu said. In their experiment on NetiLook, they found that current methods tend to overfit to a general pattern, which makes captioning results insipid and banal (e.g., \u201clove the &#8230;\u201d). To compensate for this deficiency, and enrich the diversity in the text content, they decided to integrate style-weight from topic discovery models (i.e., latent Dirichlet allocation (LDA)) with neural networks in generating diverse comments that are more vivid and human-like.<\/p>\n<p>\u201cDiversity is one of the biggest challenges in text generation. This project not only designed brand-new diversity measurements, but also proposed a smart way of marrying topic models with neural networks to make up for the insufficiency of conventional image captioning,\u201d said Song.<\/p>\n<p>Related:<\/p>\n<ul>\n<li>The project was accepted in a top conference, WWW 2018. Wen Hua Lin, Kuan-Ting Chen, HungYueh Chiang, Winston H. Hsu. Netizen-Style Commenting on Fashion Photos \u2013 Dataset and Diversity Measures, WWW 2018.<\/li>\n<li><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/winstonhsu.info\/public-netilook-dataset-for-netizen-style-commenting-on-fashion-photos\/\" target=\"_blank\" rel=\"noopener\">https:\/\/winstonhsu.info\/public-netilook-dataset-for-netizen-style-commenting-on-fashion-photos\/<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/li>\n<li>The most influential news media in Taiwan has reported the collaboration between National Taiwan University and MSRA, e.g. Microsoft\u2019s \u201cXiaoIce\u201d, in their video interview in TVBS News, December 28, 2017. <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/www.youtube.com\/watch?v=HvVNA89dzoE&t=1m25s\" target=\"_blank\" rel=\"noopener\">https:\/\/www.youtube.com\/watch?v=HvVNA89dzoE&t=1m25s<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>The advance of deep neural networks has brought huge advances in image captioning. However, current work is deficient in several ways. It simply generates \u201cvanilla\u201d sentences, which describe the shallow appearance of things (e.g., color, types) in the photo and typically doesn\u2019t create a caption with engaging information about context or their intentions, in a [&hellip;]<\/p>\n","protected":false},"author":37242,"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"msr-content-parent":199560,"footnotes":""},"research-area":[],"msr-locale":[268875],"class_list":["post-489125","msr-blog-post","type-msr-blog-post","status-publish","hentry","msr-locale-en_us"],"msr_assoc_parent":{"id":199560,"type":"lab"},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-blog-post\/489125"}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-blog-post"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-blog-post"}],"author":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/users\/37242"}],"version-history":[{"count":5,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-blog-post\/489125\/revisions"}],"predecessor-version":[{"id":489188,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-blog-post\/489125\/revisions\/489188"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=489125"}],"wp:term":[{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=489125"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=489125"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}