{"id":585388,"date":"2019-05-09T10:01:00","date_gmt":"2019-05-09T17:01:00","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-project&#038;p=585388"},"modified":"2019-09-23T07:19:51","modified_gmt":"2019-09-23T14:19:51","slug":"generative-neural-visual-artist-geneva","status":"publish","type":"msr-project","link":"https:\/\/www.microsoft.com\/en-us\/research\/project\/generative-neural-visual-artist-geneva\/","title":{"rendered":"Generative Neural Visual Artist (GeNeVA)"},"content":{"rendered":"<p><img decoding=\"async\" class=\"alignright\" style=\"width: 40%;max-width: 480px\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/05\/GeNeVA-task.png\" alt=\"The Generative Neural Visual Artist (GeNeVA) task\" \/><\/p>\n<h3><strong>The Generative Neural Visual Artist (GeNeVA) task<\/strong><\/h3>\n<div style=\"max-width: 960px\">\n<p>The GeNeVA task involves a <em>Teller<\/em> giving a sequence of linguistic instructions to a <em>Drawer<\/em> for the ultimate goal of image generation.<\/p>\n<p>The <em>Teller<\/em> is able to gauge progress through visual feedback of the generated image. This is a challenging task because the <em>Drawer<\/em> needs to learn how to map complex linguistic instructions to realistic objects on a canvas, maintaining not only object properties but relationships between objects (e.g., relative location). The <em>Drawer<\/em> also needs to modify the existing drawing in a manner consistent with previous images and instructions, so it needs to remember previous instructions. All of these involve understanding a complex relationship between objects in the scene and how those relationships are expressed in the image in a way that is consistent with all instructions given.<\/p>\n<p>An example instruction sequence for the GeNeVA task is shown in the figure on the right.<\/p>\n<p>We introduce the GeNeVA task and a model (GeNeVA-GAN) for performing this task in our paper <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/tell-draw-and-repeat-generating-and-modifying-images-based-on-continual-linguistic-instruction\/\">Tell, Draw, and Repeat: Generating and modifying images based on continual linguistic instruction<\/a>.<\/p>\n\n\t<a\n\t\thref=\"https:\/\/arxiv.org\/pdf\/1811.09845.pdf\"\n\t\tclass=\"button cta-link\"\n\t\tdata-bi-type=\"button\"\n\t\tdata-bi-cN=\"Read the paper\"\n\t\tdata-bi-tN=\"shortcodes\/msr-button\"\n\t\ttarget=\"_blank\" rel=\"noopener noreferrer\">\n\t\tRead the paper\t<\/a>\n\n\t\n<\/div>\n<hr \/>\n<h3><strong>GeNeVA &#8211; Examples<\/strong><\/h3>\n<p>Images generated by the GeNeVA-GAN model described in <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/tell-draw-and-repeat-generating-and-modifying-images-based-on-continual-linguistic-instruction\/\">Tell, Draw, and Repeat: Generating and modifying images based on continual linguistic instruction<\/a> on CoDraw (top row) and i-CLEVR (bottom row) datasets are shown with the provided instructions below:<br \/>\n<img decoding=\"async\" class=\"aligncenter\" style=\"width: 95%;max-width: 960px\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/05\/GeNeVA-codraw-i-clevr-examples.png\" alt=\"Example images generated by GeNeVA-GAN\" \/><\/p>\n<hr \/>\n<h3><strong>GeNeVA &#8211; Datasets<\/strong><\/h3>\n<p>For re-creating the CoDraw and i-CLEVR datasets used for the GeNeVA task, you will have to download the data files and then run the dataset generation code as shown in the following links:<\/p>\n<p>\n\t<a\n\t\thref=\"https:\/\/figureqadataset.blob.core.windows.net\/live-dataset\/GeNeVA-v1.zip?st=2019-04-30T18%3A33%3A50Z&#038;se=3019-05-01T18%3A33%3A00Z&#038;sp=rl&#038;sv=2018-03-28&#038;sr=b&#038;sig=iA1J9NdrKtl2aw0BfYv%2FNMxxqUghy2g0jGKtp99%2BSh8%3D\"\n\t\tclass=\"button cta-link\"\n\t\tdata-bi-type=\"button\"\n\t\tdata-bi-cN=\"Download the GeNeVA Data Files\"\n\t\tdata-bi-tN=\"shortcodes\/msr-button\"\n\t\ttarget=\"_blank\" rel=\"noopener noreferrer\">\n\t\tDownload the GeNeVA Data Files\t<\/a>\n\n\t \n\t<a\n\t\thref=\"https:\/\/github.com\/Maluuba\/GeNeVA_datasets\"\n\t\tclass=\"button cta-link\"\n\t\tdata-bi-type=\"button\"\n\t\tdata-bi-cN=\"GeNeVA Datasets - Generation Code\"\n\t\tdata-bi-tN=\"shortcodes\/msr-button\"\n\t\ttarget=\"_blank\" rel=\"noopener noreferrer\">\n\t\tGeNeVA Datasets - Generation Code\t<\/a>\n\n\t<\/p>\n<hr \/>\n<h3><strong>GeNeVA &#8211; Pre-trained Models<\/strong><\/h3>\n<p>The pre-trained models for the object detector and localizer model that we used for evaluating the metrics can be downloaded from the following links:<\/p>\n<p>\n\t<a\n\t\thref=\"https:\/\/figureqadataset.blob.core.windows.net\/live-dataset\/GeNeVA-models\/codraw_inception_best_checkpoint.pth?st=2019-08-16T20%3A33%3A41Z&#038;se=3019-08-17T20%3A33%3A00Z&#038;sp=rl&#038;sv=2018-03-28&#038;sr=b&#038;sig=0aheTm73x5pHzwde%2FRcRBBfrBgVRhE1uxVB4kwk8k7g%3D\"\n\t\tclass=\"button cta-link\"\n\t\tdata-bi-type=\"button\"\n\t\tdata-bi-cN=\"CoDraw Object Detector and Localizer\"\n\t\tdata-bi-tN=\"shortcodes\/msr-button\"\n\t\ttarget=\"_blank\" rel=\"noopener noreferrer\">\n\t\tCoDraw Object Detector and Localizer\t<\/a>\n\n\t \n\t<a\n\t\thref=\"https:\/\/figureqadataset.blob.core.windows.net\/live-dataset\/GeNeVA-models\/iclevr_inception_best_checkpoint.pth?st=2019-08-16T20%3A34%3A22Z&#038;se=3019-08-17T20%3A34%3A00Z&#038;sp=rl&#038;sv=2018-03-28&#038;sr=b&#038;sig=U9eRRPZHoZDOLO%2FWYnNAZ9attfFJKlGo28ZX7D%2BTIDk%3D\"\n\t\tclass=\"button cta-link\"\n\t\tdata-bi-type=\"button\"\n\t\tdata-bi-cN=\"i-CLEVR Object Detector and Localizer\"\n\t\tdata-bi-tN=\"shortcodes\/msr-button\"\n\t\ttarget=\"_blank\" rel=\"noopener noreferrer\">\n\t\ti-CLEVR Object Detector and Localizer\t<\/a>\n\n\t<\/p>\n<hr \/>\n<h3><strong>GeNeVA &#8211; Training and Evaluation Code<\/strong><\/h3>\n<p>Code to train and evaluate the GeNeVA-GAN model for the GeNeVA task can be obtained from the following link:<\/p>\n\n\t<a\n\t\thref=\"https:\/\/github.com\/Maluuba\/GeNeVA\"\n\t\tclass=\"button cta-link\"\n\t\tdata-bi-type=\"button\"\n\t\tdata-bi-cN=\"GeNeVA - Training and Evaluation Code\"\n\t\tdata-bi-tN=\"shortcodes\/msr-button\"\n\t\ttarget=\"_blank\" rel=\"noopener noreferrer\">\n\t\tGeNeVA - Training and Evaluation Code\t<\/a>\n\n\t\n<hr \/>\n<h3><strong>Reference<\/strong><\/h3>\n<p>If you use the GeNeVA task, code, or datasets as part of any published research, please cite the following paper:<\/p>\n<p>Alaaeldin El-Nouby, Shikhar Sharma, Hannes Schulz, Devon Hjelm, Layla El Asri, Samira Ebrahimi Kahou, Yoshua Bengio, and Graham W. Taylor.\u00a0&#8220;Tell, Draw, and Repeat: Generating and modifying images based on continual linguistic instruction&#8221;.\u00a0<em>Proceedings of the IEEE International Conference on Computer Vision (ICCV)<\/em>. 2019.<\/p>\n<pre><code style=\"font-size: 75%\">@InProceedings{El-Nouby_2019_ICCV,\r\n    author    = {El{-}Nouby, Alaaeldin and\r\n                 Sharma, Shikhar and\r\n                 Schulz, Hannes and\r\n                 Hjelm, Devon and\r\n                 El Asri, Layla and\r\n                 Ebrahimi Kahou, Samira and\r\n                 Bengio, Yoshua and\r\n                 Taylor, Graham W.},\r\n    title     = {Tell, Draw, and Repeat: Generating and modifying images based on continual linguistic instruction},\r\n    booktitle = {The IEEE International Conference on Computer Vision (ICCV)},\r\n    month     = {Oct},\r\n    year      = {2019}\r\n}<\/code><\/pre>\n<hr \/>\n","protected":false},"excerpt":{"rendered":"<p>The Generative Neural Visual Artist (GeNeVA) task The GeNeVA task involves a Teller giving a sequence of linguistic instructions to a Drawer for the ultimate goal of image generation. The Teller is able to gauge progress through visual feedback of the generated image. This is a challenging task because the Drawer needs to learn how [&hellip;]<\/p>\n","protected":false},"featured_media":587743,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","footnotes":""},"research-area":[13556,13562,13554],"msr-locale":[268875],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-585388","msr-project","type-msr-project","status-publish","has-post-thumbnail","hentry","msr-research-area-artificial-intelligence","msr-research-area-computer-vision","msr-research-area-human-computer-interaction","msr-locale-en_us","msr-archive-status-active"],"msr_project_start":"","related-publications":[585919,556554],"related-downloads":[586126,610173],"related-videos":[],"related-groups":[],"related-events":[],"related-opportunities":[],"related-posts":[],"related-articles":[],"tab-content":[],"slides":[],"related-researchers":[{"type":"user_nicename","display_name":"Shikhar Sharma","user_id":36557,"people_section":"Section name 1","alias":"shsh"},{"type":"user_nicename","display_name":"Hannes Schulz","user_id":37188,"people_section":"Section name 1","alias":"haschulz"},{"type":"guest","display_name":"Alaaeldin El-Nouby","user_id":585409,"people_section":"Section name 1","alias":""},{"type":"guest","display_name":"Layla El Asri","user_id":604137,"people_section":"Section name 1","alias":""},{"type":"guest","display_name":"Samira Ebrahimi Kahou","user_id":585412,"people_section":"Section name 1","alias":""},{"type":"guest","display_name":"Yoshua Bengio","user_id":585415,"people_section":"Section name 1","alias":""},{"type":"guest","display_name":"Graham Taylor","user_id":585418,"people_section":"Section name 1","alias":""}],"msr_research_lab":[437514],"msr_impact_theme":[],"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/585388"}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-project"}],"version-history":[{"count":60,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/585388\/revisions"}],"predecessor-version":[{"id":610155,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/585388\/revisions\/610155"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media\/587743"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=585388"}],"wp:term":[{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=585388"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=585388"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=585388"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=585388"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}