{"id":455412,"date":"2017-11-17T00:00:17","date_gmt":"2017-11-17T08:00:17","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?p=455412"},"modified":"2018-01-23T16:32:47","modified_gmt":"2018-01-24T00:32:47","slug":"figureqa-annotated-figure-dataset-visual-reasoning","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/figureqa-annotated-figure-dataset-visual-reasoning\/","title":{"rendered":"FigureQA: an annotated figure dataset for visual reasoning"},"content":{"rendered":"
Almost every scientific publication is accompanied by data visualizations in the form of graphs and charts.\u00a0Figures are an intuitive aid for understanding the content of documents, so naturally, it is useful to leverage this visual information for machine reading comprehension.<\/p>\n
To enable research in this domain we built\u00a0FigureQA (opens in new tab)<\/span><\/a>,\u00a0a new dataset composed of figure images \u2013 like bar graphs, line plots, and pie charts \u2013 and question and answer pairs about them. We introduce this dataset and the task of answering questions about figures to test the limits of existing visual reasoning algorithms, as well as to encourage the development of new models that can understand qualities and relations that are intuitive for humans.<\/p>\n Figure 1:\u00a0A sample line plot with some questions and answers taken from FigureQA<\/em><\/p>\n FigureQA and the task it introduces align with the research domains of visual question answering and reasoning.\u00a0Visual question answering (VQA)\u00a0involves studying an image and question to produce an answer; this requires a joint understanding of images and language to achieve visual comprehension.\u00a0Our research and dataset focus on relational reasoning, which aims at discovering relationships between abstract properties of elements in an image.<\/p>\n A number of visual question answering and reasoning datasets have been published before FigureQA.\u00a0Some datasets, like the one introduced with the VQA challenge [1,3],\u00a0contain photographs or artificial scenes mimicking the real world. These images are accompanied by open-ended, yes-no,\u00a0or multiple-choice questions that were collected from humans.\u00a0The high variety in these datasets makes them useful for general visual question answering,\u00a0but not suited for sophisticated reasoning. Complex scenes require a large amount of common sense knowledge about the world. Acquiring such common sense knowledge does not seem possible given only a VQA dataset.<\/p>\n For models to focus more on reasoning,\u00a0it is necessary to train on datasets having more restricted and specialized properties.\u00a0A number of published datasets including NLVR [10]\u00a0and CLEVR [5]\u00a0achieve this by generating the sample images with a program, rather than annotating natural images.\u00a0These datasets consist of images with simple geometric elements set in a basic scene, as well as questions that compare these elements based on properties like size,\u00a0count, and spatial relationships.\u00a0While such datasets are much better for testing the reasoning abilities of models, they are composed of toy scenes that do not capture the complexity of the real world.<\/p>\n We also note that FigureQA is not the first figure dataset to be published.\u00a0Existing figure datasets consist of images scraped from research papers with human-annotated plot types,\u00a0bounding boxes,\u00a0or values [8];\u00a0similar synthetically generated datasets also exist [2].\u00a0These datasets are suitable for the tasks of plot classification,\u00a0extracting plot values,\u00a0and optical character recognition (OCR), but not visual reasoning due to their lack of questions and answers.<\/p>\n FigureQA was created to develop deep neural network models that are capable of visual reasoning and can also be applied to a domain of real-world data: graphical figures. Our dataset is generated entirely by a program and has 180,000 images with over two million questions.\u00a0Except for the test set which is used for evaluation,\u00a0answers are also provided.<\/p>\n We selected five types of figures commonly found in analytical reports and documents for our dataset, namely:\u00a0vertical and horizontal bar graphs, line and dotted line plots, and pie charts.\u00a0The figures were generated randomly using some pre-defined distributions and constraints to ensure a large amount of variety,\u00a0while making sure they appeared realistic.<\/p>\n Figure 2: Examples of each figure type<\/em><\/p>\n \t\t\t \t<\/div>\n\t \t \t<\/div>\n\t<\/p>\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t \t<\/div>\n\t \t \t<\/div>\n\t<\/p>\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t \t<\/div>\n\t<\/p>\t\t\t<\/div>\n\t\t<\/div>\n\t\t<\/p>\n Fifteen types of questions concerning the figures were chosen for the dataset, addressing quantitative relational properties of the plot elements.\u00a0These questions are generated from pre-defined templates and all have yes or no answers. The answers are computed by evaluating the questions on the source data used to synthesize each figure.<\/p>\n The questions in FigureQA ask about one-versus-all characteristics of the figures,\u00a0like which plot elements are or contain the maximum, minimum,\u00a0or median values.\u00a0We also generate questions that compare two plot elements, for example whether one pie slice is larger than another, or if two curves intersect. As an extra challenge for line plots, we also formulate questions regarding the smoothness of curves and the area contained under the curve.<\/p>\n Table 1:\u00a0All question types, where X and Y are plot elements.<\/em><\/p>\n One problem with existing visual question answering datasets is that they may have a skewed distribution of answers for a specific question or visual feature.\u00a0It has been observed that models will identify this bias and exploit it, even to the point where they will ignore the image completely [3].\u00a0We took great care to de-bias our dataset by balancing the number of yes and no answers per question type, having many color combinations, and generating our source data randomly to ensure a high degree of visual variation.<\/p>\n In addition to images, questions, and answers,\u00a0for each figure we provide its source data and the bounding boxes for all the elements it contains.\u00a0This allows the dataset to be extended to other types of questions and answers and provides visual features that are useful for developing models.<\/p>\n Figure 3:\u00a0Dotted line plot with bounding boxes visualized<\/em><\/p>\n\t\t\t \t<\/div>\n\t \t \t<\/div>\n\t<\/p>\t\t\t<\/div>\n\t\t<\/div>\n\t\t\n We developed FigureQA to be a challenging visual reasoning dataset, and this has been confirmed by the performance of our baseline models.\u00a0We tested some common and advanced neural network models on our dataset, as well as the ability of people to answer the questions posed.<\/p>\n Table 2: overall baseline performance on the test set<\/em><\/p>\n Establishing a baseline for human performance on FigureQA was essential to benchmark our models.\u00a0We tested our editorial team at the Montreal lab on a portion of our FigureQA test set and observed that humans performed exceptionally well on most figure and question types, though not perfect, reaching an overall accuracy of 91.21%.\u00a0Line plots in general, median questions, and questions concerning roughness and smoothness were the most challenging.<\/p>\n The first models we evaluated our dataset on were more conventional deep neural network architectures, without more specialized neural units incorporated into them.\u00a0We used a long short-term memory (LSTM)\u00a0[4]\u00a0to establish a baseline on question text only. We also combined the encoding from our text-only baseline with the representation produced by a convolutional neural network (CNN) [6]\u00a0on the figure image as input to a multilayer perceptron (MLP) baseline.\u00a0It is common in computer vision to leverage visual features from pre-trained CNNs. We also provide a CNN baseline that uses VGG-16 features [9] instead of raw pixels as input.\u00a0Our results indicate that these models do not perform much better than a coin flip.<\/p>\n We also tested a recent advanced neural architecture called a relation network [7] on FigureQA.\u00a0This model contains modules that are specialized for reasoning about the relationships between elements of an input image.\u00a0Relation networks have achieved exceptional performance on tasks like visual reasoning on the CLEVR dataset,\u00a0which makes this model suitable for evaluation purposes.\u00a0The relation network did better than our two simpler models, with 61.54% accuracy overall, but still performed significantly worse than our human baseline.<\/p>\n Figure 4:\u00a0Relation network architecture<\/em><\/p>\n The thirty-percent or greater gap in overall accuracy between our human benchmark and the neural models for visual reasoning we tested indicates that FigureQA is a challenging dataset for this task. Our results show that our dataset is very suitable for developing more powerful visual question answering and reasoning models,\u00a0creating opportunities for research in this field.<\/p>\n FigureQA also provides another potential application for AI and how it can impact our work and lives. One could imagine how a tool to assess graphs and charts could automate decision-making processes or empower analysts.<\/p>\n We look forward to the research advances, models, and applications that come from our FigureQA dataset.<\/p>\n Related:<\/strong><\/p>\n Reasoning about figures Almost every scientific publication is accompanied by data visualizations in the form of graphs and charts.\u00a0Figures are an intuitive aid for understanding the content of documents, so naturally, it is useful to leverage this visual information for machine reading comprehension. To enable research in this domain we built\u00a0FigureQA,\u00a0a new dataset composed of […]<\/p>\n","protected":false},"author":39507,"featured_media":455934,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-author-ordering":[],"msr_hide_image_in_river":0,"footnotes":""},"categories":[194474],"tags":[186454,187413,241797,241794],"research-area":[13556,13563],"msr-region":[],"msr-event-type":[],"msr-locale":[268875],"msr-post-option":[],"msr-impact-theme":[],"msr-promo-type":[],"msr-podcast-series":[],"class_list":["post-455412","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-data-visulalization","tag-data-visualization","tag-datasets","tag-figureqa","tag-machine-reading-comprehension","msr-research-area-artificial-intelligence","msr-research-area-data-platform-analytics","msr-locale-en_us"],"msr_event_details":{"start":"","end":"","location":""},"podcast_url":"","podcast_episode":"","msr_research_lab":[437514],"msr_impact_theme":[],"related-publications":[],"related-downloads":[],"related-videos":[],"related-academic-programs":[],"related-groups":[],"related-projects":[],"related-events":[],"related-researchers":[],"msr_type":"Post","featured_image_thumbnail":"
<\/p>\nMotivation<\/h3>\n
Dataset description<\/h3>\n
<\/a><\/p>
<\/a><\/p>
<\/a><\/p>
<\/a><\/p>
<\/a><\/p>
<\/p>\n
<\/a><\/p>
<\/a><\/p>Experimental results<\/h3>\n
<\/p>\n
<\/p>\nImpact<\/h3>\n
\n
","byline":"","formattedDate":"November 17, 2017","formattedExcerpt":"Reasoning about figures Almost every scientific publication is accompanied by data visualizations in the form of graphs and charts.\u00a0Figures are an intuitive aid for understanding the content of documents, so naturally, it is useful to leverage this visual information for machine reading comprehension. To enable…","locale":{"slug":"en_us","name":"English","native":"","english":"English"},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/455412","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/users\/39507"}],"replies":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/comments?post=455412"}],"version-history":[{"count":18,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/455412\/revisions"}],"predecessor-version":[{"id":456087,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/455412\/revisions\/456087"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media\/455934"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=455412"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/categories?post=455412"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/tags?post=455412"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=455412"},{"taxonomy":"msr-region","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-region?post=455412"},{"taxonomy":"msr-event-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event-type?post=455412"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=455412"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=455412"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=455412"},{"taxonomy":"msr-promo-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-promo-type?post=455412"},{"taxonomy":"msr-podcast-series","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-podcast-series?post=455412"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}