FigureQA Dataset

Highlights

100,000
Figure images in the training set

1,327,368
Question-answer pairs in the training set

100
Unique colors and possible names for figure plot elements

15
Question types for quantitative attributes

Details

Dataset Split # Images # Questions Has Answers & Annotations? Color Scheme
Train 100,000 1,327,368 Yes Scheme 1
Validation 1 20,000 265,106 Yes Scheme 1
Validation 2 20,000 265,798 Yes Scheme 2
Test 1 20,000 265,024 No Scheme 1
Test 2 20,000 265,402 No Scheme 2

 

Unique Features

Additionally, the following features make FigureQA a distinct visual question-answering (VQA) and reasoning dataset:

  • It is entirely synthetically generated. Any number of samples can be generated in a configurable and extensible manner.
  • Each figure image is accompanied by the source data used to create it. This data can be used as input features or a learning target, and can be used to formulate questions and answers.
  • Rich bounding box annotations for all plot elements are extracted automatically and included with each generated figure image.

Figure Color Schemes

To color and identify plot elements, 100 colors where selected from the X11 named color set. Colors were selected to have a large color distance from white, the background color, with some modifications to the names to enhance readability.

In order to evaluate models on unseen color combinations, we provide validation and test sets with two color schemes consisting of alternating disjoint color sets. Each figure is colored with one set according to the training color scheme, then the other color set in the test set using the test color scheme. This ensures that all colors are learned during training, and is consistent with the one used in the CLEVR dataset.

For example:

Scheme 1

  • Vertical bar graphs, line charts, and pie charts are colored using 50 unique colors in set A, including crimson, seafoam, and royal blue.
  • Horizontal bar graphs and dot line charts are colored using 50 unique colors in set B, including light coral, sienna, and web purple.

Scheme 2

  • Vertical bar graphs, line charts, and pie charts are colored using 50 unique colors in set B, including light coral, sienna, and web purple.
  • Horizontal bar graphs and dot line charts are colored using 50 unique colors in set A, including crimson, seafoam, and royal blue.

人员

Samira Ebrahimi Kahou的肖像

Samira Ebrahimi Kahou

Postdoctoral Researcher

McGill University, Mila

Vincent Michalski的肖像

Vincent Michalski

Research Intern

MILA

Adam Atkinson的肖像

Adam Atkinson

Software Developer

Akos Kadar的肖像

Akos Kadar

Research Intern

Yoshua Bengio的肖像

Yoshua Bengio

Founder and Scientific Director

Mila – Quebec AI Institute

Mahmoud Adada的肖像

Mahmoud Adada

Principal Engineering Manager

Rahul Mehrotra的肖像

Rahul Mehrotra

Senior Program Manager