GLGE: A New General Language Generation Evaluation Benchmark

  • Dayiheng Liu ,
  • ,
  • Yeyun Gong ,
  • Weizhen Qi ,
  • Hang Zhang ,
  • Jian Jiao ,
  • Wei Chen ,
  • Jie Fu ,
  • Linjun Shou ,
  • ,
  • Pengcheng Wang ,
  • Jiusheng Chen ,
  • Daxin Jiang (姜大昕) ,
  • Jiancheng Lv ,
  • Ruofei Zhang ,
  • Winnie Wu ,
  • Ming Zhou ,
  • Nan Duan

ACL-IJCNLP 2021 |

Multi-task benchmarks such as GLUE and SuperGLUE have driven great progress of pretraining and transfer learning in Natural Language Processing (NLP). These benchmarks mostly focus on a range of Natural Language Understanding (NLU) tasks, without considering the Natural Language Generation (NLG) models. In this paper, we present the General Language Generation Evaluation (GLGE), a new multi-task benchmark for evaluating the generalization capabilities of NLG models across eight language generation tasks. For each task, we continue to design three subtasks in terms of task difficulty (GLGE-Easy, GLGE-Medium, and GLGE-Hard). This introduces 24 subtasks to comprehensively compare model performance. To encourage research on pretraining and transfer learning on NLG models, we make GLGE publicly available and build a leaderboard with strong baselines including MASS, BART, and ProphetNet (The source code and dataset will be publicly available at this https URL).

Publication Downloads

GLGE

May 13, 2021

General Language Generation Evaluation (GLGE) benchmark is a new multi-task benchmark for evaluating the generalization capabilities of NLG across eight language generation tasks.