Document AI: Benchmarks, Models and Applications (Presentation@ICDAR 2021)
Document AI, or Document Intelligence, is a relatively new research topic that refers to techniques for automatically reading, understanding, and analyzing business documents. Understanding business documents is a very challenging task due to the diversity of layouts and formats, poor quality of scanned document images as well as the complexity of template structures. In this presentation, I will introduce Document AI from three perspectives: benchmarks, models, and applications. Starting from 2019, we released two benchmark datasets TableBank and DocBank, which are used for table detection and recognition as well as the page object detection for documents. Recently, we will release two new benchmark datasets, where ReadingBank for the reading order detection task, and XFUN for the multi-lingual form understanding task that contains forms in 8 languages. Furthermore, I will also introduce the multi-modal pre-training framework LayoutLM for Document AI, together with the latest LayoutLMv2 and the multi-lingual version LayoutXLM, which have been widely adopted by 1st and 3rd party applications. Finally, I will demonstrate how to apply the LayoutLM/LayoutXLM model family into a wide range of Document AI applications, including table detection, page object detection, reading order detection, form/receipt/invoice understanding, complex document understanding, document image classification, document VQA, etc, meanwhile achieving state-of-the-art performance across these benchmarks.