{"id":773401,"date":"2021-09-09T23:06:39","date_gmt":"2021-09-10T06:06:39","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-research-item&p=773401"},"modified":"2021-09-09T23:06:39","modified_gmt":"2021-09-10T06:06:39","slug":"document-ai-benchmarks-models-and-applications-presentationicdar-2021","status":"publish","type":"msr-research-item","link":"https:\/\/www.microsoft.com\/en-us\/research\/publication\/document-ai-benchmarks-models-and-applications-presentationicdar-2021\/","title":{"rendered":"Document AI: Benchmarks, Models and Applications (Presentation@ICDAR 2021)"},"content":{"rendered":"

Document AI, or Document Intelligence, is a relatively new research topic that refers to techniques for automatically reading, understanding, and analyzing business documents. Understanding business documents is a very challenging task due to the diversity of layouts and formats, poor quality of scanned document images as well as the complexity of template structures.\u00a0 In this presentation, I will introduce Document AI from three perspectives: benchmarks, models, and applications. Starting from 2019, we released two benchmark datasets TableBank and DocBank, which are used for table detection and recognition as well as the page object detection for documents. Recently, we will release two new benchmark datasets, where ReadingBank for the reading order detection task, and XFUN for the multi-lingual form understanding task that contains forms in 8 languages. Furthermore, I will also introduce the multi-modal pre-training framework LayoutLM for Document AI, together with the latest LayoutLMv2 and the multi-lingual version LayoutXLM, which have been widely adopted by 1st<\/sup> and 3rd<\/sup> party applications. Finally, I will demonstrate how to apply the LayoutLM\/LayoutXLM model family into a wide range of Document AI applications, including table detection, page object detection, reading order detection, form\/receipt\/invoice understanding, complex document understanding, document image classification, document VQA, etc, meanwhile achieving state-of-the-art performance across these benchmarks.<\/p>\n","protected":false},"excerpt":{"rendered":"

Document AI, or Document Intelligence, is a relatively new research topic that refers to techniques for automatically reading, understanding, and analyzing business documents. Understanding business documents is a very challenging task due to the diversity of layouts and formats, poor quality of scanned document images as well as the complexity of template structures.\u00a0 In this […]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","footnotes":""},"msr-content-type":[3],"msr-research-highlight":[],"research-area":[13556,13545],"msr-publication-type":[193722],"msr-product-type":[],"msr-focus-area":[],"msr-platform":[],"msr-download-source":[],"msr-locale":[268875],"msr-post-option":[],"msr-field-of-study":[],"msr-conference":[],"msr-journal":[],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-773401","msr-research-item","type-msr-research-item","status-publish","hentry","msr-research-area-artificial-intelligence","msr-research-area-human-language-technologies","msr-locale-en_us"],"msr_publishername":"","msr_edition":"","msr_affiliation":"","msr_published_date":"2021-9","msr_host":"","msr_duration":"","msr_version":"","msr_speaker":"","msr_other_contributors":"","msr_booktitle":"","msr_pages_string":"","msr_chapter":"","msr_isbn":"","msr_journal":"","msr_volume":"","msr_number":"","msr_editors":"","msr_series":"","msr_issue":"","msr_organization":"","msr_how_published":"","msr_notes":"DIL workshop in ICDAR 2021","msr_highlight_text":"","msr_release_tracker_id":"","msr_original_fields_of_study":"","msr_download_urls":"","msr_external_url":"","msr_secondary_video_url":"","msr_longbiography":"","msr_microsoftintellectualproperty":1,"msr_main_download":"","msr_publicationurl":"","msr_doi":"","msr_publication_uploader":"","msr_related_uploader":[{"type":"file","viewUrl":"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/09\/Document_AI-Lei-Cui.pdf","id":"773404","title":"document_ai-lei-cui","label_id":"243118","label":0}],"msr_attachments":[{"id":773404,"url":"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/09\/Document_AI-Lei-Cui.pdf"}],"msr-author-ordering":[{"type":"user_nicename","value":"Lei Cui","user_id":32631,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Lei Cui"}],"msr_impact_theme":[],"msr_research_lab":[199560],"msr_event":[],"msr_group":[144735],"msr_project":[640743],"publication":[],"video":[],"download":[],"msr_publication_type":"manual","related_content":{"projects":[{"ID":640743,"post_title":"Document AI (Intelligent Document Processing)","post_name":"document-ai","post_type":"msr-project","post_date":"2021-08-10 04:05:12","post_modified":"2024-08-11 21:01:24","post_status":"publish","permalink":"https:\/\/www.microsoft.com\/en-us\/research\/project\/document-ai\/","post_excerpt":"Document AI (opens in new tab), or Document Intelligence, is a new research topic that refers to techniques for automatically reading, understanding, and analyzing business documents. Understanding business documents is an incredibly challenging task due to the diversity of layouts and formats, inferior quality of scanned document images as well as the complexity of template structures. Starting in 2019, we released two benchmark datasets TableBank (opens in new tab) and DocBank (opens in new tab),…","_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/640743"}]}}]},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/773401"}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-research-item"}],"version-history":[{"count":2,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/773401\/revisions"}],"predecessor-version":[{"id":773410,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/773401\/revisions\/773410"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=773401"}],"wp:term":[{"taxonomy":"msr-content-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-content-type?post=773401"},{"taxonomy":"msr-research-highlight","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-highlight?post=773401"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=773401"},{"taxonomy":"msr-publication-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-publication-type?post=773401"},{"taxonomy":"msr-product-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-product-type?post=773401"},{"taxonomy":"msr-focus-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-focus-area?post=773401"},{"taxonomy":"msr-platform","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-platform?post=773401"},{"taxonomy":"msr-download-source","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-download-source?post=773401"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=773401"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=773401"},{"taxonomy":"msr-field-of-study","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-field-of-study?post=773401"},{"taxonomy":"msr-conference","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-conference?post=773401"},{"taxonomy":"msr-journal","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-journal?post=773401"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=773401"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=773401"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}