{"id":733513,"date":"2021-04-20T21:08:04","date_gmt":"2021-04-21T04:08:04","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-project&p=733513"},"modified":"2022-07-12T23:43:28","modified_gmt":"2022-07-13T06:43:28","slug":"ocr-and-document-understanding","status":"publish","type":"msr-project","link":"https:\/\/www.microsoft.com\/en-us\/research\/project\/ocr-and-document-understanding\/","title":{"rendered":"OCR and Document Understanding"},"content":{"rendered":"

We have been developing SOTA technologies and industry-leading product solutions for following scenarios: (1) Universal OCR to detect and recognize any text in image\/PDF; (2) Universal math OCR to detect and recognize any math expression in image\/PDF; (3) Universal table understanding to detect, recognize, and understand any tables in image\/PDF; (4) Universal layout analysis to detect page objects such as text blocks, lists, tables, math equations, figures, etc. in any image\/PDF, identify their relationships, and determine the reading order of body text; (5) Universal information extraction to extract entities, key\/value pairs, item lists and other intended information from any image\/PDF document; (6) Synthetic data generation for the above scenarios to reduce cost, improve accuracy, and increase the speed of innovation.<\/p>\n

Related links:<\/p>\n

\n