{"id":171092,"date":"2013-02-09T02:53:21","date_gmt":"2013-02-09T02:53:21","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/project\/structured-data-search\/"},"modified":"2019-08-19T18:23:22","modified_gmt":"2019-08-20T01:23:22","slug":"structured-data-search","status":"publish","type":"msr-project","link":"https:\/\/www.microsoft.com\/en-us\/research\/project\/structured-data-search\/","title":{"rendered":"Web Data Extraction and Search"},"content":{"rendered":"
The goal of this project is to extract structured data on the web (like html tables, lists, spreadsheets etc.) and make it accessible\/searchable on\u00a0Bing and Office 365.<\/p>\n
Some of the technical challenges:<\/p>\n
Our web data research had tremendous impact of several Microsoft products and services over the years:<\/p>\n
Past interns: Mohamed Yakout, Chi Wang, Meihui Zhang, Mohan Yang<\/p>\n","protected":false},"excerpt":{"rendered":"
The goal of this project is to extract structured data on the web (like html tables, lists, spreadsheets etc.) and make it accessible\/searchable on\u00a0Bing and Office 365. Some of the technical challenges: Table classification and understanding: The vast majority of html tables are used for formatting\/layout purposes; they do not any contain useful content . […]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","footnotes":""},"research-area":[13563,13555],"msr-locale":[268875],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-171092","msr-project","type-msr-project","status-publish","hentry","msr-research-area-data-platform-analytics","msr-research-area-search-information-retrieval","msr-locale-en_us","msr-archive-status-active"],"msr_project_start":"2013-02-09","related-publications":[162425,164287,167035,357899],"related-downloads":[],"related-videos":[],"related-groups":[],"related-events":[],"related-opportunities":[],"related-posts":[],"related-articles":[],"tab-content":[],"slides":[],"related-researchers":[{"type":"user_nicename","display_name":"Kaushik Chakrabarti","user_id":32503,"people_section":"Group 1","alias":"kaushik"},{"type":"user_nicename","display_name":"Surajit Chaudhuri","user_id":33764,"people_section":"Group 1","alias":"surajitc"}],"msr_research_lab":[199565],"msr_impact_theme":[],"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/171092"}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-project"}],"version-history":[{"count":3,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/171092\/revisions"}],"predecessor-version":[{"id":392360,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/171092\/revisions\/392360"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=171092"}],"wp:term":[{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=171092"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=171092"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=171092"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=171092"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}