{"id":382940,"date":"2017-04-18T00:00:59","date_gmt":"2017-04-18T07:00:59","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-research-item&p=382940"},"modified":"2022-01-04T07:43:08","modified_gmt":"2022-01-04T15:43:08","slug":"combining-algorithms-humans-large-scale-data-integration","status":"publish","type":"msr-video","link":"https:\/\/www.microsoft.com\/en-us\/research\/video\/combining-algorithms-humans-large-scale-data-integration\/","title":{"rendered":"Combining Algorithms and Humans for Large-Scale Data Integration"},"content":{"rendered":"
Modern enterprises collect data from their operations and the web, and strongly depend on the collected data to make important decisions. To analyze the collected data, enterprises need to first perform data integration, i.e., combine the data from the multiple sources to create a unified set. Data integration involves some tasks that are still very hard for computer algorithms, like tasks involving images, video, natural language, or data semantics understanding. Since humans may be more accurate with such tasks, the\u00a0approach of crowdsourcing has been proposed and applied by large companies and\u00a0research organizations, over the last years. In crowdsourcing, humans are\u00a0also involved, in order to enhance computer algorithms by completing small tasks,\u00a0like classifying a forum comment as offensive or ironic. Crowdsourcing drastically improves the accuracy of the outcome compared to using only computer algorithms, however, it does not scale due to the large amount of time (and monetary compensation) required by humans. In this talk, I will discuss how to make crowdsourcing scalable for data integration.<\/p>\n","protected":false},"excerpt":{"rendered":"
Modern enterprises collect data from their operations and the web, and strongly depend on the collected data to make important decisions. To analyze the collected data, enterprises need to first perform data integration, i.e., combine the data from the multiple sources to create a unified set. Data integration involves some tasks that are still very […]<\/p>\n","protected":false},"featured_media":382967,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","footnotes":""},"research-area":[13563],"msr-video-type":[242718,206954],"msr-locale":[268875],"msr-post-option":[],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-382940","msr-video","type-msr-video","status-publish","has-post-thumbnail","hentry","msr-research-area-data-platform-analytics","msr-video-type-aifactory","msr-video-type-microsoft-research-talks","msr-locale-en_us"],"msr_download_urls":"","msr_external_url":"https:\/\/youtu.be\/XHc7aLBJWdU","msr_secondary_video_url":"","msr_video_file":"","_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-video\/382940"}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-video"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-video"}],"version-history":[{"count":1,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-video\/382940\/revisions"}],"predecessor-version":[{"id":470571,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-video\/382940\/revisions\/470571"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media\/382967"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=382940"}],"wp:term":[{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=382940"},{"taxonomy":"msr-video-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-video-type?post=382940"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=382940"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=382940"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=382940"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=382940"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}