{"id":372359,"date":"2017-03-20T11:14:55","date_gmt":"2017-03-20T18:14:55","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-research-item&p=372359"},"modified":"2023-11-16T17:01:02","modified_gmt":"2023-11-17T01:01:02","slug":"synthesizing-mapping-relationships-using-table-corpus","status":"publish","type":"msr-research-item","link":"https:\/\/www.microsoft.com\/en-us\/research\/publication\/synthesizing-mapping-relationships-using-table-corpus\/","title":{"rendered":"Auto-Map: Synthesizing Mapping Relationships Using Table Corpus"},"content":{"rendered":"

Mapping relationships, such as (country, country-code) or (company, stock-ticker), are valuable data assets for an array of applications in data cleaning and data integration like auto-correction and auto-join. However, today there are no good repositories of mapping tables that can enable these intelligent applications. Given a corpus of tables such as web tables or spreadsheet tables, we observe that values of these mappings often exist in pairs of columns in same tables. Motivated by their broad applicability, we study the problem of synthesizing mapping relationships using a large table corpus. Our synthesis process leverages compatibility of tables based on co-occurrence statistics, as well as constraints such as functional dependency. Experiment results using web tables and enterprise spreadsheets suggest that the proposed approach can produce high quality mapping relationships.<\/p>\n

Our benchmark data has been made available on GitHub https:\/\/github.com\/Yeye-He\/Mapping-Synthesis (opens in new tab)<\/span><\/a> to facilitate future research.<\/p>\n","protected":false},"excerpt":{"rendered":"

Mapping relationships, such as (country, country-code) or (company, stock-ticker), are valuable data assets for an array of applications in data cleaning and data integration like auto-correction and auto-join. However, today there are no good repositories of mapping tables that can enable these intelligent applications. Given a corpus of tables such as web tables or spreadsheet […]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","footnotes":""},"msr-content-type":[3],"msr-research-highlight":[],"research-area":[13563],"msr-publication-type":[193716],"msr-product-type":[],"msr-focus-area":[],"msr-platform":[],"msr-download-source":[],"msr-locale":[268875],"msr-post-option":[],"msr-field-of-study":[],"msr-conference":[],"msr-journal":[],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-372359","msr-research-item","type-msr-research-item","status-publish","hentry","msr-research-area-data-platform-analytics","msr-locale-en_us"],"msr_publishername":"","msr_edition":"","msr_affiliation":"","msr_published_date":"2017-5-14","msr_host":"","msr_duration":"","msr_version":"","msr_speaker":"","msr_other_contributors":"","msr_booktitle":"","msr_pages_string":"","msr_chapter":"","msr_isbn":"","msr_journal":"","msr_volume":"","msr_number":"","msr_editors":"","msr_series":"","msr_issue":"","msr_organization":"","msr_how_published":"","msr_notes":"","msr_highlight_text":"","msr_release_tracker_id":"","msr_original_fields_of_study":"","msr_download_urls":"","msr_external_url":"","msr_secondary_video_url":"","msr_longbiography":"","msr_microsoftintellectualproperty":1,"msr_main_download":"373274","msr_publicationurl":"","msr_doi":"","msr_publication_uploader":[{"type":"file","viewUrl":"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2017\/03\/mapping-synthesis.pdf","id":"373274","title":"mapping-synthesis","label_id":"243109","label":0},{"type":"file","viewUrl":"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2017\/03\/benchmark-web-1.zip","id":"373307","title":"benchmark-web","label_id":"243109","label":0},{"type":"file","viewUrl":"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2017\/03\/mapping-synthesis-full.pdf","id":"373304","title":"mapping-synthesis-full","label_id":"243109","label":0}],"msr_related_uploader":"","msr_attachments":[{"id":373307,"url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2017\/03\/benchmark-web-1.zip"},{"id":373304,"url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2017\/03\/mapping-synthesis-full.pdf"},{"id":373274,"url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2017\/03\/mapping-synthesis.pdf"}],"msr-author-ordering":[{"type":"user_nicename","value":"Yue Wang","user_id":36966,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Yue Wang"},{"type":"user_nicename","value":"Yeye He","user_id":34992,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Yeye He"}],"msr_impact_theme":[],"msr_research_lab":[],"msr_event":[],"msr_group":[957177],"msr_project":[967218,169514,169513],"publication":[],"video":[],"download":[],"msr_publication_type":"inproceedings","related_content":{"projects":[{"ID":967218,"post_title":"Self-service Data Preparation","post_name":"self-service-data-preparation","post_type":"msr-project","post_date":"2023-11-08 14:36:00","post_modified":"2023-11-18 10:15:39","post_status":"publish","permalink":"https:\/\/www.microsoft.com\/en-us\/research\/project\/self-service-data-preparation\/","post_excerpt":"It is often cited that data scientists spend a significant portion of their time (up to 80%), cleaning and preparing data. For less-technical users, who may be less proficient in writing code (e.g., in Excel, Power-BI and Tableau), the tasks of preparing and cleaning data are not just time-consuming, but also technically challenging. In the \"Self-service Data Preparation\" project, our goal is to develop technologies that can automate common data-preparation tasks, in the context of…","_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/967218"}]}},{"ID":169514,"post_title":"Data Exploration","post_name":"data-exploration","post_type":"msr-project","post_date":"2004-06-08 15:56:40","post_modified":"2017-06-06 10:57:58","post_status":"publish","permalink":"https:\/\/www.microsoft.com\/en-us\/research\/project\/data-exploration\/","post_excerpt":"This is a project area rather than a specific project. This project area focuses on novel ways to query, browse, extract, explore, mine and manage various kinds of data residing within the enterprise and on the web: structured data in relational databases, tabular data embedded in web pages, enterprise documents and spreadsheets as well as unstructured data in query logs, text documents and social media. Our research is relevant to both enterprise and consumer scenarios…","_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/169514"}]}},{"ID":169513,"post_title":"Data Cleaning","post_name":"data-cleaning","post_type":"msr-project","post_date":"2002-07-01 16:21:12","post_modified":"2017-06-06 10:55:49","post_status":"publish","permalink":"https:\/\/www.microsoft.com\/en-us\/research\/project\/data-cleaning\/","post_excerpt":"Poor data quality is a well-known problem in data warehouses that arises for a variety of reasons such as data entry errors and differences in data representation among data sources. For example, one source may use abbreviated state names while another source may use fully expanded state names. However, high quality data is essential for accurate data analysis. Data cleaning is the process of detecting and correcting errors and inconsistencies in data. Goal Typical data…","_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/169513"}]}}]},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/372359"}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-research-item"}],"version-history":[{"count":4,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/372359\/revisions"}],"predecessor-version":[{"id":756925,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/372359\/revisions\/756925"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=372359"}],"wp:term":[{"taxonomy":"msr-content-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-content-type?post=372359"},{"taxonomy":"msr-research-highlight","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-highlight?post=372359"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=372359"},{"taxonomy":"msr-publication-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-publication-type?post=372359"},{"taxonomy":"msr-product-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-product-type?post=372359"},{"taxonomy":"msr-focus-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-focus-area?post=372359"},{"taxonomy":"msr-platform","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-platform?post=372359"},{"taxonomy":"msr-download-source","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-download-source?post=372359"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=372359"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=372359"},{"taxonomy":"msr-field-of-study","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-field-of-study?post=372359"},{"taxonomy":"msr-conference","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-conference?post=372359"},{"taxonomy":"msr-journal","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-journal?post=372359"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=372359"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=372359"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}