{"id":698740,"date":"2020-10-16T12:46:17","date_gmt":"2020-10-16T19:46:17","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-research-item&p=698740"},"modified":"2023-11-16T16:57:27","modified_gmt":"2023-11-17T00:57:27","slug":"auto-transform-learning-to-transform-by-patterns","status":"publish","type":"msr-research-item","link":"https:\/\/www.microsoft.com\/en-us\/research\/publication\/auto-transform-learning-to-transform-by-patterns\/","title":{"rendered":"Auto-Transform: Learning-to-Transform by Patterns"},"content":{"rendered":"
Data Transformation is a long-standing problem in data management. Recent work adopts a “transform-by-example” (TBE) paradigm to infer transformation programs based on user-provided input\/output examples, which greatly improves usability, and brought such features into mainstream software like Microsoft Excel, Power BI, and Trifacta.<\/p>\n
While TBE is great progress, the need for users to provide paired input\/output examples still poses limits on its applicability. In this work, we study an alternative that transforms data based on input\/output data patterns<\/i> only (without paired examples). We term this new paradigm transform-by-patterns<\/i> (TBP). Specifically, we demonstrate that there is a rich class of transformations in TBP that can be “learned” from large collections of paired table columns. We show the proposed method can harvest such transformations across diverse domains and corpora (e.g., in different languages such as English, Chinese, Spanish, etc.). TBP transformations so obtained can be used in scenarios such as suggesting data-repairs in tables, or automating transformations in ETL pipelines. Extensive experiments on real data suggest that TBP outperforms existing methods on tasks such as data repairs, and is a promising direction for future research.<\/p>\n<\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"
Data Transformation is a long-standing problem in data management. Recent work adopts a “transform-by-example” (TBE) paradigm to infer transformation programs based on user-provided input\/output examples, which greatly improves usability, and brought such features into mainstream software like Microsoft Excel, Power BI, and Trifacta. While TBE is great progress, the need for users to provide paired […]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","footnotes":""},"msr-content-type":[3],"msr-research-highlight":[],"research-area":[13563],"msr-publication-type":[193716],"msr-product-type":[],"msr-focus-area":[],"msr-platform":[],"msr-download-source":[],"msr-locale":[268875],"msr-post-option":[],"msr-field-of-study":[],"msr-conference":[],"msr-journal":[],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-698740","msr-research-item","type-msr-research-item","status-publish","hentry","msr-research-area-data-platform-analytics","msr-locale-en_us"],"msr_publishername":"","msr_edition":"","msr_affiliation":"","msr_published_date":"2020-8-1","msr_host":"","msr_duration":"","msr_version":"","msr_speaker":"","msr_other_contributors":"","msr_booktitle":"","msr_pages_string":"","msr_chapter":"","msr_isbn":"","msr_journal":"","msr_volume":"","msr_number":"","msr_editors":"","msr_series":"","msr_issue":"","msr_organization":"","msr_how_published":"","msr_notes":"","msr_highlight_text":"","msr_release_tracker_id":"","msr_original_fields_of_study":"","msr_download_urls":"","msr_external_url":"","msr_secondary_video_url":"","msr_longbiography":"","msr_microsoftintellectualproperty":1,"msr_main_download":"","msr_publicationurl":"","msr_doi":"","msr_publication_uploader":[{"type":"file","viewUrl":"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2020\/10\/Auto-Transform.pdf","id":"698749","title":"auto-transform","label_id":"243109","label":0}],"msr_related_uploader":"","msr_attachments":[{"id":698749,"url":"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2020\/10\/Auto-Transform.pdf"},{"id":698743,"url":"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2020\/10\/p2368-he.pdf"}],"msr-author-ordering":[{"type":"text","value":"Zhongjun Jin","user_id":0,"rest_url":false},{"type":"user_nicename","value":"Yeye He","user_id":34992,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Yeye He"},{"type":"user_nicename","value":"Surajit Chaudhuri","user_id":33764,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Surajit Chaudhuri"}],"msr_impact_theme":[],"msr_research_lab":[],"msr_event":[],"msr_group":[957177],"msr_project":[967218],"publication":[],"video":[],"download":[],"msr_publication_type":"inproceedings","related_content":{"projects":[{"ID":967218,"post_title":"Self-service Data Preparation","post_name":"self-service-data-preparation","post_type":"msr-project","post_date":"2023-11-08 14:36:00","post_modified":"2023-11-18 10:15:39","post_status":"publish","permalink":"https:\/\/www.microsoft.com\/en-us\/research\/project\/self-service-data-preparation\/","post_excerpt":"It is often cited that data scientists spend a significant portion of their time (up to 80%), cleaning and preparing data. For less-technical users, who may be less proficient in writing code (e.g., in Excel, Power-BI and Tableau), the tasks of preparing and cleaning data are not just time-consuming, but also technically challenging. In the \"Self-service Data Preparation\" project, our goal is to develop technologies that can automate common data-preparation tasks, in the context of…","_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/967218"}]}}]},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/698740"}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-research-item"}],"version-history":[{"count":2,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/698740\/revisions"}],"predecessor-version":[{"id":710746,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/698740\/revisions\/710746"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=698740"}],"wp:term":[{"taxonomy":"msr-content-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-content-type?post=698740"},{"taxonomy":"msr-research-highlight","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-highlight?post=698740"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=698740"},{"taxonomy":"msr-publication-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-publication-type?post=698740"},{"taxonomy":"msr-product-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-product-type?post=698740"},{"taxonomy":"msr-focus-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-focus-area?post=698740"},{"taxonomy":"msr-platform","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-platform?post=698740"},{"taxonomy":"msr-download-source","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-download-source?post=698740"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=698740"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=698740"},{"taxonomy":"msr-field-of-study","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-field-of-study?post=698740"},{"taxonomy":"msr-conference","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-conference?post=698740"},{"taxonomy":"msr-journal","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-journal?post=698740"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=698740"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=698740"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}