Data Wrangling using Programming by Examples
Invited talk at ECOOP |
Invited talk at ECOOP.
Data Wrangling using Programming by Examples
Sumit Gulwani (opens in new tab) founded the PROSE (opens in new tab) research and engineering team at Microsoft that develops programming-by-example (PBE) APIs and ships them through multiple Microsoft products. PBE is a new frontier in AI wherein the computer programs itself—the user provides input-output examples and the computer synthesizes an intended script. This is significant because 99% of computer users do not know programming. Even for programmers, this can provide a 10-100x productivity increase for many task domains.
A killer application of PBE is in the space of data cleaning/preparation since data scientists often spend up to 80% time wrangling data into a form suitable for learning models or drawing insights. In this video, Sumit illustrates how a data cleaning task, that Python programmers took an average of 30 minutes to finish, can be performed in 30 seconds by non-programmers using the PBE paradigm. In particular, PBE can help ingest a file into tabular format, split a column to extract constituent sub-fields, derive new columns, and suggest form entries.