Data Wrangling using Programming by Examples

Sumit Gulwani (opens in new tab) founded the PROSE (opens in new tab) research and engineering team at Microsoft that develops programming-by-example (PBE) APIs and ships them through multiple Microsoft products. PBE is a new frontier in AI wherein the computer programs itself—the user provides input-output examples and the computer synthesizes an intended script. This is significant because 99% of computer users do not know programming. Even for programmers, this can provide a 10-100x productivity increase for many task domains.

A killer application of PBE is in the space of data cleaning/preparation since data scientists often spend up to 80% time wrangling data into a form suitable for learning models or drawing insights. In this video, Sumit illustrates how a data cleaning task, that Python programmers took an average of 30 minutes to finish, can be performed in 30 seconds by non-programmers using the PBE paradigm. In particular, PBE can help ingest a file into tabular format, split a column to extract constituent sub-fields, derive new columns, and suggest form entries.

Date:
Haut-parleurs:
Sumit Gulwani
Affiliation:
Microsoft