Auto-Join: Joining Tables by Leveraging Transformations

International Conference on Very Large Databases (VLDB) |

Publication

Traditional equi-join relies solely on string equality comparisons to perform joins. However, in scenarios such as ad-hoc data analysis in spreadsheets, users increasingly need to join tables whose join-columns are from the same semantic domain but use different textual representations, for which transformations are needed before equi-join can be performed. We develop an Auto-Join system that can automatically search over a rich space of operators to compose a transformation program, whose execution makes input tables equi-join-able. Our evaluation using real test cases collected from both public web tables and proprietary enterprise tables shows that the proposed system can perform the desired transformation joins at interactive speed and with high quality.

Our benchmark dataset has been released at https://github.com/Yeye-He/Auto-Join (opens in new tab).

Our Auto-Join library has also been released at https://github.com/Yeye-He/Auto-Join (opens in new tab) under a “Microsoft Research License” to facilitate future research.