Portrait of Surajit Chaudhuri

Surajit Chaudhuri

Technical Fellow, Data Platforms and Analytics

More Info

Projects

I started the AutoAdmin (opens in new tab) project in 1996 soon after joining MSR. The goal of this project is to make databases self-tuning and self-administering by exploiting knowledge of the workload. Vivek Narasayya was my primary collaborator in early years and subsequently we were joined by other colleagues in this effort. Our primary focus was in automated physical database design as well as automated statistics management in relational systems. The Index Tuning Wizard in Microsoft SQL Server 7.0 and SQL Server 2000 are based on the technology that we developed as part of this project and represented the first workload-driven commercial physical design tools on relational systems to recommend indexes and indexes + materialized views respectively. The scope of the automated physical design technology has since been expanded and made available in the Database Tuning Advisor feature of the SQL Server 2005 and subsequent releases. The AutoAdmin (opens in new tab)  project page has a detailed description of the project and the publications.

I initiated the Data Cleaning (opens in new tab) project in 2000 with the goal of developing tools and server infrastructure to support data preparation, an essential step before effective data analysis. Venkatesh Ganti was our leading reseracher in this project in the early years. Our work led to fuzzy matching and fuzzy de-duplication transforms in the SQL Server 2005 product (and subsequent releases) in the SQL Server Integration Services component.

Text documents as well as structured relational data are sources of our information. Understanding the synergy between these two sources of information has been a longstanding interest of mine. I started looking at this problem in mid-nineties (SIGMOD 1995) when we studied the problem of “join” between Relational tables and Text repositories. Later, we investigated the problem of keyword search over structured databases (IEEE ICDE 2002) and the problem of auto-ranking of answers in database queries (CIDR 2003, VLDB 2004, CIDR 2005). Ideas from this project have been incorporated in Bing.

I have worked on optimization of complex SQL queries, e.g., optimization of queries with group-by (VLDB 2004), user-defined predicates (VLDB 2006), exploiting factorization for index unions/intersection plans (SIGMOD 2003), and data mining predicates (IEEE ICDE 2002). One of the directions I explored is that of revisiting the fundamental assumptions in query optimization (SIGMOD 2005, SIGMOD 2009).

Awards and Honors

  • 2024 Elected to the National Academy of Engineering, USA
  • 2023 VLDB Best Paper Award (with Peng Li, Yeye He, Cong Yan, and Yue Wang)
  • 2012 IEEE ICDE Influential paper Award
  • 2011 ACM SIGMOD Edgar F. Codd Innovations Award
  • 2008 VLDB Best Paper Award (with Nico Bruno)
  • 2007 VLDB 10-Year Best Paper Award (with Vivek Narasayya)
  • 2005 ACM Fellow
  • 2004 ACM SIGMOD Contributions Award
  • 2000 IEEE ICDE Best Paper Award (with Vivek Narasayya)

Selected Professional Activities

  • Member, ACM A.M. Turing Award Committee, 2021 – Present
  • 2016 International Conference on Very Large Databases (VLDB): Program Co-Chair
  • 2010 ACM Symposium on Cloud Computing (SOCC): Program Co-Chair
  • 2006 ACM Conference on Management of Data (SIGMOD): Program Chair
  • 1999 ACM Conference on Knowledge Discovery and Data Mining (KDD): Program Co-Chair
  • 2011 IEEE Data Engineering Conference: Industrial Track Co-Chair
  • 2003 ACM SIGMOD Conference: Industrial Track Chair
  • 2001 ACM Conference on Knowledge Discovery and Data Mining: Industrial Track Co-Chair
  • 1999 ACM SIGMOD Conference: Industrial Track Co-Chair
  • 1998 IEEE Conference on Data Engineering (ICDE):Industrial Track Chair
  • 2002 IEEE Conference on Data Engineering (ICDE): Chair, OLAP and Data Warehousing Track
  • 2008 VLDB 10-year award committee, Chair
  • 2002 VLDB 10-year award committee, Member
  • ACM Transactions on Database Systems (TODS): (opens in new tab) Associate Editor,  2001-2007
  • IEEE Transactions on Knowledge and Data Engineering (TKDE): (opens in new tab) Associate Editor, 2001-2005
  • IEEE Data Engineering Bulletin : Associate Editor, 1998-1999

Invited Talks, Tutorials, and Surveys

  • Multi-Tenant Cloud Data Services: State-of-the-Art, Challenges and Opportunities, Tutorial, ACM SIGMOD 2022, Philadelphia.
  • Approximate Query Processing: No Silver Bullet, Keynote Talk, ACM SIGMOD 2017, Chicago.
  • Information at your Fingertips: Only a dream for enterprises? Keynote Talk, IEEE ICDE Conference 2015, Seoul.
  • How Different is Big Data? Keynote Talk, IEEE ICDE Conference 2012, Washington, DC, 2012
  • What next?: a half-dozen data management research goals for big data and the cloud, Keynote Talk, ACM PODS 2012, Scottsdale.
  • Experiences with Problem #9: Invited Talk, SIGMOD 2011, Athens.
  • A Programming Framework for Data Cleaning, Distinguished Lecture, University of British Columbia, 2009.
  • An Overview of Business Intelligence Technology, CACM 2011. (with Umeshwar Dayal, Vivek Narasayya)
  • Self-Tuning Database Systems: A Decade of Progress. VLDB 2007. (with Vivek Narasayya)
  • Foundations of automated database tuning, Tutorial presented at ACM SIGMOD 2005, VLDB 2006. (with Gerhard Weikum)
  • Self-Managing Technology in Database Management Systems, Tutorial presented at VLDB 2004. (with Benoît Dageville, Guy M. Lohman)
  • Databases and IR: Perspectives of a SQL Guy, NSF Information and Data Management PI Workshop, Seattle, 2003
  • An Overview of Data Warehousing and OLAP technology. Sigmod Record, March 1997 Tutorials Presented at 1996 VLDB, 1997 SIGMOD, 1998 EDBT and 1998 IEEE ICDE Conferences (with Umeshwar Dayal).
  • An Overview of Query Optimization in Relational Systems. Proceedings of 1998 ACM PODS. Invited Tutorial at ACM PODS Conference, 1998

Technology Transfer (in collaboration with project members)

  • SQL Server Index Tuning Wizard and Database Tuning Advisor (AutoAdmin project)
  • Fuzzy Lookup and Fuzzy Grouping Transforms in SQL Server Integration Services (Data Cleaning and Data Transformation project)
  • Query Services and Catalog Data Quality for Bing Shopping (Data Cleaning and Data Transformation project)
  • Transform Data by Example in Power Query (Data Cleaning and Data Transformation project)
  • Resource Management for Serverless Azure SQL DB (Flexible Resource Allocation Project)