
Publications
Publications by Year
-
TUNA: Tuning Unstable and Noisy Cloud Applications
Johannes Freischuetz, Konstantinos Kanellis, Brian Kroth, Shivaram Venkataraman
Eurosys 2025 | March 2025
-
MLOS in Action: Bridging the Gap Between Experimentation and Auto-Tuning in the Cloud
Brian Kroth, Sergiy Matusevych, Rana Alotaibi, Yiwen Zhu, Anja Gruenheid, Yuanyuan Tian
Proc. VLDB Endow. | October 2024, Vol 17: pp. 4269-4272
LST-Bench: Benchmarking Log-Structured Tables in the Cloud
Jesús Camacho-Rodríguez, Ashvin Agrawal, Anja Gruenheid, Ashit Gosalia, Cristian Petculescu, Josep Aguilar-Saborit, Avrilia Floratou, Carlo Curino, Raghu Ramakrishnan
ACM SIGMOD | June 2024
Sibyl: Forecasting Time-Evolving Query Workloads
Hanxian Huang, Tarique Siddiqui, Rana Alotaibi, Carlo Curino, Jyoti Leeka, Alekh Jindal, Jishen Zhao, Jesús Camacho-Rodríguez, Yuanyuan Tian
SIGMOD | June 2024
Vertically Autoscaling Monolithic Applications with CaaSPER: Scalable Container-as-a-Service Performance Enhanced Resizing Algorithm for the Cloud
Anna Pavlenko, Joyce Cahoon, Yiwen Zhu, Brian Kroth, Michael Nelson, Andrew Carter, David Liao, Travis Wright, Jesús Camacho-Rodríguez, Karla Saur
SIGMOD | June 2024
to appear
Lorentz: Learned SKU Recommendation Using Profile Data
Nick Glaze, Tria McNeely, Yiwen Zhu, Matthew Gleeson, Helen Serr, Rajeev Bhopi, Subru Krishnan, Yiwen Zhu, Subru Krishnan
SIGMOD 2024 | May 2024, Vol 2: pp. 149
VASIM: Vertical Autoscaling Simulator Toolkit
Anna Pavlenko, Karla Saur, Yiwen Zhu, Brian Kroth, Joyce Cahoon, Jesús Camacho-Rodríguez
IEEE International Conference on Data Engineering (ICDE 2024) | May 2024
Intelligent Pooling: Proactive Resource Provisioning in Large-scale Cloud Service
Deepak Ravikumar, Alex Yeo, Yiwen Zhu, Aditya Lakra, Harsha Nagulapalli, Santhosh Ravindran, Steve Suh, Niharika Dutta, Andrew Fogarty, Yoonjae Park, Sumeet Khushalani, Arijit Tarafdar, Kunal Parekh, Subru Krishnan, Yiwen Zhu, Subru Krishnan
Proc. VLDB Endow. | February 2024, Vol 17: pp. 1618-1627
-
Performance Roulette: How Cloud Weather Affects ML-Based System Optimization
Johannes Freischuetz, Konstantinos Kanellis, Brian Kroth, Shivaram Venkataraman
ML for Systems Workshop at NeurIPS 2023 | December 2023
GEqO: ML-Accelerated Semantic Equivalence Detection
Brandon Haynes, Rana Alotaibi, Anna Pavlenko, Jyoti Leeka, Alekh Jindal, Yuanyuan Tian
Proceedings of the ACM on Management of Data | December 2023
Optimizing Data Pipelines for Machine Learning in Feature Stores
Rui Liu, Kwanghyun Park, Fotis Psallidas, Xiaoyong Zhu, Jinghui Mo, Rathijit Sen, Matteo Interlandi, Konstantinos Karanasos, Yuanyuan Tian, Jesús Camacho-Rodríguez
Proc. VLDB Endow. | August 2023, Vol 16: pp. 4230-4239
PolySem: Efficient Polyglot Analytics on Semantic Data
Xinyu Liu, Venkatesh Emani, Avrilia Floratou, Joyce Cahoon, Philip Seamark, Carlo Curino
Poly’23: Polystore systems for heterogeneous data in multiple databases with privacy and security assurances | August 2023
PyFroid: Scaling Data Analysis on a Commodity Workstation
Venkatesh Emani, Avrilia Floratou, Carlo Curino
EDBT 2024 | August 2023
OneProvenance: Efficient Extraction of Dynamic Coarse-Grained Provenance From Database Query Event Logs
Fotis Psallidas, Ashvin Agrawal, Chandru Sugunan, Khaled Ibrahim, Konstantinos Karanasos, Jesus Camacho-Rodriguez, A. Floratou, C. Curino, R. Ramakrishnan
Proc. VLDB Endow. | August 2023, Vol 16: pp. 3662-3675
Demonstration of Geyser: Provenance Extraction and Applications over Data Science Scripts
Fotis Psallidas, Megan Eileen Leszczynski, Mohammad Hossein Namaki, Avrilia Floratou, Ashvin Agrawal, Konstantinos Karanasos, Subru Krishnan, Pavle Subotić, Markus Weimer, Yinghui Wu, Yiwen Zhu
ACM SIGMOD | July 2023
Exploiting Structure in Regular Expression Queries
Ling Zhang, Shaleen Deep, Avrilia Floratou, Anja Gruenheid, Jignesh M. Patel, Yiwen Zhu
ACM SIGMOD | July 2023
A Deep Dive into Common Open Formats for Analytical DBMSs
Chunwei Liu, Anna Pavlenko, Matteo Interlandi, Brandon Haynes
Proc. VLDB Endow. | June 2023, Vol 16: pp. 3044-3056
Best Paper Runner-Up
Query Processing on Gaming Consoles
Wei Cui, Qianxi Zhang, Jesús Camacho-Rodríguez, Spyros Blanas, Brandon Haynes, Yinan Li, Peng Cheng, Ravishankar Ramamurthy, Rathijit Sen, Matteo Interlandi
June 2023
Towards Building Autonomous Data Services on Azure
Yiwen Zhu, Yuanyuan Tian, Joyce Cahoon, Subru Krishnan, Ankita Agarwal, Rana Alotaibi, Jesús Camacho-Rodríguez, Bibin A Chundatt, Andrew Chung, Niharika Dutta, Andrew Fogarty, Anja Gruenheid, Brandon Haynes, Matteo Interlandi, Minu Iyer, Nick Jurgens, Sumeet Khushalani, Brian Kroth, Manoj Kumar, Jyoti Leeka, Sergiy Matusevych, Minni Mittal, Andreas C. Müller, Kartheek Muthyala, Harsha Nagulapalli, Yoonjae Park, Hiren Patel, Anna Pavlenko, Olga Poppe, Santhosh Ravindran, Karla Saur, Rathijit Sen, Steve Suh, Arijit Tarafdar, Kunal Waghray, Demin Wang, Carlo Curino, Raghu Ramakrishnan
ACM SIGMOD | June 2023
Runtime Variation in Big Data Analytics
Yiwen Zhu, Rathijit Sen, Robert Horton, John Mark Agosta
ACM SIGMOD | May 2023
Schema Matching using Pre-Trained Language Models
Yunjia Zhang, Avrilia Floratou, Joyce Cahoon, Subru Krishnan, Andreas C. Müller, Dalitso Banda, Fotis Psallidas, Jignesh M. Patel
ICDE | January 2023
The Fine-Grained Complexity of CFL Reachability
Paraschos Koutris, Shaleen Deep
Principles of Programming Languages (POPL 2023) | January 2023
The Tensor Data Platform: Towards an AI-centric Database System
Apurva Gandhi, Yuki Asada, Victor Fu, Advitya Gemawat, Lihao Zhang, Rathijit Sen, Carlo Curino, Jesús Camacho-Rodríguez, Matteo Interlandi
CIDR | January 2023
-
DIAMETRICS: Benchmarking Query Engines at Scale
Shaleen Deep, Anja Gruenheid, Kruthi Nagaraj, Hiro Naito, Jeffrey Naughton, Stratis Viglas
Communications of The ACM | December 2022, Vol 65(12): pp. 105-112
Research Highlight
Comprehensive and Efficient Workload Summarization
Shaleen Deep, Anja Gruenheid, Paraschos Koutris, Stratis Viglas, Jeffrey Naughton
Datenbank-Spektrum | November 2022
Research Highlight
Diversity and Inclusion Activities in Database Conferences: A 2021 Report.
Sihem Amer-Yahia, Yael Amsterdamer, Sourav S. Bhowmick,, Angela Bonifati, Philippe Bonnet, Renata Borovica-Gajic, Barbara Catania, Tania Cerquitelli, Silvia Chiusano, Panos K. Chrysanthis, Carlo Curino, Jérôme Darmont, Amr El Abbadi, Avrilia Floratou, Juliana Freire, Alekh Jindal, Vana Kalogeraki, Georgia Koutrika, Arun Kumar, Sujaya Maiyya, Alexandra Meliou, Madhulika Mohanty, Felix Naumann, Nele Sina Noack, Liat Peterfreund, Fatma Özcan, Wenny Rahayu, Wang-Chiew Tan, Yuanyuan Tian, Pinar Tözün, Genoveva Vargas-Solar, Neeraja J. Yadwadkar, Meihui Zhang
SIGMOD Record | November 2022
Share the Tensor Tea: How Databases can Leverage the Machine Learning Ecosystem
Yuki Asada, Victor Fu, Apurva Gandhi, Advitya Gemawat, Lihao Zhang, Dong He, Vivek Gupta, Ehi Nosakhare, Dalitso Banda, Rathijit Sen, Matteo Interlandi
VLDB | September 2022, Vol 15(12): pp. 3598-3601
Best demo award
Containerized Execution of UDFs: An Experimental Evaluation
Karla Saur, Tara Mirmira, Konstantinos Karanasos, Jesús Camacho-Rodríguez
VLDB 2022 | September 2022
Pipemizer: An Optimizer for Analytics Data Pipelines
Sunny Gakhar, Joyce Cahoon, Wangchao Le, Xiangnan Li, Kaushik Ravichandran, Hiren Patel, Marc Friedman, Brandon Haynes, Shi Qiao, Alekh Jindal, Jyoti Leeka
PVLDB | September 2022
Query Processing on Tensor Computation Runtimes
Dong He, Supun Nakandala, Dalitso Banda, Rathijit Sen, Karla Saur, Kwanghyun Park, Carlo Curino, Jesús Camacho-Rodríguez, Konstantinos Karanasos, Matteo Interlandi
VLDB 2022 | September 2022
Doppler: Automated SKU Recommendation in Migrating SQL Workloads to the Cloud
Joyce Cahoon, Wenjing Wang, Yiwen Zhu, Katherine Lin, Sean Liu, Raymond Truong, Neetu Singh, Chengcheng Wan, Alexandra M Ciortea, Sreraman Narasimhan, Subru Krishnan
VLDB 2022 | August 2022
Deploying a Steered Query Optimizer in Production at Microsoft
Wangda Zhang, Matteo Interlandi, Paul Mineiro, Shi Qiao, Nasim Ghazanfari, Karlen Lie, Marc Friedman, Rafah Hosn, Hiren Patel, Alekh Jindal
2022 International Conference on Management of Data | July 2022
VIP Hashing — Adapting to Skew in Popularity of Data on the Fly
Aarati Kakaraparthy, Jignesh M. Patel, Brian Kroth, Kwanghyun Park
VLDB 2022 | June 2022
Data Science Through the Looking Glass
Fotis Psallidas, Yiwen Zhu, Bojan Karlaš, Jordan Henkel, Matteo Interlandi, Subru Krishnan, Brian Kroth, Venkatesh Emani, Wentao Wu, Ce Zhang, Markus Weimer, Avrilia Floratou, Carlo Curino, Konstantinos Karanasos
SIGMOD Record | June 2022, Vol 51(2): pp. 30-37
End-to-end Optimization of Machine Learning Prediction Queries
Kwanghyun Park, Karla Saur, Dalitso Banda, Rathijit Sen, Matteo Interlandi, Konstantinos Karanasos
SIGMOD | June 2022
LlamaTune: Sample-Efficient DBMS Configuration Tuning
Konstantinos Kanellis, Cong Ding, Brian Kroth, Andreas C. Müller, Carlo Curino, Shivaram Venkataraman
VLDB 2022 | March 2022
NyxCache: Flexible and Efficient Multi-tenant Persistent Memory Caching
Kan Wu, Kaiwei Tu, Yuvraj Patel, Rathijit Sen, Kwanghyun Park, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau
20th USENIX Conference on File and Storage Technologies | February 2022
-
Phoebe: A Learning-based Checkpoint Optimizer
Yiwen Zhu, Matteo Interlandi, Abhishek Roy, Krishnadhan Das, Hiren Patel, Malay Bag, Hitesh Sharma, Alekh Jindal
VLDB 2021 | August 2021
Steering Query Optimizers: A Practical Take on Big Data Workloads
Parimarjan Negi, Matteo Interlandi, Ryan Marcus, Mohammad Alizadeh, Tim Kraska, Marc Friedman, Alekh Jindal
2021 International Conference on Management of Data | June 2021
Honorable mention for the Industry Track
HADAD: A Lightweight Approach for Optimizing Hybrid Complex Analytics Queries
Rana Alotaibi, Bogdan Cautis, Alin Deutsch, Ioana Manolescu
2021 International Conference on Management of Data | June 2021
HADAD: A Lightweight Approach for Optimizing Hybrid Complex Analytics Queries
Rana Alotaibi, Bogdan Cautis, Alin Deutsch, Ioana Manolescu
SIGMOD | June 2021
KEA: Tuning an Exabyte-Scale Data Infrastructure
Yiwen Zhu, Subru Krishnan, Konstantinos Karanasos, Isha Tarte, Conor Power, Abhishek Modi, Manoj Kumar, Deli Zhang, Deli Zhang, Kartheek Muthyala, Nick Jurgens, Sarvesh Sakalanaga, Sudhir Darbha, Minu Iyer, Ankita Agarwal, Carlo Curino
SIGMOD | June 2021
Microlearner: A fine-grained Learning Optimizer for Big Data Workloads at Microsoft
Alekh Jindal, Shi Qiao, Rathijit Sen, Hiren Patel
2021 International Conference on Data Engineering | April 2021
Property Graph Schema Optimization for Domain-Specific Knowledge Graphs
Rana Alotaibi, Chuan Lei, Abdul Quamar, Vasilis Efthymiou, Fatma Ozcan
ICDE | April 2021
FPGA for Aggregate Processing: The Good, The Bad, and The Ugly
Zubeyr F. Eryilmaz, Aarati Kakaraparthy, Jignesh M. Patel, Rathijit Sen, Kwanghyun Park
International Conference on Data Engineering (ICDE) | April 2021
Production Experiences from Computation Reuse at Microsoft
Alekh Jindal, Shi Qiao, Hiren Patel, Abhishek Roy, Jyoti Leeka, Brandon Haynes
2021 Extending Database Technology | March 2021
Hardware Acceleration for DBMS Machine Learning Scoring: Is It Worth the Overheads?
Zahra Azad, Rathijit Sen, Kwanghyun Park, Ajay Joshi
International Symposium on Performance Analysis of Systems and Software (ISPASS) | March 2021
The Storage Hierarchy is Not a Hierarchy: Optimizing Caching on Modern Storage Devices with Orthus
Kan Wu, Zhihan Guo, Guanzhou Hu, Kaiwei Tu, Ramnatthan Alagappan, Rathijit Sen, Kwanghyun Park, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau
19th USENIX Conference on File and Storage Technologies | February 2021
Magpie: Python at Speed and Scale using Cloud Backends
Alekh Jindal, Venkatesh Emani, Maureen Daum, Olga Poppe, Brandon Haynes, Anna Pavlenko, Ayushi Gupta, Karthik Ramachandra, Carlo Curino, Andreas C. Müller, Wentao Wu, Hiren Patel
Conference on Innovative Data Systems Research (CIDR 2021) | February 2021
-
Applied Research Lessons from CloudViews Project
Alekh Jindal
Sigmod Record | December 2020, Vol 49(3): pp. 37-42
A Tensor Compiler for Unified Machine Learning Prediction Serving
Supun Nakandala, Karla Saur, Gyeong-In Yu, Konstantinos Karanasos, Carlo Curino, Markus Weimer, Matteo Interlandi
Symposium on Operating Systems Design and Implementation (OSDI) | November 2020
Unearthing inter-job dependencies for better cluster scheduling
Andrew Chung, Subru Krishnan, Konstantinos Karanasos, Carlo Curino, Gregory R. Ganger
Symposium on Operating Systems Design and Implementation (OSDI) | November 2020
Vamsa: Automated Provenance Tracking in Data Science Scripts
Mohammad Hossein Namaki, Avrilia Floratou, Fotis Psallidas, Subru Krishnan, Ashvin Agrawal, Yiwen Zhu, Markus Weimer, Yinghui Wu
KDD | August 2020
AutoToken: predicting peak parallelism for big data analytics at Microsoft
Rathijit Sen, Alekh Jindal, Hiren Patel, Shi Qiao
Very Large Data Bases | July 2020
ESTOCADA: Towards Scalable Polystore Systems
Rana Alotaibi, Bogdan Cautis, Alin Deutsch, M. Latrache, Ioana Manolescu, Y. Yang
VLDB | July 2020
Towards Plan-aware Resource Allocation in Serverless Query Processing
Malay Bag, Alekh Jindal, Hiren Patel
USENIX conference on Hot Topics in Cloud Ccomputing | July 2020
Lessons learned from the early performance evaluation of Intel Optane DC persistent memory in DBMS
Yinjun Wu, Kwanghyun Park, Rathijit Sen, Brian Kroth, Jae Young Do
2020 Data Management on New Hardware | June 2020
Automated Tuning of Query Degree of Parallelism via Machine Learning
Zhiwei Fan, Rathijit Sen, Paraschos Koutris, Aws Albarghouthi
2020 International Conference on Management of Data | June 2020
A Comparative Exploration of ML Techniques for Tuning Query Degree of Parallelism.
Zhiwei Fan, Rathijit Sen, Paraschos Koutris, Aws Albarghouthi
May 2020
Cost Models for Big Data Query Processing: Learning, Retrofitting, and Our Findings
Tarique Siddiqui, Alekh Jindal, Shi Qiao, Hiren Patel, Wangchao le
SIGMOD 2020: International Conference on Management of Data | February 2020
Cloudy with high chance of DBMS: A 10-year prediction for Enterprise-Grade ML
Ashvin Agrawal, Rony Chatterjee, Carlo Curino, Avrilia Floratou, Neha Godwal, Matteo Interlandi, Alekh Jindal, Konstantinos Karanasos, Subru Krishnan, Brian Kroth, Jyoti Leeka, Kwanghyun Park, Hiren Patel, Olga Poppe, Fotis Psallidas, Raghu Ramakrishnan, Abhishek Roy, Karla Saur, Rathijit Sen, Markus Weimer, Travis Wright, Yiwen Zhu
Conference on Innovative Data Systems Research (CIDR) | January 2020
Extending Relational Query Processing with ML Inference
Konstantinos Karanasos, Matteo Interlandi, Doris Xin, Fotis Psallidas, Rathijit Sen, Kwanghyun Park, Ivan Popivanov, Supun Nakandal, Subru Krishnan, Markus Weimer, Yuan Yu, Raghu Ramakrishnan, Carlo Curino
Conference on Innovative Data Systems Research (CIDR) | January 2020
-
Optimizing databases by learning hidden parameters of solid state drives
Aarati Kakaraparthy, Jignesh M. Patel, Kwanghyun Park, Brian Kroth
Proceedings of VLDB | December 2019
Big Data Processing at Microsoft: Hyper Scale, Massive Complexity, and Minimal Cost
Hiren Patel, Alekh Jindal, Clemens Szyperski
Symposium on Cloud Computing | November 2019
Peregrine: Workload Optimization for Cloud Query Engines
Alekh Jindal, Hiren Patel, Abhishek Roy, Shi Qiao, Zhicheng Yin, Rathijit Sen, Subru Krishnan
Symposium on Cloud Computing | November 2019
Neptune: Scheduling Suspendable Tasks for Unified Stream/Batch Applications
Panagiotis Garefalakis, Konstantinos Karanasos, Peter Pietzuch
Symposium on Cloud Computing (SoCC) | November 2019
BlackMagic: Automatic Inlining of Scalar UDFs into SQL Queries with Froid
Karthik Ramachandra, Kwanghyun Park
Proceedings of VLDB | August 2019, Vol 12(12): pp. 1810-1813
Griffon: Reasoning about Job Anomalies with Unlabeled Data in Cloud-based Platforms
Liqun Shao, Yiwen Zhu, Siqi Liu, Abhiram Eswaran, Kristin Lieber, Janhavi Mahajan, Minsoo Thigpen, Sudhir Darbha, Subru Krishnan, Soundar Srinivasan, Carlo Curino, Konstantinos Karanasos
Symposium on Cloud Computing (SoCC) | August 2019
SparkCruise: handsfree computation reuse in Spark
Abhishek Roy, Alekh Jindal, Hiren Patel, Ashit Gosalia, Subru Krishnan, Carlo Curino
Very Large Data Bases | July 2019
Exploiting Intel Optane SSD for Microsoft SQL Server
Kan Wu, Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, Rathijit Sen, Kwanghyun Park
DaMoN 2019 | July 2019
Peering through the Dark: An Owl’s View of Inter-job Dependencies and Jobs’ Impact in Shared Clusters
Andrew Chung, Carlo Curino, Subru Krishnan, Konstantinos Karanasos, Panagiotis Garefalakis, Gregory R. Ganger
International Conference on Management of Data (SIGMOD) | June 2019
Towards Scalable Hybrid Stores: Constraint-Based Rewriting to the Rescue
Rana Alotaibi, Damian Bursztyn, Alin Deutsch, Ioana Manolescu, Stamatis Zampetakis
SIGMOD | June 2019
Query and Resource Optimizations: A Case for Breaking the Wall in Big Data Systems
Alekh Jindal, Lalitha Viswanathan, Konstantinos Karanasos
MSR-TR-2019-44 | June 2019
Published by Microsoft
Constant Time Recovery in Azure SQL Database
Panagiotis Antonopoulos, Peter Byrne, Wayne Chen, Cristian Diaconu, Raghavendra Thallam Kodandaramaih, Hanuma Kodavalla, Prashanth Purnananda, Adrian-Leonard Radu, Chaitanya Sreenivas Ravella, Girish Mittur Venkataramanappa
June 2019
Hydra: a federated resource manager for data-center scale analytics
Carlo Curino, Subru Krishnan, Konstantinos Karanasos, Sriram Rao, Giovanni M. Fumarola, Botong Huang, Kishore Chaliparambil, Arun Suresh, Young Chen, Solom Heddaya, Roni Burd, Sarvesh Sakalanaga, Chris Douglas, Bill Ramsey, Raghu Ramakrishnan
Symposium on Networked Systems Design and Implementation (NSDI) | February 2019
-
SOCK: Serverless-Optimized Containers.
Edward Oakes, Leon Yang, Dennis Zhou, Kevin Houck, Tyler Caraza-Harter, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau
;login: | December 2018, Vol 43(3)
Towards a learning optimizer for shared clouds
Chenggang Wu, Alekh Jindal, Saeed Amizadeh, Hiren Patel, Wangchao Le, Shi Qiao, Sriram Rao
Very Large Data Bases | October 2018
PRETZEL: opening the black box of machine learning prediction serving systems
Yunseong Lee, Alberto Scolari, Byung-Gon Chun, Marco Domenico Santambrogio, Matteo Interlandi, Markus Weimer
Operating Systems Design and Implementation | October 2018
Netco: Cache and I/O Management for Analytics over Disaggregated Stores
Virajith Jalaparti, Chris Douglas, Mainak Ghosh, Ashvin Agrawal, Avrilia Floratou, Srikanth Kandula, Ishai Menache, Joseph (Seffi) Naor, Sriram Rao
ACM Symposium on Cloud Computing (SOCC) | October 2018
Best Paper Award
Dhalion in action: automatic management of streaming applications
Avrilia Floratou, Ashvin Agrawal
Proceedings of the VLDB Endowment | August 2018, Vol 11(12)
SOCK: rapid task provisioning with serverless-optimized containers
Edward Oakes, Leon Yang, Dennis Zhou, Kevin Houck, Tyler Harter, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau
USENIX Annual Technical Conference | July 2018
Challenges and Opportunities in Transportation Data
Kristin Tufte, Kushal Datta, Alekh Jindal, David Maier, Robert L. Bertini
June 2018
Computation Reuse in Analytics Job Service at Microsoft
Alekh Jindal, Shi Qiao, Hiren Patel, Zhicheng Yin, Jieming Di, Malay Bag, Marc Friedman, Yifung Lin, Konstantinos Karanasos, Sriram Rao
2018 International Conference on Management of Data | May 2018
Survivability of Cloud Databases – Factors and Prediction
Jose Picado, Willis Lang, Edward C. Thayer
International Conference on Management of Data | May 2018
Columnar Storage Formats
Encyclopedia of Big Data Technologies | Published by Springer | 2018
Query and Resource Optimization: Bridging the Gap
Lalitha Viswanathan, Alekh Jindal, Konstantinos Karanasos
International Conference on Data Engineering (ICDE) | April 2018
Selecting subexpressions to materialize at datacenter scale
Alekh Jindal, Konstantinos Karanasos, Sriram Rao, Hiren Patel
Very Large Data Bases (VLDB) | February 2018
Batch-Expansion Training: An Efficient Optimization Framework
Michał Dereziński, Dhruv Mahajan, S. Sathiya Keerthi, S. V. N. Vishwanathan, Markus Weimer
International Conference on Artificial Intelligence and Statistics | February 2018
Advancements in YARN Resource Manager
Konstantinos Karanasos, Arun Suresh, Chris Douglas
Encyclopedia of Big Data Technologies | February 2018
Robust Data Partitioning.
Alekh Jindal, Anil Shanbhag, Yi Lu
February 2018
Characterizing Resource Sensitivity of Database Workloads
Rathijit Sen, Karthik Ramachandra
High-Performance Computer Architecture | January 2018
Medea: Scheduling of Long Running Applications in Shared Production Clusters
Panagiotis Garefalakis, Konstantinos Karanasos, Peter Pietzuch, Arun Suresh, Sriram Rao
European Conference on Computer Systems (EuroSys) | January 2018
-
Towards High-Performance Prediction Serving Systems
Yunseong Lee, Alberto Scolari, Byung-Gon Chun, Matteo Interlandi, Markus Weimer
31st Conference on Neural Information Processing Systems | December 2017
Froid: Optimization of Imperative Programs in a Relational Database
Karthik Ramachandra, Kwanghyun Park, K. Venkatesh Emani, Alan Halverson, Cesar Galindo-Legaria, Conor Cunningham
Proceedings of VLDB | December 2017, Vol 11(4)
Apache REEF: Retainable Evaluator Execution Framework
Byung-Gon Chun, Tyson Condie, Yingda Chen, Brian Cho, Andrew Chung, Carlo Curino, Chris Douglas, Beomyeol Jeon, Joo Seong Jeong, Gyewon Lee, Yunseong Lee, Tony Majestro, Dahlia Malkhi, Sergiy Matusevych, Brandon Myers, Mariia Mykhailova, Shravan Narayanamurthy, Joseph Noor, Raghu Ramakrishnan, Sriram Rao, Russell Sears, Beysim Sezgin, Taegeon Um, Julia Wang, Youngseok Yang, Raghu Ramakrishnan, Carlo Curino, Matteo Interlandi
ACM Transactions on Computer Systems | October 2017, Vol 35(2): pp. 5
No data left behind: real-time insights from a complex data ecosystem
Manos Karpathiotakis, Avrilia Floratou, Fatma Ozcan, Anastasia Ailamaki
SOCC | September 2017
A robust partitioning scheme for ad-hoc query workloads
Anil Shanbhag, Alekh Jindal, Samuel Madden, Jorge Quiane, Aaron J. Elmore
Symposium on Cloud Computing | September 2017
Self-Regulating Streaming Systems: Challenges and Opportunities
Avrilia Floratou, Ashvin Agrawal
BIRTE | August 2017
Dhalion: Self-Regulating Stream Processing in Heron
Avrilia Floratou, Ashvin Agrawal, Bill Graham, Sriram Rao, Karthik Ramasamy
Proceedings of the VLDB Endowment | August 2017
Energy-Proportional Computing: A New Definition
Rathijit Sen, David A. Wood
IEEE Computer | July 2017, Vol 50(8): pp. 26-33
Frequency governors for cloud database OLTP workloads
Rathijit Sen, Alan Halverson
International Symposium on Low Power Electronics and Design | July 2017
Pipsqueak: Lean Lambdas with Large Libraries
Edward Oakes, Leon Yang, Kevin Houck, Tyler Harter, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau
International Conference on Distributed Computing Systems Workshops | July 2017
ROSA: R Optimizations with Static Analysis
Rathijit Sen, Jianqiao Zhu, Jignesh M. Patel, Somesh Jha
RIOT 2017 | July 2017
Twitter Heron: Towards Extensible Streaming Engines
Maosong Fu, Ashvin Agrawal, Avrilia Floratou, Bill Graham, Andrew Jorgensen, Mark Li, Neng Lu, Karthik Ramasamy, Sriram Rao, Cong Wang
ICDE | May 2017
Pareto Governors for Energy-Optimal Computing
Rathijit Sen, David A. Wood
ACM Transactions on Architecture and Code Optimization | April 2017, Vol 14(1): pp. 6
Resource bricolage and resource selection for parallel database systems
Jiexing Li, Jeffrey F. Naughton, Rimma V. Nehme
Very Large Data Bases | January 2017
INGESTBASE: A Declarative Data Ingestion System.
Alekh Jindal, Jorge-Arnulfo Quiané-Ruiz, Samuel Madden
MSR-TR-2017-62 | January 2017
Published by Microsoft
AdaptDB: adaptive partitioning for distributed joins
Yi Lu, Anil Shanbhag, Alekh Jindal, Samuel Madden
Very Large Data Bases | January 2017
-
Morpheus: Towards Automated SLOs for Enterprise Clusters
Sangeetha Abdu Jyothi, Carlo Curino, Ishai Menache, Shravan Matthur Narayanamurthy, Alexey Tumanov, Jonathan Yaniv, Ruslan Mavlyutov, Íñigo Goiri, Subru Krishnan, Janardhan (Jana) Kulkarni, Sriram Rao
2016 International Symposium on Operating Systems Design and Implementation (OSDI) | November 2016
PerfOrator: Eloquent Performance Models for Resource Optimization
Dharmesh Kakadia, Kaushik Rajan, Carlo Curino, Subru Krishnan
ACM Symposium on Cloud Computing 2016 (SoCC’16) | October 2016
A moeba : a shape changing storage system for big data
Anil Shanbhag, Alekh Jindal, Yi Lu, Samuel Madden
Very Large Data Bases | August 2016
GraphFrames: an integrated API for mixing graph and relational queries
Ankur Dave, Alekh Jindal, Li Erran Li, Reynold Xin, Joseph Gonzalez, Matei Zaharia
GRADES | June 2016
Efficient Queue Management for Cluster Scheduling
Jeff Rasley, Konstantinos Karanasos, Srikanth Kandula, Rodrigo Fonseca, Sriram Rao, Milan Vojnovic
European Conference on Computer Systems (EuroSys) | April 2016
Do the Hard Stuff First: Scheduling Dependent Computations in Data Analytics Clusters
Robert Grandl, Srikanth Kandula, Sriram Rao, Aditya Akella, Janardhan (Jana) Kulkarni
MSR-TR-2016-19 | February 2016
Published by Microsoft
-
Towards Geo-Distributed Machine Learning.
Ignacio Cano, Dhruv Mahajan, Giovanni Matteo Fumarola, Arvind Krishnamurthy, Markus Weimer, Carlo Curino
IEEE Data(base) Engineering Bulletin | December 2015, Vol 40: pp. 41-59
Mercury: Hybrid Centralized and Distributed Scheduling in Large Shared Clusters
Konstantinos Karanasos, Sriram Rao, Carlo Curino, Chris Douglas, Kishore Chaliparambil, Giovanni Matteo Fumarola, Solom Heddaya, Raghu Ramakrishnan, Sarvesh Sakalanaga
USENIX Annual Technical Conference (USENIX ATC’2015) | July 2015
Apache Tez: A Unifying Framework for Modeling and Building Data Processing Applications
Bikas Saha, Hitesh Shah, Siddharth Seth, Gopal Vijayaraghavan, Arun Murthy, Carlo Curino
SIGMOD ’15: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data | May 2015
WANalytics: Geo-Distributed Analytics for a Data Intensive World
Ashish Vulimiri, Carlo Curino, Philip Brighten Godfrey, Thomas Jungblut, Konstantinos Karanasos, Jitendra Padhye, George Varghese
International Conference on Management of Data (SIGMOD) | May 2015
Global Analytics in the Face of Bandwidth and Regulatory Constraints
Ashish Vulimiri, Carlo Curino, P. Brighten Godfrey, Thomas Jungblut, Jitu Padhye, George Varghese
12th USENIX Symposium on Networked Systems Design and Implementation (NSDI ’15) | May 2015
ISBN: 978-1-931971-218
Blind men and an elephant coalescing open-source, academic, and industrial perspectives on BigData
Chris Douglas, Carlo Curino
International Conference on Data Engineering | April 2015
WANalytics: Analytics for a Geo-Distributed Data-Intensive World
Ashish Vulimiri, Carlo Curino, Brighten Godfrey, Konstantinos Karanasos, George Varghese
Conference on Innovative Data Systems Research (CIDR) | January 2015
-
Multi-resource Packing for Cluster Schedulers
Robert Grandl, Ganesh Ananthanarayanan, Srikanth Kandula, Sriram Rao, Aditya Akella
ACM SIGCOMM | August 2014
Indexing HDFS data in PDW: splitting the data from the index
Vinitha Reddy Gankidi, Nikhil Teletia, Jignesh M. Patel, Alan Halverson, David J. DeWitt
Very Large Data Bases | July 2014
Partial results in database systems
Willis Lang, Rimma V. Nehme, Eric Robinson, Jeffrey F. Naughton
International Conference on Management of Data | June 2014
Dynamically Optimizing Queries over Large Scale Data Platforms
Konstantinos Karanasos, Andrey Balmin, Marcel Kutsch, Fatma Ozcan, Vuk Ercegovac, Chunyang Xia, Jesse Jackson
International Conference on Management of Data (SIGMOD) | June 2014
Towards Multi-Tenant Performance SLOs
Willis Lang, Srinath Shankar, Jignesh M. Patel, Ajay Kalhan
IEEE Transactions on Knowledge and Data Engineering | May 2014, Vol 26(6): pp. 1447-1463
Distributed and Scalable PCA in the Cloud
Arun Kumar, Vijay Narayanan, Nikos Karampatziakis, Paul Mineiro, Markus Weimer
MSR-TR-2014-165 | January 2014
Published by Microsoft
Apache Reef Research Paper
Elastic Distributed Bayesian Collaborative Filtering
Alex Beutel, Markus Weimer, Tom Minka, Yordan Zaykov, Vijay Narayanan
January 2014
-
Delta: Scalable Data Dissemination under Capacity Constraints
Konstantinos Karanasos, Asterios Katsifodimos, Ioana Manolescu
Very Large Data Bases (PVLDB) | December 2013
Reservation-based Scheduling: If You’re Late Don’t Blame Us!
Carlo Curino, Djellel E Difallah, Chris Douglas, Subru Krishnan, Rahgu Ramakrishnan, Sriram Rao
MSR-TR-2013-108 | October 2013
Published in SoCC 2014
Split query processing in polybase
David J. DeWitt, Rimma Nehme, Srinath Shankar, Josep Aguilar-Saborit, Artin Avanes, Miro Flasza, Jim Gramling, Alan Halverson
2013 ACM SIGMOD International Conference on Management of Data | June 2013
Fact Checking and Analyzing the Web
Francois Goasdoue, Konstantinos Karanasos, Yannis Katsis, Julien Leblay, Ioana Manolescu, Stamatis Zampetakis
International Conference on Management of Data (SIGMOD) | June 2013
Growing Triples on Trees: an XML-RDF Hybrid Model for Annotated Documents
Francois Goasdoue, Konstantinos Karanasos, Yannis Katsis, Julien Leblay, Ioana Manolescu, Stamatis Zampetakis
Very Large Data Bases Journal (VLDB J.) | June 2013, Vol 22: pp. 589-613
Fast peak-to-peak behavior with SSD buffer pool
Jaeyoung Do, Donghui Zhang, J. M. Patel, D. J. DeWitt
International Conference on Data Engineering | April 2013
Towards Resource-Elastic Machine Learning
Dhruv Mahajan, Sundararajan Sellamanickam, Markus Weimer, Keerthi Selvaraj
January 2013
-
Towards energy-efficient database cluster design
Willis Lang, Stavros Harizopoulos, Jignesh M. Patel, Mehul A. Shah, Dimitris Tsirogiannis
2012 Very Large Data Bases | August 2012, Vol 5(11): pp. 1684-1695
Robust Estimation of Resource Consumption for SQL Queries using Statistical Techniques
Jiexing Li, Arnd Christian König, Arnd Christian König, Vivek Narasayya, Surajit Chaudhuri
38th International Conference on Very Large Databases | August 2012
Declarative Systems for Large-Scale Machine Learning.
Vinayak R. Borkar, Yingyi Bu, Michael J. Carey, Joshua Rosen, Neoklis Polyzotis, Tyson Condie, Markus Weimer, Raghu Ramakrishnan
IEEE Data(base) Engineering Bulletin | July 2012, Vol 35: pp. 24-32
Query optimization in Microsoft SQL Server PDW
Srinath Shankar, Rimma Nehme, Josep Aguilar-Saborit, Andrew Chung, Mostafa Elhemali, Eric Robinson, Mahadevan Sankara Subramanian, David DeWitt, César Galindo-Legaria, Alan Halverson
International Conference on Management of Data | May 2012
WWW 2012 Tutorial: New Templates for Scalable Data Analysis
Amr Ahmed, Alexander J. Smola, Markus Weimer
25th International World Wide Web Conference | April 2012
GSLPI: A Cost-Based Query Progress Indicator
Jiexing Li, Rimma V. Nehme, Jeffrey Naughton
International Conference on Data Engineering | March 2012
DOI PDF Publication Publication Publication Publication Publication
Toward Progress Indicators on Steroids for Big Data Systems.
Jiexing Li, Rimma V. Nehme, Jeffrey F. Naughton
Conference on Innovative Data Systems Research | January 2012
-
Rethinking Query Processing for Energy Efficiency: Slowing Down to Win the Race.
Nicolas Bruno, Surajit Chaudhuri, Arnd Christian König, Vivek Narasayya, Ravi Ramamurthy, Manoj Syamala
IEEE Data(base) Engineering Bulletin | November 2011, Vol 34: pp. 12-19
View Selection in Semantic Web Databases
Francois Goasdoue, Konstantinos Karanasos, Julien Leblay, Ioana Manolescu
Very Large Data Bases (PVLDB) | October 2011, Vol 5: pp. 97-108
Automated partitioning design in parallel database systems
Rimma Nehme, Nicolas Bruno
International Conference on Management of Data | June 2011
Turbocharging DBMS buffer pool using SSDs
Jaeyoung Do, Donghui Zhang, Jignesh M. Patel, David J. DeWitt, Jeffrey F. Naughton, Alan Halverson
International Conference on Management of Data | June 2011
-
The Mimicking Octopus: Towards a one-size-fits-all Database Architecture
Alekh Jindal
36th International Conference on VLDB | September 2010
Energy management for MapReduce clusters
Willis Lang, Jignesh M. Patel
Very Large Data Bases | August 2010
DOI PDF PDF Publication Publication Publication Publication Publication
On energy management, load balancing and replication
Willis Lang, Jignesh M. Patel, Jeffrey F. Naughton
International Conference on Management of Data | June 2010
DOI PDF Publication Publication Publication Publication Publication
Wimpy node clusters: what about non-wimpy workloads?
Willis Lang, Jignesh M. Patel, Srinath Shankar
Data Management on New Hardware | June 2010
-
Towards Eco-friendly Database Management Systems
Willis Lang, Jignesh M. Patel
Conference on Innovative Data Systems Research | December 2009
Publications by Research Area
-
TUNA: Tuning Unstable and Noisy Cloud Applications
Johannes Freischuetz, Konstantinos Kanellis, Brian Kroth, Shivaram Venkataraman
Eurosys 2025 | March 2025
MLOS in Action: Bridging the Gap Between Experimentation and Auto-Tuning in the Cloud
Brian Kroth, Sergiy Matusevych, Rana Alotaibi, Yiwen Zhu, Anja Gruenheid, Yuanyuan Tian
Proc. VLDB Endow. | October 2024, Vol 17: pp. 4269-4272
Performance Roulette: How Cloud Weather Affects ML-Based System Optimization
Johannes Freischuetz, Konstantinos Kanellis, Brian Kroth, Shivaram Venkataraman
ML for Systems Workshop at NeurIPS 2023 | December 2023
NyxCache: Flexible and Efficient Multi-tenant Persistent Memory Caching
Kan Wu, Kaiwei Tu, Yuvraj Patel, Rathijit Sen, Kwanghyun Park, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau
20th USENIX Conference on File and Storage Technologies | February 2022
The Storage Hierarchy is Not a Hierarchy: Optimizing Caching on Modern Storage Devices with Orthus
Kan Wu, Zhihan Guo, Guanzhou Hu, Kaiwei Tu, Ramnatthan Alagappan, Rathijit Sen, Kwanghyun Park, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau
19th USENIX Conference on File and Storage Technologies | February 2021
Unearthing inter-job dependencies for better cluster scheduling
Andrew Chung, Subru Krishnan, Konstantinos Karanasos, Carlo Curino, Gregory R. Ganger
Symposium on Operating Systems Design and Implementation (OSDI) | November 2020
Lessons learned from the early performance evaluation of Intel Optane DC persistent memory in DBMS
Yinjun Wu, Kwanghyun Park, Rathijit Sen, Brian Kroth, Jae Young Do
2020 Data Management on New Hardware | June 2020
Cost Models for Big Data Query Processing: Learning, Retrofitting, and Our Findings
Tarique Siddiqui, Alekh Jindal, Shi Qiao, Hiren Patel, Wangchao le
SIGMOD 2020: International Conference on Management of Data | February 2020
Neptune: Scheduling Suspendable Tasks for Unified Stream/Batch Applications
Panagiotis Garefalakis, Konstantinos Karanasos, Peter Pietzuch
Symposium on Cloud Computing (SoCC) | November 2019
Exploiting Intel Optane SSD for Microsoft SQL Server
Kan Wu, Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, Rathijit Sen, Kwanghyun Park
DaMoN 2019 | July 2019
Hydra: a federated resource manager for data-center scale analytics
Carlo Curino, Subru Krishnan, Konstantinos Karanasos, Sriram Rao, Giovanni M. Fumarola, Botong Huang, Kishore Chaliparambil, Arun Suresh, Young Chen, Solom Heddaya, Roni Burd, Sarvesh Sakalanaga, Chris Douglas, Bill Ramsey, Raghu Ramakrishnan
Symposium on Networked Systems Design and Implementation (NSDI) | February 2019
SOCK: Serverless-Optimized Containers.
Edward Oakes, Leon Yang, Dennis Zhou, Kevin Houck, Tyler Caraza-Harter, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau
;login: | December 2018, Vol 43(3)
PRETZEL: opening the black box of machine learning prediction serving systems
Yunseong Lee, Alberto Scolari, Byung-Gon Chun, Marco Domenico Santambrogio, Matteo Interlandi, Markus Weimer
Operating Systems Design and Implementation | October 2018
Netco: Cache and I/O Management for Analytics over Disaggregated Stores
Virajith Jalaparti, Chris Douglas, Mainak Ghosh, Ashvin Agrawal, Avrilia Floratou, Srikanth Kandula, Ishai Menache, Joseph (Seffi) Naor, Sriram Rao
ACM Symposium on Cloud Computing (SOCC) | October 2018
Best Paper Award
SOCK: rapid task provisioning with serverless-optimized containers
Edward Oakes, Leon Yang, Dennis Zhou, Kevin Houck, Tyler Harter, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau
USENIX Annual Technical Conference | July 2018
Survivability of Cloud Databases – Factors and Prediction
Jose Picado, Willis Lang, Edward C. Thayer
International Conference on Management of Data | May 2018
Computation Reuse in Analytics Job Service at Microsoft
Alekh Jindal, Shi Qiao, Hiren Patel, Zhicheng Yin, Jieming Di, Malay Bag, Marc Friedman, Yifung Lin, Konstantinos Karanasos, Sriram Rao
2018 International Conference on Management of Data | May 2018
Selecting subexpressions to materialize at datacenter scale
Alekh Jindal, Konstantinos Karanasos, Sriram Rao, Hiren Patel
Very Large Data Bases (VLDB) | February 2018
Advancements in YARN Resource Manager
Konstantinos Karanasos, Arun Suresh, Chris Douglas
Encyclopedia of Big Data Technologies | February 2018
Characterizing Resource Sensitivity of Database Workloads
Rathijit Sen, Karthik Ramachandra
High-Performance Computer Architecture | January 2018
Medea: Scheduling of Long Running Applications in Shared Production Clusters
Panagiotis Garefalakis, Konstantinos Karanasos, Peter Pietzuch, Arun Suresh, Sriram Rao
European Conference on Computer Systems (EuroSys) | January 2018
Towards High-Performance Prediction Serving Systems
Yunseong Lee, Alberto Scolari, Byung-Gon Chun, Matteo Interlandi, Markus Weimer
31st Conference on Neural Information Processing Systems | December 2017
Apache REEF: Retainable Evaluator Execution Framework
Byung-Gon Chun, Tyson Condie, Yingda Chen, Brian Cho, Andrew Chung, Carlo Curino, Chris Douglas, Beomyeol Jeon, Joo Seong Jeong, Gyewon Lee, Yunseong Lee, Tony Majestro, Dahlia Malkhi, Sergiy Matusevych, Brandon Myers, Mariia Mykhailova, Shravan Narayanamurthy, Joseph Noor, Raghu Ramakrishnan, Sriram Rao, Russell Sears, Beysim Sezgin, Taegeon Um, Julia Wang, Youngseok Yang, Raghu Ramakrishnan, Carlo Curino, Matteo Interlandi
ACM Transactions on Computer Systems | October 2017, Vol 35(2): pp. 5
Energy-Proportional Computing: A New Definition
Rathijit Sen, David A. Wood
IEEE Computer | July 2017, Vol 50(8): pp. 26-33
Frequency governors for cloud database OLTP workloads
Rathijit Sen, Alan Halverson
International Symposium on Low Power Electronics and Design | July 2017
Pareto Governors for Energy-Optimal Computing
Rathijit Sen, David A. Wood
ACM Transactions on Architecture and Code Optimization | April 2017, Vol 14(1): pp. 6
Resource bricolage and resource selection for parallel database systems
Jiexing Li, Jeffrey F. Naughton, Rimma V. Nehme
Very Large Data Bases | January 2017
Morpheus: Towards Automated SLOs for Enterprise Clusters
Sangeetha Abdu Jyothi, Carlo Curino, Ishai Menache, Shravan Matthur Narayanamurthy, Alexey Tumanov, Jonathan Yaniv, Ruslan Mavlyutov, Íñigo Goiri, Subru Krishnan, Janardhan (Jana) Kulkarni, Sriram Rao
2016 International Symposium on Operating Systems Design and Implementation (OSDI) | November 2016
PerfOrator: Eloquent Performance Models for Resource Optimization
Dharmesh Kakadia, Kaushik Rajan, Carlo Curino, Subru Krishnan
ACM Symposium on Cloud Computing 2016 (SoCC’16) | October 2016
Efficient Queue Management for Cluster Scheduling
Jeff Rasley, Konstantinos Karanasos, Srikanth Kandula, Rodrigo Fonseca, Sriram Rao, Milan Vojnovic
European Conference on Computer Systems (EuroSys) | April 2016
Do the Hard Stuff First: Scheduling Dependent Computations in Data Analytics Clusters
Robert Grandl, Srikanth Kandula, Sriram Rao, Aditya Akella, Janardhan (Jana) Kulkarni
MSR-TR-2016-19 | February 2016
Published by Microsoft
Mercury: Hybrid Centralized and Distributed Scheduling in Large Shared Clusters
Konstantinos Karanasos, Sriram Rao, Carlo Curino, Chris Douglas, Kishore Chaliparambil, Giovanni Matteo Fumarola, Solom Heddaya, Raghu Ramakrishnan, Sarvesh Sakalanaga
USENIX Annual Technical Conference (USENIX ATC’2015) | July 2015
Blind men and an elephant coalescing open-source, academic, and industrial perspectives on BigData
Chris Douglas, Carlo Curino
International Conference on Data Engineering | April 2015
WANalytics: Analytics for a Geo-Distributed Data-Intensive World
Ashish Vulimiri, Carlo Curino, Brighten Godfrey, Konstantinos Karanasos, George Varghese
Conference on Innovative Data Systems Research (CIDR) | January 2015
Multi-resource Packing for Cluster Schedulers
Robert Grandl, Ganesh Ananthanarayanan, Srikanth Kandula, Sriram Rao, Aditya Akella
ACM SIGCOMM | August 2014
Indexing HDFS data in PDW: splitting the data from the index
Vinitha Reddy Gankidi, Nikhil Teletia, Jignesh M. Patel, Alan Halverson, David J. DeWitt
Very Large Data Bases | July 2014
Partial results in database systems
Willis Lang, Rimma V. Nehme, Eric Robinson, Jeffrey F. Naughton
International Conference on Management of Data | June 2014
Distributed and Scalable PCA in the Cloud
Arun Kumar, Vijay Narayanan, Nikos Karampatziakis, Paul Mineiro, Markus Weimer
MSR-TR-2014-165 | January 2014
Published by Microsoft
Apache Reef Research Paper
Reservation-based Scheduling: If You’re Late Don’t Blame Us!
Carlo Curino, Djellel E Difallah, Chris Douglas, Subru Krishnan, Rahgu Ramakrishnan, Sriram Rao
MSR-TR-2013-108 | October 2013
Published in SoCC 2014
Split query processing in polybase
David J. DeWitt, Rimma Nehme, Srinath Shankar, Josep Aguilar-Saborit, Artin Avanes, Miro Flasza, Jim Gramling, Alan Halverson
2013 ACM SIGMOD International Conference on Management of Data | June 2013
Towards energy-efficient database cluster design
Willis Lang, Stavros Harizopoulos, Jignesh M. Patel, Mehul A. Shah, Dimitris Tsirogiannis
2012 Very Large Data Bases | August 2012, Vol 5(11): pp. 1684-1695
Toward Progress Indicators on Steroids for Big Data Systems.
Jiexing Li, Rimma V. Nehme, Jeffrey F. Naughton
Conference on Innovative Data Systems Research | January 2012
Rethinking Query Processing for Energy Efficiency: Slowing Down to Win the Race.
Nicolas Bruno, Surajit Chaudhuri, Arnd Christian König, Vivek Narasayya, Ravi Ramamurthy, Manoj Syamala
IEEE Data(base) Engineering Bulletin | November 2011, Vol 34: pp. 12-19
The Mimicking Octopus: Towards a one-size-fits-all Database Architecture
Alekh Jindal
36th International Conference on VLDB | September 2010
Wimpy node clusters: what about non-wimpy workloads?
Willis Lang, Jignesh M. Patel, Srinath Shankar
Data Management on New Hardware | June 2010
Towards Eco-friendly Database Management Systems
Willis Lang, Jignesh M. Patel
Conference on Innovative Data Systems Research | December 2009
-
MLOS in Action: Bridging the Gap Between Experimentation and Auto-Tuning in the Cloud
Brian Kroth, Sergiy Matusevych, Rana Alotaibi, Yiwen Zhu, Anja Gruenheid, Yuanyuan Tian
Proc. VLDB Endow. | October 2024, Vol 17: pp. 4269-4272
Lorentz: Learned SKU Recommendation Using Profile Data
Nick Glaze, Tria McNeely, Yiwen Zhu, Matthew Gleeson, Helen Serr, Rajeev Bhopi, Subru Krishnan, Yiwen Zhu, Subru Krishnan
SIGMOD 2024 | May 2024, Vol 2: pp. 149
Intelligent Pooling: Proactive Resource Provisioning in Large-scale Cloud Service
Deepak Ravikumar, Alex Yeo, Yiwen Zhu, Aditya Lakra, Harsha Nagulapalli, Santhosh Ravindran, Steve Suh, Niharika Dutta, Andrew Fogarty, Yoonjae Park, Sumeet Khushalani, Arijit Tarafdar, Kunal Parekh, Subru Krishnan, Yiwen Zhu, Subru Krishnan
Proc. VLDB Endow. | February 2024, Vol 17: pp. 1618-1627
Performance Roulette: How Cloud Weather Affects ML-Based System Optimization
Johannes Freischuetz, Konstantinos Kanellis, Brian Kroth, Shivaram Venkataraman
ML for Systems Workshop at NeurIPS 2023 | December 2023
Towards Building Autonomous Data Services on Azure
Yiwen Zhu, Yuanyuan Tian, Joyce Cahoon, Subru Krishnan, Ankita Agarwal, Rana Alotaibi, Jesús Camacho-Rodríguez, Bibin A Chundatt, Andrew Chung, Niharika Dutta, Andrew Fogarty, Anja Gruenheid, Brandon Haynes, Matteo Interlandi, Minu Iyer, Nick Jurgens, Sumeet Khushalani, Brian Kroth, Manoj Kumar, Jyoti Leeka, Sergiy Matusevych, Minni Mittal, Andreas C. Müller, Kartheek Muthyala, Harsha Nagulapalli, Yoonjae Park, Hiren Patel, Anna Pavlenko, Olga Poppe, Santhosh Ravindran, Karla Saur, Rathijit Sen, Steve Suh, Arijit Tarafdar, Kunal Waghray, Demin Wang, Carlo Curino, Raghu Ramakrishnan
ACM SIGMOD | June 2023
Runtime Variation in Big Data Analytics
Yiwen Zhu, Rathijit Sen, Robert Horton, John Mark Agosta
ACM SIGMOD | May 2023
The Tensor Data Platform: Towards an AI-centric Database System
Apurva Gandhi, Yuki Asada, Victor Fu, Advitya Gemawat, Lihao Zhang, Rathijit Sen, Carlo Curino, Jesús Camacho-Rodríguez, Matteo Interlandi
CIDR | January 2023
Share the Tensor Tea: How Databases can Leverage the Machine Learning Ecosystem
Yuki Asada, Victor Fu, Apurva Gandhi, Advitya Gemawat, Lihao Zhang, Dong He, Vivek Gupta, Ehi Nosakhare, Dalitso Banda, Rathijit Sen, Matteo Interlandi
VLDB | September 2022, Vol 15(12): pp. 3598-3601
Best demo award
Query Processing on Tensor Computation Runtimes
Dong He, Supun Nakandala, Dalitso Banda, Rathijit Sen, Karla Saur, Kwanghyun Park, Carlo Curino, Jesús Camacho-Rodríguez, Konstantinos Karanasos, Matteo Interlandi
VLDB 2022 | September 2022
Deploying a Steered Query Optimizer in Production at Microsoft
Wangda Zhang, Matteo Interlandi, Paul Mineiro, Shi Qiao, Nasim Ghazanfari, Karlen Lie, Marc Friedman, Rafah Hosn, Hiren Patel, Alekh Jindal
2022 International Conference on Management of Data | July 2022
Data Science Through the Looking Glass
Fotis Psallidas, Yiwen Zhu, Bojan Karlaš, Jordan Henkel, Matteo Interlandi, Subru Krishnan, Brian Kroth, Venkatesh Emani, Wentao Wu, Ce Zhang, Markus Weimer, Avrilia Floratou, Carlo Curino, Konstantinos Karanasos
SIGMOD Record | June 2022, Vol 51(2): pp. 30-37
End-to-end Optimization of Machine Learning Prediction Queries
Kwanghyun Park, Karla Saur, Dalitso Banda, Rathijit Sen, Matteo Interlandi, Konstantinos Karanasos
SIGMOD | June 2022
A Tensor Compiler for Unified Machine Learning Prediction Serving
Supun Nakandala, Karla Saur, Gyeong-In Yu, Konstantinos Karanasos, Carlo Curino, Markus Weimer, Matteo Interlandi
Symposium on Operating Systems Design and Implementation (OSDI) | November 2020
AutoToken: predicting peak parallelism for big data analytics at Microsoft
Rathijit Sen, Alekh Jindal, Hiren Patel, Shi Qiao
Very Large Data Bases | July 2020
Automated Tuning of Query Degree of Parallelism via Machine Learning
Zhiwei Fan, Rathijit Sen, Paraschos Koutris, Aws Albarghouthi
2020 International Conference on Management of Data | June 2020
A Comparative Exploration of ML Techniques for Tuning Query Degree of Parallelism.
Zhiwei Fan, Rathijit Sen, Paraschos Koutris, Aws Albarghouthi
May 2020
Cloudy with high chance of DBMS: A 10-year prediction for Enterprise-Grade ML
Ashvin Agrawal, Rony Chatterjee, Carlo Curino, Avrilia Floratou, Neha Godwal, Matteo Interlandi, Alekh Jindal, Konstantinos Karanasos, Subru Krishnan, Brian Kroth, Jyoti Leeka, Kwanghyun Park, Hiren Patel, Olga Poppe, Fotis Psallidas, Raghu Ramakrishnan, Abhishek Roy, Karla Saur, Rathijit Sen, Markus Weimer, Travis Wright, Yiwen Zhu
Conference on Innovative Data Systems Research (CIDR) | January 2020
Extending Relational Query Processing with ML Inference
Konstantinos Karanasos, Matteo Interlandi, Doris Xin, Fotis Psallidas, Rathijit Sen, Kwanghyun Park, Ivan Popivanov, Supun Nakandal, Subru Krishnan, Markus Weimer, Yuan Yu, Raghu Ramakrishnan, Carlo Curino
Conference on Innovative Data Systems Research (CIDR) | January 2020
Peregrine: Workload Optimization for Cloud Query Engines
Alekh Jindal, Hiren Patel, Abhishek Roy, Shi Qiao, Zhicheng Yin, Rathijit Sen, Subru Krishnan
Symposium on Cloud Computing | November 2019
Griffon: Reasoning about Job Anomalies with Unlabeled Data in Cloud-based Platforms
Liqun Shao, Yiwen Zhu, Siqi Liu, Abhiram Eswaran, Kristin Lieber, Janhavi Mahajan, Minsoo Thigpen, Sudhir Darbha, Subru Krishnan, Soundar Srinivasan, Carlo Curino, Konstantinos Karanasos
Symposium on Cloud Computing (SoCC) | August 2019
Towards a learning optimizer for shared clouds
Chenggang Wu, Alekh Jindal, Saeed Amizadeh, Hiren Patel, Wangchao Le, Shi Qiao, Sriram Rao
Very Large Data Bases | October 2018
Query and Resource Optimization: Bridging the Gap
Lalitha Viswanathan, Alekh Jindal, Konstantinos Karanasos
International Conference on Data Engineering (ICDE) | April 2018
Elastic Distributed Bayesian Collaborative Filtering
Alex Beutel, Markus Weimer, Tom Minka, Yordan Zaykov, Vijay Narayanan
January 2014
Towards Resource-Elastic Machine Learning
Dhruv Mahajan, Sundararajan Sellamanickam, Markus Weimer, Keerthi Selvaraj
January 2013
-
MLOS in Action: Bridging the Gap Between Experimentation and Auto-Tuning in the Cloud
Brian Kroth, Sergiy Matusevych, Rana Alotaibi, Yiwen Zhu, Anja Gruenheid, Yuanyuan Tian
Proc. VLDB Endow. | October 2024, Vol 17: pp. 4269-4272
VASIM: Vertical Autoscaling Simulator Toolkit
Anna Pavlenko, Karla Saur, Yiwen Zhu, Brian Kroth, Joyce Cahoon, Jesús Camacho-Rodríguez
IEEE International Conference on Data Engineering (ICDE 2024) | May 2024
PolySem: Efficient Polyglot Analytics on Semantic Data
Xinyu Liu, Venkatesh Emani, Avrilia Floratou, Joyce Cahoon, Philip Seamark, Carlo Curino
Poly’23: Polystore systems for heterogeneous data in multiple databases with privacy and security assurances | August 2023
PyFroid: Scaling Data Analysis on a Commodity Workstation
Venkatesh Emani, Avrilia Floratou, Carlo Curino
EDBT 2024 | August 2023
Data Science Through the Looking Glass
Fotis Psallidas, Yiwen Zhu, Bojan Karlaš, Jordan Henkel, Matteo Interlandi, Subru Krishnan, Brian Kroth, Venkatesh Emani, Wentao Wu, Ce Zhang, Markus Weimer, Avrilia Floratou, Carlo Curino, Konstantinos Karanasos
SIGMOD Record | June 2022, Vol 51(2): pp. 30-37
Magpie: Python at Speed and Scale using Cloud Backends
Alekh Jindal, Venkatesh Emani, Maureen Daum, Olga Poppe, Brandon Haynes, Anna Pavlenko, Ayushi Gupta, Karthik Ramachandra, Carlo Curino, Andreas C. Müller, Wentao Wu, Hiren Patel
Conference on Innovative Data Systems Research (CIDR 2021) | February 2021
ROSA: R Optimizations with Static Analysis
Rathijit Sen, Jianqiao Zhu, Jignesh M. Patel, Somesh Jha
RIOT 2017 | July 2017
-
LST-Bench: Benchmarking Log-Structured Tables in the Cloud
Jesús Camacho-Rodríguez, Ashvin Agrawal, Anja Gruenheid, Ashit Gosalia, Cristian Petculescu, Josep Aguilar-Saborit, Avrilia Floratou, Carlo Curino, Raghu Ramakrishnan
ACM SIGMOD | June 2024
Sibyl: Forecasting Time-Evolving Query Workloads
Hanxian Huang, Tarique Siddiqui, Rana Alotaibi, Carlo Curino, Jyoti Leeka, Alekh Jindal, Jishen Zhao, Jesús Camacho-Rodríguez, Yuanyuan Tian
SIGMOD | June 2024
Vertically Autoscaling Monolithic Applications with CaaSPER: Scalable Container-as-a-Service Performance Enhanced Resizing Algorithm for the Cloud
Anna Pavlenko, Joyce Cahoon, Yiwen Zhu, Brian Kroth, Michael Nelson, Andrew Carter, David Liao, Travis Wright, Jesús Camacho-Rodríguez, Karla Saur
SIGMOD | June 2024
to appear
Lorentz: Learned SKU Recommendation Using Profile Data
Nick Glaze, Tria McNeely, Yiwen Zhu, Matthew Gleeson, Helen Serr, Rajeev Bhopi, Subru Krishnan, Yiwen Zhu, Subru Krishnan
SIGMOD 2024 | May 2024, Vol 2: pp. 149
VASIM: Vertical Autoscaling Simulator Toolkit
Anna Pavlenko, Karla Saur, Yiwen Zhu, Brian Kroth, Joyce Cahoon, Jesús Camacho-Rodríguez
IEEE International Conference on Data Engineering (ICDE 2024) | May 2024
Intelligent Pooling: Proactive Resource Provisioning in Large-scale Cloud Service
Deepak Ravikumar, Alex Yeo, Yiwen Zhu, Aditya Lakra, Harsha Nagulapalli, Santhosh Ravindran, Steve Suh, Niharika Dutta, Andrew Fogarty, Yoonjae Park, Sumeet Khushalani, Arijit Tarafdar, Kunal Parekh, Subru Krishnan, Yiwen Zhu, Subru Krishnan
Proc. VLDB Endow. | February 2024, Vol 17: pp. 1618-1627
GEqO: ML-Accelerated Semantic Equivalence Detection
Brandon Haynes, Rana Alotaibi, Anna Pavlenko, Jyoti Leeka, Alekh Jindal, Yuanyuan Tian
Proceedings of the ACM on Management of Data | December 2023
Optimizing Data Pipelines for Machine Learning in Feature Stores
Rui Liu, Kwanghyun Park, Fotis Psallidas, Xiaoyong Zhu, Jinghui Mo, Rathijit Sen, Matteo Interlandi, Konstantinos Karanasos, Yuanyuan Tian, Jesús Camacho-Rodríguez
Proc. VLDB Endow. | August 2023, Vol 16: pp. 4230-4239
PolySem: Efficient Polyglot Analytics on Semantic Data
Xinyu Liu, Venkatesh Emani, Avrilia Floratou, Joyce Cahoon, Philip Seamark, Carlo Curino
Poly’23: Polystore systems for heterogeneous data in multiple databases with privacy and security assurances | August 2023
PyFroid: Scaling Data Analysis on a Commodity Workstation
Venkatesh Emani, Avrilia Floratou, Carlo Curino
EDBT 2024 | August 2023
OneProvenance: Efficient Extraction of Dynamic Coarse-Grained Provenance From Database Query Event Logs
Fotis Psallidas, Ashvin Agrawal, Chandru Sugunan, Khaled Ibrahim, Konstantinos Karanasos, Jesus Camacho-Rodriguez, A. Floratou, C. Curino, R. Ramakrishnan
Proc. VLDB Endow. | August 2023, Vol 16: pp. 3662-3675
Demonstration of Geyser: Provenance Extraction and Applications over Data Science Scripts
Fotis Psallidas, Megan Eileen Leszczynski, Mohammad Hossein Namaki, Avrilia Floratou, Ashvin Agrawal, Konstantinos Karanasos, Subru Krishnan, Pavle Subotić, Markus Weimer, Yinghui Wu, Yiwen Zhu
ACM SIGMOD | July 2023
Exploiting Structure in Regular Expression Queries
Ling Zhang, Shaleen Deep, Avrilia Floratou, Anja Gruenheid, Jignesh M. Patel, Yiwen Zhu
ACM SIGMOD | July 2023
A Deep Dive into Common Open Formats for Analytical DBMSs
Chunwei Liu, Anna Pavlenko, Matteo Interlandi, Brandon Haynes
Proc. VLDB Endow. | June 2023, Vol 16: pp. 3044-3056
Best Paper Runner-Up
Query Processing on Gaming Consoles
Wei Cui, Qianxi Zhang, Jesús Camacho-Rodríguez, Spyros Blanas, Brandon Haynes, Yinan Li, Peng Cheng, Ravishankar Ramamurthy, Rathijit Sen, Matteo Interlandi
June 2023
Towards Building Autonomous Data Services on Azure
Yiwen Zhu, Yuanyuan Tian, Joyce Cahoon, Subru Krishnan, Ankita Agarwal, Rana Alotaibi, Jesús Camacho-Rodríguez, Bibin A Chundatt, Andrew Chung, Niharika Dutta, Andrew Fogarty, Anja Gruenheid, Brandon Haynes, Matteo Interlandi, Minu Iyer, Nick Jurgens, Sumeet Khushalani, Brian Kroth, Manoj Kumar, Jyoti Leeka, Sergiy Matusevych, Minni Mittal, Andreas C. Müller, Kartheek Muthyala, Harsha Nagulapalli, Yoonjae Park, Hiren Patel, Anna Pavlenko, Olga Poppe, Santhosh Ravindran, Karla Saur, Rathijit Sen, Steve Suh, Arijit Tarafdar, Kunal Waghray, Demin Wang, Carlo Curino, Raghu Ramakrishnan
ACM SIGMOD | June 2023
Runtime Variation in Big Data Analytics
Yiwen Zhu, Rathijit Sen, Robert Horton, John Mark Agosta
ACM SIGMOD | May 2023
Schema Matching using Pre-Trained Language Models
Yunjia Zhang, Avrilia Floratou, Joyce Cahoon, Subru Krishnan, Andreas C. Müller, Dalitso Banda, Fotis Psallidas, Jignesh M. Patel
ICDE | January 2023
The Fine-Grained Complexity of CFL Reachability
Paraschos Koutris, Shaleen Deep
Principles of Programming Languages (POPL 2023) | January 2023
The Tensor Data Platform: Towards an AI-centric Database System
Apurva Gandhi, Yuki Asada, Victor Fu, Advitya Gemawat, Lihao Zhang, Rathijit Sen, Carlo Curino, Jesús Camacho-Rodríguez, Matteo Interlandi
CIDR | January 2023
DIAMETRICS: Benchmarking Query Engines at Scale
Shaleen Deep, Anja Gruenheid, Kruthi Nagaraj, Hiro Naito, Jeffrey Naughton, Stratis Viglas
Communications of The ACM | December 2022, Vol 65(12): pp. 105-112
Research Highlight
Comprehensive and Efficient Workload Summarization
Shaleen Deep, Anja Gruenheid, Paraschos Koutris, Stratis Viglas, Jeffrey Naughton
Datenbank-Spektrum | November 2022
Research Highlight
Diversity and Inclusion Activities in Database Conferences: A 2021 Report.
Sihem Amer-Yahia, Yael Amsterdamer, Sourav S. Bhowmick,, Angela Bonifati, Philippe Bonnet, Renata Borovica-Gajic, Barbara Catania, Tania Cerquitelli, Silvia Chiusano, Panos K. Chrysanthis, Carlo Curino, Jérôme Darmont, Amr El Abbadi, Avrilia Floratou, Juliana Freire, Alekh Jindal, Vana Kalogeraki, Georgia Koutrika, Arun Kumar, Sujaya Maiyya, Alexandra Meliou, Madhulika Mohanty, Felix Naumann, Nele Sina Noack, Liat Peterfreund, Fatma Özcan, Wenny Rahayu, Wang-Chiew Tan, Yuanyuan Tian, Pinar Tözün, Genoveva Vargas-Solar, Neeraja J. Yadwadkar, Meihui Zhang
SIGMOD Record | November 2022
Share the Tensor Tea: How Databases can Leverage the Machine Learning Ecosystem
Yuki Asada, Victor Fu, Apurva Gandhi, Advitya Gemawat, Lihao Zhang, Dong He, Vivek Gupta, Ehi Nosakhare, Dalitso Banda, Rathijit Sen, Matteo Interlandi
VLDB | September 2022, Vol 15(12): pp. 3598-3601
Best demo award
Query Processing on Tensor Computation Runtimes
Dong He, Supun Nakandala, Dalitso Banda, Rathijit Sen, Karla Saur, Kwanghyun Park, Carlo Curino, Jesús Camacho-Rodríguez, Konstantinos Karanasos, Matteo Interlandi
VLDB 2022 | September 2022
Pipemizer: An Optimizer for Analytics Data Pipelines
Sunny Gakhar, Joyce Cahoon, Wangchao Le, Xiangnan Li, Kaushik Ravichandran, Hiren Patel, Marc Friedman, Brandon Haynes, Shi Qiao, Alekh Jindal, Jyoti Leeka
PVLDB | September 2022
Containerized Execution of UDFs: An Experimental Evaluation
Karla Saur, Tara Mirmira, Konstantinos Karanasos, Jesús Camacho-Rodríguez
VLDB 2022 | September 2022
Doppler: Automated SKU Recommendation in Migrating SQL Workloads to the Cloud
Joyce Cahoon, Wenjing Wang, Yiwen Zhu, Katherine Lin, Sean Liu, Raymond Truong, Neetu Singh, Chengcheng Wan, Alexandra M Ciortea, Sreraman Narasimhan, Subru Krishnan
VLDB 2022 | August 2022
Deploying a Steered Query Optimizer in Production at Microsoft
Wangda Zhang, Matteo Interlandi, Paul Mineiro, Shi Qiao, Nasim Ghazanfari, Karlen Lie, Marc Friedman, Rafah Hosn, Hiren Patel, Alekh Jindal
2022 International Conference on Management of Data | July 2022
VIP Hashing — Adapting to Skew in Popularity of Data on the Fly
Aarati Kakaraparthy, Jignesh M. Patel, Brian Kroth, Kwanghyun Park
VLDB 2022 | June 2022
Data Science Through the Looking Glass
Fotis Psallidas, Yiwen Zhu, Bojan Karlaš, Jordan Henkel, Matteo Interlandi, Subru Krishnan, Brian Kroth, Venkatesh Emani, Wentao Wu, Ce Zhang, Markus Weimer, Avrilia Floratou, Carlo Curino, Konstantinos Karanasos
SIGMOD Record | June 2022, Vol 51(2): pp. 30-37
End-to-end Optimization of Machine Learning Prediction Queries
Kwanghyun Park, Karla Saur, Dalitso Banda, Rathijit Sen, Matteo Interlandi, Konstantinos Karanasos
SIGMOD | June 2022
LlamaTune: Sample-Efficient DBMS Configuration Tuning
Konstantinos Kanellis, Cong Ding, Brian Kroth, Andreas C. Müller, Carlo Curino, Shivaram Venkataraman
VLDB 2022 | March 2022
Phoebe: A Learning-based Checkpoint Optimizer
Yiwen Zhu, Matteo Interlandi, Abhishek Roy, Krishnadhan Das, Hiren Patel, Malay Bag, Hitesh Sharma, Alekh Jindal
VLDB 2021 | August 2021
Steering Query Optimizers: A Practical Take on Big Data Workloads
Parimarjan Negi, Matteo Interlandi, Ryan Marcus, Mohammad Alizadeh, Tim Kraska, Marc Friedman, Alekh Jindal
2021 International Conference on Management of Data | June 2021
Honorable mention for the Industry Track
HADAD: A Lightweight Approach for Optimizing Hybrid Complex Analytics Queries
Rana Alotaibi, Bogdan Cautis, Alin Deutsch, Ioana Manolescu
2021 International Conference on Management of Data | June 2021
HADAD: A Lightweight Approach for Optimizing Hybrid Complex Analytics Queries
Rana Alotaibi, Bogdan Cautis, Alin Deutsch, Ioana Manolescu
SIGMOD | June 2021
KEA: Tuning an Exabyte-Scale Data Infrastructure
Yiwen Zhu, Subru Krishnan, Konstantinos Karanasos, Isha Tarte, Conor Power, Abhishek Modi, Manoj Kumar, Deli Zhang, Deli Zhang, Kartheek Muthyala, Nick Jurgens, Sarvesh Sakalanaga, Sudhir Darbha, Minu Iyer, Ankita Agarwal, Carlo Curino
SIGMOD | June 2021
Microlearner: A fine-grained Learning Optimizer for Big Data Workloads at Microsoft
Alekh Jindal, Shi Qiao, Rathijit Sen, Hiren Patel
2021 International Conference on Data Engineering | April 2021
Property Graph Schema Optimization for Domain-Specific Knowledge Graphs
Rana Alotaibi, Chuan Lei, Abdul Quamar, Vasilis Efthymiou, Fatma Ozcan
ICDE | April 2021
FPGA for Aggregate Processing: The Good, The Bad, and The Ugly
Zubeyr F. Eryilmaz, Aarati Kakaraparthy, Jignesh M. Patel, Rathijit Sen, Kwanghyun Park
International Conference on Data Engineering (ICDE) | April 2021
Production Experiences from Computation Reuse at Microsoft
Alekh Jindal, Shi Qiao, Hiren Patel, Abhishek Roy, Jyoti Leeka, Brandon Haynes
2021 Extending Database Technology | March 2021
Hardware Acceleration for DBMS Machine Learning Scoring: Is It Worth the Overheads?
Zahra Azad, Rathijit Sen, Kwanghyun Park, Ajay Joshi
International Symposium on Performance Analysis of Systems and Software (ISPASS) | March 2021
Magpie: Python at Speed and Scale using Cloud Backends
Alekh Jindal, Venkatesh Emani, Maureen Daum, Olga Poppe, Brandon Haynes, Anna Pavlenko, Ayushi Gupta, Karthik Ramachandra, Carlo Curino, Andreas C. Müller, Wentao Wu, Hiren Patel
Conference on Innovative Data Systems Research (CIDR 2021) | February 2021
Applied Research Lessons from CloudViews Project
Alekh Jindal
Sigmod Record | December 2020, Vol 49(3): pp. 37-42
Unearthing inter-job dependencies for better cluster scheduling
Andrew Chung, Subru Krishnan, Konstantinos Karanasos, Carlo Curino, Gregory R. Ganger
Symposium on Operating Systems Design and Implementation (OSDI) | November 2020
Vamsa: Automated Provenance Tracking in Data Science Scripts
Mohammad Hossein Namaki, Avrilia Floratou, Fotis Psallidas, Subru Krishnan, Ashvin Agrawal, Yiwen Zhu, Markus Weimer, Yinghui Wu
KDD | August 2020
ESTOCADA: Towards Scalable Polystore Systems
Rana Alotaibi, Bogdan Cautis, Alin Deutsch, M. Latrache, Ioana Manolescu, Y. Yang
VLDB | July 2020
AutoToken: predicting peak parallelism for big data analytics at Microsoft
Rathijit Sen, Alekh Jindal, Hiren Patel, Shi Qiao
Very Large Data Bases | July 2020
Towards Plan-aware Resource Allocation in Serverless Query Processing
Malay Bag, Alekh Jindal, Hiren Patel
USENIX conference on Hot Topics in Cloud Ccomputing | July 2020
Lessons learned from the early performance evaluation of Intel Optane DC persistent memory in DBMS
Yinjun Wu, Kwanghyun Park, Rathijit Sen, Brian Kroth, Jae Young Do
2020 Data Management on New Hardware | June 2020
Automated Tuning of Query Degree of Parallelism via Machine Learning
Zhiwei Fan, Rathijit Sen, Paraschos Koutris, Aws Albarghouthi
2020 International Conference on Management of Data | June 2020
A Comparative Exploration of ML Techniques for Tuning Query Degree of Parallelism.
Zhiwei Fan, Rathijit Sen, Paraschos Koutris, Aws Albarghouthi
May 2020
Cost Models for Big Data Query Processing: Learning, Retrofitting, and Our Findings
Tarique Siddiqui, Alekh Jindal, Shi Qiao, Hiren Patel, Wangchao le
SIGMOD 2020: International Conference on Management of Data | February 2020
Cloudy with high chance of DBMS: A 10-year prediction for Enterprise-Grade ML
Ashvin Agrawal, Rony Chatterjee, Carlo Curino, Avrilia Floratou, Neha Godwal, Matteo Interlandi, Alekh Jindal, Konstantinos Karanasos, Subru Krishnan, Brian Kroth, Jyoti Leeka, Kwanghyun Park, Hiren Patel, Olga Poppe, Fotis Psallidas, Raghu Ramakrishnan, Abhishek Roy, Karla Saur, Rathijit Sen, Markus Weimer, Travis Wright, Yiwen Zhu
Conference on Innovative Data Systems Research (CIDR) | January 2020
Extending Relational Query Processing with ML Inference
Konstantinos Karanasos, Matteo Interlandi, Doris Xin, Fotis Psallidas, Rathijit Sen, Kwanghyun Park, Ivan Popivanov, Supun Nakandal, Subru Krishnan, Markus Weimer, Yuan Yu, Raghu Ramakrishnan, Carlo Curino
Conference on Innovative Data Systems Research (CIDR) | January 2020
Optimizing databases by learning hidden parameters of solid state drives
Aarati Kakaraparthy, Jignesh M. Patel, Kwanghyun Park, Brian Kroth
Proceedings of VLDB | December 2019
Big Data Processing at Microsoft: Hyper Scale, Massive Complexity, and Minimal Cost
Hiren Patel, Alekh Jindal, Clemens Szyperski
Symposium on Cloud Computing | November 2019
Peregrine: Workload Optimization for Cloud Query Engines
Alekh Jindal, Hiren Patel, Abhishek Roy, Shi Qiao, Zhicheng Yin, Rathijit Sen, Subru Krishnan
Symposium on Cloud Computing | November 2019
BlackMagic: Automatic Inlining of Scalar UDFs into SQL Queries with Froid
Karthik Ramachandra, Kwanghyun Park
Proceedings of VLDB | August 2019, Vol 12(12): pp. 1810-1813
Griffon: Reasoning about Job Anomalies with Unlabeled Data in Cloud-based Platforms
Liqun Shao, Yiwen Zhu, Siqi Liu, Abhiram Eswaran, Kristin Lieber, Janhavi Mahajan, Minsoo Thigpen, Sudhir Darbha, Subru Krishnan, Soundar Srinivasan, Carlo Curino, Konstantinos Karanasos
Symposium on Cloud Computing (SoCC) | August 2019
SparkCruise: handsfree computation reuse in Spark
Abhishek Roy, Alekh Jindal, Hiren Patel, Ashit Gosalia, Subru Krishnan, Carlo Curino
Very Large Data Bases | July 2019
Exploiting Intel Optane SSD for Microsoft SQL Server
Kan Wu, Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, Rathijit Sen, Kwanghyun Park
DaMoN 2019 | July 2019
Towards Scalable Hybrid Stores: Constraint-Based Rewriting to the Rescue
Rana Alotaibi, Damian Bursztyn, Alin Deutsch, Ioana Manolescu, Stamatis Zampetakis
SIGMOD | June 2019
Peering through the Dark: An Owl’s View of Inter-job Dependencies and Jobs’ Impact in Shared Clusters
Andrew Chung, Carlo Curino, Subru Krishnan, Konstantinos Karanasos, Panagiotis Garefalakis, Gregory R. Ganger
International Conference on Management of Data (SIGMOD) | June 2019
Query and Resource Optimizations: A Case for Breaking the Wall in Big Data Systems
Alekh Jindal, Lalitha Viswanathan, Konstantinos Karanasos
MSR-TR-2019-44 | June 2019
Published by Microsoft
Constant Time Recovery in Azure SQL Database
Panagiotis Antonopoulos, Peter Byrne, Wayne Chen, Cristian Diaconu, Raghavendra Thallam Kodandaramaih, Hanuma Kodavalla, Prashanth Purnananda, Adrian-Leonard Radu, Chaitanya Sreenivas Ravella, Girish Mittur Venkataramanappa
June 2019
Hydra: a federated resource manager for data-center scale analytics
Carlo Curino, Subru Krishnan, Konstantinos Karanasos, Sriram Rao, Giovanni M. Fumarola, Botong Huang, Kishore Chaliparambil, Arun Suresh, Young Chen, Solom Heddaya, Roni Burd, Sarvesh Sakalanaga, Chris Douglas, Bill Ramsey, Raghu Ramakrishnan
Symposium on Networked Systems Design and Implementation (NSDI) | February 2019
Towards a learning optimizer for shared clouds
Chenggang Wu, Alekh Jindal, Saeed Amizadeh, Hiren Patel, Wangchao Le, Shi Qiao, Sriram Rao
Very Large Data Bases | October 2018
Dhalion in action: automatic management of streaming applications
Avrilia Floratou, Ashvin Agrawal
Proceedings of the VLDB Endowment | August 2018, Vol 11(12)
Challenges and Opportunities in Transportation Data
Kristin Tufte, Kushal Datta, Alekh Jindal, David Maier, Robert L. Bertini
June 2018
Computation Reuse in Analytics Job Service at Microsoft
Alekh Jindal, Shi Qiao, Hiren Patel, Zhicheng Yin, Jieming Di, Malay Bag, Marc Friedman, Yifung Lin, Konstantinos Karanasos, Sriram Rao
2018 International Conference on Management of Data | May 2018
Columnar Storage Formats
Encyclopedia of Big Data Technologies | Published by Springer | 2018
Query and Resource Optimization: Bridging the Gap
Lalitha Viswanathan, Alekh Jindal, Konstantinos Karanasos
International Conference on Data Engineering (ICDE) | April 2018
Selecting subexpressions to materialize at datacenter scale
Alekh Jindal, Konstantinos Karanasos, Sriram Rao, Hiren Patel
Very Large Data Bases (VLDB) | February 2018
Robust Data Partitioning.
Alekh Jindal, Anil Shanbhag, Yi Lu
February 2018
Characterizing Resource Sensitivity of Database Workloads
Rathijit Sen, Karthik Ramachandra
High-Performance Computer Architecture | January 2018
Froid: Optimization of Imperative Programs in a Relational Database
Karthik Ramachandra, Kwanghyun Park, K. Venkatesh Emani, Alan Halverson, Cesar Galindo-Legaria, Conor Cunningham
Proceedings of VLDB | December 2017, Vol 11(4)
No data left behind: real-time insights from a complex data ecosystem
Manos Karpathiotakis, Avrilia Floratou, Fatma Ozcan, Anastasia Ailamaki
SOCC | September 2017
A robust partitioning scheme for ad-hoc query workloads
Anil Shanbhag, Alekh Jindal, Samuel Madden, Jorge Quiane, Aaron J. Elmore
Symposium on Cloud Computing | September 2017
Self-Regulating Streaming Systems: Challenges and Opportunities
Avrilia Floratou, Ashvin Agrawal
BIRTE | August 2017
Dhalion: Self-Regulating Stream Processing in Heron
Avrilia Floratou, Ashvin Agrawal, Bill Graham, Sriram Rao, Karthik Ramasamy
Proceedings of the VLDB Endowment | August 2017
Frequency governors for cloud database OLTP workloads
Rathijit Sen, Alan Halverson
International Symposium on Low Power Electronics and Design | July 2017
Pipsqueak: Lean Lambdas with Large Libraries
Edward Oakes, Leon Yang, Kevin Houck, Tyler Harter, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau
International Conference on Distributed Computing Systems Workshops | July 2017
ROSA: R Optimizations with Static Analysis
Rathijit Sen, Jianqiao Zhu, Jignesh M. Patel, Somesh Jha
RIOT 2017 | July 2017
Twitter Heron: Towards Extensible Streaming Engines
Maosong Fu, Ashvin Agrawal, Avrilia Floratou, Bill Graham, Andrew Jorgensen, Mark Li, Neng Lu, Karthik Ramasamy, Sriram Rao, Cong Wang
ICDE | May 2017
Pareto Governors for Energy-Optimal Computing
Rathijit Sen, David A. Wood
ACM Transactions on Architecture and Code Optimization | April 2017, Vol 14(1): pp. 6
INGESTBASE: A Declarative Data Ingestion System.
Alekh Jindal, Jorge-Arnulfo Quiané-Ruiz, Samuel Madden
MSR-TR-2017-62 | January 2017
Published by Microsoft
AdaptDB: adaptive partitioning for distributed joins
Yi Lu, Anil Shanbhag, Alekh Jindal, Samuel Madden
Very Large Data Bases | January 2017
A moeba : a shape changing storage system for big data
Anil Shanbhag, Alekh Jindal, Yi Lu, Samuel Madden
Very Large Data Bases | August 2016
GraphFrames: an integrated API for mixing graph and relational queries
Ankur Dave, Alekh Jindal, Li Erran Li, Reynold Xin, Joseph Gonzalez, Matei Zaharia
GRADES | June 2016
WANalytics: Geo-Distributed Analytics for a Data Intensive World
Ashish Vulimiri, Carlo Curino, Philip Brighten Godfrey, Thomas Jungblut, Konstantinos Karanasos, Jitendra Padhye, George Varghese
International Conference on Management of Data (SIGMOD) | May 2015
Apache Tez: A Unifying Framework for Modeling and Building Data Processing Applications
Bikas Saha, Hitesh Shah, Siddharth Seth, Gopal Vijayaraghavan, Arun Murthy, Carlo Curino
SIGMOD ’15: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data | May 2015
Global Analytics in the Face of Bandwidth and Regulatory Constraints
Ashish Vulimiri, Carlo Curino, P. Brighten Godfrey, Thomas Jungblut, Jitu Padhye, George Varghese
12th USENIX Symposium on Networked Systems Design and Implementation (NSDI ’15) | May 2015
ISBN: 978-1-931971-218
Dynamically Optimizing Queries over Large Scale Data Platforms
Konstantinos Karanasos, Andrey Balmin, Marcel Kutsch, Fatma Ozcan, Vuk Ercegovac, Chunyang Xia, Jesse Jackson
International Conference on Management of Data (SIGMOD) | June 2014
Delta: Scalable Data Dissemination under Capacity Constraints
Konstantinos Karanasos, Asterios Katsifodimos, Ioana Manolescu
Very Large Data Bases (PVLDB) | December 2013
Reservation-based Scheduling: If You’re Late Don’t Blame Us!
Carlo Curino, Djellel E Difallah, Chris Douglas, Subru Krishnan, Rahgu Ramakrishnan, Sriram Rao
MSR-TR-2013-108 | October 2013
Published in SoCC 2014
Growing Triples on Trees: an XML-RDF Hybrid Model for Annotated Documents
Francois Goasdoue, Konstantinos Karanasos, Yannis Katsis, Julien Leblay, Ioana Manolescu, Stamatis Zampetakis
Very Large Data Bases Journal (VLDB J.) | June 2013, Vol 22: pp. 589-613
Fact Checking and Analyzing the Web
Francois Goasdoue, Konstantinos Karanasos, Yannis Katsis, Julien Leblay, Ioana Manolescu, Stamatis Zampetakis
International Conference on Management of Data (SIGMOD) | June 2013
Fast peak-to-peak behavior with SSD buffer pool
Jaeyoung Do, Donghui Zhang, J. M. Patel, D. J. DeWitt
International Conference on Data Engineering | April 2013
Towards energy-efficient database cluster design
Willis Lang, Stavros Harizopoulos, Jignesh M. Patel, Mehul A. Shah, Dimitris Tsirogiannis
2012 Very Large Data Bases | August 2012, Vol 5(11): pp. 1684-1695
Query optimization in Microsoft SQL Server PDW
Srinath Shankar, Rimma Nehme, Josep Aguilar-Saborit, Andrew Chung, Mostafa Elhemali, Eric Robinson, Mahadevan Sankara Subramanian, David DeWitt, César Galindo-Legaria, Alan Halverson
International Conference on Management of Data | May 2012
GSLPI: A Cost-Based Query Progress Indicator
Jiexing Li, Rimma V. Nehme, Jeffrey Naughton
International Conference on Data Engineering | March 2012
DOI PDF Publication Publication Publication Publication Publication
View Selection in Semantic Web Databases
Francois Goasdoue, Konstantinos Karanasos, Julien Leblay, Ioana Manolescu
Very Large Data Bases (PVLDB) | October 2011, Vol 5: pp. 97-108
On energy management, load balancing and replication
Willis Lang, Jignesh M. Patel, Jeffrey F. Naughton
International Conference on Management of Data | June 2010
DOI PDF Publication Publication Publication Publication Publication
-
OneProvenance: Efficient Extraction of Dynamic Coarse-Grained Provenance From Database Query Event Logs
Fotis Psallidas, Ashvin Agrawal, Chandru Sugunan, Khaled Ibrahim, Konstantinos Karanasos, Jesus Camacho-Rodriguez, A. Floratou, C. Curino, R. Ramakrishnan
Proc. VLDB Endow. | August 2023, Vol 16: pp. 3662-3675
A Tensor Compiler for Unified Machine Learning Prediction Serving
Supun Nakandala, Karla Saur, Gyeong-In Yu, Konstantinos Karanasos, Carlo Curino, Markus Weimer, Matteo Interlandi
Symposium on Operating Systems Design and Implementation (OSDI) | November 2020
Netco: Cache and I/O Management for Analytics over Disaggregated Stores
Virajith Jalaparti, Chris Douglas, Mainak Ghosh, Ashvin Agrawal, Avrilia Floratou, Srikanth Kandula, Ishai Menache, Joseph (Seffi) Naor, Sriram Rao
ACM Symposium on Cloud Computing (SOCC) | October 2018
Best Paper Award
Batch-Expansion Training: An Efficient Optimization Framework
Michał Dereziński, Dhruv Mahajan, S. Sathiya Keerthi, S. V. N. Vishwanathan, Markus Weimer
International Conference on Artificial Intelligence and Statistics | February 2018
Towards Geo-Distributed Machine Learning.
Ignacio Cano, Dhruv Mahajan, Giovanni Matteo Fumarola, Arvind Krishnamurthy, Markus Weimer, Carlo Curino
IEEE Data(base) Engineering Bulletin | December 2015, Vol 40: pp. 41-59
Partial results in database systems
Willis Lang, Rimma V. Nehme, Eric Robinson, Jeffrey F. Naughton
International Conference on Management of Data | June 2014
Towards Multi-Tenant Performance SLOs
Willis Lang, Srinath Shankar, Jignesh M. Patel, Ajay Kalhan
IEEE Transactions on Knowledge and Data Engineering | May 2014, Vol 26(6): pp. 1447-1463
Declarative Systems for Large-Scale Machine Learning.
Vinayak R. Borkar, Yingyi Bu, Michael J. Carey, Joshua Rosen, Neoklis Polyzotis, Tyson Condie, Markus Weimer, Raghu Ramakrishnan
IEEE Data(base) Engineering Bulletin | July 2012, Vol 35: pp. 24-32
WWW 2012 Tutorial: New Templates for Scalable Data Analysis
Amr Ahmed, Alexander J. Smola, Markus Weimer
25th International World Wide Web Conference | April 2012
Turbocharging DBMS buffer pool using SSDs
Jaeyoung Do, Donghui Zhang, Jignesh M. Patel, David J. DeWitt, Jeffrey F. Naughton, Alan Halverson
International Conference on Management of Data | June 2011
Automated partitioning design in parallel database systems
Rimma Nehme, Nicolas Bruno
International Conference on Management of Data | June 2011
Energy management for MapReduce clusters
Willis Lang, Jignesh M. Patel
Very Large Data Bases | August 2010
DOI PDF PDF Publication Publication Publication Publication Publication
-
OneProvenance: Efficient Extraction of Dynamic Coarse-Grained Provenance From Database Query Event Logs
Fotis Psallidas, Ashvin Agrawal, Chandru Sugunan, Khaled Ibrahim, Konstantinos Karanasos, Jesus Camacho-Rodriguez, A. Floratou, C. Curino, R. Ramakrishnan
Proc. VLDB Endow. | August 2023, Vol 16: pp. 3662-3675
Robust Estimation of Resource Consumption for SQL Queries using Statistical Techniques
Jiexing Li, Arnd Christian König, Arnd Christian König, Vivek Narasayya, Surajit Chaudhuri
38th International Conference on Very Large Databases | August 2012
-
OneProvenance: Efficient Extraction of Dynamic Coarse-Grained Provenance From Database Query Event Logs
Fotis Psallidas, Ashvin Agrawal, Chandru Sugunan, Khaled Ibrahim, Konstantinos Karanasos, Jesus Camacho-Rodriguez, A. Floratou, C. Curino, R. Ramakrishnan
Proc. VLDB Endow. | August 2023, Vol 16: pp. 3662-3675
-
FPGA for Aggregate Processing: The Good, The Bad, and The Ugly
Zubeyr F. Eryilmaz, Aarati Kakaraparthy, Jignesh M. Patel, Rathijit Sen, Kwanghyun Park
International Conference on Data Engineering (ICDE) | April 2021
Hardware Acceleration for DBMS Machine Learning Scoring: Is It Worth the Overheads?
Zahra Azad, Rathijit Sen, Kwanghyun Park, Ajay Joshi
International Symposium on Performance Analysis of Systems and Software (ISPASS) | March 2021
A Tensor Compiler for Unified Machine Learning Prediction Serving
Supun Nakandala, Karla Saur, Gyeong-In Yu, Konstantinos Karanasos, Carlo Curino, Markus Weimer, Matteo Interlandi
Symposium on Operating Systems Design and Implementation (OSDI) | November 2020
Optimizing databases by learning hidden parameters of solid state drives
Aarati Kakaraparthy, Jignesh M. Patel, Kwanghyun Park, Brian Kroth
Proceedings of VLDB | December 2019
Publications by Type
-
TUNA: Tuning Unstable and Noisy Cloud Applications
Johannes Freischuetz, Konstantinos Kanellis, Brian Kroth, Shivaram Venkataraman
Eurosys 2025 | March 2025
LST-Bench: Benchmarking Log-Structured Tables in the Cloud
Jesús Camacho-Rodríguez, Ashvin Agrawal, Anja Gruenheid, Ashit Gosalia, Cristian Petculescu, Josep Aguilar-Saborit, Avrilia Floratou, Carlo Curino, Raghu Ramakrishnan
ACM SIGMOD | June 2024
Sibyl: Forecasting Time-Evolving Query Workloads
Hanxian Huang, Tarique Siddiqui, Rana Alotaibi, Carlo Curino, Jyoti Leeka, Alekh Jindal, Jishen Zhao, Jesús Camacho-Rodríguez, Yuanyuan Tian
SIGMOD | June 2024
Vertically Autoscaling Monolithic Applications with CaaSPER: Scalable Container-as-a-Service Performance Enhanced Resizing Algorithm for the Cloud
Anna Pavlenko, Joyce Cahoon, Yiwen Zhu, Brian Kroth, Michael Nelson, Andrew Carter, David Liao, Travis Wright, Jesús Camacho-Rodríguez, Karla Saur
SIGMOD | June 2024
to appear
VASIM: Vertical Autoscaling Simulator Toolkit
Anna Pavlenko, Karla Saur, Yiwen Zhu, Brian Kroth, Joyce Cahoon, Jesús Camacho-Rodríguez
IEEE International Conference on Data Engineering (ICDE 2024) | May 2024
Performance Roulette: How Cloud Weather Affects ML-Based System Optimization
Johannes Freischuetz, Konstantinos Kanellis, Brian Kroth, Shivaram Venkataraman
ML for Systems Workshop at NeurIPS 2023 | December 2023
GEqO: ML-Accelerated Semantic Equivalence Detection
Brandon Haynes, Rana Alotaibi, Anna Pavlenko, Jyoti Leeka, Alekh Jindal, Yuanyuan Tian
Proceedings of the ACM on Management of Data | December 2023
PolySem: Efficient Polyglot Analytics on Semantic Data
Xinyu Liu, Venkatesh Emani, Avrilia Floratou, Joyce Cahoon, Philip Seamark, Carlo Curino
Poly’23: Polystore systems for heterogeneous data in multiple databases with privacy and security assurances | August 2023
PyFroid: Scaling Data Analysis on a Commodity Workstation
Venkatesh Emani, Avrilia Floratou, Carlo Curino
EDBT 2024 | August 2023
Demonstration of Geyser: Provenance Extraction and Applications over Data Science Scripts
Fotis Psallidas, Megan Eileen Leszczynski, Mohammad Hossein Namaki, Avrilia Floratou, Ashvin Agrawal, Konstantinos Karanasos, Subru Krishnan, Pavle Subotić, Markus Weimer, Yinghui Wu, Yiwen Zhu
ACM SIGMOD | July 2023
Exploiting Structure in Regular Expression Queries
Ling Zhang, Shaleen Deep, Avrilia Floratou, Anja Gruenheid, Jignesh M. Patel, Yiwen Zhu
ACM SIGMOD | July 2023
Query Processing on Gaming Consoles
Wei Cui, Qianxi Zhang, Jesús Camacho-Rodríguez, Spyros Blanas, Brandon Haynes, Yinan Li, Peng Cheng, Ravishankar Ramamurthy, Rathijit Sen, Matteo Interlandi
June 2023
Towards Building Autonomous Data Services on Azure
Yiwen Zhu, Yuanyuan Tian, Joyce Cahoon, Subru Krishnan, Ankita Agarwal, Rana Alotaibi, Jesús Camacho-Rodríguez, Bibin A Chundatt, Andrew Chung, Niharika Dutta, Andrew Fogarty, Anja Gruenheid, Brandon Haynes, Matteo Interlandi, Minu Iyer, Nick Jurgens, Sumeet Khushalani, Brian Kroth, Manoj Kumar, Jyoti Leeka, Sergiy Matusevych, Minni Mittal, Andreas C. Müller, Kartheek Muthyala, Harsha Nagulapalli, Yoonjae Park, Hiren Patel, Anna Pavlenko, Olga Poppe, Santhosh Ravindran, Karla Saur, Rathijit Sen, Steve Suh, Arijit Tarafdar, Kunal Waghray, Demin Wang, Carlo Curino, Raghu Ramakrishnan
ACM SIGMOD | June 2023
Runtime Variation in Big Data Analytics
Yiwen Zhu, Rathijit Sen, Robert Horton, John Mark Agosta
ACM SIGMOD | May 2023
Schema Matching using Pre-Trained Language Models
Yunjia Zhang, Avrilia Floratou, Joyce Cahoon, Subru Krishnan, Andreas C. Müller, Dalitso Banda, Fotis Psallidas, Jignesh M. Patel
ICDE | January 2023
The Fine-Grained Complexity of CFL Reachability
Paraschos Koutris, Shaleen Deep
Principles of Programming Languages (POPL 2023) | January 2023
The Tensor Data Platform: Towards an AI-centric Database System
Apurva Gandhi, Yuki Asada, Victor Fu, Advitya Gemawat, Lihao Zhang, Rathijit Sen, Carlo Curino, Jesús Camacho-Rodríguez, Matteo Interlandi
CIDR | January 2023
Query Processing on Tensor Computation Runtimes
Dong He, Supun Nakandala, Dalitso Banda, Rathijit Sen, Karla Saur, Kwanghyun Park, Carlo Curino, Jesús Camacho-Rodríguez, Konstantinos Karanasos, Matteo Interlandi
VLDB 2022 | September 2022
Pipemizer: An Optimizer for Analytics Data Pipelines
Sunny Gakhar, Joyce Cahoon, Wangchao Le, Xiangnan Li, Kaushik Ravichandran, Hiren Patel, Marc Friedman, Brandon Haynes, Shi Qiao, Alekh Jindal, Jyoti Leeka
PVLDB | September 2022
Containerized Execution of UDFs: An Experimental Evaluation
Karla Saur, Tara Mirmira, Konstantinos Karanasos, Jesús Camacho-Rodríguez
VLDB 2022 | September 2022
Doppler: Automated SKU Recommendation in Migrating SQL Workloads to the Cloud
Joyce Cahoon, Wenjing Wang, Yiwen Zhu, Katherine Lin, Sean Liu, Raymond Truong, Neetu Singh, Chengcheng Wan, Alexandra M Ciortea, Sreraman Narasimhan, Subru Krishnan
VLDB 2022 | August 2022
Deploying a Steered Query Optimizer in Production at Microsoft
Wangda Zhang, Matteo Interlandi, Paul Mineiro, Shi Qiao, Nasim Ghazanfari, Karlen Lie, Marc Friedman, Rafah Hosn, Hiren Patel, Alekh Jindal
2022 International Conference on Management of Data | July 2022
VIP Hashing — Adapting to Skew in Popularity of Data on the Fly
Aarati Kakaraparthy, Jignesh M. Patel, Brian Kroth, Kwanghyun Park
VLDB 2022 | June 2022
End-to-end Optimization of Machine Learning Prediction Queries
Kwanghyun Park, Karla Saur, Dalitso Banda, Rathijit Sen, Matteo Interlandi, Konstantinos Karanasos
SIGMOD | June 2022
LlamaTune: Sample-Efficient DBMS Configuration Tuning
Konstantinos Kanellis, Cong Ding, Brian Kroth, Andreas C. Müller, Carlo Curino, Shivaram Venkataraman
VLDB 2022 | March 2022
NyxCache: Flexible and Efficient Multi-tenant Persistent Memory Caching
Kan Wu, Kaiwei Tu, Yuvraj Patel, Rathijit Sen, Kwanghyun Park, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau
20th USENIX Conference on File and Storage Technologies | February 2022
Phoebe: A Learning-based Checkpoint Optimizer
Yiwen Zhu, Matteo Interlandi, Abhishek Roy, Krishnadhan Das, Hiren Patel, Malay Bag, Hitesh Sharma, Alekh Jindal
VLDB 2021 | August 2021
Steering Query Optimizers: A Practical Take on Big Data Workloads
Parimarjan Negi, Matteo Interlandi, Ryan Marcus, Mohammad Alizadeh, Tim Kraska, Marc Friedman, Alekh Jindal
2021 International Conference on Management of Data | June 2021
Honorable mention for the Industry Track
HADAD: A Lightweight Approach for Optimizing Hybrid Complex Analytics Queries
Rana Alotaibi, Bogdan Cautis, Alin Deutsch, Ioana Manolescu
2021 International Conference on Management of Data | June 2021
HADAD: A Lightweight Approach for Optimizing Hybrid Complex Analytics Queries
Rana Alotaibi, Bogdan Cautis, Alin Deutsch, Ioana Manolescu
SIGMOD | June 2021
KEA: Tuning an Exabyte-Scale Data Infrastructure
Yiwen Zhu, Subru Krishnan, Konstantinos Karanasos, Isha Tarte, Conor Power, Abhishek Modi, Manoj Kumar, Deli Zhang, Deli Zhang, Kartheek Muthyala, Nick Jurgens, Sarvesh Sakalanaga, Sudhir Darbha, Minu Iyer, Ankita Agarwal, Carlo Curino
SIGMOD | June 2021
Microlearner: A fine-grained Learning Optimizer for Big Data Workloads at Microsoft
Alekh Jindal, Shi Qiao, Rathijit Sen, Hiren Patel
2021 International Conference on Data Engineering | April 2021
Property Graph Schema Optimization for Domain-Specific Knowledge Graphs
Rana Alotaibi, Chuan Lei, Abdul Quamar, Vasilis Efthymiou, Fatma Ozcan
ICDE | April 2021
FPGA for Aggregate Processing: The Good, The Bad, and The Ugly
Zubeyr F. Eryilmaz, Aarati Kakaraparthy, Jignesh M. Patel, Rathijit Sen, Kwanghyun Park
International Conference on Data Engineering (ICDE) | April 2021
Production Experiences from Computation Reuse at Microsoft
Alekh Jindal, Shi Qiao, Hiren Patel, Abhishek Roy, Jyoti Leeka, Brandon Haynes
2021 Extending Database Technology | March 2021
Hardware Acceleration for DBMS Machine Learning Scoring: Is It Worth the Overheads?
Zahra Azad, Rathijit Sen, Kwanghyun Park, Ajay Joshi
International Symposium on Performance Analysis of Systems and Software (ISPASS) | March 2021
The Storage Hierarchy is Not a Hierarchy: Optimizing Caching on Modern Storage Devices with Orthus
Kan Wu, Zhihan Guo, Guanzhou Hu, Kaiwei Tu, Ramnatthan Alagappan, Rathijit Sen, Kwanghyun Park, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau
19th USENIX Conference on File and Storage Technologies | February 2021
Magpie: Python at Speed and Scale using Cloud Backends
Alekh Jindal, Venkatesh Emani, Maureen Daum, Olga Poppe, Brandon Haynes, Anna Pavlenko, Ayushi Gupta, Karthik Ramachandra, Carlo Curino, Andreas C. Müller, Wentao Wu, Hiren Patel
Conference on Innovative Data Systems Research (CIDR 2021) | February 2021
Unearthing inter-job dependencies for better cluster scheduling
Andrew Chung, Subru Krishnan, Konstantinos Karanasos, Carlo Curino, Gregory R. Ganger
Symposium on Operating Systems Design and Implementation (OSDI) | November 2020
A Tensor Compiler for Unified Machine Learning Prediction Serving
Supun Nakandala, Karla Saur, Gyeong-In Yu, Konstantinos Karanasos, Carlo Curino, Markus Weimer, Matteo Interlandi
Symposium on Operating Systems Design and Implementation (OSDI) | November 2020
Vamsa: Automated Provenance Tracking in Data Science Scripts
Mohammad Hossein Namaki, Avrilia Floratou, Fotis Psallidas, Subru Krishnan, Ashvin Agrawal, Yiwen Zhu, Markus Weimer, Yinghui Wu
KDD | August 2020
ESTOCADA: Towards Scalable Polystore Systems
Rana Alotaibi, Bogdan Cautis, Alin Deutsch, M. Latrache, Ioana Manolescu, Y. Yang
VLDB | July 2020
AutoToken: predicting peak parallelism for big data analytics at Microsoft
Rathijit Sen, Alekh Jindal, Hiren Patel, Shi Qiao
Very Large Data Bases | July 2020
Towards Plan-aware Resource Allocation in Serverless Query Processing
Malay Bag, Alekh Jindal, Hiren Patel
USENIX conference on Hot Topics in Cloud Ccomputing | July 2020
Lessons learned from the early performance evaluation of Intel Optane DC persistent memory in DBMS
Yinjun Wu, Kwanghyun Park, Rathijit Sen, Brian Kroth, Jae Young Do
2020 Data Management on New Hardware | June 2020
Automated Tuning of Query Degree of Parallelism via Machine Learning
Zhiwei Fan, Rathijit Sen, Paraschos Koutris, Aws Albarghouthi
2020 International Conference on Management of Data | June 2020
Cost Models for Big Data Query Processing: Learning, Retrofitting, and Our Findings
Tarique Siddiqui, Alekh Jindal, Shi Qiao, Hiren Patel, Wangchao le
SIGMOD 2020: International Conference on Management of Data | February 2020
Cloudy with high chance of DBMS: A 10-year prediction for Enterprise-Grade ML
Ashvin Agrawal, Rony Chatterjee, Carlo Curino, Avrilia Floratou, Neha Godwal, Matteo Interlandi, Alekh Jindal, Konstantinos Karanasos, Subru Krishnan, Brian Kroth, Jyoti Leeka, Kwanghyun Park, Hiren Patel, Olga Poppe, Fotis Psallidas, Raghu Ramakrishnan, Abhishek Roy, Karla Saur, Rathijit Sen, Markus Weimer, Travis Wright, Yiwen Zhu
Conference on Innovative Data Systems Research (CIDR) | January 2020
Extending Relational Query Processing with ML Inference
Konstantinos Karanasos, Matteo Interlandi, Doris Xin, Fotis Psallidas, Rathijit Sen, Kwanghyun Park, Ivan Popivanov, Supun Nakandal, Subru Krishnan, Markus Weimer, Yuan Yu, Raghu Ramakrishnan, Carlo Curino
Conference on Innovative Data Systems Research (CIDR) | January 2020
Optimizing databases by learning hidden parameters of solid state drives
Aarati Kakaraparthy, Jignesh M. Patel, Kwanghyun Park, Brian Kroth
Proceedings of VLDB | December 2019
Big Data Processing at Microsoft: Hyper Scale, Massive Complexity, and Minimal Cost
Hiren Patel, Alekh Jindal, Clemens Szyperski
Symposium on Cloud Computing | November 2019
Peregrine: Workload Optimization for Cloud Query Engines
Alekh Jindal, Hiren Patel, Abhishek Roy, Shi Qiao, Zhicheng Yin, Rathijit Sen, Subru Krishnan
Symposium on Cloud Computing | November 2019
Neptune: Scheduling Suspendable Tasks for Unified Stream/Batch Applications
Panagiotis Garefalakis, Konstantinos Karanasos, Peter Pietzuch
Symposium on Cloud Computing (SoCC) | November 2019
Griffon: Reasoning about Job Anomalies with Unlabeled Data in Cloud-based Platforms
Liqun Shao, Yiwen Zhu, Siqi Liu, Abhiram Eswaran, Kristin Lieber, Janhavi Mahajan, Minsoo Thigpen, Sudhir Darbha, Subru Krishnan, Soundar Srinivasan, Carlo Curino, Konstantinos Karanasos
Symposium on Cloud Computing (SoCC) | August 2019
SparkCruise: handsfree computation reuse in Spark
Abhishek Roy, Alekh Jindal, Hiren Patel, Ashit Gosalia, Subru Krishnan, Carlo Curino
Very Large Data Bases | July 2019
Exploiting Intel Optane SSD for Microsoft SQL Server
Kan Wu, Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, Rathijit Sen, Kwanghyun Park
DaMoN 2019 | July 2019
Towards Scalable Hybrid Stores: Constraint-Based Rewriting to the Rescue
Rana Alotaibi, Damian Bursztyn, Alin Deutsch, Ioana Manolescu, Stamatis Zampetakis
SIGMOD | June 2019
Peering through the Dark: An Owl’s View of Inter-job Dependencies and Jobs’ Impact in Shared Clusters
Andrew Chung, Carlo Curino, Subru Krishnan, Konstantinos Karanasos, Panagiotis Garefalakis, Gregory R. Ganger
International Conference on Management of Data (SIGMOD) | June 2019
Hydra: a federated resource manager for data-center scale analytics
Carlo Curino, Subru Krishnan, Konstantinos Karanasos, Sriram Rao, Giovanni M. Fumarola, Botong Huang, Kishore Chaliparambil, Arun Suresh, Young Chen, Solom Heddaya, Roni Burd, Sarvesh Sakalanaga, Chris Douglas, Bill Ramsey, Raghu Ramakrishnan
Symposium on Networked Systems Design and Implementation (NSDI) | February 2019
Towards a learning optimizer for shared clouds
Chenggang Wu, Alekh Jindal, Saeed Amizadeh, Hiren Patel, Wangchao Le, Shi Qiao, Sriram Rao
Very Large Data Bases | October 2018
PRETZEL: opening the black box of machine learning prediction serving systems
Yunseong Lee, Alberto Scolari, Byung-Gon Chun, Marco Domenico Santambrogio, Matteo Interlandi, Markus Weimer
Operating Systems Design and Implementation | October 2018
Netco: Cache and I/O Management for Analytics over Disaggregated Stores
Virajith Jalaparti, Chris Douglas, Mainak Ghosh, Ashvin Agrawal, Avrilia Floratou, Srikanth Kandula, Ishai Menache, Joseph (Seffi) Naor, Sriram Rao
ACM Symposium on Cloud Computing (SOCC) | October 2018
Best Paper Award
SOCK: rapid task provisioning with serverless-optimized containers
Edward Oakes, Leon Yang, Dennis Zhou, Kevin Houck, Tyler Harter, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau
USENIX Annual Technical Conference | July 2018
Survivability of Cloud Databases – Factors and Prediction
Jose Picado, Willis Lang, Edward C. Thayer
International Conference on Management of Data | May 2018
Computation Reuse in Analytics Job Service at Microsoft
Alekh Jindal, Shi Qiao, Hiren Patel, Zhicheng Yin, Jieming Di, Malay Bag, Marc Friedman, Yifung Lin, Konstantinos Karanasos, Sriram Rao
2018 International Conference on Management of Data | May 2018
Query and Resource Optimization: Bridging the Gap
Lalitha Viswanathan, Alekh Jindal, Konstantinos Karanasos
International Conference on Data Engineering (ICDE) | April 2018
Selecting subexpressions to materialize at datacenter scale
Alekh Jindal, Konstantinos Karanasos, Sriram Rao, Hiren Patel
Very Large Data Bases (VLDB) | February 2018
Batch-Expansion Training: An Efficient Optimization Framework
Michał Dereziński, Dhruv Mahajan, S. Sathiya Keerthi, S. V. N. Vishwanathan, Markus Weimer
International Conference on Artificial Intelligence and Statistics | February 2018
Characterizing Resource Sensitivity of Database Workloads
Rathijit Sen, Karthik Ramachandra
High-Performance Computer Architecture | January 2018
Medea: Scheduling of Long Running Applications in Shared Production Clusters
Panagiotis Garefalakis, Konstantinos Karanasos, Peter Pietzuch, Arun Suresh, Sriram Rao
European Conference on Computer Systems (EuroSys) | January 2018
Towards High-Performance Prediction Serving Systems
Yunseong Lee, Alberto Scolari, Byung-Gon Chun, Matteo Interlandi, Markus Weimer
31st Conference on Neural Information Processing Systems | December 2017
No data left behind: real-time insights from a complex data ecosystem
Manos Karpathiotakis, Avrilia Floratou, Fatma Ozcan, Anastasia Ailamaki
SOCC | September 2017
A robust partitioning scheme for ad-hoc query workloads
Anil Shanbhag, Alekh Jindal, Samuel Madden, Jorge Quiane, Aaron J. Elmore
Symposium on Cloud Computing | September 2017
Self-Regulating Streaming Systems: Challenges and Opportunities
Avrilia Floratou, Ashvin Agrawal
BIRTE | August 2017
Dhalion: Self-Regulating Stream Processing in Heron
Avrilia Floratou, Ashvin Agrawal, Bill Graham, Sriram Rao, Karthik Ramasamy
Proceedings of the VLDB Endowment | August 2017
Frequency governors for cloud database OLTP workloads
Rathijit Sen, Alan Halverson
International Symposium on Low Power Electronics and Design | July 2017
Pipsqueak: Lean Lambdas with Large Libraries
Edward Oakes, Leon Yang, Kevin Houck, Tyler Harter, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau
International Conference on Distributed Computing Systems Workshops | July 2017
ROSA: R Optimizations with Static Analysis
Rathijit Sen, Jianqiao Zhu, Jignesh M. Patel, Somesh Jha
RIOT 2017 | July 2017
Twitter Heron: Towards Extensible Streaming Engines
Maosong Fu, Ashvin Agrawal, Avrilia Floratou, Bill Graham, Andrew Jorgensen, Mark Li, Neng Lu, Karthik Ramasamy, Sriram Rao, Cong Wang
ICDE | May 2017
Resource bricolage and resource selection for parallel database systems
Jiexing Li, Jeffrey F. Naughton, Rimma V. Nehme
Very Large Data Bases | January 2017
AdaptDB: adaptive partitioning for distributed joins
Yi Lu, Anil Shanbhag, Alekh Jindal, Samuel Madden
Very Large Data Bases | January 2017
Morpheus: Towards Automated SLOs for Enterprise Clusters
Sangeetha Abdu Jyothi, Carlo Curino, Ishai Menache, Shravan Matthur Narayanamurthy, Alexey Tumanov, Jonathan Yaniv, Ruslan Mavlyutov, Íñigo Goiri, Subru Krishnan, Janardhan (Jana) Kulkarni, Sriram Rao
2016 International Symposium on Operating Systems Design and Implementation (OSDI) | November 2016
PerfOrator: Eloquent Performance Models for Resource Optimization
Dharmesh Kakadia, Kaushik Rajan, Carlo Curino, Subru Krishnan
ACM Symposium on Cloud Computing 2016 (SoCC’16) | October 2016
A moeba : a shape changing storage system for big data
Anil Shanbhag, Alekh Jindal, Yi Lu, Samuel Madden
Very Large Data Bases | August 2016
GraphFrames: an integrated API for mixing graph and relational queries
Ankur Dave, Alekh Jindal, Li Erran Li, Reynold Xin, Joseph Gonzalez, Matei Zaharia
GRADES | June 2016
Efficient Queue Management for Cluster Scheduling
Jeff Rasley, Konstantinos Karanasos, Srikanth Kandula, Rodrigo Fonseca, Sriram Rao, Milan Vojnovic
European Conference on Computer Systems (EuroSys) | April 2016
Mercury: Hybrid Centralized and Distributed Scheduling in Large Shared Clusters
Konstantinos Karanasos, Sriram Rao, Carlo Curino, Chris Douglas, Kishore Chaliparambil, Giovanni Matteo Fumarola, Solom Heddaya, Raghu Ramakrishnan, Sarvesh Sakalanaga
USENIX Annual Technical Conference (USENIX ATC’2015) | July 2015
WANalytics: Geo-Distributed Analytics for a Data Intensive World
Ashish Vulimiri, Carlo Curino, Philip Brighten Godfrey, Thomas Jungblut, Konstantinos Karanasos, Jitendra Padhye, George Varghese
International Conference on Management of Data (SIGMOD) | May 2015
Apache Tez: A Unifying Framework for Modeling and Building Data Processing Applications
Bikas Saha, Hitesh Shah, Siddharth Seth, Gopal Vijayaraghavan, Arun Murthy, Carlo Curino
SIGMOD ’15: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data | May 2015
Global Analytics in the Face of Bandwidth and Regulatory Constraints
Ashish Vulimiri, Carlo Curino, P. Brighten Godfrey, Thomas Jungblut, Jitu Padhye, George Varghese
12th USENIX Symposium on Networked Systems Design and Implementation (NSDI ’15) | May 2015
ISBN: 978-1-931971-218
Blind men and an elephant coalescing open-source, academic, and industrial perspectives on BigData
Chris Douglas, Carlo Curino
International Conference on Data Engineering | April 2015
WANalytics: Analytics for a Geo-Distributed Data-Intensive World
Ashish Vulimiri, Carlo Curino, Brighten Godfrey, Konstantinos Karanasos, George Varghese
Conference on Innovative Data Systems Research (CIDR) | January 2015
Multi-resource Packing for Cluster Schedulers
Robert Grandl, Ganesh Ananthanarayanan, Srikanth Kandula, Sriram Rao, Aditya Akella
ACM SIGCOMM | August 2014
Indexing HDFS data in PDW: splitting the data from the index
Vinitha Reddy Gankidi, Nikhil Teletia, Jignesh M. Patel, Alan Halverson, David J. DeWitt
Very Large Data Bases | July 2014
Partial results in database systems
Willis Lang, Rimma V. Nehme, Eric Robinson, Jeffrey F. Naughton
International Conference on Management of Data | June 2014
Dynamically Optimizing Queries over Large Scale Data Platforms
Konstantinos Karanasos, Andrey Balmin, Marcel Kutsch, Fatma Ozcan, Vuk Ercegovac, Chunyang Xia, Jesse Jackson
International Conference on Management of Data (SIGMOD) | June 2014
Elastic Distributed Bayesian Collaborative Filtering
Alex Beutel, Markus Weimer, Tom Minka, Yordan Zaykov, Vijay Narayanan
January 2014
Delta: Scalable Data Dissemination under Capacity Constraints
Konstantinos Karanasos, Asterios Katsifodimos, Ioana Manolescu
Very Large Data Bases (PVLDB) | December 2013
Split query processing in polybase
David J. DeWitt, Rimma Nehme, Srinath Shankar, Josep Aguilar-Saborit, Artin Avanes, Miro Flasza, Jim Gramling, Alan Halverson
2013 ACM SIGMOD International Conference on Management of Data | June 2013
Fact Checking and Analyzing the Web
Francois Goasdoue, Konstantinos Karanasos, Yannis Katsis, Julien Leblay, Ioana Manolescu, Stamatis Zampetakis
International Conference on Management of Data (SIGMOD) | June 2013
Fast peak-to-peak behavior with SSD buffer pool
Jaeyoung Do, Donghui Zhang, J. M. Patel, D. J. DeWitt
International Conference on Data Engineering | April 2013
Robust Estimation of Resource Consumption for SQL Queries using Statistical Techniques
Jiexing Li, Arnd Christian König, Arnd Christian König, Vivek Narasayya, Surajit Chaudhuri
38th International Conference on Very Large Databases | August 2012
Query optimization in Microsoft SQL Server PDW
Srinath Shankar, Rimma Nehme, Josep Aguilar-Saborit, Andrew Chung, Mostafa Elhemali, Eric Robinson, Mahadevan Sankara Subramanian, David DeWitt, César Galindo-Legaria, Alan Halverson
International Conference on Management of Data | May 2012
WWW 2012 Tutorial: New Templates for Scalable Data Analysis
Amr Ahmed, Alexander J. Smola, Markus Weimer
25th International World Wide Web Conference | April 2012
GSLPI: A Cost-Based Query Progress Indicator
Jiexing Li, Rimma V. Nehme, Jeffrey Naughton
International Conference on Data Engineering | March 2012
DOI PDF Publication Publication Publication Publication Publication
Toward Progress Indicators on Steroids for Big Data Systems.
Jiexing Li, Rimma V. Nehme, Jeffrey F. Naughton
Conference on Innovative Data Systems Research | January 2012
Turbocharging DBMS buffer pool using SSDs
Jaeyoung Do, Donghui Zhang, Jignesh M. Patel, David J. DeWitt, Jeffrey F. Naughton, Alan Halverson
International Conference on Management of Data | June 2011
Automated partitioning design in parallel database systems
Rimma Nehme, Nicolas Bruno
International Conference on Management of Data | June 2011
The Mimicking Octopus: Towards a one-size-fits-all Database Architecture
Alekh Jindal
36th International Conference on VLDB | September 2010
Energy management for MapReduce clusters
Willis Lang, Jignesh M. Patel
Very Large Data Bases | August 2010
DOI PDF PDF Publication Publication Publication Publication Publication
On energy management, load balancing and replication
Willis Lang, Jignesh M. Patel, Jeffrey F. Naughton
International Conference on Management of Data | June 2010
DOI PDF Publication Publication Publication Publication Publication
Wimpy node clusters: what about non-wimpy workloads?
Willis Lang, Jignesh M. Patel, Srinath Shankar
Data Management on New Hardware | June 2010
Towards Eco-friendly Database Management Systems
Willis Lang, Jignesh M. Patel
Conference on Innovative Data Systems Research | December 2009
-
MLOS in Action: Bridging the Gap Between Experimentation and Auto-Tuning in the Cloud
Brian Kroth, Sergiy Matusevych, Rana Alotaibi, Yiwen Zhu, Anja Gruenheid, Yuanyuan Tian
Proc. VLDB Endow. | October 2024, Vol 17: pp. 4269-4272
Lorentz: Learned SKU Recommendation Using Profile Data
Nick Glaze, Tria McNeely, Yiwen Zhu, Matthew Gleeson, Helen Serr, Rajeev Bhopi, Subru Krishnan, Yiwen Zhu, Subru Krishnan
SIGMOD 2024 | May 2024, Vol 2: pp. 149
Intelligent Pooling: Proactive Resource Provisioning in Large-scale Cloud Service
Deepak Ravikumar, Alex Yeo, Yiwen Zhu, Aditya Lakra, Harsha Nagulapalli, Santhosh Ravindran, Steve Suh, Niharika Dutta, Andrew Fogarty, Yoonjae Park, Sumeet Khushalani, Arijit Tarafdar, Kunal Parekh, Subru Krishnan, Yiwen Zhu, Subru Krishnan
Proc. VLDB Endow. | February 2024, Vol 17: pp. 1618-1627
Optimizing Data Pipelines for Machine Learning in Feature Stores
Rui Liu, Kwanghyun Park, Fotis Psallidas, Xiaoyong Zhu, Jinghui Mo, Rathijit Sen, Matteo Interlandi, Konstantinos Karanasos, Yuanyuan Tian, Jesús Camacho-Rodríguez
Proc. VLDB Endow. | August 2023, Vol 16: pp. 4230-4239
OneProvenance: Efficient Extraction of Dynamic Coarse-Grained Provenance From Database Query Event Logs
Fotis Psallidas, Ashvin Agrawal, Chandru Sugunan, Khaled Ibrahim, Konstantinos Karanasos, Jesus Camacho-Rodriguez, A. Floratou, C. Curino, R. Ramakrishnan
Proc. VLDB Endow. | August 2023, Vol 16: pp. 3662-3675
A Deep Dive into Common Open Formats for Analytical DBMSs
Chunwei Liu, Anna Pavlenko, Matteo Interlandi, Brandon Haynes
Proc. VLDB Endow. | June 2023, Vol 16: pp. 3044-3056
Best Paper Runner-Up
DIAMETRICS: Benchmarking Query Engines at Scale
Shaleen Deep, Anja Gruenheid, Kruthi Nagaraj, Hiro Naito, Jeffrey Naughton, Stratis Viglas
Communications of The ACM | December 2022, Vol 65(12): pp. 105-112
Research Highlight
Comprehensive and Efficient Workload Summarization
Shaleen Deep, Anja Gruenheid, Paraschos Koutris, Stratis Viglas, Jeffrey Naughton
Datenbank-Spektrum | November 2022
Research Highlight
Diversity and Inclusion Activities in Database Conferences: A 2021 Report.
Sihem Amer-Yahia, Yael Amsterdamer, Sourav S. Bhowmick,, Angela Bonifati, Philippe Bonnet, Renata Borovica-Gajic, Barbara Catania, Tania Cerquitelli, Silvia Chiusano, Panos K. Chrysanthis, Carlo Curino, Jérôme Darmont, Amr El Abbadi, Avrilia Floratou, Juliana Freire, Alekh Jindal, Vana Kalogeraki, Georgia Koutrika, Arun Kumar, Sujaya Maiyya, Alexandra Meliou, Madhulika Mohanty, Felix Naumann, Nele Sina Noack, Liat Peterfreund, Fatma Özcan, Wenny Rahayu, Wang-Chiew Tan, Yuanyuan Tian, Pinar Tözün, Genoveva Vargas-Solar, Neeraja J. Yadwadkar, Meihui Zhang
SIGMOD Record | November 2022
Share the Tensor Tea: How Databases can Leverage the Machine Learning Ecosystem
Yuki Asada, Victor Fu, Apurva Gandhi, Advitya Gemawat, Lihao Zhang, Dong He, Vivek Gupta, Ehi Nosakhare, Dalitso Banda, Rathijit Sen, Matteo Interlandi
VLDB | September 2022, Vol 15(12): pp. 3598-3601
Best demo award
Data Science Through the Looking Glass
Fotis Psallidas, Yiwen Zhu, Bojan Karlaš, Jordan Henkel, Matteo Interlandi, Subru Krishnan, Brian Kroth, Venkatesh Emani, Wentao Wu, Ce Zhang, Markus Weimer, Avrilia Floratou, Carlo Curino, Konstantinos Karanasos
SIGMOD Record | June 2022, Vol 51(2): pp. 30-37
Applied Research Lessons from CloudViews Project
Alekh Jindal
Sigmod Record | December 2020, Vol 49(3): pp. 37-42
BlackMagic: Automatic Inlining of Scalar UDFs into SQL Queries with Froid
Karthik Ramachandra, Kwanghyun Park
Proceedings of VLDB | August 2019, Vol 12(12): pp. 1810-1813
SOCK: Serverless-Optimized Containers.
Edward Oakes, Leon Yang, Dennis Zhou, Kevin Houck, Tyler Caraza-Harter, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau
;login: | December 2018, Vol 43(3)
Dhalion in action: automatic management of streaming applications
Avrilia Floratou, Ashvin Agrawal
Proceedings of the VLDB Endowment | August 2018, Vol 11(12)
Advancements in YARN Resource Manager
Konstantinos Karanasos, Arun Suresh, Chris Douglas
Encyclopedia of Big Data Technologies | February 2018
Froid: Optimization of Imperative Programs in a Relational Database
Karthik Ramachandra, Kwanghyun Park, K. Venkatesh Emani, Alan Halverson, Cesar Galindo-Legaria, Conor Cunningham
Proceedings of VLDB | December 2017, Vol 11(4)
Apache REEF: Retainable Evaluator Execution Framework
Byung-Gon Chun, Tyson Condie, Yingda Chen, Brian Cho, Andrew Chung, Carlo Curino, Chris Douglas, Beomyeol Jeon, Joo Seong Jeong, Gyewon Lee, Yunseong Lee, Tony Majestro, Dahlia Malkhi, Sergiy Matusevych, Brandon Myers, Mariia Mykhailova, Shravan Narayanamurthy, Joseph Noor, Raghu Ramakrishnan, Sriram Rao, Russell Sears, Beysim Sezgin, Taegeon Um, Julia Wang, Youngseok Yang, Raghu Ramakrishnan, Carlo Curino, Matteo Interlandi
ACM Transactions on Computer Systems | October 2017, Vol 35(2): pp. 5
Energy-Proportional Computing: A New Definition
Rathijit Sen, David A. Wood
IEEE Computer | July 2017, Vol 50(8): pp. 26-33
Pareto Governors for Energy-Optimal Computing
Rathijit Sen, David A. Wood
ACM Transactions on Architecture and Code Optimization | April 2017, Vol 14(1): pp. 6
Towards Geo-Distributed Machine Learning.
Ignacio Cano, Dhruv Mahajan, Giovanni Matteo Fumarola, Arvind Krishnamurthy, Markus Weimer, Carlo Curino
IEEE Data(base) Engineering Bulletin | December 2015, Vol 40: pp. 41-59
Towards Multi-Tenant Performance SLOs
Willis Lang, Srinath Shankar, Jignesh M. Patel, Ajay Kalhan
IEEE Transactions on Knowledge and Data Engineering | May 2014, Vol 26(6): pp. 1447-1463
Growing Triples on Trees: an XML-RDF Hybrid Model for Annotated Documents
Francois Goasdoue, Konstantinos Karanasos, Yannis Katsis, Julien Leblay, Ioana Manolescu, Stamatis Zampetakis
Very Large Data Bases Journal (VLDB J.) | June 2013, Vol 22: pp. 589-613
Towards energy-efficient database cluster design
Willis Lang, Stavros Harizopoulos, Jignesh M. Patel, Mehul A. Shah, Dimitris Tsirogiannis
2012 Very Large Data Bases | August 2012, Vol 5(11): pp. 1684-1695
Declarative Systems for Large-Scale Machine Learning.
Vinayak R. Borkar, Yingyi Bu, Michael J. Carey, Joshua Rosen, Neoklis Polyzotis, Tyson Condie, Markus Weimer, Raghu Ramakrishnan
IEEE Data(base) Engineering Bulletin | July 2012, Vol 35: pp. 24-32
Rethinking Query Processing for Energy Efficiency: Slowing Down to Win the Race.
Nicolas Bruno, Surajit Chaudhuri, Arnd Christian König, Vivek Narasayya, Ravi Ramamurthy, Manoj Syamala
IEEE Data(base) Engineering Bulletin | November 2011, Vol 34: pp. 12-19
View Selection in Semantic Web Databases
Francois Goasdoue, Konstantinos Karanasos, Julien Leblay, Ioana Manolescu
Very Large Data Bases (PVLDB) | October 2011, Vol 5: pp. 97-108
-
A Comparative Exploration of ML Techniques for Tuning Query Degree of Parallelism.
Zhiwei Fan, Rathijit Sen, Paraschos Koutris, Aws Albarghouthi
May 2020
Constant Time Recovery in Azure SQL Database
Panagiotis Antonopoulos, Peter Byrne, Wayne Chen, Cristian Diaconu, Raghavendra Thallam Kodandaramaih, Hanuma Kodavalla, Prashanth Purnananda, Adrian-Leonard Radu, Chaitanya Sreenivas Ravella, Girish Mittur Venkataramanappa
June 2019
-
Query and Resource Optimizations: A Case for Breaking the Wall in Big Data Systems
Alekh Jindal, Lalitha Viswanathan, Konstantinos Karanasos
MSR-TR-2019-44 | June 2019
Published by Microsoft
INGESTBASE: A Declarative Data Ingestion System.
Alekh Jindal, Jorge-Arnulfo Quiané-Ruiz, Samuel Madden
MSR-TR-2017-62 | January 2017
Published by Microsoft
Do the Hard Stuff First: Scheduling Dependent Computations in Data Analytics Clusters
Robert Grandl, Srikanth Kandula, Sriram Rao, Aditya Akella, Janardhan (Jana) Kulkarni
MSR-TR-2016-19 | February 2016
Published by Microsoft
Distributed and Scalable PCA in the Cloud
Arun Kumar, Vijay Narayanan, Nikos Karampatziakis, Paul Mineiro, Markus Weimer
MSR-TR-2014-165 | January 2014
Published by Microsoft
Apache Reef Research Paper
Reservation-based Scheduling: If You’re Late Don’t Blame Us!
Carlo Curino, Djellel E Difallah, Chris Douglas, Subru Krishnan, Rahgu Ramakrishnan, Sriram Rao
MSR-TR-2013-108 | October 2013
Published in SoCC 2014
-
Columnar Storage Formats
Encyclopedia of Big Data Technologies | Published by Springer | 2018
-
Challenges and Opportunities in Transportation Data
Kristin Tufte, Kushal Datta, Alekh Jindal, David Maier, Robert L. Bertini
June 2018
Robust Data Partitioning.
Alekh Jindal, Anil Shanbhag, Yi Lu
February 2018
Towards Resource-Elastic Machine Learning
Dhruv Mahajan, Sundararajan Sellamanickam, Markus Weimer, Keerthi Selvaraj
January 2013