{"id":1042161,"date":"2024-06-10T09:00:00","date_gmt":"2024-06-10T16:00:00","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/lst-bench-a-new-benchmark-tool-for-open-table-formats-in-the-data-lake\/"},"modified":"2024-06-05T12:57:11","modified_gmt":"2024-06-05T19:57:11","slug":"lst-bench-a-new-benchmark-tool-for-open-table-formats-in-the-data-lake","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/lst-bench-a-new-benchmark-tool-for-open-table-formats-in-the-data-lake\/","title":{"rendered":"LST-Bench: A new benchmark tool for open table formats in the data lake"},"content":{"rendered":"\n<p class=\"has-text-align-center\"><em><em><strong>This paper was presented at the <\/strong><\/em><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/2024.sigmod.org\/\" target=\"_blank\" rel=\"noreferrer noopener\"><strong><em>ACM SIGMOD\/Principles of Database Systems Conference<\/em><\/strong><span class=\"sr-only\"> (opens in new tab)<\/span><\/a><em><strong> (SIGMOD\/PODS 2024), the premier forum on large-scale data management and databases.<\/strong><\/em><\/em><\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1401\" height=\"788\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/06\/NEW_SIGMOD2024-BlogHeroFeature-1400x788-1.png\" alt=\"SIGMOD PODS 2024 logo to the left of the first page of \"LST-Bench: Benchmarking Log-Structured Tables in the Cloud\"\" class=\"wp-image-1042893\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/06\/NEW_SIGMOD2024-BlogHeroFeature-1400x788-1.png 1401w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/06\/NEW_SIGMOD2024-BlogHeroFeature-1400x788-1-300x169.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/06\/NEW_SIGMOD2024-BlogHeroFeature-1400x788-1-1024x576.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/06\/NEW_SIGMOD2024-BlogHeroFeature-1400x788-1-768x432.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/06\/NEW_SIGMOD2024-BlogHeroFeature-1400x788-1-1066x600.png 1066w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/06\/NEW_SIGMOD2024-BlogHeroFeature-1400x788-1-655x368.png 655w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/06\/NEW_SIGMOD2024-BlogHeroFeature-1400x788-1-240x135.png 240w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/06\/NEW_SIGMOD2024-BlogHeroFeature-1400x788-1-640x360.png 640w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/06\/NEW_SIGMOD2024-BlogHeroFeature-1400x788-1-960x540.png 960w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/06\/NEW_SIGMOD2024-BlogHeroFeature-1400x788-1-1280x720.png 1280w\" sizes=\"(max-width: 1401px) 100vw, 1401px\" \/><\/figure>\n\n\n\n<p>As organizations grapple with ever-expanding datasets, the adoption of data lakes has become a vital strategy for scalable and cost-effective data management. The success of these systems largely depends on the file formats used to store the data. Traditional formats, while efficient in data compression and organization, falter with frequent updates. Advanced table formats like Delta Lake, Apache Iceberg, and Apache Hudi offer promising solutions with easier data modifications and historical tracking, yet their efficacy lies in their ability to handle continuous updates, a challenge that requires extensive and thorough evaluation.<\/p>\n\n\n\n<p>Our paper, \u201c<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/lst-bench-benchmarking-log-structured-tables-in-the-cloud\/\" target=\"_blank\" rel=\"noreferrer noopener\">LST-Bench: Benchmarking Log-Structured Tables in the Cloud<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>,\u201d presented at SIGMOD 2024, introduces an innovative tool designed to evaluate the performance of different table formats in the cloud. LST-Bench builds on the well-established\u00a0<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/www.tpc.org\/tpcds\/\" target=\"_blank\" rel=\"noreferrer noopener\">TPC-DS<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>\u00a0benchmark\u2014which measures how efficiently systems handle large datasets and complex queries\u2014and includes features specifically designed for table formats, simplifying the process of testing them under real-world conditions. Additionally, it automatically conducts tests and collects essential data from both the computational engine and various cloud services, enabling accurate performance evaluation.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"flexible-and-adaptive-testing\">Flexible and adaptive testing<\/h2>\n\n\n\n<p>Designed for flexibility, LST-Bench adapts to a broad range of scenarios, as illustrated in Figure 1. The framework was developed by incorporating insights from engineers, facilitating the integration of existing workloads like TPC-DS, while promoting reusability. For example, each test session establishes a new connection to the data-processing engine, organizing tasks as a series of statements. This setup permits developers to run multiple tasks either sequentially within a single session or concurrently across various sessions, reflecting real-world application patterns.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"2250\" height=\"230\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/06\/LST-Bench_figure1.png\" alt=\"A diagram showing workload components in LST-Bench and their relationships.\" class=\"wp-image-1042341\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/06\/LST-Bench_figure1.png 2250w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/06\/LST-Bench_figure1-300x31.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/06\/LST-Bench_figure1-1024x105.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/06\/LST-Bench_figure1-768x79.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/06\/LST-Bench_figure1-1536x157.png 1536w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/06\/LST-Bench_figure1-2048x209.png 2048w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/06\/LST-Bench_figure1-240x25.png 240w\" sizes=\"(max-width: 2250px) 100vw, 2250px\" \/><figcaption class=\"wp-element-caption\">Figure 1. Workload components in LST-Bench and their relationships. A task is a sequence of SQL statements, while a session is a sequence of tasks that represents a logical unit of work or a user session. A phase is a group of concurrent sessions that must be completed before the next phase can start. Lastly, a workload is a sequence of phases.<\/figcaption><\/figure>\n\n\n\n<p>The TPC-DS workload comprises the following foundational tasks:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Load task<\/strong>:<strong> <\/strong>Loads data into tables for experimentation.<\/li>\n\n\n\n<li><strong>Single User task<\/strong>: Executes complex queries to test the engine&#8217;s upper performance limit.<\/li>\n\n\n\n<li><strong>Data Maintenance task<\/strong>:<strong> <\/strong>Handles data insertions and deletions.<\/li>\n<\/ul>\n\n\n\n<p>LST-Bench introduces the following tasks specific to table formats:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Optimize task<\/strong>: Compacts the data files within a table.<\/li>\n\n\n\n<li><strong>Time Travel task<\/strong>: Enables querying data as it appeared at a specified point in the past.<\/li>\n\n\n\n<li><strong>Parameterized Custom task<\/strong>: Allows for the integration of user-defined code to create dynamic workflows.<\/li>\n<\/ul>\n\n\n\n<p>These features enable LST-Bench to evaluate aspects of table formats that are not covered by TPC-DS, providing deeper insights into their performance, as shown in Figure 2.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1400\" height=\"1035\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/06\/LST-Bench_figure2-Edited-23may24.png\" alt=\"A diagram illustrating various LST-Bench tasks combined to create workloads that provide insights into table formats. The workloads assess the handling of frequent data modifications over time, optimizing tables for multiple modifications of varying sizes, managing simultaneous reading and writing sessions, querying data across different time points, and evaluating the impact of batch size variations on read query performance.\" class=\"wp-image-1042338\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/06\/LST-Bench_figure2-Edited-23may24.png 1400w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/06\/LST-Bench_figure2-Edited-23may24-300x222.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/06\/LST-Bench_figure2-Edited-23may24-1024x757.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/06\/LST-Bench_figure2-Edited-23may24-768x568.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/06\/LST-Bench_figure2-Edited-23may24-80x60.png 80w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/06\/LST-Bench_figure2-Edited-23may24-240x177.png 240w\" sizes=\"(max-width: 1400px) 100vw, 1400px\" \/><figcaption class=\"wp-element-caption\">Figure 2. LST-Bench expands on TPC-DS by introducing a flexible workload representation and incorporating extensions that help users gain insights into table formats previously overlooked by the original benchmark.<\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"a-degradation-rate-metric-to-measure-stability\">A degradation rate metric to measure stability<\/h2>\n\n\n\n<p>In addition to these workload extensions, LST-Bench introduces new metrics to evaluate table formats both comprehensively and fairly. It retains the traditional metric categories like performance, storage, and compute efficiency, and it adds a new stability metric called <em>degradation rate<\/em>. This new metric specifically addresses the impact of accumulating small files in the data lake\u2014a common issue arising from frequent, small updates\u2014providing an assessment of the system\u2019s&nbsp;efficiency over time.<\/p>\n\n\n\n<p>The degradation rate is calculated by dividing a workload into different phases. The degradation rate \\(S_{DR}\\) is defined as follows:<\/p>\n\n\n\n<p class=\"has-text-align-center\">\\(S_{DR}={1\\over n}\\sum\\limits_{i=1}^n\\dfrac{M_{i} &#8211; M_{i-1}}{M_{i-1}}\\)<\/p>\n\n\n\n<p>Here, \\(M_i\\)&nbsp;represents the performance or efficiency metric value of the \\(i^{th}\\)&nbsp;iteration of a workload phase, and \\(n\\) reflects the total number of iterations of that phase. Intuitively, \\(S_{DR}\\)&nbsp;is the rate at which a metric grows or shrinks, reflecting cumulative effects of changes in the underlying system\u2019s state. This rate provides insight into how quickly a system degrades over time. A stable system demonstrates a low \\(S_{DR}\\), indicating minimal degradation.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"lst-bench-implementation\">LST-Bench implementation<\/h2>\n\n\n\n<p>The LST-Bench features a Java-based client application that runs SQL workloads on various engines, enabling users to define tasks, sessions, and phase libraries to reuse different workload components. This allows them to reference these libraries in their workload definitions, add new task templates, or create entirely new task libraries to model-specific scenarios.<\/p>\n\n\n\n<p>LST-Bench also includes a processing module that consolidates experimental results and calculates metrics to provide insights into table formats and engines. It uses both internal telemetry from LST-Bench and external telemetry from cloud services, such as resource utilization, storage API calls, and network I\/O volume. The metrics processor offers multiple visualization options, including notebooks and a web app, to help users analyze performance data effectively.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1400\" height=\"500\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/06\/LST-Bench_figure3.png\" alt=\"An illustration depicting the components and execution model of the LST-Bench tool. The Client Application establishes connections with engines via dedicated drivers, while the Metrics Processor gathers telemetry from the Client Application, engines, and other cloud services. This data is aggregated and visualized using either a notebook or web application. \" class=\"wp-image-1042347\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/06\/LST-Bench_figure3.png 1400w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/06\/LST-Bench_figure3-300x107.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/06\/LST-Bench_figure3-1024x366.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/06\/LST-Bench_figure3-768x274.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/06\/LST-Bench_figure3-240x86.png 240w\" sizes=\"(max-width: 1400px) 100vw, 1400px\" \/><figcaption class=\"wp-element-caption\">Figure 3. The LST-Bench tool components and execution model.<\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"implications-and-looking-ahead\">Implications and looking ahead<\/h2>\n\n\n\n<p>LST-Bench integrates seamlessly into the testing workflows of the <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/learn.microsoft.com\/fabric\/data-warehouse\/\" target=\"_blank\" rel=\"noreferrer noopener\">Microsoft Fabric<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> warehouse, allowing that team to rigorously assess engine performance, evaluate releases, and identify any issues. This leads to a more reliable and optimized user experience on the Microsoft Fabric data analytics platform. Additionally, LST-Bench holds promise as a foundational tool for various Microsoft initiatives. It\u2019s currently instrumental in research projects focused on improving data organization for table formats, with the goal of increasing the performance of customer workloads on Microsoft Fabric. LST-Bench is also being used to evaluate the performance of table formats converted using <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/xtable.apache.org\/\" target=\"_blank\" rel=\"noreferrer noopener\">Apache XTable (Incubating)<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, an open-source tool designed to prevent data silos within data lakes.<\/p>\n\n\n\n<p>LST-Bench is <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/github.com\/microsoft\/lst-bench\/\">open source<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, and we welcome contributors to help expand this tool, making it highly effective for organizations to thoroughly evaluate their table formats.<\/p>\n\n\n\n\t<div class=\"border-bottom border-top border-gray-300 mt-5 mb-5 msr-promo text-center text-md-left alignwide\" data-bi-aN=\"promo\" data-bi-id=\"931956\">\n\t\t\n\n\t\t<p class=\"msr-promo__label text-gray-800 text-center text-uppercase\">\n\t\t<span class=\"px-4 bg-white display-inline-block font-weight-semibold small\">Spotlight: On-demand video<\/span>\n\t<\/p>\n\t\n\t<div class=\"row pt-3 pb-4 align-items-center\">\n\t\t\t\t\t\t<div class=\"msr-promo__media col-12 col-md-5\">\n\t\t\t\t<a class=\"bg-gray-300\" href=\"https:\/\/www.microsoft.com\/en-us\/research\/blog\/ai-explainer-foundation-models-and-the-next-era-of-ai\/\" aria-label=\"AI Explainer: Foundation models \u200band the next era of AI\" data-bi-cN=\"AI Explainer: Foundation models \u200band the next era of AI\" target=\"_blank\">\n\t\t\t\t\t<img decoding=\"async\" class=\"w-100 display-block\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/03\/AIEx01_blog_hero_1400x788.png\" alt=\"a screenshot of a computer screen shot of a man\" \/>\n\t\t\t\t<\/a>\n\t\t\t<\/div>\n\t\t\t\n\t\t\t<div class=\"msr-promo__content p-3 px-5 col-12 col-md\">\n\n\t\t\t\t\t\t\t\t\t<h2 class=\"h4\">AI Explainer: Foundation models \u200band the next era of AI<\/h2>\n\t\t\t\t\n\t\t\t\t\t\t\t\t<p class=\"large\">Explore how the transformer architecture, larger models and more data, and in-context learning have helped advance AI from perception to creation.<\/p>\n\t\t\t\t\n\t\t\t\t\t\t\t\t<div class=\"wp-block-buttons justify-content-center justify-content-md-start\">\n\t\t\t\t\t<div class=\"wp-block-button\">\n\t\t\t\t\t\t<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/blog\/ai-explainer-foundation-models-and-the-next-era-of-ai\/\" class=\"btn btn-brand glyph-append glyph-append-chevron-right\" aria-label=\"Watch video\" data-bi-cN=\"AI Explainer: Foundation models \u200band the next era of AI\" target=\"_blank\">\n\t\t\t\t\t\t\tWatch video\t\t\t\t\t\t<\/a>\n\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t\t\t<\/div><!--\/.msr-promo__content-->\n\t<\/div><!--\/.msr-promo__inner-wrap-->\n\t<\/div><!--\/.msr-promo-->\n\t\n\n\n<p><strong>Acknowledgements<\/strong><\/p>\n\n\n\n<p>We would like to thank <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/jcahoon\/\">Joyce Cahoon<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> and <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/yiwzh\/\">Yiwen Zhu<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> for their valuable discussions on the stability metric, and <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/www.linkedin.com\/in\/josemedranojimenez\/\" target=\"_blank\" rel=\"noreferrer noopener\">Jose Medrano<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> and <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/www.linkedin.com\/in\/emma-rose-wirshing-aa790b105\/\" target=\"_blank\" rel=\"noreferrer noopener\">Emma Rose Wirshing<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> for their feedback on LST-Bench and their work on integrating it with the Microsoft Fabric Warehouse.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>LST-Bench is a new open-source benchmark designed to evaluate table formats in cloud environments. It extends existing benchmarks to better reflect real-world usage & performance of data lakes and easily integrates with commonly used analytical engines.<\/p>\n","protected":false},"author":37583,"featured_media":1042893,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"footnotes":""},"categories":[1],"tags":[],"research-area":[13563],"msr-region":[],"msr-event-type":[],"msr-locale":[268875],"msr-post-option":[243984],"msr-impact-theme":[],"msr-promo-type":[],"msr-podcast-series":[],"class_list":["post-1042161","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-research-blog","msr-research-area-data-platform-analytics","msr-locale-en_us","msr-post-option-blog-homepage-featured"],"msr_event_details":{"start":"","end":"","location":""},"podcast_url":"","podcast_episode":"","msr_research_lab":[],"msr_impact_theme":[],"related-publications":[],"related-downloads":[],"related-videos":[],"related-academic-programs":[],"related-groups":[684024],"related-projects":[],"related-events":[],"related-researchers":[{"type":"user_nicename","value":"Jes\u00fas Camacho Rodr\u00edguez","user_id":40693,"display_name":"Jes\u00fas Camacho Rodr\u00edguez","author_link":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/jesusca\/\" aria-label=\"Visit the profile page for Jes\u00fas Camacho Rodr\u00edguez\">Jes\u00fas Camacho Rodr\u00edguez<\/a>","is_active":false,"last_first":"Camacho Rodr\u00edguez, Jes\u00fas","people_section":0,"alias":"jesusca"},{"type":"user_nicename","value":"Ashvin Agrawal","user_id":40006,"display_name":"Ashvin Agrawal","author_link":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/asagr\/\" aria-label=\"Visit the profile page for Ashvin Agrawal\">Ashvin Agrawal<\/a>","is_active":false,"last_first":"Agrawal, Ashvin","people_section":0,"alias":"asagr"},{"type":"user_nicename","value":"Anja Gruenheid","user_id":40696,"display_name":"Anja Gruenheid","author_link":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/agruenheid\/\" aria-label=\"Visit the profile page for Anja Gruenheid\">Anja Gruenheid<\/a>","is_active":false,"last_first":"Gruenheid, Anja","people_section":0,"alias":"agruenheid"},{"type":"guest","value":"ashit-gosalia","user_id":"1042176","display_name":"Ashit Gosalia","author_link":"<a href=\"https:\/\/www.linkedin.com\/in\/ashit-gosalia\/\" aria-label=\"Visit the profile page for Ashit Gosalia\">Ashit Gosalia<\/a>","is_active":true,"last_first":"Gosalia, Ashit","people_section":0,"alias":"ashit-gosalia"},{"type":"guest","value":"cristian-petculescu","user_id":"1042179","display_name":"Cristian Petculescu","author_link":"<a href=\"https:\/\/www.linkedin.com\/in\/petcu40\/\" aria-label=\"Visit the profile page for Cristian Petculescu\">Cristian Petculescu<\/a>","is_active":true,"last_first":"Petculescu, Cristian","people_section":0,"alias":"cristian-petculescu"},{"type":"guest","value":"josep-aguilar-saborit","user_id":"1042182","display_name":"Josep Aguilar-Saborit","author_link":"<a href=\"https:\/\/www.linkedin.com\/in\/josep-aguilar-saborit-77a16041\/\" aria-label=\"Visit the profile page for Josep Aguilar-Saborit\">Josep Aguilar-Saborit<\/a>","is_active":true,"last_first":"Aguilar-Saborit, Josep","people_section":0,"alias":"josep-aguilar-saborit"},{"type":"user_nicename","value":"Avrilia Floratou","user_id":36080,"display_name":"Avrilia Floratou","author_link":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/avflor\/\" aria-label=\"Visit the profile page for Avrilia Floratou\">Avrilia Floratou<\/a>","is_active":false,"last_first":"Floratou, Avrilia","people_section":0,"alias":"avflor"},{"type":"user_nicename","value":"Carlo Curino","user_id":31352,"display_name":"Carlo Curino","author_link":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/ccurino\/\" aria-label=\"Visit the profile page for Carlo Curino\">Carlo Curino<\/a>","is_active":false,"last_first":"Curino, Carlo","people_section":0,"alias":"ccurino"},{"type":"user_nicename","value":"Raghu Ramakrishnan","user_id":40051,"display_name":"Raghu Ramakrishnan","author_link":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/raghu\/\" aria-label=\"Visit the profile page for Raghu Ramakrishnan\">Raghu Ramakrishnan<\/a>","is_active":false,"last_first":"Ramakrishnan, Raghu","people_section":0,"alias":"raghu"}],"msr_type":"Post","featured_image_thumbnail":"<img width=\"960\" height=\"540\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/06\/NEW_SIGMOD2024-BlogHeroFeature-1400x788-1-960x540.png\" class=\"img-object-cover\" alt=\"SIGMOD PODS 2024 logo to the left of the first page of &quot;LST-Bench: Benchmarking Log-Structured Tables in the Cloud&quot;\" decoding=\"async\" loading=\"lazy\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/06\/NEW_SIGMOD2024-BlogHeroFeature-1400x788-1-960x540.png 960w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/06\/NEW_SIGMOD2024-BlogHeroFeature-1400x788-1-300x169.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/06\/NEW_SIGMOD2024-BlogHeroFeature-1400x788-1-1024x576.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/06\/NEW_SIGMOD2024-BlogHeroFeature-1400x788-1-768x432.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/06\/NEW_SIGMOD2024-BlogHeroFeature-1400x788-1-1066x600.png 1066w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/06\/NEW_SIGMOD2024-BlogHeroFeature-1400x788-1-655x368.png 655w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/06\/NEW_SIGMOD2024-BlogHeroFeature-1400x788-1-240x135.png 240w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/06\/NEW_SIGMOD2024-BlogHeroFeature-1400x788-1-640x360.png 640w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/06\/NEW_SIGMOD2024-BlogHeroFeature-1400x788-1-1280x720.png 1280w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/06\/NEW_SIGMOD2024-BlogHeroFeature-1400x788-1.png 1401w\" sizes=\"(max-width: 960px) 100vw, 960px\" \/>","byline":"","formattedDate":"June 10, 2024","formattedExcerpt":"LST-Bench is a new open-source benchmark designed to evaluate table formats in cloud environments. It extends existing benchmarks to better reflect real-world usage &amp; performance of data lakes and easily integrates with commonly used analytical engines.","locale":{"slug":"en_us","name":"English","native":"","english":"English"},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/1042161"}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/users\/37583"}],"replies":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/comments?post=1042161"}],"version-history":[{"count":69,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/1042161\/revisions"}],"predecessor-version":[{"id":1043865,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/1042161\/revisions\/1043865"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media\/1042893"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=1042161"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/categories?post=1042161"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/tags?post=1042161"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=1042161"},{"taxonomy":"msr-region","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-region?post=1042161"},{"taxonomy":"msr-event-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event-type?post=1042161"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=1042161"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=1042161"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=1042161"},{"taxonomy":"msr-promo-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-promo-type?post=1042161"},{"taxonomy":"msr-podcast-series","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-podcast-series?post=1042161"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}