{"id":1004586,"date":"2024-03-07T09:00:00","date_gmt":"2024-03-07T17:00:00","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?p=1004586"},"modified":"2024-03-05T14:17:27","modified_gmt":"2024-03-05T22:17:27","slug":"improving-llm-understanding-of-structured-data-and-exploring-advanced-prompting-methods","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/improving-llm-understanding-of-structured-data-and-exploring-advanced-prompting-methods\/","title":{"rendered":"Improving LLM understanding of structured data and exploring advanced prompting methods"},"content":{"rendered":"\n<p class=\"has-text-align-center\"><em><strong>This research paper was presented at the <\/strong><\/em><strong><em><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/www.wsdm-conference.org\/2024\/\" target=\"_blank\" rel=\"noreferrer noopener\">17th ACM International Conference on Web Search and Data Mining<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/em><\/strong><em><strong>&nbsp;(WSDM 2024), the premier conference on web-inspired research on search and data mining.<\/strong><\/em><\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1400\" height=\"788\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/02\/WSDM-BlogHeroFeature-1400x788-1.png\" alt=\"WSDM logo in white to the left of the first page of the \"Table Meets LLM: Can Large Language Models Understand Structured Table Data? A Benchmark and Empirical Study\" publication\" class=\"wp-image-1004610\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/02\/WSDM-BlogHeroFeature-1400x788-1.png 1400w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/02\/WSDM-BlogHeroFeature-1400x788-1-300x169.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/02\/WSDM-BlogHeroFeature-1400x788-1-1024x576.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/02\/WSDM-BlogHeroFeature-1400x788-1-768x432.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/02\/WSDM-BlogHeroFeature-1400x788-1-1066x600.png 1066w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/02\/WSDM-BlogHeroFeature-1400x788-1-655x368.png 655w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/02\/WSDM-BlogHeroFeature-1400x788-1-343x193.png 343w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/02\/WSDM-BlogHeroFeature-1400x788-1-240x135.png 240w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/02\/WSDM-BlogHeroFeature-1400x788-1-640x360.png 640w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/02\/WSDM-BlogHeroFeature-1400x788-1-960x540.png 960w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/02\/WSDM-BlogHeroFeature-1400x788-1-1280x720.png 1280w\" sizes=\"(max-width: 1400px) 100vw, 1400px\" \/><\/figure>\n\n\n\n<p>In today\u2019s data-driven landscape, tables are indispensable for organizing and presenting information, particularly text. They streamline repetitive content, enhance data manageability, enable easier data analysis, and improve machine processing capabilities. Meanwhile, large language models (LLMs) are advancing in their ability to tackle challenges associated with natural language, but the degree to which they understand tables included in their prompts remains an open question. Our research aims to explore this question and improve how LLMs use and work with table-based data.<\/p>\n\n\n\n<p>Our paper, \u201c<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/table-meets-llm-can-large-language-models-understand-structured-table-data-a-benchmark-and-empirical-study\/\">Table Meets LLM: Can Large Language Models Understand Structured Table Data? A Benchmark and Empirical Study<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>,\u201d presented at <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/www.wsdm-conference.org\/2024\/\" target=\"_blank\" rel=\"noreferrer noopener\">WSDM 2024<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, investigates what kinds of prompts most effectively enable LLMs to understand tables; how much LLMs inherently detect structured data; and how LLMs\u2019 existing knowledge can be harnessed to improve this understanding. We also analyze the complex trade-off among multiple combinations of input designs and overall performance.<\/p>\n\n\n\n<p>To address these questions, we propose a new benchmark called Structural Understanding Capabilities (SUC), shown in Figure 1 (a), which focuses on specific tasks to assess LLMs\u2019 ability to understand structured data in tables and compare different types of prompts. We conducted a series of experiments using different prompt designs. Our findings, detailed in the <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/table-meets-llm-can-large-language-models-understand-structured-table-data-a-benchmark-and-empirical-study\/\">paper<\/a>, evaluate how each design enhances LLMs\u2019 ability to work with tables.&nbsp;<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><a data-bi-bhvr=\"14\"  data-bi-cn=\"The image (a) is a flowchart with three main columns that illustrate the stages, capabilities, and tasks associated with a process benchmarked by SUC (Semantic Understanding Capability), and their application in input designs. Here is the detailed alt text for the image: Flowchart illustrates the detailed design of the Semantic Understanding Capability Benchmark. The leftmost column is labeled \" href=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/02\/Figure-1.png\"><img loading=\"lazy\" decoding=\"async\" width=\"5041\" height=\"1530\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/02\/Figure-1.png\" alt=\"The image (a) is a flowchart with three main columns that illustrate the stages, capabilities, and tasks associated with a process benchmarked by SUC (Semantic Understanding Capability), and their application in input designs. Here is the detailed alt text for the image: Flowchart illustrates the detailed design of the Semantic Understanding Capability Benchmark. The leftmost column is labeled 'Stages' with two main stages: 'Partition & Parsing' in blue and 'Search & Retrieval' in pink. Each stage is associated with 'Capabilities' in the middle column. 'Partition & Parsing' includes 'Structural Description Detection', 'Format Understanding', and 'Hierarchy Detection'. 'Search & Retrieval' includes 'Grounding\/Locating' and 'Operation Reasoning'. These capabilities correspond to 'Tasks' in the third column. For 'Partition & Parsing', tasks are 'Table Partition', 'Table Size Detection', and 'Hierarchy Detection'. For 'Search & Retrieval', tasks are 'Cell Lookup & Reverse Lookup' and 'Column & Row Retrieval'.  \n\n \n\nTo the right of these columns is image (b) labeled 'Input Designs' connected to 'Partition Mark', 'Serialization', 'Role Prompting', 'Order Permutation', and 'Format Explanation'. These are further linked to types of 'Markup Languages' represented in green boxes: 'HTML', 'XML', 'Markdown', and more indicated by ellipses. Image (b) covers the input designs for the SUC evaluation. \" class=\"wp-image-1004616\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/02\/Figure-1.png 5041w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/02\/Figure-1-300x91.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/02\/Figure-1-1024x311.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/02\/Figure-1-768x233.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/02\/Figure-1-1536x466.png 1536w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/02\/Figure-1-2048x622.png 2048w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/02\/Figure-1-240x73.png 240w\" sizes=\"(max-width: 5041px) 100vw, 5041px\" \/><\/a><figcaption class=\"wp-element-caption\">Figure 1. The SUC benchmark and prompt designs for evaluation.<\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"insights-and-findings-using-the-suc-benchmark\">Insights and findings using the SUC benchmark<\/h2>\n\n\n\n<p>Based on humans&#8217; perception of tables, we developed tasks to evaluate how LLMs understand them. We conducted evaluations on GPT-3.5 and GPT-4 and discovered that the results depended on certain input factors, such as table format, content order, and partition marks. The findings, detailed in Tables 1 and 2, reveal some notable and unexpected findings:<\/p>\n\n\n\n<ul>\n<li>Delimiter-separated formats (e.g., CSV, TSV), underperformed compared with HTML by 6.76 percent.<\/li>\n\n\n\n<li>Using HTML and few-shot learning consistently improved performance. The effectiveness of other approaches, such as format explanation, role prompting, order change, and partition marks, varied depending on task difficulty and the required capacity.<\/li>\n\n\n\n<li>Despite the simplicity of the benchmark tasks, the highest overall accuracy across seven tasks is only 65.43 percent. This underscores the need for LLMs to have better awareness of table structures and highlights areas for further improvement in table serialization.<\/li>\n<\/ul>\n\n\n\n<p>Our exploration suggests that:<\/p>\n\n\n\n<ul>\n<li>LLMs have a basic understanding of table structures but are far from perfect, even in straightforward tasks like detecting the number of columns and rows.<\/li>\n\n\n\n<li>Choosing the right combination of input designs can significantly enhance LLMs\u2019 understanding of structured data.<\/li>\n<\/ul>\n\n\n\n<p>Our findings revealed significant performance gaps in downstream tasks, attributed to the different combinations of serialization functions and input options. These gaps remained even with GPT-4, underscoring the effectiveness of our benchmark approach.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><a data-bi-bhvr=\"14\"  data-bi-cn=\"This is a table regarding the comparison table displaying the accuracy (Acc) of GPT-4 versus previous models in different tasks. Tasks include Table Partition, Cell Lookup, Reverse Lookup, Column Retrieval, Row Retrieval, Size Detection, and Merged Cell Detection. The data formats compared are NL + Sep, Markdown, JSON, XML, and HTML. GPT-4 shows improved accuracy across nearly all tasks and formats compared to its predecessors, with notable high accuracy in the HTML format for Table Partition and Merged Cell Detection tasks.\" \" href=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/02\/WSDM_Table1.png\"><img loading=\"lazy\" decoding=\"async\" width=\"1430\" height=\"229\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/02\/WSDM_Table1.png\" alt=\"This is a table regarding the comparison table displaying the accuracy (Acc) of GPT-4 versus previous models in different tasks. Tasks include Table Partition, Cell Lookup, Reverse Lookup, Column Retrieval, Row Retrieval, Size Detection, and Merged Cell Detection. The data formats compared are NL + Sep, Markdown, JSON, XML, and HTML. GPT-4 shows improved accuracy across nearly all tasks and formats compared to its predecessors, with notable high accuracy in the HTML format for Table Partition and Merged Cell Detection tasks.\" \" class=\"wp-image-1004628\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/02\/WSDM_Table1.png 1430w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/02\/WSDM_Table1-300x48.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/02\/WSDM_Table1-1024x164.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/02\/WSDM_Table1-768x123.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/02\/WSDM_Table1-240x38.png 240w\" sizes=\"(max-width: 1430px) 100vw, 1430px\" \/><\/a><figcaption class=\"wp-element-caption\">Table 1. SUC benchmark evaluations on table formats.<\/figcaption><\/figure>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><a data-bi-bhvr=\"14\"  data-bi-cn=\"This table presents the comparison of accuracy (Acc) and changes in accuracy (\u0394) for different input designs using GPT-4 on various tasks. The tasks include Table Partition, Cell Lookup, Reverse Lookup, Column Retrieval, Row Retrieval, Size Detection, and Merged Cell Detection. The input designs tested are Markup Language HTML with and without various components such as format explanation, partition mark, role prompting, and change order, as well as without 1-shot learning. The last row shows the performance of GPT-4 with Language HTML. The table displays positive and negative changes in percentages with respective tasks, highlighting the impact of each input design modification on the model\" href=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/02\/WSDM_Table2.png\"><img loading=\"lazy\" decoding=\"async\" width=\"1430\" height=\"273\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/02\/WSDM_Table2.png\" alt=\"This table presents the comparison of accuracy (Acc) and changes in accuracy (\u0394) for different input designs using GPT-4 on various tasks. The tasks include Table Partition, Cell Lookup, Reverse Lookup, Column Retrieval, Row Retrieval, Size Detection, and Merged Cell Detection. The input designs tested are Markup Language HTML with and without various components such as format explanation, partition mark, role prompting, and change order, as well as without 1-shot learning. The last row shows the performance of GPT-4 with Language HTML. The table displays positive and negative changes in percentages with respective tasks, highlighting the impact of each input design modification on the model's accuracy.\" \" class=\"wp-image-1004631\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/02\/WSDM_Table2.png 1430w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/02\/WSDM_Table2-300x57.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/02\/WSDM_Table2-1024x195.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/02\/WSDM_Table2-768x147.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/02\/WSDM_Table2-240x46.png 240w\" sizes=\"(max-width: 1430px) 100vw, 1430px\" \/><\/a><figcaption class=\"wp-element-caption\">Table 2. Ablation study of input designs using the SUC benchmark.<\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"improved-performance-with-self-augmented-prompting\">Improved performance with self-augmented prompting<\/h2>\n\n\n\n<p>Based on these benchmark evaluations, we investigated how LLMs\u2019 existing knowledge could be used to enhance their understanding of structured data. To do this, we introduced self-augmentation, a model-agnostic technique that improves structural prompting\u2014enabling LLMs to identify key values and ranges by tapping into their own internal knowledge. This technique simplifies and optimizes how LLMs utilize their existing knowledge base to improve their understanding of structured content, allowing them to generate intermediate structural insights. This process is shown in Figure 2, with the results detailed in Table 3.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><a data-bi-bhvr=\"14\"  data-bi-cn=\"The image depicts a diagram showing the Self-augmented Prompting workflow that involves an initial table, an intermediate output, and a final output. Here is the detailed alt text for the image: On the left, there\" href=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/02\/Figure_2.png\"><img loading=\"lazy\" decoding=\"async\" width=\"2500\" height=\"800\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/02\/Figure_2.png\" alt=\"The image depicts a diagram showing the Self-augmented Prompting workflow that involves an initial table, an intermediate output, and a final output. Here is the detailed alt text for the image: On the left, there's a table with the title 'Antoine Salamin' and columns labeled 'Year', 'Team', 'Driver', 'Races', and 'Pos'. Two rows are visible with the years 1983 and 1989, team name starting with 'Swit...', driver name starting with 'Antoine...', and positions '29th' and '7th' highlighted in the last visible row. Below the table is a box labeled 'Table & Other info' and an arrow pointing right labeled '1st request' with the text 'Identify critical values and ranges of the table'. \n\n \n\nIn the center, a green box with rounded corners titled 'Intermediate Output' contains text summarizing the table's content, mentioning Antoine Salamin's results from 1983 to 1989, the number of races, podiums, and points range. There's an arrow looping back to the first box with 'LLM' written above it, indicating a feedback loop for further processing. \n\n \n\nOn the right, a blue box with rounded corners titled 'Final Output' contains a narrative description saying 'In 1989, Antoine Salamin drove a Porsche 962C for the Swiss Team Salamin, powered by a Porsche turbo Flat-6 engine. He competed in two races, achieving one podium and 17 points, finishing 7th overall.' An arrow labeled '2nd request' points from the '1st request' to the 'Intermediate Output' and another from there to the 'Final Output', indicating the sequence of processing requests. \" class=\"wp-image-1004622\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/02\/Figure_2.png 2500w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/02\/Figure_2-300x96.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/02\/Figure_2-1024x328.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/02\/Figure_2-768x246.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/02\/Figure_2-1536x492.png 1536w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/02\/Figure_2-2048x655.png 2048w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/02\/Figure_2-240x77.png 240w\" sizes=\"(max-width: 2500px) 100vw, 2500px\" \/><\/a><figcaption class=\"wp-element-caption\">Figure 2. Self-augmented prompting.<\/figcaption><\/figure>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><a data-bi-bhvr=\"14\"  data-bi-cn=\"This table is comparing the accuracy (Acc) and BLEU scores for different types of input choices on various question-answering datasets: TabFact, HybridQA, SQA, Feverous, and ToTTo. The types include 1-shot and self-explanation approaches (SA) with various modifications such as without table size, partition mark, format explanation, role prompting, critical values and ranges identification, and structural information description. Each row shows the impact of these modifications on the model\" href=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/02\/WSDM_Table3.png\"><img loading=\"lazy\" decoding=\"async\" width=\"1430\" height=\"360\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/02\/WSDM_Table3.png\" alt=\"This table is comparing the accuracy (Acc) and BLEU scores for different types of input choices on various question-answering datasets: TabFact, HybridQA, SQA, Feverous, and ToTTo. The types include 1-shot and self-explanation approaches (SA) with various modifications such as without table size, partition mark, format explanation, role prompting, critical values and ranges identification, and structural information description. Each row shows the impact of these modifications on the model's performance, with accuracy percentages for the datasets and BLEU-1 to BLEU-4 scores for the ToTTo dataset. \" class=\"wp-image-1004634\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/02\/WSDM_Table3.png 1430w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/02\/WSDM_Table3-300x76.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/02\/WSDM_Table3-1024x258.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/02\/WSDM_Table3-768x193.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/02\/WSDM_Table3-240x60.png 240w\" sizes=\"(max-width: 1430px) 100vw, 1430px\" \/><\/a><figcaption class=\"wp-element-caption\">Table 3. Evaluation of downstream tasks. \u201cSA\u201d refers to self-augmented prompting.<\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"looking-forward\">Looking forward<\/h2>\n\n\n\n<p>Our study sets a key benchmark in expanding the capabilities of LLMs to better understand structured table data, moving beyond conventional natural language processing tasks. We suggest future research should prioritize the integration of structural information to improve performance with various structured data types. Additionally, we propose exploring LLMs\u2019 ability to use external tools or agents for improved handling of structured data, opening new avenues for application.<\/p>\n<span id=\"label-external-link\" class=\"sr-only\" aria-hidden=\"true\">Opens in a new tab<\/span>","protected":false},"excerpt":{"rendered":"<p>Structural Understanding Capabilities is a new benchmark for evaluating and improving LLM comprehension of structured table data. This advance can help LLMs process and analyze data more effectively, broadening their applicability in real-world tasks.<\/p>\n","protected":false},"author":42183,"featured_media":1004610,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"footnotes":""},"categories":[1],"tags":[],"research-area":[13556,13563,13545],"msr-region":[],"msr-event-type":[],"msr-post-option":[243984],"msr-impact-theme":[],"msr-promo-type":[],"msr-podcast-series":[],"msr_event_details":{"start":"","end":"","location":""},"podcast_url":"","podcast_episode":"","msr_research_lab":[199560],"msr_impact_theme":[],"related-publications":[],"related-downloads":[],"related-videos":[],"related-academic-programs":[],"related-groups":[714577],"related-projects":[558663],"related-events":[],"related-researchers":[{"type":"user_nicename","value":"Mengyu Zhou","user_id":37131,"display_name":"Mengyu Zhou","author_link":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/mezho\/\" aria-label=\"Visit the profile page for Mengyu Zhou\">Mengyu Zhou<\/a>","is_active":false,"last_first":"Zhou, Mengyu","people_section":0,"alias":"mezho"}],"msr_type":"Post","featured_image_thumbnail":"<img width=\"960\" height=\"540\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/02\/WSDM-BlogHeroFeature-1400x788-1-960x540.png\" class=\"img-object-cover\" alt=\"WSDM logo in white to the left of the first page of the &quot;Table Meets LLM: Can Large Language Models Understand Structured Table Data? A Benchmark and Empirical Study&quot; publication\" decoding=\"async\" loading=\"lazy\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/02\/WSDM-BlogHeroFeature-1400x788-1-960x540.png 960w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/02\/WSDM-BlogHeroFeature-1400x788-1-300x169.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/02\/WSDM-BlogHeroFeature-1400x788-1-1024x576.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/02\/WSDM-BlogHeroFeature-1400x788-1-768x432.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/02\/WSDM-BlogHeroFeature-1400x788-1-1066x600.png 1066w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/02\/WSDM-BlogHeroFeature-1400x788-1-655x368.png 655w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/02\/WSDM-BlogHeroFeature-1400x788-1-343x193.png 343w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/02\/WSDM-BlogHeroFeature-1400x788-1-240x135.png 240w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/02\/WSDM-BlogHeroFeature-1400x788-1-640x360.png 640w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/02\/WSDM-BlogHeroFeature-1400x788-1-1280x720.png 1280w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/02\/WSDM-BlogHeroFeature-1400x788-1.png 1400w\" sizes=\"(max-width: 960px) 100vw, 960px\" \/>","byline":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/mezho\/\" title=\"Go to researcher profile for Mengyu Zhou\" aria-label=\"Go to researcher profile for Mengyu Zhou\" data-bi-type=\"byline author\" data-bi-cN=\"Mengyu Zhou\">Mengyu Zhou<\/a>","formattedDate":"March 7, 2024","formattedExcerpt":"Structural Understanding Capabilities is a new benchmark for evaluating and improving LLM comprehension of structured table data. This advance can help LLMs process and analyze data more effectively, broadening their applicability in real-world tasks.","_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/1004586"}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/users\/42183"}],"replies":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/comments?post=1004586"}],"version-history":[{"count":50,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/1004586\/revisions"}],"predecessor-version":[{"id":1012074,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/1004586\/revisions\/1012074"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media\/1004610"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=1004586"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/categories?post=1004586"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/tags?post=1004586"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=1004586"},{"taxonomy":"msr-region","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-region?post=1004586"},{"taxonomy":"msr-event-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event-type?post=1004586"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=1004586"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=1004586"},{"taxonomy":"msr-promo-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-promo-type?post=1004586"},{"taxonomy":"msr-podcast-series","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-podcast-series?post=1004586"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}