Large Language Models for Tabular Data: Progresses and Future Directions

SIGIR'2024 |

Tables contain a significant portion of the world’s structured information. The ability to efficiently and accurately understand, process, reason about, analyze, and generate tabular data is critical for achieving Artificial General Intelligence (AGI) systems.

However, despite their prevalence and importance, tables present unique challenges due to their structured nature and the diverse semantics embedded within them. Textual content, numerical values, visual formats, and even formulas in tables carry rich semantic information that is often underutilized due to the complexity of accurately interpreting and integrating.

Fortunately, the advent of Large Language Models (LLMs) has opened new frontiers in natural language processing (NLP) and machine learning (ML), showing remarkable success in understanding and generating text, code, etc. Applying these advanced models to the domain of tabular data holds the promise of significant breakthroughs in how we process and leverage structured information.

Therefore, this tutorial aims to provide a comprehensive study of the advances, challenges, and opportunities in leveraging cutting-edge LLMs for tabular data. By introducing methods of prompting or training cutting-edge LLMs for table interpreting, processing, reasoning, analytics, and generation, we aim to equip researchers and practitioners with the knowledge and tools needed to unlock the full potential of LLMs for tabular data in their domains.