{"id":979374,"date":"2023-10-27T09:00:00","date_gmt":"2023-10-27T16:00:00","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?p=979374"},"modified":"2023-10-26T12:45:35","modified_gmt":"2023-10-26T19:45:35","slug":"data-formulator-a-concept-driven-ai-powered-approach-to-data-visualization","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/data-formulator-a-concept-driven-ai-powered-approach-to-data-visualization\/","title":{"rendered":"Data Formulator: A concept-driven, AI-powered approach to data visualization"},"content":{"rendered":"\n

This research paper was presented at the <\/em><\/strong>IEEE Visualization Conference<\/strong><\/em> (opens in new tab)<\/span><\/a> (VIS 2023), the premier forum for advances in visualization and visual analytics.<\/em><\/strong><\/p>\n\n\n\n

\"The<\/figure>\n\n\n\n

Effective data visualization plays a crucial role in data analysis. It enables data analysts and others to explore complex datasets, comprehend patterns, and convey meaningful insights to various stakeholders. Today, there are numerous tools for creating visual representations of data. However, these tools only work with tidy data<\/em>, meaning that data points must be organized according to the specific categories required by the tool\u2019s visualization format. This poses significant challenges for data analysts, requiring the use of additional tools to transform raw data into a compatible format before it is entered into one of these visualization tools.<\/p>\n\n\n\n

For instance, consider a dataset displaying 2020 temperatures in Seattle and Atlanta. If an analyst aims to create a scatter plot comparing the temperatures of these two US cities on the x\/y-axes, data transformation is essential. The visualization tool mandates separate columns for Seattle and Atlanta temperatures to map to the scatter plot’s axes. Consequently, the analyst must pivot the input table to generate these columns. Moreover, if the analyst intends to compare which city experiences warmer days or create a smoothed line chart illustrating Seattle’s 7-day moving average temperature, further computations on the transformed data are necessary. Fields like “Warmer” and “Seattle 7-day Moving Avg” need to be calculated to facilitate the visualization, as depicted in Figure 1. This intricate process highlights the complexity and expertise currently needed to prepare raw data for effective visualization.<\/p>\n\n\n\n

\"A<\/a>
Figure 1. A data analyst wants to compare 2020 temperatures in Seattle and Atlanta using visualizations like scatter plots and histograms. However, the original dataset lacks necessary columns (“Seattle Temp,” “Atlanta Temp,” “Warmer,” and “Seattle Temp Moving Average”) for these visualizations. Data transformation is needed to include these fields.<\/figcaption><\/figure>\n\n\n\n

This hurdle is particularly daunting because it necessitates a certain level of programming expertise or familiarity with additional data processing tools. It highlights the complexities of data visualization and underscores the need for an easier and more seamless process for data analysts, enabling them to create impactful visualizations regardless of their technical background.<\/p>\n\n\n\n

Against the backdrop of rapid advancements in learning language models (LLMs) and programming-by-example techniques, researchers have made significant strides in breaking down these barriers. In this context, we share our paper, \u201cData Formulator: AI-powered Concept-driven Visualization Authoring<\/u> (opens in new tab)<\/span><\/a>,\u201d presented at VIS 2023 (opens in new tab)<\/span><\/a> and winner of the Best Paper Honorable Mention (opens in new tab)<\/span><\/a> award. Data Formulator is an AI-powered visualization authoring tool developed through a collaboration between researchers studying AI and those studying human-computer interaction (HCI). The result is a new visualization paradigm that separates high-level visualization intents from low-level data transformation steps. The process begins with data analysts articulating their visualization ideas as data concepts<\/em>. These concepts refer to specific data categories, or fields<\/em>, that analysts want to visualize, even though they are not present in the raw input data. This way, they effectively convey their visualization intent with the AI agent, which, in turn, assists them in implementing their visualization.<\/p>\n\n\n\n

\n\t