{"id":979374,"date":"2023-10-27T09:00:00","date_gmt":"2023-10-27T16:00:00","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?p=979374"},"modified":"2023-10-26T12:45:35","modified_gmt":"2023-10-26T19:45:35","slug":"data-formulator-a-concept-driven-ai-powered-approach-to-data-visualization","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/data-formulator-a-concept-driven-ai-powered-approach-to-data-visualization\/","title":{"rendered":"Data Formulator: A concept-driven, AI-powered approach to data visualization"},"content":{"rendered":"\n<p class=\"has-text-align-center\"><strong><em>This research paper was presented at the <\/em><\/strong><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/ieeevis.org\/year\/2023\/welcome\" target=\"_blank\" rel=\"noreferrer noopener\"><em><strong>IEEE Visualization Conference<\/strong><\/em><span class=\"sr-only\"> (opens in new tab)<\/span><\/a><strong><em> (VIS 2023), the premier forum for advances in visualization and visual analytics.<\/em><\/strong><\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1400\" height=\"788\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/10\/VICS-BlogHeroFeature-1400x788-1.jpg\" alt=\"The VIS2023 logo to the left of the first page of an accepted research paper\" class=\"wp-image-979416\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/10\/VICS-BlogHeroFeature-1400x788-1.jpg 1400w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/10\/VICS-BlogHeroFeature-1400x788-1-300x169.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/10\/VICS-BlogHeroFeature-1400x788-1-1024x576.jpg 1024w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/10\/VICS-BlogHeroFeature-1400x788-1-768x432.jpg 768w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/10\/VICS-BlogHeroFeature-1400x788-1-1066x600.jpg 1066w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/10\/VICS-BlogHeroFeature-1400x788-1-655x368.jpg 655w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/10\/VICS-BlogHeroFeature-1400x788-1-343x193.jpg 343w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/10\/VICS-BlogHeroFeature-1400x788-1-240x135.jpg 240w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/10\/VICS-BlogHeroFeature-1400x788-1-640x360.jpg 640w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/10\/VICS-BlogHeroFeature-1400x788-1-960x540.jpg 960w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/10\/VICS-BlogHeroFeature-1400x788-1-1280x720.jpg 1280w\" sizes=\"auto, (max-width: 1400px) 100vw, 1400px\" \/><\/figure>\n\n\n\n<p>Effective data visualization plays a crucial role in data analysis. It enables data analysts and others to explore complex datasets, comprehend patterns, and convey meaningful insights to various stakeholders. Today, there are numerous tools for creating visual representations of data. However, these tools only work with <em>tidy data<\/em>, meaning that data points must be organized according to the specific categories required by the tool\u2019s visualization format. This poses significant challenges for data analysts, requiring the use of additional tools to transform raw data into a compatible format before it is entered into one of these visualization tools.<\/p>\n\n\n\n<p>For instance, consider a dataset displaying 2020 temperatures in Seattle and Atlanta. If an analyst aims to create a scatter plot comparing the temperatures of these two US cities on the x\/y-axes, data transformation is essential. The visualization tool mandates separate columns for Seattle and Atlanta temperatures to map to the scatter plot&#8217;s axes. Consequently, the analyst must pivot the input table to generate these columns. Moreover, if the analyst intends to compare which city experiences warmer days or create a smoothed line chart illustrating Seattle&#8217;s 7-day moving average temperature, further computations on the transformed data are necessary. Fields like &#8220;Warmer&#8221; and &#8220;Seattle 7-day Moving Avg&#8221; need to be calculated to facilitate the visualization, as depicted in Figure 1. This intricate process highlights the complexity and expertise currently needed to prepare raw data for effective visualization.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><a data-bi-bhvr=\"14\"  data-bi-cn=\"A figure with upper left showing an input data table with three columns Date, City and Temperature showing temperatures of Seattle and Atlanta from 2020-01-01 to 2020-12-31. On its right side show three visualizations that the user wants to create: (1) a scatter plot to compare their temperatures, (2) a histogram to show number days each city is warmer, and (3) a line chart shows Seattle moving average temperature; and the user cannot create these visualizations because the input table is not in the right format. At the bottom of the figure, it shows a data table that the analyst needs to transform from the input table in order to create desired visualizations. This table contains six columns: Date, Seattle Temp, Atlanta Temp, Warmer, Difference and Seattle Temp Moving Average. There is an emoji of \u201cconfusion\u201d to express that the data transformation process can be challenging. \" href=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/10\/VIS-figure-1-example.png\"><img loading=\"lazy\" decoding=\"async\" width=\"3456\" height=\"1690\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/10\/VIS-figure-1-example.png\" alt=\"A figure with upper left showing an input data table with three columns Date, City and Temperature showing temperatures of Seattle and Atlanta from 2020-01-01 to 2020-12-31. On its right side show three visualizations that the user wants to create: (1) a scatter plot to compare their temperatures, (2) a histogram to show number days each city is warmer, and (3) a line chart shows Seattle moving average temperature; and the user cannot create these visualizations because the input table is not in the right format. At the bottom of the figure, it shows a data table that the analyst needs to transform from the input table in order to create desired visualizations. This table contains six columns: Date, Seattle Temp, Atlanta Temp, Warmer, Difference and Seattle Temp Moving Average. There is an emoji of \u201cconfusion\u201d to express that the data transformation process can be challenging. \" class=\"wp-image-979386\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/10\/VIS-figure-1-example.png 3456w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/10\/VIS-figure-1-example-300x147.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/10\/VIS-figure-1-example-1024x501.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/10\/VIS-figure-1-example-768x376.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/10\/VIS-figure-1-example-1536x751.png 1536w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/10\/VIS-figure-1-example-2048x1001.png 2048w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/10\/VIS-figure-1-example-240x117.png 240w\" sizes=\"auto, (max-width: 3456px) 100vw, 3456px\" \/><\/a><figcaption class=\"wp-element-caption\">Figure 1. A data analyst wants to compare 2020 temperatures in Seattle and Atlanta using visualizations like scatter plots and histograms. However, the original dataset lacks necessary columns (&#8220;Seattle Temp,&#8221; &#8220;Atlanta Temp,&#8221; &#8220;Warmer,&#8221; and &#8220;Seattle Temp Moving Average&#8221;) for these visualizations. Data transformation is needed to include these fields.<\/figcaption><\/figure>\n\n\n\n<p>This hurdle is particularly daunting because it necessitates a certain level of programming expertise or familiarity with additional data processing tools. It highlights the complexities of data visualization and underscores the need for an easier and more seamless process for data analysts, enabling them to create impactful visualizations regardless of their technical background.<\/p>\n\n\n\n<p>Against the backdrop of rapid advancements in learning language models (LLMs) and programming-by-example techniques, researchers have made significant strides in breaking down these barriers. In this context, we share our paper, \u201c<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/arxiv.org\/abs\/2309.10094\"><u>Data Formulator: AI-powered Concept-driven Visualization Authoring<\/u><span class=\"sr-only\"> (opens in new tab)<\/span><\/a>,\u201d presented at <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/ieeevis.org\/year\/2023\/welcome\">VIS 2023<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> and winner of the <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/ieeevis.org\/year\/2023\/info\/awards\/best-paper-awards\">Best Paper Honorable Mention<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> award. Data Formulator is an AI-powered visualization authoring tool developed through a collaboration between researchers studying AI and those studying human-computer interaction (HCI). The result is a new visualization paradigm that separates high-level visualization intents from low-level data transformation steps. The process begins with data analysts articulating their visualization ideas as <em>data concepts<\/em>. These concepts refer to specific data categories, or <em>fields<\/em>, that analysts want to visualize, even though they are not present in the raw input data. This way, they effectively convey their visualization intent with the AI agent, which, in turn, assists them in implementing their visualization.<\/p>\n\n\n\n<div class=\"annotations \" data-bi-aN=\"citation\">\n\t<ul class=\"annotations__list card depth-16 bg-body p-4 \">\n\t\t<li class=\"annotations__list-item\">\n\t\t\t\t\t\t<span class=\"annotations__type d-block text-uppercase font-weight-semibold text-neutral-300 small\">Publication<\/span>\n\t\t\t<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/data-formulator-ai-powered-concept-driven-visualization-authoring\/\" target=\"_self\" class=\"annotations__link font-weight-semibold text-decoration-none\" data-bi-type=\"annotated-link\" aria-label=\"Data Formulator: AI-powered Concept-driven Visualization Authoring\" data-bi-aN=\"citation\" data-bi-cN=\"Data Formulator: AI-powered Concept-driven Visualization Authoring\">\n\t\t\t\tData Formulator: AI-powered Concept-driven Visualization Authoring&nbsp;<span class=\"glyph-append glyph-append-chevron-right glyph-append-xsmall\"><\/span>\n\t\t\t<\/a>\n\t\t\t\t\t<\/li>\n\t<\/ul>\n<\/div>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"defining-data-concepts-and-creating-visualizations\">Defining data concepts and creating visualizations<\/h2>\n\n\n\n<p>The way Data Formula operates is straightforward. The analyst defines the specific data concepts they plan to visualize, either through natural language queries or by providing categories, or example entries for the concept. Once these concepts are defined, they are linked to appropriate visual representation, as illustrated in Figure 2.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><a data-bi-bhvr=\"14\"  data-bi-cn=\"A figure shows the user interface of Data Formulator and steps for an analyst to interact with the interface. At the right side shows the concept shelf, there is an annotation that reads \u201c1. Concept Shelf: create and derive new concepts needed for visualization\u201d. To its left is the Chart Builder panel, with an annotation \u201c2. Chart Builder: encode data concepts to visual channels\u201d. The bottom left side is a table view that shows the input data, the annotation reads \u201c3. Data View: inspect the original and derive tables\u201d. The top left is the visualization panel that shows visualizations generated by Data Formulator, the annotation reads \u201c4. Visualization View: explore generated visualizations.\u201d \" href=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/10\/VIS-figure-2-overview.png\"><img loading=\"lazy\" decoding=\"async\" width=\"3664\" height=\"1916\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/10\/VIS-figure-2-overview.png\" alt=\"A figure shows the user interface of Data Formulator and steps for an analyst to interact with the interface. At the right side shows the concept shelf, there is an annotation that reads \u201c1. Concept Shelf: create and derive new concepts needed for visualization\u201d. To its left is the Chart Builder panel, with an annotation \u201c2. Chart Builder: encode data concepts to visual channels\u201d. The bottom left side is a table view that shows the input data, the annotation reads \u201c3. Data View: inspect the original and derive tables\u201d. The top left is the visualization panel that shows visualizations generated by Data Formulator, the annotation reads \u201c4. Visualization View: explore generated visualizations.\u201d \" class=\"wp-image-979392\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/10\/VIS-figure-2-overview.png 3664w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/10\/VIS-figure-2-overview-300x157.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/10\/VIS-figure-2-overview-1024x535.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/10\/VIS-figure-2-overview-768x402.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/10\/VIS-figure-2-overview-1536x803.png 1536w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/10\/VIS-figure-2-overview-2048x1071.png 2048w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/10\/VIS-figure-2-overview-240x126.png 240w\" sizes=\"auto, (max-width: 3664px) 100vw, 3664px\" \/><\/a><figcaption class=\"wp-element-caption\">Figure 2. The Data Formulator user interface. Data Formulator has four panels: (1) the Concept Shelf, for defining new data concepts to be visualized, (2) the Chart Builder, for specifying the visualization type, (3) the Table View, for analysts to inspect data automatically generated by Data Formulator, and (4) the Visualization Panel, for presenting final visualizations.<\/figcaption><\/figure>\n\n\n\n<p>If the analyst defines concepts through examples, Data Formulator engages a program synthesizer, which generates a specialized data reshaping program, transforming the provided data to bring out the required data fields. Conversely, when an analyst introduces a new concept using natural language queries, Data Formulator calls on LLMs to generate code, which facilitates the creation of a new data category based on the provided description. In both cases, Data Formulator compiles the transformed data into a structured table and creates corresponding visualizations.<\/p>\n\n\n\n<p>We recognize that analyst specifications can be ambiguous, so we designed Data Formulator to generate multiple visualization options to help them identify what they want. The tool also provides analysts with the AI-generated transformation program and the transformed data for inspection. This transparency helps analysts refine their intent for future iterations.<\/p>\n\n\n\n<p>In continuing our Seattle\/Atlanta temperatures example, the following two figures show how analysts can use Data Formulator to create visualizations without reformatting raw data using an external tool. Instead, the analyst provides example entries in the form of temperature values to create new the data concepts &#8220;Seattle Temp&#8221; and &#8220;Atlanta Temp,\u201d shown in Figure 3. The analyst uses these natural language queries to create the new concept &#8220;Warmer&#8221; and instructs Data Formulator to format the data so that it can be visualized, shown in Figure 4.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><a data-bi-bhvr=\"14\"  data-bi-cn=\"The figure shows the workflow of the analyst to create new data concepts \u201cAtlanta Temp\u201d and \u201cSeattle Temp\u201d using examples. The left figure shows that the user opens a panel in Data Formulator\u2019s concept shelf, typed the concept name \u201cAtlanta Temp\u201d, and provide example temperature values \u201c45, 47, 56, 41\u201d to define the concept. Then, the user drags Atlanta Temp concept to y-axis in the Chart Builder (the Seattle Temp concept is already placed in the x-axis box). The analyst then completes an example table with two columns Atlanta Temp, Seattle Temp with two rows (row 1 contains two values 45, 51, row contains values 47, 45) to demonstrate the relation between these two concepts. Finally, the analyst clicks \u201cFormulate\u201d button and Data Formulator returns the transformed data (with columns \u201c#\u201d, \u201cSeattle Temp\u201d, \u201cAtlanta Temp\u201d, \u201cDate\u201d) and a scatter plot that visualizes the data with Seattle Temp on x axis, Atlanta Temp on y axis. \" href=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/10\/VIS-figure-3-by-example.png\"><img loading=\"lazy\" decoding=\"async\" width=\"3656\" height=\"1338\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/10\/VIS-figure-3-by-example.png\" alt=\"The figure shows the workflow of the analyst to create new data concepts \u201cAtlanta Temp\u201d and \u201cSeattle Temp\u201d using examples. The left figure shows that the user opens a panel in Data Formulator\u2019s concept shelf, typed the concept name \u201cAtlanta Temp\u201d, and provide example temperature values \u201c45, 47, 56, 41\u201d to define the concept. Then, the user drags Atlanta Temp concept to y-axis in the Chart Builder (the Seattle Temp concept is already placed in the x-axis box). The analyst then completes an example table with two columns Atlanta Temp, Seattle Temp with two rows (row 1 contains two values 45, 51, row contains values 47, 45) to demonstrate the relation between these two concepts. Finally, the analyst clicks \u201cFormulate\u201d button and Data Formulator returns the transformed data (with columns \u201c#\u201d, \u201cSeattle Temp\u201d, \u201cAtlanta Temp\u201d, \u201cDate\u201d) and a scatter plot that visualizes the data with Seattle Temp on x axis, Atlanta Temp on y axis. \" class=\"wp-image-979398\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/10\/VIS-figure-3-by-example.png 3656w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/10\/VIS-figure-3-by-example-300x110.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/10\/VIS-figure-3-by-example-1024x375.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/10\/VIS-figure-3-by-example-768x281.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/10\/VIS-figure-3-by-example-1536x562.png 1536w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/10\/VIS-figure-3-by-example-2048x750.png 2048w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/10\/VIS-figure-3-by-example-240x88.png 240w\" sizes=\"auto, (max-width: 3656px) 100vw, 3656px\" \/><\/a><figcaption class=\"wp-element-caption\">Figure 3. The analyst creates new data concepts \u201cAtlanta Temp\u201d, \u201cSeattle Temp\u201d using examples. The AI agent solves a programming-by-example problem to create the new concepts for visualization.<\/figcaption><\/figure>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><a data-bi-bhvr=\"14\"  data-bi-cn=\"The figure shows the workflow of the analyst to create new data concepts \u201cWarmer\u201d using natural language query. The left figure shows that the user opens a panel in Data Formulator\u2019s concept shelf. The user selected \u201cderived from\u201d two concepts \u201cSeattle Temp\u201d and \u201cAtlanta Temp\u201d and typed the concept name \u201cWarmer\u201d. The user also provides a natural language query \u201cWhich is the warmer city, or the same\u201d to describe the concept. After clicking a \u201cforge\u201d icon, in the second box shows the concept with the instantiated concept which contains an example table: the example table has 5 rows and header \u201cSeattle Temp, Atlanta Temp, Warmer\u201d, and the rows show \u201c51, 45, Seattle\u201d, \u201c38, 58, Atlanta\u201d, \u201c44, 65, Atlanta\u201d, \u201c42, 60, Atlanta\u201d, \u201c35, 62, Atlanta\u201d. The user then clicks the inspect button, and Data Formulator opens a panel that shows the code that achieve the transformation. Finally, the analyst clicks \u201csave\u201d button after inspecting the code to confirm the code is correct. \" href=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/10\/VIS-figure-4-by-natural-language-updated.png\"><img loading=\"lazy\" decoding=\"async\" width=\"2980\" height=\"1042\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/10\/VIS-figure-4-by-natural-language-updated.png\" alt=\"The figure shows the workflow of the analyst to create new data concepts \u201cWarmer\u201d using natural language query. The left figure shows that the user opens a panel in Data Formulator\u2019s concept shelf. The user selected \u201cderived from\u201d two concepts \u201cSeattle Temp\u201d and \u201cAtlanta Temp\u201d and typed the concept name \u201cWarmer\u201d. The user also provides a natural language query \u201cWhich is the warmer city, or the same\u201d to describe the concept. After clicking a \u201cforge\u201d icon, in the second box shows the concept with the instantiated concept which contains an example table: the example table has 5 rows and header \u201cSeattle Temp, Atlanta Temp, Warmer\u201d, and the rows show \u201c51, 45, Seattle\u201d, \u201c38, 58, Atlanta\u201d, \u201c44, 65, Atlanta\u201d, \u201c42, 60, Atlanta\u201d, \u201c35, 62, Atlanta\u201d. The user then clicks the inspect button, and Data Formulator opens a panel that shows the code that achieve the transformation. Finally, the analyst clicks \u201csave\u201d button after inspecting the code to confirm the code is correct. \" class=\"wp-image-979401\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/10\/VIS-figure-4-by-natural-language-updated.png 2980w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/10\/VIS-figure-4-by-natural-language-updated-300x105.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/10\/VIS-figure-4-by-natural-language-updated-1024x358.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/10\/VIS-figure-4-by-natural-language-updated-768x269.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/10\/VIS-figure-4-by-natural-language-updated-1536x537.png 1536w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/10\/VIS-figure-4-by-natural-language-updated-2048x716.png 2048w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/10\/VIS-figure-4-by-natural-language-updated-240x84.png 240w\" sizes=\"auto, (max-width: 2980px) 100vw, 2980px\" \/><\/a><figcaption class=\"wp-element-caption\">Figure 4. The analyst creates a new data concept \u201cWarmer\u201d using natural language description. Data Formulator calls LLMs to generate a transformation program to derive the new concept.<\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"looking-ahead-analyst-ai-collaboration-in-data-analysis\">Looking ahead: Analyst-AI collaboration in data analysis<\/h2>\n\n\n\n<p>AI-powered data analysis tools have the potential to significantly streamline the entire data analysis process by consolidating various tasks into a single tool. Beyond just visualization, this concept-driven technique can be applied to data cleaning, data integration, visual data exploration, and visual storytelling. Our vision is for an AI system to take high-level instruction from the user and automatically recommend the necessary steps across the entire data analysis pipeline, enabling collaboration between the user and the AI agent to achieve their data visualization goals.<\/p>\n\n\n\n<p>Inevitably, data analysts will need to tackle more complex tasks beyond the scope mentioned here. For this reason, it\u2019s crucial to consider how to design AI-powered tools that effectively convey results to the analyst that are uncertain, ambiguous, or incorrect. This ensures that the analyst can trust the tool and collaborate effectively with AI to accomplish their objectives.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Visualization is vital for understanding complex data, but existing tools require \u201ctidy data,\u201d adding extra steps. Learn how Data Formulator transforms concepts into visuals, promoting collaboration between analysts and AI agents.<\/p>\n","protected":false},"author":42183,"featured_media":979416,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","footnotes":""},"categories":[1],"tags":[],"research-area":[13556,13554],"msr-region":[],"msr-event-type":[],"msr-locale":[268875],"msr-post-option":[],"msr-impact-theme":[],"msr-promo-type":[],"msr-podcast-series":[],"class_list":["post-979374","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-research-blog","msr-research-area-artificial-intelligence","msr-research-area-human-computer-interaction","msr-locale-en_us"],"msr_event_details":{"start":"","end":"","location":""},"podcast_url":"","podcast_episode":"","msr_research_lab":[],"msr_impact_theme":[],"related-publications":[],"related-downloads":[],"related-videos":[],"related-academic-programs":[],"related-groups":[],"related-projects":[],"related-events":[],"related-researchers":[{"type":"user_nicename","value":"Chenglong Wang","user_id":41251,"display_name":"Chenglong Wang","author_link":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/chenwang\/\" aria-label=\"Visit the profile page for Chenglong Wang\">Chenglong Wang<\/a>","is_active":false,"last_first":"Wang, Chenglong","people_section":0,"alias":"chenwang"},{"type":"guest","value":"john-thompson","user_id":"979884","display_name":"John Thompson","author_link":"<a href=\"https:\/\/jrthomp.com\/\" aria-label=\"Visit the profile page for John Thompson\">John Thompson<\/a>","is_active":true,"last_first":"Thompson, John","people_section":0,"alias":"john-thompson"},{"type":"user_nicename","value":"Steven Drucker","user_id":33564,"display_name":"Steven Drucker","author_link":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/sdrucker\/\" aria-label=\"Visit the profile page for Steven Drucker\">Steven Drucker<\/a>","is_active":false,"last_first":"Drucker, Steven","people_section":0,"alias":"sdrucker"},{"type":"user_nicename","value":"Jianfeng Gao","user_id":32246,"display_name":"Jianfeng Gao","author_link":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/jfgao\/\" aria-label=\"Visit the profile page for Jianfeng Gao\">Jianfeng Gao<\/a>","is_active":false,"last_first":"Gao, Jianfeng","people_section":0,"alias":"jfgao"}],"msr_type":"Post","featured_image_thumbnail":"<img width=\"960\" height=\"540\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/10\/VICS-BlogHeroFeature-1400x788-1-960x540.jpg\" class=\"img-object-cover\" alt=\"The VIS2023 logo to the left of the first page of an accepted research paper\" decoding=\"async\" loading=\"lazy\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/10\/VICS-BlogHeroFeature-1400x788-1-960x540.jpg 960w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/10\/VICS-BlogHeroFeature-1400x788-1-300x169.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/10\/VICS-BlogHeroFeature-1400x788-1-1024x576.jpg 1024w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/10\/VICS-BlogHeroFeature-1400x788-1-768x432.jpg 768w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/10\/VICS-BlogHeroFeature-1400x788-1-1066x600.jpg 1066w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/10\/VICS-BlogHeroFeature-1400x788-1-655x368.jpg 655w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/10\/VICS-BlogHeroFeature-1400x788-1-343x193.jpg 343w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/10\/VICS-BlogHeroFeature-1400x788-1-240x135.jpg 240w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/10\/VICS-BlogHeroFeature-1400x788-1-640x360.jpg 640w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/10\/VICS-BlogHeroFeature-1400x788-1-1280x720.jpg 1280w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/10\/VICS-BlogHeroFeature-1400x788-1.jpg 1400w\" sizes=\"auto, (max-width: 960px) 100vw, 960px\" \/>","byline":"","formattedDate":"October 27, 2023","formattedExcerpt":"Visualization is vital for understanding complex data, but existing tools require \u201ctidy data,\u201d adding extra steps. Learn how Data Formulator transforms concepts into visuals, promoting collaboration between analysts and AI agents.","locale":{"slug":"en_us","name":"English","native":"","english":"English"},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/979374","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/users\/42183"}],"replies":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/comments?post=979374"}],"version-history":[{"count":13,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/979374\/revisions"}],"predecessor-version":[{"id":979902,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/979374\/revisions\/979902"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media\/979416"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=979374"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/categories?post=979374"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/tags?post=979374"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=979374"},{"taxonomy":"msr-region","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-region?post=979374"},{"taxonomy":"msr-event-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event-type?post=979374"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=979374"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=979374"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=979374"},{"taxonomy":"msr-promo-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-promo-type?post=979374"},{"taxonomy":"msr-podcast-series","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-podcast-series?post=979374"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}