{"id":1133598,"date":"2025-03-10T09:00:00","date_gmt":"2025-03-10T16:00:00","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?p=1133598"},"modified":"2025-06-25T09:35:24","modified_gmt":"2025-06-25T16:35:24","slug":"semantic-telemetry-understanding-how-users-interact-with-ai-systems","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/semantic-telemetry-understanding-how-users-interact-with-ai-systems\/","title":{"rendered":"Semantic Telemetry: Understanding how users interact with AI systems"},"content":{"rendered":"\n
\"Semantic<\/figure>\n\n\n\n

AI tools are proving useful across a range of applications, from helping to drive the new era of business transformation to helping artists craft songs. But which applications are providing the most value to users? We\u2019ll dig into that question in a series of blog posts that introduce the Semantic Telemetry<\/a> project at Microsoft Research. In this initial post, we will introduce a new data science approach that we will use to analyze topics and task complexity of Copilot in Bing usage.<\/p>\n\n\n\n

Human-AI interactions can be iterative and complex, requiring a new data science approach to understand user behavior to build and support increasingly high value use cases. Imagine the following chat:<\/p>\n\n\n\n

\"Example<\/figure>\n\n\n\n

Here we see that chats can be complex and span multiple topics, such as event planning, team building, and logistics. Generative AI has ushered in a two-fold paradigm shift. First, LLMs give us a new thing to measure, that is, how people interact with AI systems. Second, they give us a new way to measure those interactions, that is, they give us the capability to understand and make inferences on these interactions, at scale. The Semantic Telemetry project has created new measures to classify human-AI interactions and understand user behavior, contributing to efforts in developing new approaches for measuring generative AI (opens in new tab)<\/span><\/a> across various use cases.<\/p>\n\n\n\n

Semantic Telemetry is a rethink of traditional telemetry–in which data is collected for understanding systems–designed for analyzing chat-based AI. We employ an innovative data science methodology that uses a large language model (LLM) to generate meaningful categorical labels, enabling us to gain insights into chat log data.<\/p>\n\n\n\n

\"Flow
Figure 1: Prompting an LLM to classify a conversation based on LLM generated label taxonomy<\/figcaption><\/figure>\n\n\n\n

This process begins with developing a set of classifications and definitions. We create these classifications by instructing an LLM to generate a short summary of the conversation, and then iteratively prompting the LLM to generate, update, and review classification labels on a batched set of summaries. This process is outlined in the paper: TnT-LLM: Text Mining at Scale with Large Language Models<\/a>. We then prompt an LLM with these generated classifiers to label new unstructured (and unlabeled) chat log data.<\/p>\n\n\n\n

\n