{"id":1144422,"date":"2025-07-23T09:00:00","date_gmt":"2025-07-23T16:00:00","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?p=1144422"},"modified":"2025-07-31T08:57:03","modified_gmt":"2025-07-31T15:57:03","slug":"technical-approach-for-classifying-human-ai-interactions-at-scale","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/technical-approach-for-classifying-human-ai-interactions-at-scale\/","title":{"rendered":"Technical approach for classifying human-AI interactions at scale"},"content":{"rendered":"\n
\"The<\/figure>\n\n\n\n

As large language models (LLMs) become foundational to modern AI systems, the ability to run them at scale\u2014efficiently, reliably, and in near real-time\u2014is no longer a nice-to-have. It\u2019s essential. The Semantic Telemetry<\/a> project tackles this challenge by applying LLM-based classifiers to hundreds of millions of sampled, anonymized Bing Chat conversations each week. These classifiers extract signals like user expertise, primary topic, and satisfaction, enabling deeper insight into human-AI interactions and driving continuous system improvement.<\/p>\n\n\n\n

But building a pipeline that can handle this volume isn\u2019t just about plugging into an API. It requires a high-throughput, high-performance architecture that can orchestrate distributed processing, manage token and prompt complexity, and gracefully handle the unpredictability of remote LLM endpoints.<\/p>\n\n\n\n

In this latest post in our series on Semantic Telemetry, we\u2019ll walk through the engineering behind that system\u2014how we designed for scale from the start, the trade-offs we made, and the lessons we learned along the way. From batching strategies and token optimization and orchestration, we\u2019ll share what it takes to build a real-time LLM classification pipeline.<\/p>\n\n\n\n

For additional project background: Semantic Telemetry: Understanding how users interact with AI systems<\/a> and Engagement, user expertise, and satisfaction: Key insights from the Semantic Telemetry Project<\/a>.<\/p>\n\n\n\n

\n
\n
\n\t
\n\t\t
\n\t\t\t\t\t\tBlog<\/span>\n\t\t\tSemantic Telemetry: Understanding how users interact with AI systems<\/span> <\/span><\/a>\t\t\t\t\t<\/div>\n\t<\/article>\n<\/div>\n<\/div>\n\n\n\n
\n
\n\t
\n\t\t
\n\t\t\t\t\t\tBlog<\/span>\n\t\t\tEngagement, user expertise, and satisfaction: Key insights from the Semantic Telemetry Project<\/span> <\/span><\/a>\t\t\t\t\t<\/div>\n\t<\/article>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n

System architecture highlights<\/h2>\n\n\n\n

The Semantic Telemetry pipeline (opens in new tab)<\/span><\/a> is a highly-scalable, highly-configurable, data transformation pipeline. While it follows a familiar ETL structure, several architectural innovations make it uniquely suited for high-throughput LLM integration:<\/p>\n\n\n\n