{"id":1150284,"date":"2025-10-22T14:31:38","date_gmt":"2025-10-22T21:31:38","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-project&p=1150284"},"modified":"2025-10-22T14:31:41","modified_gmt":"2025-10-22T21:31:41","slug":"kernel%e2%80%91level-innovation-and-hardware%e2%80%91aware-modeling","status":"publish","type":"msr-project","link":"https:\/\/www.microsoft.com\/en-us\/research\/project\/kernel%e2%80%91level-innovation-and-hardware%e2%80%91aware-modeling\/","title":{"rendered":"Kernel\u2011level innovation and hardware\u2011aware modeling\u00a0"},"content":{"rendered":"
\n\t
\n\t\t
\n\t\t\t\"M365\t\t<\/div>\n\t\t\n\t\t
\n\t\t\t\n\t\t\t
\n\t\t\t\t\n\t\t\t\t
\n\t\t\t\t\t\n\t\t\t\t\t
\n\t\t\t\t\t\t
\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t<\/span>\n\t\t\t\t\t\t\t\t\tEfficient AI team\t\t\t\t\t\t\t\t<\/a>\n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\n

Kernel\u2011level innovation and hardware\u2011aware modeling<\/h1>\n\n\t\t\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t<\/div>\n\t\t<\/div>\n\t<\/div>\n<\/section>\n\n\n\n\n\n

Our team is driving fundamental innovation at the kernel level to push the boundaries of efficiency in large-scale AI workloads. We are rethinking core attention mechanisms and computational pathways to deliver breakthroughs in performance, memory optimization, and scalability.<\/p>\n\n\n\n

By redesigning execution flows, introducing advanced quantization strategies, and leveraging emerging hardware capabilities, we aim to eliminate bottlenecks in both compute and communication layers. This project focuses on achieving end-to-end acceleration without compromising accuracy or reliability, enabling models to handle longer contexts and higher throughput at significantly lower cost. Through tight algorithm\u2013hardware co-design and deep integration with production systems, we are building the foundation for next-generation AI infrastructure that is faster, leaner, and more sustainable.<\/p>\n\n\n\n

<\/div>\n\n\n\n
\"Research<\/figure>\n\n\n\n
<\/div>\n\n\n\n

Together, these advancements deliver measurable gains in tokens per second, cost per generated token, and energy efficiency while preserving output quality.<\/p>\n\n\n\n

<\/div>\n\n\n","protected":false},"excerpt":{"rendered":"

We design and optimize GPU kernels and model\u2011execution strategies to maximize throughput and minimize latency for real\u2011world LLM workloads. Interactive enterprise scenarios often run at low batch sizes, interleave very long contexts, and have strict latency targets\u2014exposing different bottlenecks than training. <\/p>\n

Our work includes attention\u2011kernel optimization for both prefill and decode, sampling and logit\u2011processing improvements, and auto\u2011tuning at the PTX level to balance occupancy, register usage, and memory traffic. We also explore dynamic kernel selection at runtime, choosing kernels based on batch size, context length, and hardware topology to maintain peak efficiency without manual retuning. <\/p>\n

Together, these advancements deliver measurable gains in tokens per second and cost per generated token while preserving output quality. <\/p>\n","protected":false},"featured_media":1045266,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":true,"_classifai_error":"","footnotes":""},"research-area":[13556],"msr-locale":[268875],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-1150284","msr-project","type-msr-project","status-publish","has-post-thumbnail","hentry","msr-research-area-artificial-intelligence","msr-locale-en_us","msr-archive-status-active"],"msr_project_start":"","related-publications":[841759,1041231,1041954,1041966,1129848],"related-downloads":[],"related-videos":[],"related-groups":[],"related-events":[],"related-opportunities":[],"related-posts":[],"related-articles":[],"tab-content":[],"slides":[],"related-researchers":[{"type":"user_nicename","display_name":"Srikant Bharadwaj","user_id":41644,"people_section":"Related people","alias":"srbharadwaj"},{"type":"user_nicename","display_name":"Mirian Hipolito Garcia","user_id":40483,"people_section":"Related people","alias":"mirianh"},{"type":"user_nicename","display_name":"Daniel Eduardo Madrigal Diaz","user_id":40480,"people_section":"Related people","alias":"danielmad"},{"type":"user_nicename","display_name":"Victor Ruehle","user_id":41027,"people_section":"Related people","alias":"virueh"},{"type":"user_nicename","display_name":"Renee St. Amant","user_id":43080,"people_section":"Related people","alias":"reneestamant"}],"msr_research_lab":[],"msr_impact_theme":[],"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/1150284","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-project"}],"version-history":[{"count":21,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/1150284\/revisions"}],"predecessor-version":[{"id":1155057,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/1150284\/revisions\/1155057"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media\/1045266"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=1150284"}],"wp:term":[{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=1150284"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=1150284"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=1150284"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=1150284"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}