{"id":421017,"date":"2017-08-22T12:01:42","date_gmt":"2017-08-22T19:01:42","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?p=421017"},"modified":"2018-08-16T16:48:33","modified_gmt":"2018-08-16T23:48:33","slug":"microsoft-unveils-project-brainwave","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/microsoft-unveils-project-brainwave\/","title":{"rendered":"Microsoft unveils Project Brainwave for real-time AI"},"content":{"rendered":"<p><em><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-421035 aligncenter\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2017\/08\/Hot-Chips-Stratix-10-board-1-.jpg\" alt=\"Hot Chips Stratix 10 board\" width=\"1044\" height=\"403\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2017\/08\/Hot-Chips-Stratix-10-board-1-.jpg 1044w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2017\/08\/Hot-Chips-Stratix-10-board-1--300x116.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2017\/08\/Hot-Chips-Stratix-10-board-1--768x296.jpg 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2017\/08\/Hot-Chips-Stratix-10-board-1--1024x395.jpg 1024w\" sizes=\"(max-width: 1044px) 100vw, 1044px\" \/>By <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/dburger\/\">Doug Burger<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, Distinguished Engineer, Microsoft<\/em><\/p>\n<p>Today at <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/www.hotchips.org\/about\/\" target=\"_blank\" rel=\"noopener\">Hot Chips 2017<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, our cross-Microsoft team unveiled a new deep learning acceleration platform, codenamed Project Brainwave.\u00a0 I\u2019m delighted to share more details in this post, since Project Brainwave achieves a major leap forward in both performance and flexibility for cloud-based serving of deep learning models. We designed the system for real-time AI, which means the system processes requests as fast as it receives them, with ultra-low latency.\u00a0 Real-time AI is becoming increasingly important as cloud infrastructures process live data streams, whether they be search queries, videos, sensor streams, or interactions with users.<\/p>\n<p>The Project Brainwave system is built with three main layers:<\/p>\n<ol>\n<li>A high-performance, distributed system architecture;<\/li>\n<li>A hardware DNN engine synthesized onto FPGAs; and<\/li>\n<li>A compiler and runtime for low-friction deployment of trained models.<\/li>\n<\/ol>\n<p>First, Project Brainwave leverages the <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/blogs.microsoft.com\/ai\/2016\/10\/17\/the_moonshot_that_succeeded\/\">massive FPGA infrastructure <span class=\"sr-only\"> (opens in new tab)<\/span><\/a>that Microsoft has been deploying over the past few years.\u00a0 By attaching high-performance FPGAs directly to our datacenter network, we can serve DNNs as <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/onedrive.live.com\/view.aspx?resid=D026B4699190F1E6!2725&ithint=file%2cpptx&app=PowerPoint&authkey=!ANkHSNEobbYVzpQ\" target=\"_blank\" rel=\"noopener\">hardware microservices<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, where a DNN can be mapped to a pool of remote FPGAs and called by a server with no software in the loop.\u00a0 This system architecture both reduces latency, since the CPU does not need to process incoming requests, and allows very high throughput, with the FPGA processing requests as fast as the network can stream them.<\/p>\n<p>Second, Project Brainwave uses a powerful \u201csoft\u201d DNN processing unit (or DPU), synthesized onto commercially available FPGAs.\u00a0 A number of companies\u2014both large companies and a slew of startups\u2014are building hardened DPUs.\u00a0 Although some of these chips have high peak performance, they must choose their operators and data types at design time, which limits their flexibility.\u00a0 Project Brainwave takes a different approach, providing a design that scales across a range of data types, with the desired data type being a synthesis-time decision.\u00a0 The design combines both the ASIC digital signal processing blocks on the FPGAs and the synthesizable logic to provide a greater and more optimized number of functional units.\u00a0 This approach exploits the FPGA\u2019s flexibility in two ways.\u00a0 First, we have defined highly customized, narrow-precision data types that increase performance without real losses in model accuracy.\u00a0 Second, we can incorporate research innovations into the hardware platform quickly (typically a few weeks), which is essential in this fast-moving space. \u00a0As a result, we achieve performance comparable to &#8211; or greater than &#8211; many of these hard-coded DPU chips but are delivering the promised performance today.<\/p>\n<div id=\"attachment_421206\" style=\"width: 310px\" class=\"wp-caption alignright\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-421206\" class=\"wp-image-421206 size-medium\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2017\/08\/Hot-Chips-Stratix-10-300x200.jpg\" alt=\"Intel Stratix 10\" width=\"300\" height=\"200\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2017\/08\/Hot-Chips-Stratix-10-300x200.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2017\/08\/Hot-Chips-Stratix-10.jpg 640w\" sizes=\"(max-width: 300px) 100vw, 300px\" \/><p id=\"caption-attachment-421206\" class=\"wp-caption-text\">At Hot Chips, Project Brainwave was demonstrated using Intel\u2019s new 14 nm Stratix 10 FPGA.<\/p><\/div>\n<p>Third, Project Brainwave incorporates a software stack designed to support the wide range of popular deep learning frameworks.\u00a0 We already support <a href=\"https:\/\/www.microsoft.com\/en-us\/cognitive-toolkit\/\" target=\"_blank\" rel=\"noopener\">Microsoft\u00a0Cognitive Toolkit<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>\u00a0and Google\u2019s <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/github.com\/tensorflow\/tensorflow\" target=\"_blank\" rel=\"noopener\">Tensorflow<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, and plan to support many others.\u00a0 We have defined a graph-based intermediate representation, to which we convert models trained in the popular frameworks, and then compile down to our high-performance infrastructure.<\/p>\n<p>We architected this system to show high <em>actual<\/em> performance across a wide range of complex models, with batch-free execution.\u00a0 Companies and researchers building DNN accelerators often show performance demos using convolutional neural networks (CNNs).\u00a0 Since CNNs are so compute intensive, it is comparatively simple to achieve high performance numbers.\u00a0 Those results are often not representative of performance on more complex models from other domains, such as LSTMs or GRUs for natural language processing.\u00a0 Another technique that DNN processors often use to boost performance is running deep neural networks with high degrees of batching.\u00a0 While this technique is effective for throughput-based architectures\u2014as well as off-line scenarios such as training\u2014it is less effective for real-time AI.\u00a0 With large batches, the first query in a batch must wait for all of the many queries in the batch to complete.\u00a0 Our system, designed for real-time AI, can handle complex, memory-intensive models such as <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/en.wikipedia.org\/wiki\/Long_short-term_memory\">LSTMs<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, without using batching to juice throughput.<\/p>\n<p>At Hot Chips, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/erchung\/\">Eric Chung<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> and <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/jfowers\/\">Jeremy Fowers<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> demonstrated the Project Brainwave system ported to Intel\u2019s new 14 nm <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/www.altera.com\/documentation\/jzw1474049428757.html?wapkw=stratix+10+fpga&_ga=2.103075200.1664846565.1502825634-143895998.1458610550\" target=\"_blank\" rel=\"noopener\">Stratix 10 FPGA<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. You can <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2017\/08\/HC29.22622-Brainwave-Datacenter-Chung-Microsoft-2017_08_11_2017.compressed.pdf\" target=\"_blank\" rel=\"noopener\">view the PowerPoint deck they presented at the event here<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> (PDF file).<\/p>\n<p>Even on early Stratix 10 silicon, the ported Project Brainwave system ran a large GRU model\u2014five times larger than Resnet-50\u2014with no batching, and achieved record-setting performance.\u00a0 The demo used Microsoft\u2019s custom 8-bit floating point format (\u201cms-fp8\u201d), which does not suffer accuracy losses (on average) across a range of models.\u00a0 We showed Stratix 10 sustaining <strong>39.5 Teraflops<\/strong> on this large GRU, running each request in under one millisecond.\u00a0 At that level of performance, the Brainwave architecture sustains execution of over 130,000 compute operations per cycle, driven by one macro-instruction being issued each 10 cycles.\u00a0 Running on Stratix 10, Project Brainwave thus achieves unprecedented levels of demonstrated real-time AI performance on extremely challenging models.\u00a0 As we tune the system over the next few quarters, we expect significant further performance improvements.<\/p>\n<p>We are working to bring this powerful, real-time AI system to users in <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/azure.microsoft.com\" target=\"_blank\" rel=\"noopener\">Azure<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, so that our customers can benefit from Project Brainwave directly, complementing the indirect access through our services such as <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/www.bing.com\" target=\"_blank\" rel=\"noopener\">Bing<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>.\u00a0 In the near future, we\u2019ll detail when our Azure customers will be able to run their most complex deep learning models at record-setting performance.\u00a0 With the Project Brainwave system incorporated at scale and available to our customers, Microsoft Azure will have industry-leading capabilities for real-time AI.<\/p>\n<p>&nbsp;<\/p>\n<p><strong>Related:<\/strong><\/p>\n<ul>\n<li><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/1drv.ms\/p\/s!AqQlbiCFL_Cf18tGaHcNLxqcV87gnQ\">Accelerating Persistent Neural Networks at DataCenter Scale<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/li>\n<li><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/channel9.msdn.com\/Events\/Build\/2017\/B8063\">Inside the Microsoft FPGA-based configurable cloud<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/li>\n<li><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/www.wired.com\/2016\/09\/microsoft-bets-future-chip-reprogram-fly\/\">Microsoft bets its future on a reprogrammable computer chip<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/li>\n<li><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/x.com\/dcburger\" target=\"_blank\" rel=\"noopener\">Follow Doug Burger on twitter\u00a0<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>By Doug Burger, Distinguished Engineer, Microsoft Today at Hot Chips 2017, our cross-Microsoft team unveiled a new deep learning acceleration platform, codenamed Project Brainwave.\u00a0 I\u2019m delighted to share more details in this post, since Project Brainwave achieves a major leap forward in both performance and flexibility for cloud-based serving of deep learning models. We designed [&hellip;]<\/p>\n","protected":false},"author":35981,"featured_media":421206,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"footnotes":""},"categories":[194467],"tags":[],"research-area":[13556,13552],"msr-region":[],"msr-event-type":[],"msr-locale":[268875],"msr-post-option":[],"msr-impact-theme":[],"msr-promo-type":[],"msr-podcast-series":[],"class_list":["post-421017","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-artifical-intelligence","msr-research-area-artificial-intelligence","msr-research-area-hardware-devices","msr-locale-en_us"],"msr_event_details":{"start":"","end":"","location":""},"podcast_url":"","podcast_episode":"","msr_research_lab":[],"msr_impact_theme":[],"related-publications":[],"related-downloads":[],"related-videos":[],"related-academic-programs":[],"related-groups":[],"related-projects":[486102,171431],"related-events":[],"related-researchers":[],"msr_type":"Post","featured_image_thumbnail":"<img width=\"640\" height=\"426\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2017\/08\/Hot-Chips-Stratix-10.jpg\" class=\"img-object-cover\" alt=\"Intel Stratix 10\" decoding=\"async\" loading=\"lazy\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2017\/08\/Hot-Chips-Stratix-10.jpg 640w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2017\/08\/Hot-Chips-Stratix-10-300x200.jpg 300w\" sizes=\"(max-width: 640px) 100vw, 640px\" \/>","byline":"","formattedDate":"August 22, 2017","formattedExcerpt":"By Doug Burger, Distinguished Engineer, Microsoft Today at Hot Chips 2017, our cross-Microsoft team unveiled a new deep learning acceleration platform, codenamed Project Brainwave.\u00a0 I\u2019m delighted to share more details in this post, since Project Brainwave achieves a major leap forward in both performance and&hellip;","locale":{"slug":"en_us","name":"English","native":"","english":"English"},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/421017"}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/users\/35981"}],"replies":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/comments?post=421017"}],"version-history":[{"count":13,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/421017\/revisions"}],"predecessor-version":[{"id":784429,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/421017\/revisions\/784429"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media\/421206"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=421017"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/categories?post=421017"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/tags?post=421017"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=421017"},{"taxonomy":"msr-region","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-region?post=421017"},{"taxonomy":"msr-event-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event-type?post=421017"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=421017"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=421017"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=421017"},{"taxonomy":"msr-promo-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-promo-type?post=421017"},{"taxonomy":"msr-podcast-series","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-podcast-series?post=421017"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}