{"id":421017,"date":"2017-08-22T12:01:42","date_gmt":"2017-08-22T19:01:42","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?p=421017"},"modified":"2018-08-16T16:48:33","modified_gmt":"2018-08-16T23:48:33","slug":"microsoft-unveils-project-brainwave","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/microsoft-unveils-project-brainwave\/","title":{"rendered":"Microsoft unveils Project Brainwave for real-time AI"},"content":{"rendered":"
By Doug Burger (opens in new tab)<\/span><\/a>, Distinguished Engineer, Microsoft<\/em><\/p>\n Today at Hot Chips 2017 (opens in new tab)<\/span><\/a>, our cross-Microsoft team unveiled a new deep learning acceleration platform, codenamed Project Brainwave.\u00a0 I\u2019m delighted to share more details in this post, since Project Brainwave achieves a major leap forward in both performance and flexibility for cloud-based serving of deep learning models. We designed the system for real-time AI, which means the system processes requests as fast as it receives them, with ultra-low latency.\u00a0 Real-time AI is becoming increasingly important as cloud infrastructures process live data streams, whether they be search queries, videos, sensor streams, or interactions with users.<\/p>\n The Project Brainwave system is built with three main layers:<\/p>\n First, Project Brainwave leverages the massive FPGA infrastructure (opens in new tab)<\/span><\/a>that Microsoft has been deploying over the past few years.\u00a0 By attaching high-performance FPGAs directly to our datacenter network, we can serve DNNs as hardware microservices (opens in new tab)<\/span><\/a>, where a DNN can be mapped to a pool of remote FPGAs and called by a server with no software in the loop.\u00a0 This system architecture both reduces latency, since the CPU does not need to process incoming requests, and allows very high throughput, with the FPGA processing requests as fast as the network can stream them.<\/p>\n Second, Project Brainwave uses a powerful \u201csoft\u201d DNN processing unit (or DPU), synthesized onto commercially available FPGAs.\u00a0 A number of companies\u2014both large companies and a slew of startups\u2014are building hardened DPUs.\u00a0 Although some of these chips have high peak performance, they must choose their operators and data types at design time, which limits their flexibility.\u00a0 Project Brainwave takes a different approach, providing a design that scales across a range of data types, with the desired data type being a synthesis-time decision.\u00a0 The design combines both the ASIC digital signal processing blocks on the FPGAs and the synthesizable logic to provide a greater and more optimized number of functional units.\u00a0 This approach exploits the FPGA\u2019s flexibility in two ways.\u00a0 First, we have defined highly customized, narrow-precision data types that increase performance without real losses in model accuracy.\u00a0 Second, we can incorporate research innovations into the hardware platform quickly (typically a few weeks), which is essential in this fast-moving space. \u00a0As a result, we achieve performance comparable to – or greater than – many of these hard-coded DPU chips but are delivering the promised performance today.<\/p>\n\n