{"id":492296,"date":"2018-06-22T11:28:35","date_gmt":"2018-06-22T18:28:35","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-research-item&p=492296"},"modified":"2018-10-16T22:22:30","modified_gmt":"2018-10-17T05:22:30","slug":"a-configurable-cloud-scale-dnn-processor-for-real-time-ai","status":"publish","type":"msr-research-item","link":"https:\/\/www.microsoft.com\/en-us\/research\/publication\/a-configurable-cloud-scale-dnn-processor-for-real-time-ai\/","title":{"rendered":"A Configurable Cloud-Scale DNN Processor for Real-Time AI"},"content":{"rendered":"
Interactive AI-powered services require low-latency evaluation of deep neural network (DNN) models\u2014aka \u201crealtime AI\u201d. The growing demand for computationally expensive, state-of-the-art DNNs, coupled with diminishing performance gains of general-purpose architectures, has fueled an explosion of specialized Neural Processing Units (NPUs). NPUs for interactive services should satisfy two requirements: (1) execution of DNN models with low latency, high throughput, and high efficiency, and (2) flexibility to accommodate evolving state-of-the-art models (e.g., RNNs, CNNs, MLPs) without costly silicon updates.\u00a0This paper describes the NPU architecture for Project Brainwave,\u00a0a production-scale system for real-time AI. The Brainwave NPU achieves more than an order of magnitude improvement in latency and throughput over state-of-the-art GPUs on large RNNs at a batch size of 1. The NPU attains this performance using a single-threaded SIMD ISA paired with a distributed microarchitecture capable of dispatching over 7M operations from a single instruction. The spatially distributed microarchitecture, scaled up to 96,000 multiply-accumulate units, is supported by hierarchical instruction decoders and schedulers coupled with thousands of independently addressable high-bandwidth on-chip memories, and can transparently exploit many levels of fine-grain SIMD parallelism. When targeting an FPGA, microarchitectural parameters such as native datapaths and numerical precision can be \u201csynthesis specialized\u201d to models at compile time, enabling high FPGA performance competitive with hardened NPUs. When running on an Intel Stratix 10 280 FPGA, the Brainwave
\nNPU achieves performance ranging from ten to over thirty-five teraflops, with no batching, on large, memory-intensive RNNs.<\/p>\n","protected":false},"excerpt":{"rendered":"
Interactive AI-powered services require low-latency evaluation of deep neural network (DNN) models\u2014aka \u201crealtime AI\u201d. The growing demand for computationally expensive, state-of-the-art DNNs, coupled with diminishing performance gains of general-purpose architectures, has fueled an explosion of specialized Neural Processing Units (NPUs). NPUs for interactive services should satisfy two requirements: (1) execution of DNN models with low […]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-author-ordering":[{"type":"user_nicename","value":"Jeremy Fowers","user_id":"32249"},{"type":"user_nicename","value":"Kalin Ovtcharov","user_id":"36134"},{"type":"user_nicename","value":"Michael Papamichael","user_id":"33191"},{"type":"user_nicename","value":"Todd Massengill","user_id":"34236"},{"type":"user_nicename","value":"Ming Liu","user_id":"37056"},{"type":"user_nicename","value":"Daniel Lo","user_id":"31646"},{"type":"user_nicename","value":"Shlomi Alkalay","user_id":"37479"},{"type":"user_nicename","value":"Michael Haselman","user_id":"37482"},{"type":"user_nicename","value":"Logan Adams","user_id":"37503"},{"type":"user_nicename","value":"Mahdi Ghandi","user_id":"37506"},{"type":"user_nicename","value":"Stephen Heil","user_id":"33607"},{"type":"user_nicename","value":"Prerak Patel","user_id":"37512"},{"type":"user_nicename","value":"Adam Sapek","user_id":"37491"},{"type":"user_nicename","value":"Gabriel Weisz","user_id":"37500"},{"type":"user_nicename","value":"Lisa Woods","user_id":"32701"},{"type":"user_nicename","value":"Sitaram Lanka","user_id":"37485"},{"type":"user_nicename","value":"Steve Reinhardt","user_id":"37488"},{"type":"user_nicename","value":"Adrian Caulfield","user_id":"30808"},{"type":"user_nicename","value":"Eric Chung","user_id":"31746"},{"type":"user_nicename","value":"Doug Burger","user_id":"31582"}],"msr_publishername":"ACM","msr_publisher_other":"","msr_booktitle":"","msr_chapter":"","msr_edition":"Proceedings of the 45th International Symposium on Computer Architecture, 2018","msr_editors":"","msr_how_published":"","msr_isbn":"","msr_issue":"","msr_journal":"","msr_number":"","msr_organization":"","msr_pages_string":"","msr_page_range_start":"","msr_page_range_end":"","msr_series":"","msr_volume":"","msr_copyright":"","msr_conference_name":"Proceedings of the 45th International Symposium on Computer Architecture, 2018","msr_doi":"","msr_arxiv_id":"","msr_s2_paper_id":"","msr_mag_id":"","msr_pubmed_id":"","msr_other_authors":"","msr_other_contributors":"","msr_speaker":"","msr_award":"","msr_affiliation":"","msr_institution":"","msr_host":"","msr_version":"","msr_duration":"","msr_original_fields_of_study":"","msr_release_tracker_id":"","msr_s2_match_type":"","msr_citation_count_updated":"","msr_published_date":"2018-06-05","msr_highlight_text":"","msr_notes":"","msr_longbiography":"","msr_publicationurl":"","msr_external_url":"","msr_secondary_video_url":"","msr_conference_url":"","msr_journal_url":"","msr_s2_pdf_url":"","msr_year":0,"msr_citation_count":0,"msr_influential_citations":0,"msr_reference_count":0,"msr_s2_match_confidence":0,"msr_microsoftintellectualproperty":true,"msr_s2_open_access":false,"msr_s2_author_ids":[],"msr_pub_ids":[],"msr_hide_image_in_river":0,"footnotes":""},"msr-research-highlight":[],"research-area":[13556,13552],"msr-publication-type":[193716],"msr-publisher":[],"msr-focus-area":[],"msr-locale":[268875],"msr-post-option":[],"msr-field-of-study":[],"msr-conference":[],"msr-journal":[],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-492296","msr-research-item","type-msr-research-item","status-publish","hentry","msr-research-area-artificial-intelligence","msr-research-area-hardware-devices","msr-locale-en_us"],"msr_publishername":"ACM","msr_edition":"Proceedings of the 45th International Symposium on Computer Architecture, 2018","msr_affiliation":"","msr_published_date":"2018-06-05","msr_host":"","msr_duration":"","msr_version":"","msr_speaker":"","msr_other_contributors":"","msr_booktitle":"","msr_pages_string":"","msr_chapter":"","msr_isbn":"","msr_journal":"","msr_volume":"","msr_number":"","msr_editors":"","msr_series":"","msr_issue":"","msr_organization":"","msr_how_published":"","msr_notes":"","msr_highlight_text":"","msr_release_tracker_id":"","msr_original_fields_of_study":"","msr_download_urls":"","msr_external_url":"","msr_secondary_video_url":"","msr_longbiography":"","msr_microsoftintellectualproperty":1,"msr_main_download":"492299","msr_publicationurl":"","msr_doi":"","msr_publication_uploader":[{"type":"file","title":"ISCA18-Brainwave-CameraReady","viewUrl":"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2018\/06\/ISCA18-Brainwave-CameraReady.pdf","id":492299,"label_id":0}],"msr_related_uploader":"","msr_citation_count":0,"msr_citation_count_updated":"","msr_s2_paper_id":"","msr_influential_citations":0,"msr_reference_count":0,"msr_arxiv_id":"","msr_s2_author_ids":[],"msr_s2_open_access":false,"msr_s2_pdf_url":null,"msr_attachments":[],"msr-author-ordering":[{"type":"user_nicename","value":"Jeremy Fowers","user_id":32249,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Jeremy Fowers"},{"type":"user_nicename","value":"Kalin Ovtcharov","user_id":36134,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Kalin Ovtcharov"},{"type":"user_nicename","value":"Michael Papamichael","user_id":33191,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Michael Papamichael"},{"type":"user_nicename","value":"Todd Massengill","user_id":34236,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Todd Massengill"},{"type":"user_nicename","value":"Ming Liu","user_id":37056,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Ming Liu"},{"type":"user_nicename","value":"Daniel Lo","user_id":31646,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Daniel Lo"},{"type":"user_nicename","value":"Shlomi Alkalay","user_id":37479,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Shlomi Alkalay"},{"type":"user_nicename","value":"Michael Haselman","user_id":37482,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Michael Haselman"},{"type":"user_nicename","value":"Logan Adams","user_id":37503,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Logan Adams"},{"type":"user_nicename","value":"Mahdi Ghandi","user_id":37506,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Mahdi Ghandi"},{"type":"user_nicename","value":"Stephen Heil","user_id":33607,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Stephen Heil"},{"type":"user_nicename","value":"Prerak Patel","user_id":37512,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Prerak Patel"},{"type":"user_nicename","value":"Adam Sapek","user_id":37491,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Adam Sapek"},{"type":"user_nicename","value":"Gabriel Weisz","user_id":37500,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Gabriel Weisz"},{"type":"user_nicename","value":"Lisa Woods","user_id":32701,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Lisa Woods"},{"type":"user_nicename","value":"Sitaram Lanka","user_id":37485,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Sitaram Lanka"},{"type":"user_nicename","value":"Steve Reinhardt","user_id":37488,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Steve Reinhardt"},{"type":"user_nicename","value":"Adrian Caulfield","user_id":30808,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Adrian Caulfield"},{"type":"user_nicename","value":"Eric Chung","user_id":31746,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Eric Chung"},{"type":"user_nicename","value":"Doug Burger","user_id":31582,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Doug Burger"}],"msr_impact_theme":[],"msr_research_lab":[],"msr_event":[],"msr_group":[],"msr_project":[486102,171431],"publication":[],"video":[],"msr-tool":[],"msr_publication_type":"inproceedings","related_content":{"projects":[{"ID":486102,"post_title":"Project Brainwave","post_name":"project-brainwave","post_type":"msr-project","post_date":"2018-08-14 09:49:27","post_modified":"2023-07-10 07:52:57","post_status":"publish","permalink":"https:\/\/www.microsoft.com\/en-us\/research\/project\/project-brainwave\/","post_excerpt":"Project Brainwave is a deep learning platform for real-time AI inference in the cloud and on the edge, transforming computing by augmenting CPUs with an interconnected and configurable compute layer composed of programmable silicon.","_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/486102"}]}},{"ID":171431,"post_title":"Project Catapult","post_name":"project-catapult","post_type":"msr-project","post_date":"2015-02-02 08:20:51","post_modified":"2021-12-06 21:07:49","post_status":"publish","permalink":"https:\/\/www.microsoft.com\/en-us\/research\/project\/project-catapult\/","post_excerpt":"Project Catapult is the code name for a Microsoft Research (MSR) enterprise-level initiative that is transforming cloud computing by augmenting CPUs with an interconnected and configurable compute layer composed of programmable silicon.","_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/171431"}]}}]},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/492296","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-research-item"}],"version-history":[{"count":1,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/492296\/revisions"}],"predecessor-version":[{"id":492305,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/492296\/revisions\/492305"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=492296"}],"wp:term":[{"taxonomy":"msr-research-highlight","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-highlight?post=492296"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=492296"},{"taxonomy":"msr-publication-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-publication-type?post=492296"},{"taxonomy":"msr-publisher","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-publisher?post=492296"},{"taxonomy":"msr-focus-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-focus-area?post=492296"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=492296"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=492296"},{"taxonomy":"msr-field-of-study","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-field-of-study?post=492296"},{"taxonomy":"msr-conference","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-conference?post=492296"},{"taxonomy":"msr-journal","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-journal?post=492296"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=492296"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=492296"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}