{"id":5675,"date":"2017-10-27T07:00:45","date_gmt":"2017-10-27T14:00:45","guid":{"rendered":"https:\/\/blogs.msdn.microsoft.com\/translation\/?p=5675"},"modified":"2017-10-27T07:00:45","modified_gmt":"2017-10-27T14:00:45","slug":"bringing-ai-translation-to-edge-devices-with-microsoft-translator","status":"publish","type":"post","link":"https://www.microsoft.com\/en-us\/translator/blog\/2017\/10\/27\/bringing-ai-translation-to-edge-devices-with-microsoft-translator\/","title":{"rendered":"Bringing AI translation to edge devices with Microsoft Translator"},"content":{"rendered":"
In November 2016, Microsoft brought the benefit of AI-powered machine translation, aka Neural Machine Translation (NMT), to developers and end users alike. Last week, Microsoft brought NMT capability to the edge of the cloud<\/a> by leveraging the NPU, an AI-dedicated processor integrated into the Mate 10<\/a>, Huawei\u2019s latest flagship phone. The new chip makes AI-powered translations available on the device even in the absence of internet access, enabling the system to produce translations whose quality is on par with the online system.<\/p>\n To achieve this breakthrough, researchers and engineers from Microsoft and Huawei collaborated in adapting neural translation to this new computing environment.<\/p>\n The most advanced NMT systems currently in production (i.e., used at scale in the cloud by businesses and apps) are using a neural network architecture combining multiple layers of LSTM networks<\/a>, an attention algorithm, and a translation (decoder) layer.<\/p>\n The animation below explains, in a simplified way, how this multi-layer neural network functions. For more details, please refer to the \u201cwhat is machine translation page<\/a>\u201d on the Microsoft Translator site.<\/p>\n <\/p>\n <\/a><\/p>\n In this cloud NMT implementation, these middle LSTM layers consume a large part of the computing power. To be able to run full NMT on a mobile device, it was necessary to find a mechanism that could reduce these computational costs while preserving, as much as possible, the translation quality.<\/p>\n This is where Huawei\u2019s Neural Processing Unit (NPU) comes into play. \u00a0Microsoft researchers and engineers took advantage of the NPU, which is specifically engineered to excel at low-latency AI computations, to offload operations that would have been unacceptably slow to process on the main CPU.<\/p>\n <\/p>\n The implementation now available on the Microsoft Translator app for the Huawei Mate 10 optimizes translation by offloading the most compute-intensive tasks to the NPU.<\/p>\n Specifically, this implementation replaces these middle LSTM network layers by a deep feed-forward neural network<\/a>. Deep feed-forward neural networks are powerful but require very large amounts of computation due to the high connectivity among neurons.<\/p>\n Neural networks rely primarily on matrix multiplications, an operation that is not complex from a mathematical standpoint but very expensive when performed at the scale required for such a deep neural network. The Huawei NPU excels in performing these matrix multiplications in a massively parallel fashion. It is also quite efficient from a power utilization standpoint, an important quality on battery powered devices.<\/p>\n At each layer of this feed-forward network, the NPU computes both the raw neuron output and the subsequent ReLu activation function<\/a> efficiently and with very low latency. By leveraging the ample high-speed memory on the NPU, it performs these computations in parallel without having to pay the cost for data transfer (i.e., slowing down performance) between the CPU and the NPU.<\/p>\n Once the final layer of this deep feed-forward network is computed, the system has a rich representation of the source language sentence. This representation is then fed through a left-to-right LSTM \u201cdecoder\u201d to produce each target language word, with the same attention algorithm used in the online version of the NMT.<\/p>\nImplementation<\/h2>\n