{"id":971391,"date":"2023-10-06T09:00:00","date_gmt":"2023-10-06T16:00:00","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/efficient-and-hardware-friendly-neural-architecture-search-with-spaceevo\/"},"modified":"2023-10-04T07:47:37","modified_gmt":"2023-10-04T14:47:37","slug":"efficient-and-hardware-friendly-neural-architecture-search-with-spaceevo","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/efficient-and-hardware-friendly-neural-architecture-search-with-spaceevo\/","title":{"rendered":"Efficient and hardware-friendly neural architecture search with SpaceEvo"},"content":{"rendered":"\n<p class=\"has-text-align-center\"><strong><em>This research paper was presented at the <\/em><\/strong><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/iccv2023.thecvf.com\/\" target=\"_blank\" rel=\"noreferrer noopener\"><strong><em>2023 IEEE\/CVF International Conference on Computer Vision<\/em><\/strong><span class=\"sr-only\"> (opens in new tab)<\/span><\/a><strong><em> (ICCV), a premier academic conference for computer vision.<\/em><\/strong><\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1400\" height=\"788\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/10\/ICCV-SpaceEVO-2023-BlogHeroFeature-1400x788-1.png\" alt=\"ICCV 2023: SpaceEvo\" class=\"wp-image-972249\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/10\/ICCV-SpaceEVO-2023-BlogHeroFeature-1400x788-1.png 1400w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/10\/ICCV-SpaceEVO-2023-BlogHeroFeature-1400x788-1-300x169.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/10\/ICCV-SpaceEVO-2023-BlogHeroFeature-1400x788-1-1024x576.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/10\/ICCV-SpaceEVO-2023-BlogHeroFeature-1400x788-1-768x432.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/10\/ICCV-SpaceEVO-2023-BlogHeroFeature-1400x788-1-1066x600.png 1066w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/10\/ICCV-SpaceEVO-2023-BlogHeroFeature-1400x788-1-655x368.png 655w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/10\/ICCV-SpaceEVO-2023-BlogHeroFeature-1400x788-1-343x193.png 343w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/10\/ICCV-SpaceEVO-2023-BlogHeroFeature-1400x788-1-240x135.png 240w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/10\/ICCV-SpaceEVO-2023-BlogHeroFeature-1400x788-1-640x360.png 640w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/10\/ICCV-SpaceEVO-2023-BlogHeroFeature-1400x788-1-960x540.png 960w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/10\/ICCV-SpaceEVO-2023-BlogHeroFeature-1400x788-1-1280x720.png 1280w\" sizes=\"auto, (max-width: 1400px) 100vw, 1400px\" \/><\/figure>\n\n\n\n<p>In the field of deep learning, where breakthroughs like the models <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/pypi.org\/project\/microsoftvision\/\" target=\"_blank\" rel=\"noreferrer noopener\">ResNet<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> and <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/github.com\/microsoft\/AzureML-BERT\" target=\"_blank\" rel=\"noreferrer noopener\">BERT<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> have achieved remarkable success, a key challenge remains: developing efficient deep neural network (DNN) models that both excel in performance and minimize latency across diverse devices. To address this, researchers have introduced hardware-aware neural architecture search (NAS) to automate efficient model design for various hardware configurations. This approach involves a predefined search space, search algorithm, accuracy estimation, and hardware-specific cost prediction models.<\/p>\n\n\n\n<p>However, optimizing the search space itself has often been overlooked. Current efforts rely mainly on MobileNets-based search spaces designed to minimize latency on mobile CPUs. But manual designs may not always align with different hardware requirements, limiting their suitability for a diverse range of devices.<\/p>\n\n\n\n<p>In the paper, \u201c<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/spaceevo-hardware-friendly-search-space-design-for-efficient-int8-inference\/\">SpaceEvo: Hardware-Friendly Search Space Design for Efficient INT8 Inference<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>,&#8221; presented at <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/iccv2023.thecvf.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">ICCV 2023,<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> we introduce SpaceEvo, a novel method that automatically creates specialized search spaces optimized for efficient INT8 inference on specific hardware platforms. What sets SpaceEvo apart is its ability to perform this design process automatically, creating a search space tailored for hardware-specific, quantization-friendly NAS. <\/p>\n\n\n\n<div style=\"height:20px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n\t<div class=\"border-bottom border-top border-gray-300 mt-5 mb-5 msr-promo text-center text-md-left alignwide\" data-bi-aN=\"promo\" data-bi-id=\"1002645\">\n\t\t\n\n\t\t<p class=\"msr-promo__label text-gray-800 text-center text-uppercase\">\n\t\t<span class=\"px-4 bg-white display-inline-block font-weight-semibold small\">Spotlight: AI-POWERED EXPERIENCE<\/span>\n\t<\/p>\n\t\n\t<div class=\"row pt-3 pb-4 align-items-center\">\n\t\t\t\t\t\t<div class=\"msr-promo__media col-12 col-md-5\">\n\t\t\t\t<a class=\"bg-gray-300\" href=\"https:\/\/aka.ms\/research-copilot\/?OCID=msr_researchforum_Copilot_MCR_Blog_Promo\" aria-label=\"Microsoft research copilot experience\" data-bi-cN=\"Microsoft research copilot experience\" target=\"_blank\">\n\t\t\t\t\t<img decoding=\"async\" class=\"w-100 display-block\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/01\/MSR-Chat-Promo.png\" alt=\"\" \/>\n\t\t\t\t<\/a>\n\t\t\t<\/div>\n\t\t\t\n\t\t\t<div class=\"msr-promo__content p-3 px-5 col-12 col-md\">\n\n\t\t\t\t\t\t\t\t\t<h2 class=\"h4\">Microsoft research copilot experience<\/h2>\n\t\t\t\t\n\t\t\t\t\t\t\t\t<p class=\"large\">Discover more about research at Microsoft through our AI-powered experience<\/p>\n\t\t\t\t\n\t\t\t\t\t\t\t\t<div class=\"wp-block-buttons justify-content-center justify-content-md-start\">\n\t\t\t\t\t<div class=\"wp-block-button\">\n\t\t\t\t\t\t<a href=\"https:\/\/aka.ms\/research-copilot\/?OCID=msr_researchforum_Copilot_MCR_Blog_Promo\" class=\"btn btn-brand glyph-append glyph-append-chevron-right\" aria-label=\"Microsoft research copilot experience\" data-bi-cN=\"Microsoft research copilot experience\" target=\"_blank\">\n\t\t\t\t\t\t\tStart now\t\t\t\t\t\t<\/a>\n\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t\t\t<\/div><!--\/.msr-promo__content-->\n\t<\/div><!--\/.msr-promo__inner-wrap-->\n\t<\/div><!--\/.msr-promo-->\n\t\n\n\n<p>Notably, SpaceEvo&#8217;s lightweight design makes it ideal for practical applications, requiring only 25 GPU hours to create a hardware-specific solution and making it a cost-effective choice for hardware-aware NAS. This specialized search space, with hardware-preferred operators and configurations, enables the exploration of larger, more efficient models with low INT8 latency. Figure 1 demonstrates that our search space consistently outperforms existing alternatives in INT8 model quality. Conducting neural architecture searches within this hardware-friendly space yields models that set new INT8 accuracy benchmarks.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><a data-bi-bhvr=\"14\"  data-bi-cn=\"Figure1: The image displays 4 sub-figures, each illustrating model accuracy error distribution when sampling models within INT8 quantized latency at 10 ms on a VNNI CPU, 15 ms on a VNNI CPU, 10 ms on a Pixel 4 CPU, and 20ms on a Pixel CPU for various Search Spaces. Each sub-figure contains 4 \u2013 5 curves, representing model accuracy error distributions from our search space, ProxylessNAS search space, MobileNetv3 search space, ResNet search space, and AttentiveNAS search space.  Our search space consistently delivers superior INT8 model populations, outperforming state-of-the-art alternatives under varying hardware and latency constraints. \" href=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/09\/figure1.png\"><img loading=\"lazy\" decoding=\"async\" width=\"4397\" height=\"1033\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/09\/figure1.png\" alt=\"Figure1: The image displays 4 sub-figures, each illustrating model accuracy error distribution when sampling models within INT8 quantized latency at 10 ms on a VNNI CPU, 15 ms on a VNNI CPU, 10 ms on a Pixel 4 CPU, and 20ms on a Pixel CPU for various Search Spaces. Each sub-figure contains 4 \u2013 5 curves, representing model accuracy error distributions from our search space, ProxylessNAS search space, MobileNetv3 search space, ResNet search space, and AttentiveNAS search space.  Our search space consistently delivers superior INT8 model populations, outperforming state-of-the-art alternatives under varying hardware and latency constraints. \" class=\"wp-image-971418\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/09\/figure1.png 4397w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/09\/figure1-300x70.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/09\/figure1-1024x241.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/09\/figure1-768x180.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/09\/figure1-1536x361.png 1536w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/09\/figure1-2048x481.png 2048w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/09\/figure1-240x56.png 240w\" sizes=\"auto, (max-width: 4397px) 100vw, 4397px\" \/><\/a><figcaption class=\"wp-element-caption\">Figure 1. Error distribution of INT8 quantized models across various NAS search spaces. Our search space consistently outperforms state-of-the-art alternatives in INT8 model quality.<\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"on-device-quantization-latency-analysis\">On-device quantization latency analysis<\/h2>\n\n\n\n<p>We began our investigation by trying to understand INT8 quantized latency factors and their implications for search space design. We conducted our study on two widely used devices: an Intel CPU with VNNI instructions and onnxruntime support, and a Pixel 4 phone CPU with TFLite 2.7.<\/p>\n\n\n\n<p>Our study revealed two critical findings:<\/p>\n\n\n\n<ol class=\"wp-block-list\" type=\"1\">\n<li>Both the choice of operator type and configurations, like channel width, significantly affect INT8 latency, illustrated in Figure 2. For instance, operators like Squeeze-and-Excitation and Hardswish, while enhancing accuracy with minimal latency, can lead to slower INT8 inference on Intel CPUs. This slowdown primarily arises from the added costs of data transformation between INT32 and INT8, which outweigh the latency reduction achieved through INT8 computation.<\/li>\n\n\n\n<li>Quantization efficiency varies among different devices, and preferred operator types can be contradictory.<\/li>\n<\/ol>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full is-resized\"><a data-bi-bhvr=\"14\"  data-bi-cn=\"Figure2: The image showcases a table (left) and a figure (right). The table on the left, labeled \"operator, intel CPU, pixel 4\", highlights the INT8 latency speedup in comparison to Float32 latency of various operators on two hardware types. The rows are categorized as: Conv, DWConv, SE, hardswish, and swish. The figure on the right depicts the INT8 quantized speedup of Conv1x1 across different channel numbers. It includes two curves, each signifying speedups under diverse channel numbers on Intel CPU and Pixel 4 hardware.  \" href=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/09\/figure2.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/09\/figure2.png\" alt=\"Figure2: The image showcases a table (left) and a figure (right). The table on the left, labeled \"operator, intel CPU, pixel 4\", highlights the INT8 latency speedup in comparison to Float32 latency of various operators on two hardware types. The rows are categorized as: Conv, DWConv, SE, hardswish, and swish. The figure on the right depicts the INT8 quantized speedup of Conv1x1 across different channel numbers. It includes two curves, each signifying speedups under diverse channel numbers on Intel CPU and Pixel 4 hardware.  \" class=\"wp-image-971421\" style=\"width:626px;height:260px\" width=\"626\" height=\"260\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/09\/figure2.png 2295w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/09\/figure2-300x125.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/09\/figure2-1024x425.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/09\/figure2-768x319.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/09\/figure2-1536x638.png 1536w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/09\/figure2-2048x850.png 2048w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/09\/figure2-240x100.png 240w\" sizes=\"auto, (max-width: 626px) 100vw, 626px\" \/><\/a><figcaption class=\"wp-element-caption\">Figure 2. Left: Selecting different operator types results in notably distinct quantized speed improvements. Right: Conv1x1 speed enhancements across various channel numbers.<\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"finding-diverse-efficient-quantized-models-with-spaceevo\">Finding diverse, efficient quantized models with SpaceEvo<\/h2>\n\n\n\n<p>Unlike traditional architecture search, which aims to find the best single model, our objective is to uncover a diverse population of billions of accurate and INT8 latency-friendly architectures within the search space.<\/p>\n\n\n\n<p>Drawing inspiration from neural architecture search, we introduced an evolutionary search algorithm to explore this quantization-friendly model population in SpaceEvo. Our approach incorporated three key techniques:<\/p>\n\n\n\n<ol class=\"wp-block-list\" type=\"1\">\n<li>The introduction of the Q-T score as a metric to measure the quantization-friendliness of a candidate search space, based on the INT8 accuracy-latency of top-tier subnets.<\/li>\n\n\n\n<li>Redesigned search algorithms that focus on exploring a collection of model populations (i.e., the search space) within the vast hyperspace, as illustrated in Figure 3. This is achieved through the &#8220;elastic stage,&#8221; which divides the search space into a sequence of elastic stages, allowing traditional evolution methods like aging evolution to explore effectively.<\/li>\n\n\n\n<li>A block-wise search space quantization scheme to reduce the training costs associated with exploring a search space that has a maximum Q-T score.<\/li>\n<\/ol>\n\n\n\n<p>After discovering the search space, we employed a two-stage NAS process to train a quantized-for-all supernet over the search space. This ensured that all candidate models could achieve comparable quantized accuracy without individual fine-tuning or quantization. We utilized evolutionary search and <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/github.com\/microsoft\/nn-Meter\" target=\"_blank\" rel=\"noreferrer noopener\">nn-Meter<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> for INT8 latency prediction to identify the best quantized models under various INT8 latency constraints. Figure 3 shows the overall design process.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full is-resized\"><a data-bi-bhvr=\"14\"  data-bi-cn=\"Figure3: The image depicts a flowchart that outlines the complete SpaceEvo process and its application for NAS. Starting with a large hyperspace, an evolution search algorithm explores a candidate search space. A quality estimator then assesses its quality score based on INT8 latency and accuracy. This score is used as a reward for the algorithm, guiding further exploration until a suitable search space is found. A quantized-for-all supernet is then trained over this space, enabling hardware-aware NAS for deploying models within various INT8 latency constraints. \" href=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/09\/figure3.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/09\/figure3.png\" alt=\"Figure3: The image depicts a flowchart that outlines the complete SpaceEvo process and its application for NAS. Starting with a large hyperspace, an evolution search algorithm explores a candidate search space. A quality estimator then assesses its quality score based on INT8 latency and accuracy. This score is used as a reward for the algorithm, guiding further exploration until a suitable search space is found. A quantized-for-all supernet is then trained over this space, enabling hardware-aware NAS for deploying models within various INT8 latency constraints. \" class=\"wp-image-971424\" style=\"width:496px;height:256px\" width=\"496\" height=\"256\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/09\/figure3.png 1315w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/09\/figure3-300x155.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/09\/figure3-1024x529.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/09\/figure3-768x397.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/09\/figure3-240x124.png 240w\" sizes=\"auto, (max-width: 496px) 100vw, 496px\" \/><\/a><figcaption class=\"wp-element-caption\">Figure 3: The complete SpaceEvo process and application for NAS<\/figcaption><\/figure>\n\n\n\n<p>Extensive experiments on two real-world edge devices and ImageNet demonstrated that our automatically designed search spaces significantly surpass manually designed search spaces. Table 1 showcases our discovered models, SEQnet, setting new benchmarks for INT8 quantized accuracy-latency tradeoffs.&nbsp;<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><tbody><tr><td class=\"has-text-align-center\" data-align=\"center\" colspan=\"6\"><strong>(a) Results on the Intel VNNI CPU with onnxruntime<\/strong><\/td><\/tr><tr><td class=\"has-text-align-center\" data-align=\"center\" rowspan=\"2\">Model<\/td><td class=\"has-text-align-center\" data-align=\"center\">Top-1 Acc %<\/td><td class=\"has-text-align-center\" data-align=\"center\" colspan=\"2\">Latency<\/td><td class=\"has-text-align-center\" data-align=\"center\">Top-1 Acc %<\/td><td class=\"has-text-align-center\" data-align=\"center\" rowspan=\"2\">FLOPs<\/td><\/tr><tr><td class=\"has-text-align-center\" data-align=\"center\">INT8<\/td><td class=\"has-text-align-center\" data-align=\"center\">INT8<\/td><td class=\"has-text-align-center\" data-align=\"center\">Speedup<\/td><td class=\"has-text-align-center\" data-align=\"center\">FP32<\/td><\/tr><tr><td class=\"has-text-align-center\" data-align=\"center\">MobileNetV3Small<\/td><td class=\"has-text-align-center\" data-align=\"center\">66.3<\/td><td class=\"has-text-align-center\" data-align=\"center\">4.4 ms<\/td><td class=\"has-text-align-center\" data-align=\"center\">1.1x<\/td><td class=\"has-text-align-center\" data-align=\"center\">67.4<\/td><td class=\"has-text-align-center\" data-align=\"center\">56M<\/td><\/tr><tr><td class=\"has-text-align-center\" data-align=\"center\"><strong>SEQnet@cpu-A0<\/strong><\/td><td class=\"has-text-align-center\" data-align=\"center\"><strong>74.7<\/strong><\/td><td class=\"has-text-align-center\" data-align=\"center\"><strong>4.4 ms<\/strong><\/td><td class=\"has-text-align-center\" data-align=\"center\"><strong>2.0x<\/strong><\/td><td class=\"has-text-align-center\" data-align=\"center\"><strong>74.8<\/strong><\/td><td class=\"has-text-align-center\" data-align=\"center\">163M<\/td><\/tr><tr><td class=\"has-text-align-center\" data-align=\"center\">MobileNetV3Large<\/td><td class=\"has-text-align-center\" data-align=\"center\">74.5<\/td><td class=\"has-text-align-center\" data-align=\"center\">10.3 ms<\/td><td class=\"has-text-align-center\" data-align=\"center\">1.5x<\/td><td class=\"has-text-align-center\" data-align=\"center\">75.2<\/td><td class=\"has-text-align-center\" data-align=\"center\">219M<\/td><\/tr><tr><td class=\"has-text-align-center\" data-align=\"center\"><strong>SEQnet@cpu-A1<\/strong><\/td><td class=\"has-text-align-center\" data-align=\"center\"><strong>77.4<\/strong><\/td><td class=\"has-text-align-center\" data-align=\"center\"><strong>8.8 ms<\/strong><\/td><td class=\"has-text-align-center\" data-align=\"center\"><strong>2.4x<\/strong><\/td><td class=\"has-text-align-center\" data-align=\"center\"><strong>77.5<\/strong><\/td><td class=\"has-text-align-center\" data-align=\"center\">358M<\/td><\/tr><tr><td class=\"has-text-align-center\" data-align=\"center\">FBNetV3-A<\/td><td class=\"has-text-align-center\" data-align=\"center\">78.2<\/td><td class=\"has-text-align-center\" data-align=\"center\">27.7 ms<\/td><td class=\"has-text-align-center\" data-align=\"center\">1.3x<\/td><td class=\"has-text-align-center\" data-align=\"center\">79.1<\/td><td class=\"has-text-align-center\" data-align=\"center\">357M<\/td><\/tr><tr><td class=\"has-text-align-center\" data-align=\"center\"><strong>SEQnet@cpu-A4<\/strong><\/td><td class=\"has-text-align-center\" data-align=\"center\"><strong>80.0<\/strong><\/td><td class=\"has-text-align-center\" data-align=\"center\"><strong>24.4 ms<\/strong><\/td><td class=\"has-text-align-center\" data-align=\"center\"><strong>2.4x<\/strong><\/td><td class=\"has-text-align-center\" data-align=\"center\"><strong>80.1<\/strong><\/td><td class=\"has-text-align-center\" data-align=\"center\">1267M<\/td><\/tr><tr><td class=\"has-text-align-center\" data-align=\"center\" colspan=\"6\"><strong>(b) Results on the Google Pixel 4 with TFLite<\/strong><\/td><\/tr><tr><td class=\"has-text-align-center\" data-align=\"center\">MobileNetV3Small<\/td><td class=\"has-text-align-center\" data-align=\"center\">66.3<\/td><td class=\"has-text-align-center\" data-align=\"center\">6.4 ms<\/td><td class=\"has-text-align-center\" data-align=\"center\">1.3x<\/td><td class=\"has-text-align-center\" data-align=\"center\">67.4<\/td><td class=\"has-text-align-center\" data-align=\"center\">56M<\/td><\/tr><tr><td class=\"has-text-align-center\" data-align=\"center\"><strong>SEQnet@pixel4-A0<\/strong><\/td><td class=\"has-text-align-center\" data-align=\"center\"><strong>73.6<\/strong><\/td><td class=\"has-text-align-center\" data-align=\"center\"><strong>5.9 ms<\/strong><\/td><td class=\"has-text-align-center\" data-align=\"center\"><strong>2.1x<\/strong><\/td><td class=\"has-text-align-center\" data-align=\"center\"><strong>73.7<\/strong><\/td><td class=\"has-text-align-center\" data-align=\"center\">107M<\/td><\/tr><tr><td class=\"has-text-align-center\" data-align=\"center\">MobileNetV3Large<\/td><td class=\"has-text-align-center\" data-align=\"center\">74.5<\/td><td class=\"has-text-align-center\" data-align=\"center\">15.7 ms<\/td><td class=\"has-text-align-center\" data-align=\"center\">1.5x<\/td><td class=\"has-text-align-center\" data-align=\"center\">75.2<\/td><td class=\"has-text-align-center\" data-align=\"center\">219M<\/td><\/tr><tr><td class=\"has-text-align-center\" data-align=\"center\">EfficientNet-B0<\/td><td class=\"has-text-align-center\" data-align=\"center\">76.7<\/td><td class=\"has-text-align-center\" data-align=\"center\">36.4 ms<\/td><td class=\"has-text-align-center\" data-align=\"center\">1.7x<\/td><td class=\"has-text-align-center\" data-align=\"center\">77.3<\/td><td class=\"has-text-align-center\" data-align=\"center\">390M<\/td><\/tr><tr><td class=\"has-text-align-center\" data-align=\"center\"><strong>SEQnet@pixel4-A1<\/strong><\/td><td class=\"has-text-align-center\" data-align=\"center\"><strong>77.6<\/strong><\/td><td class=\"has-text-align-center\" data-align=\"center\"><strong>14.7 ms<\/strong><\/td><td class=\"has-text-align-center\" data-align=\"center\"><strong>2.2x<\/strong><\/td><td class=\"has-text-align-center\" data-align=\"center\"><strong>77.7<\/strong><\/td><td class=\"has-text-align-center\" data-align=\"center\">274M<\/td><\/tr><\/tbody><\/table><figcaption class=\"wp-element-caption\"><center>Table 1. Our automated search spaces outperformed manual ones in ImageNet results on two devices. Speedup: INT8 latency compared with FP32 inference.<\/center><\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"potential-for-sustainable-and-efficient-computing\">Potential for sustainable and efficient computing<\/h2>\n\n\n\n<p>SpaceEvo is the first attempt to address the hardware-friendly search space optimization challenge in NAS, paving the way for designing effective low-latency DNN models for diverse real-world edge devices. Looking ahead, the implications of SpaceEvo reach far beyond its initial achievements. Its potential extends to applications for other crucial deployment metrics, such as energy and memory consumption, enhancing the sustainability of edge computing solutions.<\/p>\n\n\n\n<p>We are exploring adapting these methods to support diverse model architectures like transformers, further expanding its role in evolving deep learning model design and efficient deployment.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>A persistent challenge in deep learning is optimizing neural network models for diverse hardware configurations, balancing performance and low latency. Learn how SpaceEvo automates hardware-aware neural architecture search to fine-tune DNN models for swift execution on diverse devices.<\/p>\n","protected":false},"author":42183,"featured_media":972249,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","footnotes":""},"categories":[1],"tags":[],"research-area":[13556,13547],"msr-region":[],"msr-event-type":[],"msr-locale":[268875],"msr-post-option":[],"msr-impact-theme":[],"msr-promo-type":[],"msr-podcast-series":[],"class_list":["post-971391","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-research-blog","msr-research-area-artificial-intelligence","msr-research-area-systems-and-networking","msr-locale-en_us"],"msr_event_details":{"start":"","end":"","location":""},"podcast_url":"","podcast_episode":"","msr_research_lab":[199560],"msr_impact_theme":[],"related-publications":[],"related-downloads":[],"related-videos":[],"related-academic-programs":[],"related-groups":[881388,920469],"related-projects":[],"related-events":[],"related-researchers":[{"type":"user_nicename","value":"Li Lyna Zhang","user_id":38121,"display_name":"Li Lyna Zhang","author_link":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/lzhani\/\" aria-label=\"Visit the profile page for Li Lyna Zhang\">Li Lyna Zhang<\/a>","is_active":false,"last_first":"Zhang, Li Lyna","people_section":0,"alias":"lzhani"},{"type":"user_nicename","value":"Jiahang Xu","user_id":41569,"display_name":"Jiahang Xu","author_link":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/jiahangxu\/\" aria-label=\"Visit the profile page for Jiahang Xu\">Jiahang Xu<\/a>","is_active":false,"last_first":"Xu, Jiahang","people_section":0,"alias":"jiahangxu"},{"type":"user_nicename","value":"Yuqing Yang","user_id":40654,"display_name":"Yuqing Yang","author_link":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/yuqyang\/\" aria-label=\"Visit the profile page for Yuqing Yang\">Yuqing Yang<\/a>","is_active":false,"last_first":"Yang, Yuqing","people_section":0,"alias":"yuqyang"},{"type":"user_nicename","value":"Ting Cao","user_id":37446,"display_name":"Ting Cao","author_link":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/ticao\/\" aria-label=\"Visit the profile page for Ting Cao\">Ting Cao<\/a>","is_active":false,"last_first":"Cao, Ting","people_section":0,"alias":"ticao"},{"type":"user_nicename","value":"Mao Yang","user_id":32798,"display_name":"Mao Yang","author_link":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/maoyang\/\" aria-label=\"Visit the profile page for Mao Yang\">Mao Yang<\/a>","is_active":false,"last_first":"Yang, Mao","people_section":0,"alias":"maoyang"}],"msr_type":"Post","featured_image_thumbnail":"<img width=\"960\" height=\"540\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/10\/ICCV-SpaceEVO-2023-BlogHeroFeature-1400x788-1-960x540.png\" class=\"img-object-cover\" alt=\"ICCV 2023: SpaceEvo\" decoding=\"async\" loading=\"lazy\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/10\/ICCV-SpaceEVO-2023-BlogHeroFeature-1400x788-1-960x540.png 960w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/10\/ICCV-SpaceEVO-2023-BlogHeroFeature-1400x788-1-300x169.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/10\/ICCV-SpaceEVO-2023-BlogHeroFeature-1400x788-1-1024x576.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/10\/ICCV-SpaceEVO-2023-BlogHeroFeature-1400x788-1-768x432.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/10\/ICCV-SpaceEVO-2023-BlogHeroFeature-1400x788-1-1066x600.png 1066w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/10\/ICCV-SpaceEVO-2023-BlogHeroFeature-1400x788-1-655x368.png 655w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/10\/ICCV-SpaceEVO-2023-BlogHeroFeature-1400x788-1-343x193.png 343w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/10\/ICCV-SpaceEVO-2023-BlogHeroFeature-1400x788-1-240x135.png 240w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/10\/ICCV-SpaceEVO-2023-BlogHeroFeature-1400x788-1-640x360.png 640w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/10\/ICCV-SpaceEVO-2023-BlogHeroFeature-1400x788-1-1280x720.png 1280w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/10\/ICCV-SpaceEVO-2023-BlogHeroFeature-1400x788-1.png 1400w\" sizes=\"auto, (max-width: 960px) 100vw, 960px\" \/>","byline":"","formattedDate":"October 6, 2023","formattedExcerpt":"A persistent challenge in deep learning is optimizing neural network models for diverse hardware configurations, balancing performance and low latency. Learn how SpaceEvo automates hardware-aware neural architecture search to fine-tune DNN models for swift execution on diverse devices.","locale":{"slug":"en_us","name":"English","native":"","english":"English"},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/971391","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/users\/42183"}],"replies":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/comments?post=971391"}],"version-history":[{"count":26,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/971391\/revisions"}],"predecessor-version":[{"id":972321,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/971391\/revisions\/972321"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media\/972249"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=971391"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/categories?post=971391"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/tags?post=971391"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=971391"},{"taxonomy":"msr-region","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-region?post=971391"},{"taxonomy":"msr-event-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event-type?post=971391"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=971391"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=971391"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=971391"},{"taxonomy":"msr-promo-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-promo-type?post=971391"},{"taxonomy":"msr-podcast-series","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-podcast-series?post=971391"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}