{"id":971391,"date":"2023-10-06T09:00:00","date_gmt":"2023-10-06T16:00:00","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/efficient-and-hardware-friendly-neural-architecture-search-with-spaceevo\/"},"modified":"2023-10-04T07:47:37","modified_gmt":"2023-10-04T14:47:37","slug":"efficient-and-hardware-friendly-neural-architecture-search-with-spaceevo","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/efficient-and-hardware-friendly-neural-architecture-search-with-spaceevo\/","title":{"rendered":"Efficient and hardware-friendly neural architecture search with SpaceEvo"},"content":{"rendered":"\n

This research paper was presented at the <\/em><\/strong>2023 IEEE\/CVF International Conference on Computer Vision<\/em><\/strong> (opens in new tab)<\/span><\/a> (ICCV), a premier academic conference for computer vision.<\/em><\/strong><\/p>\n\n\n\n

\"ICCV<\/figure>\n\n\n\n

In the field of deep learning, where breakthroughs like the models ResNet (opens in new tab)<\/span><\/a> and BERT (opens in new tab)<\/span><\/a> have achieved remarkable success, a key challenge remains: developing efficient deep neural network (DNN) models that both excel in performance and minimize latency across diverse devices. To address this, researchers have introduced hardware-aware neural architecture search (NAS) to automate efficient model design for various hardware configurations. This approach involves a predefined search space, search algorithm, accuracy estimation, and hardware-specific cost prediction models.<\/p>\n\n\n\n

However, optimizing the search space itself has often been overlooked. Current efforts rely mainly on MobileNets-based search spaces designed to minimize latency on mobile CPUs. But manual designs may not always align with different hardware requirements, limiting their suitability for a diverse range of devices.<\/p>\n\n\n\n

In the paper, \u201cSpaceEvo: Hardware-Friendly Search Space Design for Efficient INT8 Inference (opens in new tab)<\/span><\/a>,” presented at ICCV 2023, (opens in new tab)<\/span><\/a> we introduce SpaceEvo, a novel method that automatically creates specialized search spaces optimized for efficient INT8 inference on specific hardware platforms. What sets SpaceEvo apart is its ability to perform this design process automatically, creating a search space tailored for hardware-specific, quantization-friendly NAS. <\/p>\n\n\n\n

<\/div>\n\n\n\n\t
\n\t\t\n\n\t\t

\n\t\tSpotlight: Blog post<\/span>\n\t<\/p>\n\t\n\t

\n\t\t\t\t\t\t
\n\t\t\t\t\n\t\t\t\t\t\"White\n\t\t\t\t<\/a>\n\t\t\t<\/div>\n\t\t\t\n\t\t\t
\n\n\t\t\t\t\t\t\t\t\t

Eureka: Evaluating and understanding progress in AI<\/h2>\n\t\t\t\t\n\t\t\t\t\t\t\t\t

How can we rigorously evaluate and understand state-of-the-art progress in AI? Eureka is an open-source framework for standardizing evaluations of large foundation models, beyond single-score reporting and rankings. Learn more about the extended findings.\u00a0<\/p>\n\t\t\t\t\n\t\t\t\t\t\t\t\t

\n\t\t\t\t\t
\n\t\t\t\t\t\t\n\t\t\t\t\t\t\tRead more\t\t\t\t\t\t<\/a>\n\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t\t\t<\/div>\n\t<\/div>\n\t<\/div>\n\t\n\n\n

Notably, SpaceEvo’s lightweight design makes it ideal for practical applications, requiring only 25 GPU hours to create a hardware-specific solution and making it a cost-effective choice for hardware-aware NAS. This specialized search space, with hardware-preferred operators and configurations, enables the exploration of larger, more efficient models with low INT8 latency. Figure 1 demonstrates that our search space consistently outperforms existing alternatives in INT8 model quality. Conducting neural architecture searches within this hardware-friendly space yields models that set new INT8 accuracy benchmarks.<\/p>\n\n\n\n

\"Figure1:<\/a>
Figure 1. Error distribution of INT8 quantized models across various NAS search spaces. Our search space consistently outperforms state-of-the-art alternatives in INT8 model quality.<\/figcaption><\/figure>\n\n\n\n

On-device quantization latency analysis<\/h2>\n\n\n\n

We began our investigation by trying to understand INT8 quantized latency factors and their implications for search space design. We conducted our study on two widely used devices: an Intel CPU with VNNI instructions and onnxruntime support, and a Pixel 4 phone CPU with TFLite 2.7.<\/p>\n\n\n\n

Our study revealed two critical findings:<\/p>\n\n\n\n

    \n
  1. Both the choice of operator type and configurations, like channel width, significantly affect INT8 latency, illustrated in Figure 2. For instance, operators like Squeeze-and-Excitation and Hardswish, while enhancing accuracy with minimal latency, can lead to slower INT8 inference on Intel CPUs. This slowdown primarily arises from the added costs of data transformation between INT32 and INT8, which outweigh the latency reduction achieved through INT8 computation.<\/li>\n\n\n\n
  2. Quantization efficiency varies among different devices, and preferred operator types can be contradictory.<\/li>\n<\/ol>\n\n\n\n
    \"Figure2:<\/a>
    Figure 2. Left: Selecting different operator types results in notably distinct quantized speed improvements. Right: Conv1x1 speed enhancements across various channel numbers.<\/figcaption><\/figure>\n\n\n\n

    Finding diverse, efficient quantized models with SpaceEvo<\/h2>\n\n\n\n

    Unlike traditional architecture search, which aims to find the best single model, our objective is to uncover a diverse population of billions of accurate and INT8 latency-friendly architectures within the search space.<\/p>\n\n\n\n

    Drawing inspiration from neural architecture search, we introduced an evolutionary search algorithm to explore this quantization-friendly model population in SpaceEvo. Our approach incorporated three key techniques:<\/p>\n\n\n\n

      \n
    1. The introduction of the Q-T score as a metric to measure the quantization-friendliness of a candidate search space, based on the INT8 accuracy-latency of top-tier subnets.<\/li>\n\n\n\n
    2. Redesigned search algorithms that focus on exploring a collection of model populations (i.e., the search space) within the vast hyperspace, as illustrated in Figure 3. This is achieved through the “elastic stage,” which divides the search space into a sequence of elastic stages, allowing traditional evolution methods like aging evolution to explore effectively.<\/li>\n\n\n\n
    3. A block-wise search space quantization scheme to reduce the training costs associated with exploring a search space that has a maximum Q-T score.<\/li>\n<\/ol>\n\n\n\n

      After discovering the search space, we employed a two-stage NAS process to train a quantized-for-all supernet over the search space. This ensured that all candidate models could achieve comparable quantized accuracy without individual fine-tuning or quantization. We utilized evolutionary search and nn-Meter (opens in new tab)<\/span><\/a> for INT8 latency prediction to identify the best quantized models under various INT8 latency constraints. Figure 3 shows the overall design process.<\/p>\n\n\n\n

      \"Figure3:<\/a>
      Figure 3: The complete SpaceEvo process and application for NAS<\/figcaption><\/figure>\n\n\n\n

      Extensive experiments on two real-world edge devices and ImageNet demonstrated that our automatically designed search spaces significantly surpass manually designed search spaces. Table 1 showcases our discovered models, SEQnet, setting new benchmarks for INT8 quantized accuracy-latency tradeoffs. <\/p>\n\n\n\n

      (a) Results on the Intel VNNI CPU with onnxruntime<\/strong><\/td><\/tr>
      Model<\/td>Top-1 Acc %<\/td>Latency<\/td>Top-1 Acc %<\/td>FLOPs<\/td><\/tr>
      INT8<\/td>INT8<\/td>Speedup<\/td>FP32<\/td><\/tr>
      MobileNetV3Small<\/td>66.3<\/td>4.4 ms<\/td>1.1x<\/td>67.4<\/td>56M<\/td><\/tr>
      SEQnet@cpu-A0<\/strong><\/td>74.7<\/strong><\/td>4.4 ms<\/strong><\/td>2.0x<\/strong><\/td>74.8<\/strong><\/td>163M<\/td><\/tr>
      MobileNetV3Large<\/td>74.5<\/td>10.3 ms<\/td>1.5x<\/td>75.2<\/td>219M<\/td><\/tr>
      SEQnet@cpu-A1<\/strong><\/td>77.4<\/strong><\/td>8.8 ms<\/strong><\/td>2.4x<\/strong><\/td>77.5<\/strong><\/td>358M<\/td><\/tr>
      FBNetV3-A<\/td>78.2<\/td>27.7 ms<\/td>1.3x<\/td>79.1<\/td>357M<\/td><\/tr>
      SEQnet@cpu-A4<\/strong><\/td>80.0<\/strong><\/td>24.4 ms<\/strong><\/td>2.4x<\/strong><\/td>80.1<\/strong><\/td>1267M<\/td><\/tr>
      (b) Results on the Google Pixel 4 with TFLite<\/strong><\/td><\/tr>
      MobileNetV3Small<\/td>66.3<\/td>6.4 ms<\/td>1.3x<\/td>67.4<\/td>56M<\/td><\/tr>
      SEQnet@pixel4-A0<\/strong><\/td>73.6<\/strong><\/td>5.9 ms<\/strong><\/td>2.1x<\/strong><\/td>73.7<\/strong><\/td>107M<\/td><\/tr>
      MobileNetV3Large<\/td>74.5<\/td>15.7 ms<\/td>1.5x<\/td>75.2<\/td>219M<\/td><\/tr>
      EfficientNet-B0<\/td>76.7<\/td>36.4 ms<\/td>1.7x<\/td>77.3<\/td>390M<\/td><\/tr>
      SEQnet@pixel4-A1<\/strong><\/td>77.6<\/strong><\/td>14.7 ms<\/strong><\/td>2.2x<\/strong><\/td>77.7<\/strong><\/td>274M<\/td><\/tr><\/tbody><\/table>
      Table 1. Our automated search spaces outperformed manual ones in ImageNet results on two devices. Speedup: INT8 latency compared with FP32 inference.<\/center><\/figcaption><\/figure>\n\n\n\n

      Potential for sustainable and efficient computing<\/h2>\n\n\n\n

      SpaceEvo is the first attempt to address the hardware-friendly search space optimization challenge in NAS, paving the way for designing effective low-latency DNN models for diverse real-world edge devices. Looking ahead, the implications of SpaceEvo reach far beyond its initial achievements. Its potential extends to applications for other crucial deployment metrics, such as energy and memory consumption, enhancing the sustainability of edge computing solutions.<\/p>\n\n\n\n

      We are exploring adapting these methods to support diverse model architectures like transformers, further expanding its role in evolving deep learning model design and efficient deployment.<\/p>\n","protected":false},"excerpt":{"rendered":"

      A persistent challenge in deep learning is optimizing neural network models for diverse hardware configurations, balancing performance and low latency. Learn how SpaceEvo automates hardware-aware neural architecture search to fine-tune DNN models for swift execution on diverse devices.<\/p>\n","protected":false},"author":42183,"featured_media":972249,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","footnotes":""},"categories":[1],"tags":[],"research-area":[13556,13547],"msr-region":[],"msr-event-type":[],"msr-locale":[268875],"msr-post-option":[],"msr-impact-theme":[],"msr-promo-type":[],"msr-podcast-series":[],"class_list":["post-971391","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-research-blog","msr-research-area-artificial-intelligence","msr-research-area-systems-and-networking","msr-locale-en_us"],"msr_event_details":{"start":"","end":"","location":""},"podcast_url":"","podcast_episode":"","msr_research_lab":[199560],"msr_impact_theme":[],"related-publications":[],"related-downloads":[],"related-videos":[],"related-academic-programs":[],"related-groups":[881388,920469],"related-projects":[],"related-events":[],"related-researchers":[{"type":"user_nicename","value":"Li Lyna Zhang","user_id":38121,"display_name":"Li Lyna Zhang","author_link":"Li Lyna Zhang<\/a>","is_active":false,"last_first":"Zhang, Li Lyna","people_section":0,"alias":"lzhani"},{"type":"user_nicename","value":"Jiahang Xu","user_id":41569,"display_name":"Jiahang Xu","author_link":"Jiahang Xu<\/a>","is_active":false,"last_first":"Xu, Jiahang","people_section":0,"alias":"jiahangxu"},{"type":"user_nicename","value":"Yuqing Yang","user_id":40654,"display_name":"Yuqing Yang","author_link":"Yuqing Yang<\/a>","is_active":false,"last_first":"Yang, Yuqing","people_section":0,"alias":"yuqyang"},{"type":"user_nicename","value":"Ting Cao","user_id":37446,"display_name":"Ting Cao","author_link":"Ting Cao<\/a>","is_active":false,"last_first":"Cao, Ting","people_section":0,"alias":"ticao"},{"type":"user_nicename","value":"Mao Yang","user_id":32798,"display_name":"Mao Yang","author_link":"Mao Yang<\/a>","is_active":false,"last_first":"Yang, Mao","people_section":0,"alias":"maoyang"}],"msr_type":"Post","featured_image_thumbnail":"\"ICCV","byline":"","formattedDate":"October 6, 2023","formattedExcerpt":"A persistent challenge in deep learning is optimizing neural network models for diverse hardware configurations, balancing performance and low latency. Learn how SpaceEvo automates hardware-aware neural architecture search to fine-tune DNN models for swift execution on diverse devices.","locale":{"slug":"en_us","name":"English","native":"","english":"English"},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/971391"}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/users\/42183"}],"replies":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/comments?post=971391"}],"version-history":[{"count":26,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/971391\/revisions"}],"predecessor-version":[{"id":972321,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/971391\/revisions\/972321"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media\/972249"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=971391"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/categories?post=971391"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/tags?post=971391"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=971391"},{"taxonomy":"msr-region","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-region?post=971391"},{"taxonomy":"msr-event-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event-type?post=971391"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=971391"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=971391"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=971391"},{"taxonomy":"msr-promo-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-promo-type?post=971391"},{"taxonomy":"msr-podcast-series","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-podcast-series?post=971391"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}