{"id":941124,"date":"2023-05-18T10:00:00","date_gmt":"2023-05-18T17:00:00","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?p=941124"},"modified":"2023-06-27T08:56:12","modified_gmt":"2023-06-27T15:56:12","slug":"react-a-synergistic-cloud-edge-fusion-architecture","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/react-a-synergistic-cloud-edge-fusion-architecture\/","title":{"rendered":"REACT \u2014 A synergistic cloud-edge fusion architecture"},"content":{"rendered":"\n

This research paper was accepted by the eighth ACM\/IEEE Conference on Internet of Things Design and Implementation (opens in new tab)<\/span><\/a> (IoTDI), which is a premier venue on IoT. The paper describes a framework that leverages cloud resources to execute large deep neural network (DNN) models with higher accuracy to improve the accuracy of models running on edge devices.<\/em><\/p>\n\n\n\n

\"iotdi<\/figure>\n\n\n\n

Leveraging the cloud and edge concurrently<\/h2>\n\n\n\n

The internet is evolving towards an edge-computing architecture to support latency-sensitive DNN workloads in the emerging Internet of Things and mobile computing applications domains. However, unlike cloud environments, the edge has limited computing resources and cannot run large, high accuracy DNN models (opens in new tab)<\/span><\/a>. As a result, past work has focused on offloading <\/em>some of the computation to the cloud to get around this limitation. However, this comes at the cost of increased latency.<\/p>\n\n\n\n

For example, in edge video analytics use cases, such as road traffic monitoring, drone surveillance, and driver assist technology, one can transmit occasional frames to the cloud to perform object detection\u2014a task ideally suited to models hosted on powerful GPUs. On the other hand, the edge handles the interpolating intermediate frames through object tracking\u2014a relatively inexpensive computational task performed using general-purpose CPUs, a low-powered edge GPU, or other edge accelerators (e.g., Intel Movidius Neural Stick). However, for most real-time applications, processing data in the cloud is infeasible due to strict latency constraints.<\/p>\n\n\n\n

<\/div>\n\n\n\n\t
\n\t\t\n\n\t\t

\n\t\tSpotlight: AI-POWERED EXPERIENCE<\/span>\n\t<\/p>\n\t\n\t

\n\t\t\t\t\t\t
\n\t\t\t\t\n\t\t\t\t\t\"\"\n\t\t\t\t<\/a>\n\t\t\t<\/div>\n\t\t\t\n\t\t\t
\n\n\t\t\t\t\t\t\t\t\t

Microsoft research copilot experience<\/h2>\n\t\t\t\t\n\t\t\t\t\t\t\t\t

Discover more about research at Microsoft through our AI-powered experience<\/p>\n\t\t\t\t\n\t\t\t\t\t\t\t\t

\n\t\t\t\t\t
\n\t\t\t\t\t\t\n\t\t\t\t\t\t\tStart now\t\t\t\t\t\t<\/a>\n\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t\t\t<\/div>\n\t<\/div>\n\t<\/div>\n\t\n\n\n

In our research paper, REACT: Streaming Video Analytics On The Edge With Asynchronous Cloud Support<\/a>, we propose and demonstrate a novel architecture that leverages both the edge and the cloud concurrently<\/em> to perform redundant computations<\/em> at both ends. This helps retain the low latency of the edge while boosting accuracy with the power of the cloud. Our key technical contribution is in fusing the cloud inputs, which are received asynchronously, into the stream of computation at the edge, thereby improving the quality of detection without sacrificing latency.<\/p>\n\n\n\n

Fusing edge and cloud detections<\/h2>\n\n\n\n
\n
\n
\"Figure<\/a>
Figure 1(a): Orange and green boxes indicate detection from edge and cloud. Tracking performance degrades with every frame, indicated by the fading shades of blue.<\/figcaption><\/figure>\n<\/div>\n\n\n\n
\n
\"Figure<\/a>
Figure 1(b): REACT uses asynchronous cloud detections to correct the box labels and detect more objects.<\/figcaption><\/figure>\n<\/div>\n<\/div>\n\n\n\n

We illustrate our fusion approach in REACT for object detection in videos. Figure 1 shows the result of object detection using a lightweight edge model. This suffers from both missed objects (e.g., cars in Frame 1 are not detected) and misclassified objects (e.g., the van on the right of the frame that has been misclassified as a car).<\/p>\n\n\n\n

To address the challenges of limited edge computation capacity and the drop in accuracy from using edge models, we follow a two-pronged approach. First, since the sequence of video frames are spatiotemporally correlated, it suffices to call edge object detection only once every few frames. As illustrated in Figure 1(a), edge detection runs every fifth frame. As shown in the figure, to interpose the intermediate frames, we employ a comparatively lightweight operation of object tracking. Second, to improve the accuracy of inference, select frames are asynchronously transmitted to the cloud for inference. Depending on network delay and the availability of cloud resources, cloud detections reach the edge device only after a few frames. Next, the newer cloud detections\u2014previously undetected\u2014are merged with the current frame. To do this, we feed the cloud detection, which was made on an old frame, into another instance of the object tracker to \u201cfast forward\u201d to the current time. The newly detected objects can then be merged into the current frame so long as the scene does not change abruptly. Figure 1(b) shows a visual result of our approach on a dashcam video dataset.<\/p>\n\n\n\n

Here\u2019s a more detailed description of how REACT goes about combining the edge and the cloud detections. Each detection contains objects represented by a \u27e8class_label, bounding_box, confidence_score\u27e9 tuple. Whenever we receive a new detection (either edge or cloud), we purge from the current list the objects that were previously obtained from the same detection source (either cloud or edge). Then we form a zero matrix of size (c, n).<\/em> Here, c<\/em> and n <\/em>are the indices associated with detections from current list and new source, respectively. We populate the matrix cell with the Intersection over Union (IoU) values\u2014if it is greater than 0.5\u2014corresponding to specific current and new detections. We then perform a linear sum assignment, which matches two objects with the maximum overlap. For overlapped objects, we modify the confidence values, bounding box, and class label based on the new detections\u2019 source. Specifically, our analysis reveals that edge detection models could correctly localize objects, but often had false positives, i.e., they assigned class labels incorrectly. In contrast, cloud detections have higher localization error but lower error for class labels. Finally, newer objects (unmatched ones) will then get added to the list of current objects with the returned confidence values, bounding boxes, and class labels. Thus, REACT\u2019s fusion algorithm must consider multiple cases \u2014such as misaligned bounding boxes, class label mismatch, etc. \u2014 to consolidate the edge and cloud detections into a single list. <\/p>\n\n\n\n

Detector<\/th>Backbone<\/th>Where<\/th>#params<\/th><\/tr><\/thead>
Faster R-CNN<\/td>ResNet50-FPN<\/td>Cloud<\/td>41.5M<\/td><\/tr>
RetinaNet<\/td>ResNet50-FPN<\/td>Cloud<\/td>36.1M<\/td><\/tr>
CenterNet<\/td>DLA34<\/td>Cloud<\/td>20.1M<\/td><\/tr>
TinyYOLOv3<\/td>DN19<\/td>Edge<\/td>8.7M<\/td><\/tr>
SSD<\/td>MobileNetV2<\/td>Edge<\/td>3.4M<\/td><\/tr><\/tbody><\/table>
Table 1: Models used in our evaluation<\/center><\/figcaption><\/figure>\n\n\n\n

In our experimentation, we leveraged state-of-the-art computer vision algorithms for getting object detections at the edge and the cloud (see Table 1). Further, we use mAP@0.5 (mean average precision at 0.5 IoU)<\/em>, a metric popular in the computer vision community to measure the performance of object detections. Moreover, to evaluate the efficacy of REACT, we looked at two datasets:<\/p>\n\n\n\n

    \n
  1. VisDrone<\/strong> (opens in new tab)<\/span><\/a>:<\/strong> as drone-based surveillance<\/li>\n\n\n\n
  2. D2City<\/strong> (opens in new tab)<\/span><\/a>:<\/strong> dashcam-based driver assist<\/li>\n<\/ol>\n\n\n\n

    Based on our evaluation, we observed that REACT outperforms baseline algorithms by as much as 50%. Also, we noted that edge and cloud models can complement each other, and overall performance improves due to our edge-cloud fusion algorithm.<\/p>\n\n\n\n

    As already noted, the object detector runs only once every few frames and a lightweight object tracking is performed on intermediate frames. Running detection redundantly at both the edge and the cloud allows an application developer to flexibly trade off the frequency of edge versus cloud executions while achieving the same accuracy, as shown in Figure 2. For example, if the edge device experiences thermal throttling, we can pick a lower edge detection frequency (say, once every 20 frames) and complement it with cloud detection once every 30 frames to get mAP@0.5 of around 22.8. However, if there are fewer constraints at the edge, we can increase the edge detection frequency to once every five frames and reduce cloud detections to once every 120 frames to get similar performance (mAP@0.5 of 22.7). This provides a playground for fine-grained programmatic control.<\/p>\n\n\n\n

    \"The
    Figure 2: mAP@0.5 values for varying cloud and edge detection frequency on the D2-City dataset. Similar shading corresponds to similar mAP@0.5.<\/figcaption><\/figure>\n\n\n\n

    Further, one can amortize the cost of using cloud resources over multiple edge devices by having these share the same cloud hosted model. Specifically, if an application can tolerate a median latency of up to 500 ms, we can support over 60 concurrent devices at a time using the V100 GPU (Figure 3).<\/p>\n\n\n\n

    \"A
    Figure 3: 50th<\/sup> percentile response time vs number of edge devices that concurrently share a cloud GPU<\/figcaption><\/figure>\n\n\n\n

    Conclusion<\/h2>\n\n\n\n

    REACT represents a new paradigm of edge + cloud computing that leverages the resources of each to improve accuracy without sacrificing latency. As we have shown above, the choice between offloading and on-device inference is not binary, and redundant execution at cloud and edge locations complement each other when carefully employed. While we have focused on object detection, we believe that this approach could be employed in other contexts such as human pose-estimation, instance and semantic segmentation applications to have the \u201cbest of both worlds.\u201d<\/p>\n","protected":false},"excerpt":{"rendered":"

    This research paper was accepted by the eighth ACM\/IEEE Conference on Internet of Things Design and Implementation (opens in new tab) (IoTDI), which is a premier venue on IoT. The paper describes a framework that leverages cloud resources to execute large deep neural network (DNN) models with higher accuracy to improve the accuracy of models running […]<\/p>\n","protected":false},"author":42183,"featured_media":941151,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","footnotes":""},"categories":[1],"tags":[],"research-area":[13556,13547],"msr-region":[],"msr-event-type":[],"msr-locale":[268875],"msr-post-option":[243984],"msr-impact-theme":[],"msr-promo-type":[],"msr-podcast-series":[],"class_list":["post-941124","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-research-blog","msr-research-area-artificial-intelligence","msr-research-area-systems-and-networking","msr-locale-en_us","msr-post-option-blog-homepage-featured"],"msr_event_details":{"start":"","end":"","location":""},"podcast_url":"","podcast_episode":"","msr_research_lab":[199562],"msr_impact_theme":[],"related-publications":[],"related-downloads":[],"related-videos":[],"related-academic-programs":[],"related-groups":[144725],"related-projects":[],"related-events":[],"related-researchers":[{"type":"user_nicename","value":"Srinivasan Iyengar","user_id":41221,"display_name":"Srinivasan Iyengar","author_link":"Srinivasan Iyengar<\/a>","is_active":false,"last_first":"Iyengar, Srinivasan","people_section":0,"alias":"sriyengar"},{"type":"user_nicename","value":"Venkat Padmanabhan","user_id":33180,"display_name":"Venkat Padmanabhan","author_link":"Venkat Padmanabhan<\/a>","is_active":false,"last_first":"Padmanabhan, Venkat","people_section":0,"alias":"padmanab"}],"msr_type":"Post","featured_image_thumbnail":"\"iotdi","byline":"Srinivasan Iyengar<\/a> and Venkat Padmanabhan<\/a>","formattedDate":"May 18, 2023","formattedExcerpt":"This research paper was accepted by the eighth ACM\/IEEE Conference on Internet of Things Design and Implementation (opens in new tab) (IoTDI), which is a premier venue on IoT. The paper describes a framework that leverages cloud resources to execute large deep neural network (DNN) models…","locale":{"slug":"en_us","name":"English","native":"","english":"English"},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/941124"}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/users\/42183"}],"replies":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/comments?post=941124"}],"version-history":[{"count":14,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/941124\/revisions"}],"predecessor-version":[{"id":952452,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/941124\/revisions\/952452"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media\/941151"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=941124"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/categories?post=941124"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/tags?post=941124"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=941124"},{"taxonomy":"msr-region","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-region?post=941124"},{"taxonomy":"msr-event-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event-type?post=941124"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=941124"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=941124"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=941124"},{"taxonomy":"msr-promo-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-promo-type?post=941124"},{"taxonomy":"msr-podcast-series","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-podcast-series?post=941124"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}