{"id":144711,"date":"2020-02-25T11:36:42","date_gmt":"2000-03-27T00:12:25","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/group\/internet-media\/"},"modified":"2022-07-05T20:40:04","modified_gmt":"2022-07-06T03:40:04","slug":"internet-media","status":"publish","type":"msr-group","link":"https:\/\/www.microsoft.com\/en-us\/research\/group\/internet-media\/","title":{"rendered":"Intelligent Multimedia Group"},"content":{"rendered":"
\n
\n

The Intelligent Multimedia (IM) group aims to build seamless yet efficient multimedia applications and services through breakthroughs in fundamental theory and innovations in algorithm and system technology. We address the problems of intelligent multimedia content sensing, processing, analysis, services, and the generic scalability issues of multimedia computing systems. Current research focus is on video analytics to support intelligent cloud and intelligent edge media services.\u00a0Current research interests include, but are not limited to, object detection, tracking, semantic segmentation, human pose estimation, people re-ID, action recognition, depth estimation, SLAM, scene understanding, multimodality analysis, etc.<\/p>\n<\/div>\n<\/div>\n

<\/div>\n

Areas of Focus:<\/strong><\/h2>\n
\n
Deep Video Analytics <\/strong><\/div>\n
\n

Video is the biggest big data that contains an enormous amount of information. We are leveraging computer vision and deep learning to develop both cloud-based and edge-based intelligence engines that can turn raw video data into insights to facilitate various applications and services. Target application scenarios include video augmented reality, smart home surveillance,\u00a0business (retail store, office) intelligence, public security, video storytelling and sharing, etc. We have taken a human centric approach where\u00a0a significant effort has been focused on understanding human, human attributes and human behaviors. Our research has\u00a0contributed to\u00a0a number of video APIs offered in Microsoft Cognitive Services (https:\/\/www.microsoft.com\/cognitive-services (opens in new tab)<\/span><\/a>), Azure Media Analytics Services, Windows Machine Learning, Office Media (Stream\/Teams), and Dynamics\/Connected Store.<\/p>\n

– Video API R&D, 3 technologies (intelligent motion detection, face detection\/tracking, face redaction), deployed in Microsoft Cognitive Services and Azure Media Services (2016)
\n\uf0a7
Announcing: Motion detection for Azure Media Analytics (opens in new tab)<\/span><\/a> (2016)
\n\uf0a7
Announcing face and emotion detection for Azure Media Analytics | Azure Blog and Updates | Microsoft Azure (opens in new tab)<\/span><\/a> (2016)
\n\uf0a7
Announcing Face Redaction for Azure Media Analytics | Azure Blog and Updates | Microsoft Azure (opens in new tab)<\/span><\/a> (2016)
\n\uf0a7
Redact faces with Azure Media Analytics | Microsoft Docs (opens in new tab)<\/span><\/a><\/p>\n

– Developed, released\/deployed human pose estimation (2019.5) and object tracking (2019.10) technologies as vision skills on the Windows Machine Learning platform.
\n\uf0a7
\u5fae\u8f6f\u53d1\u5e03Windows Vision Skills\u9884\u89c8\u7248\uff0c\u8f7b\u677e\u8c03\u7528\u8ba1\u7b97\u673a\u89c6\u89c9 (opens in new tab)<\/span><\/a>
\n\uf0a7
NuGet Gallery | Microsoft.AI.Skills.Vision.ObjectTrackerPreview 0.0.0.3 (opens in new tab)<\/span><\/a><\/p>\n

– Speech denoising technologies deployed in Microsoft Stream 1.0 (GA, 2020.6) and 2.0 (Internal Preview 2020.12)
\n\uf0a7
\u4ece\u5608\u6742\u89c6\u9891\u4e2d\u63d0\u53d6\u8d85\u6e05\u4eba\u58f0\uff0c\u8bed\u97f3\u589e\u5f3a\u6a21\u578bPHASEN\u5df2\u52a0\u5165\u5fae\u8f6f\u89c6\u9891\u670d\u52a1 (opens in new tab)<\/span><\/a><\/p>\n

– Multi object tracking (FairMOT), Multiview 3D pose estimation (VoxelPose), person re-ID technologies shipped to the Microsoft Dynamics\/Connected Store Product. (2020, and ongoing)
\n\uf0a7
\u4eceFairMOT\u5230VoxelPose\uff0c\u63ed\u79d8\u5fae\u8f6f\u4ee5\u201c\u4eba\u201d\u4e3a\u4e2d\u5fc3\u7684\u6700\u65b0\u89c6\u89c9\u7406\u89e3\u6210\u679c (opens in new tab)<\/span><\/a><\/p>\n

– Screen content understanding (element detection\/screen tree) technologies shipped to Microsoft\u2019s mobile robotic process automation (RPA) product (2020, and ongoing)<\/p>\n<\/div>\n

Open Source Projects: <\/strong><\/div>\n
.\u00a0Human Pose Estimation:\u00a0VoxelPose (opens in new tab)<\/span><\/a>
\n
Cross View Fusion for 3D Human Pose Estimation (opens in new tab)<\/span><\/a>
\n
Fusing Wearable IMUs with Multi-View Images for Human Pose Estimation: A Geometric Approach (opens in new tab)<\/span><\/a><\/div>\n
.\u00a0Object Tracking:\u00a0SA-Siam (opens in new tab)<\/span><\/a>
\n
SPM-Tracker (opens in new tab)<\/span><\/a> Siamese network based tracker (opens in new tab)<\/span><\/a> (a comprehensive PyTorch based toolbox that supports a series of Siamese-network-based tracking methods like SiamFC \/ SiamRPN \/ SPM)
\n
A Simple Baseline for One-Shot Multi-Object Tracking (opens in new tab)<\/span><\/a> (2.2K stars)<\/div>\n
. Re-Identification: Semantics-aligned representation learning for person re-identification (SAN) (opens in new tab)<\/span><\/a><\/div>\n
. Action Recognition: View adaptive neural networks (opens in new tab)<\/span><\/a> Semantics-Guided Neural Networks for Efficient Skeleton-Based Human Action Recognition (opens in new tab)<\/span><\/a><\/div>\n
. Domain Generalization\/Adaptation:\u00a0 Style Normalization and Restitution for Domain Generalization and Adaptation (opens in new tab)<\/span><\/a><\/div>\n
.<\/div>\n<\/div>\n
<\/div>\n
\n
Titanium (past project) <\/strong><\/div>\n
<\/div>\n
\n

Project Titanium aims at bringing new computing experiences through enriched cloud-client computing. While data and programs can be provided as services from the cloud, the screen, referring to the entire collection of data involved in user interface, constitutes the missing third dimension. Titanium will address the problems of adaptive screen composition, representation, and processing, following the roadmap of Titanium Screen, Titanium Remote, Titanium Live, and Titanium Cloud. As \u201cTitanium\u201d suggests, it will provide a light-weight yet efficient solution towards ultimate computing experiences in the cloud plus service era.<\/p>\n<\/div>\n

<\/div>\n
Mira (past project) <\/strong><\/div>\n
<\/div>\n
\n

Project Mira aims at enabling multimedia representation and processing towards perceptual quality rather than pixel-wise fidelity through a joint effort of signal processing, computer vision, and machine learning. In particular, it seeks to build systems not only incorporating this newly developed vision and learning technologies into compression but also inspiring new vision technologies by looking at the problem from the view of signal processing. By bridging vision and signal processing, this project is expected to offer a fresh frame of mind to multimedia representation and processing.<\/p>\n<\/div>\n

<\/div>\n\t\t\t
\n\t\t\t\t\t\t\t
\n\t\t\t\t\t\n\t\t\t\t\t\t