{"id":664548,"date":"2020-07-20T09:54:40","date_gmt":"2020-07-20T16:54:40","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-group&#038;p=664548"},"modified":"2023-07-26T10:10:50","modified_gmt":"2023-07-26T17:10:50","slug":"cognitive-services-research","status":"publish","type":"msr-group","link":"https:\/\/www.microsoft.com\/en-us\/research\/group\/cognitive-services-research\/","title":{"rendered":"Azure Cognitive Services Research"},"content":{"rendered":"<section class=\"mb-3 moray-highlight\">\n\t<div class=\"card-img-overlay mx-lg-0\">\n\t\t<div class=\"card-background  has-background- card-background--full-bleed\">\n\t\t\t<img loading=\"lazy\" decoding=\"async\" width=\"1920\" height=\"720\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2015\/01\/MLOG.8.png\" class=\"attachment-full size-full\" alt=\"Microsoft at NeurIPS 2020\" style=\"\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2015\/01\/MLOG.8.png 1920w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2015\/01\/MLOG.8-300x113.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2015\/01\/MLOG.8-768x288.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2015\/01\/MLOG.8-1024x384.png 1024w\" sizes=\"(max-width: 1920px) 100vw, 1920px\" \/>\t\t<\/div>\n\t\t<!-- Foreground -->\n\t\t<div class=\"card-foreground d-flex mt-md-n5 my-lg-5 px-g px-lg-0\">\n\t\t\t<!-- Container -->\n\t\t\t<div class=\"container d-flex mt-md-n5 my-lg-5 align-self-center\">\n\t\t\t\t<!-- Card wrapper -->\n\t\t\t\t<div class=\"w-100 w-lg-col-5\">\n\t\t\t\t\t<!-- Card -->\n\t\t\t\t\t<div class=\"card material-md-card py-5 px-md-5\">\n\t\t\t\t\t\t<div class=\"card-body \">\n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\n<h1 class=\"wp-block-heading h2\" id=\"azure-cognitive-services-research\">Azure Cognitive Services Research<\/h1>\n\n\t\t\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t<\/div>\n\t\t<\/div>\n\t<\/div>\n<\/section>\n\n\n\n\n\n<p>The mission of the Azure Cognitive Services Research group (CSR) is to make fundamental contributions to advancing the state of the art of the most challenging problems in speech, language, and vision\u2014both within Microsoft and the external research community. The CSR includes <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/group\/azure-computer-vision-research\/\">Computer Vision<\/a>,&nbsp;<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/group\/knowledge-and-language\/\">Knowledge and Language<\/a>, and <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/group\/speech-research-team\/\">Speech<\/a> teams.<\/p>\n\n\n\n<p>We conduct cutting edge research in all aspects of spoken language processing and computer vision. This includes audio-visual fusion; visual-semantic reasoning; federated learning; speech recognition; speech enhancement; speaker recognition and diarization; machine reading comprehension; text summarization; multilingual language modeling; and related topics in natural language processing, understanding, and generation; as well as face forgery detection; object detection and segmentation; dense pose, head, and mask tracking, action recognition; image and video captioning; and other topics in image and real-time video understanding. We leverage large-scale GPU and CPU clusters as well as internal and public data sets to develop world-leading deep learning technologies for forward-looking topics such as audio-visual far-field meeting transcription, automatic meeting minutes generation, and multi-modal dialog systems. We publish our research on public benchmarks, such as our breakthrough human parity performances on the <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/blog\/microsoft-researchers-achieve-new-conversational-speech-recognition-milestone\/\">Switchboard conversational speech recognition task<\/a>, <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/www.tau-nlp.org\/csqa-leaderboard\">CommonenseQA<\/a> and <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/blog\/machine-reading-systems-are-becoming-more-conversational\/\">Stanford\u2019s Conversational Question Answering Challenge<\/a> (CoQA).<\/p>\n\n\n\n<p>In addition to expanding our scientific understanding of speech, language, and vision, our work finds outlets in Microsoft products such as <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/azure.microsoft.com\/en-us\/services\/cognitive-services\/#overview\" target=\"_blank\" rel=\"noopener noreferrer\">Azure Cognitive Services<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, HoloLens, Teams, Windows, Office, Bing, Cortana, Skype Translator, Xbox, and more.<\/p>\n\n\n\n<p>The Azure Cognitive Services Research group is managed by <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/nzeng\/\">Michael Zeng<\/a>.<\/p>\n\n\n\n\n\n<p>The <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/group\/knowledge-and-language\/\">Knowledge and Language Team<\/a> is part of the Azure AI Cognitive Services Research (CSR) group, focusing on cutting edge research and the development of the next generation framework for knowledge and natural language processing.<\/p>\n\n\n\n<p>We are working on problems including knowledge-boosted language modeling, knowledge extraction, knowledge graph, summarization, language understanding and generation. We conduct large-scale pre-training and domain-specific fine-tuning on internal and public data sets to develop state-of-the-art deep learning technologies for core knowledge and language problems in various real applications.<\/p>\n\n\n\n<p>Our work has resulted in multiple publications in top NLP conferences and first place submissions to the CommonsenseQA and FEVER leaderboards.<\/p>\n\n\n\n<p>Our recent work covers:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>How to simultaneously pre-train knowledge graph and language model<\/li>\n\n\n\n<li>Increase factual correctness of abstractive summaries via knowledge graph<\/li>\n\n\n\n<li>Summarize multi-party meeting transcripts<\/li>\n\n\n\n<li>Utilize positional bias in news articles for zero-shot summarization<\/li>\n<\/ul>\n\n\n\n\n\n<p><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/group\/azure-computer-vision-research\/\">Azure Computer Vision Research<\/a> (ACVR) group is part of the Azure AI Cognitive Services Research (CSR) group, focusing on cutting edge research in computer vision to advance the state of the art and develop the next generation framework for visual recognition. The problems that we are interested in include image classification; object detection and segmentation; motion analysis and object tracking; dense pose, head, and mask tracking, action recognition; image generation; real-time video understanding; visual representation learning; multi-modality representation learning; and unsupervised\/self-supervised\/contrastive learning. We leverage large-scale GPU and CPU clusters as well as internal and public data sets to develop world-leading deep learning technologies for core vision problems and generic visual representation that can be customized to a wide range of downstream tasks and real applications. The team also runs <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/project\/projectflorence\/\">Project Florence<\/a>, with a focus on developing universal backbones with shared representations for a wide spectrum of visual categories, aiming at accelerating Microsoft vision product shipping using state-of-the-art large-scale deep learning models.<\/p>\n\n\n\n\n\n<p class=\"x-hidden-focus\">The&nbsp;<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/group\/speech-research-team\/\">Speech Research Team<\/a>&nbsp;is part of the&nbsp;<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/group\/cognitive-services-research\/\">Azure AI Cognitive Services Research (CSR) group<\/a>&nbsp;and is responsible for fundamental advances in audio, speech, and spoken language processing technologies. We also work closely with engineering and product teams to bring the new technologies into Microsoft products.<\/p>\n\n\n\n<p class=\"x-hidden-focus\">We work on a wide range of speech processing problems, including speech enhancement, speech recognition, speaker diarization, multi-lingual speech recognition, spoken language understanding, end-to-end modeling and self-supervised learning. Our recent work covers the following topics.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Deep learning-based real-time speech enhancement<\/li>\n\n\n\n<li>Monaural and multi-channel speech separation for meeting transcription<\/li>\n\n\n\n<li>Ad hoc microphone arrays<\/li>\n\n\n\n<li>End-to-end modeling for speaker-attributed speech recognition<\/li>\n\n\n\n<li>Unified speech representation learning<\/li>\n\n\n\n<li>Speech-language pre-training<\/li>\n<\/ul>\n\n\n\n<p>The results of our work are delivered to Microsoft speech technologies and interwoven into various products. We also contributed to the development of new services, such as&nbsp;<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/cognitive-services\/speech-service\/conversation-transcription\">Conversation Transcription<\/a>&nbsp;of Azure Cognitive Services which is powering the transcription features of several Microsoft products. Our work resulted in the first place in the speaker diarization track of&nbsp;<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/www.robots.ox.ac.uk\/~vgg\/data\/voxceleb\/competition2020.html\">VoxSRC-20<\/a>&nbsp;(joint work with other Microsoft researchers) and the breakthrough human parity performance on the&nbsp;<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/blog\/microsoft-researchers-achieve-new-conversational-speech-recognition-milestone\/\" target=\"_blank\" rel=\"noreferrer noopener\">Switchboard conversational speech recognition task<\/a>.<\/p>\n\n\n\n<p>The former Speech and Dialog Research Group (SDRG) was merged with the Azure Computer Vision Group in 2020 to form the Cognitive Services Research Group.<\/p>\n\n\n\n\n\n<p>CSR organizes the Distinguished Talk Series to host discussions with leaders in academia and industry. If you\u2019re interested in giving a talk, please contact Chenguang Zhu (<a href=\"mailto:chezhu@microsoft.com\">chezhu@microsoft.com<\/a>).<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><tbody><tr><td>\n<h4>Presenter<\/h4>\n<\/td><td>\n<h4>Affiliation<\/h4>\n<\/td><td>\n<h4>Date<\/h4>\n<\/td><td> <h4>Title<\/h4> <\/td><\/tr><tr><td>Prof. Zhengyuan Zhou<\/td><td>NYU<\/td><td>TBD<\/td><td>Optimal No-Regret Learning in Repeated First-Price Auctions<\/td><\/tr><tr><td>Prof. Mohit Bansal<\/td><td>University of North Carolina at Chapel Hill<\/td><td>8\/31\/2023<\/td><td>TBD<\/td><\/tr><tr><td>Prof. Chen Sun<\/td><td>Brown University<\/td><td>7\/20\/2023<\/td><td>TBD<\/td><\/tr><tr><td>Prof. Haiyi Zhu<\/td><td>CMU<\/td><td>6\/29\/2023<\/td><td>Bridging AI and HCI: Incorporating Human Values into the Design and Use of AI Technologies<\/td><\/tr><tr><td>Prof. Mark Dredze<\/td><td>Johns Hopkins University<\/td><td>5\/18\/2023<\/td><td>LLMs and Health: Challenges and Opportunities<\/td><\/tr><tr><td>Prof. Dongyeop Kang <\/td><td>University of Minnesota<\/td><td>5\/11\/2023<\/td><td>Computational Modeling of Human Disagreements in the Era of Large Language Models<\/td><\/tr><tr><td>Prof. Graham Neubig<\/td><td>Carnegie Mellon University<\/td><td>4\/13\/2023<\/td><td>Learning to Explain and Explaining to Learn<\/td><\/tr><tr><td>Prof. Greg Durrett<\/td><td>UT Austin<\/td><td>3\/9\/2023<\/td><td>Information Synthesis in the Era of GPT-3<\/td><\/tr><tr><td>Prof. Chen-Yu Wei<\/td><td>University of Virginia<\/td><td>1\/19\/2023<\/td><td>Policy Optimization in Adversarial MDPs: Improved Exploration Bonus Design<\/td><\/tr><tr><td>Prof. Xiaoming Liu<\/td><td>Michigan State University<\/td><td>1\/12\/2023<\/td><td>Autonomous Sensing: from 3D Object Detection to Biometric Recognition<\/td><\/tr><tr><td>Prof. Shuran Song<\/td><td>Columbia University<\/td><td>11\/10\/2022<\/td><td>Learning Meets Gravity: Robots that Embrace Dynamics from Pixels<\/td><\/tr><tr><td>Prof. Abhishek Gupta<\/td><td>Ohio State University<\/td><td>10\/13\/2022<\/td><td>Communication-efficient Model Heterogeneous Federated Learning<\/td><\/tr><tr><td>Prof. Serena Yeung<\/td><td>Stanford<\/td><td>9\/22\/2022<\/td><td>Overcoming Data and Label Bottlenecks in Scene Understanding for Healthcare Applications<\/td><\/tr><tr><td>Prof. Lu Wang<\/td><td>University of Michigan<\/td><td>8\/25\/2022<\/td><td>Long Document Summarization with Efficient Attentions and Structures<\/td><\/tr><tr><td>Prof. Muhao Chen<\/td><td>University of Southern California<\/td><td>7\/21\/2022<\/td><td>Robust and Indirectly Supervised Information Extraction<\/td><\/tr><tr><td>Prof. Allyson Ettinger<\/td><td>University of Chicago<\/td><td>6\/16\/2022<\/td><td>&#8220;Understanding&#8221; and Prediction: Controlled Assessment of Meaning Sensitivity in Pre-trained Language Models<\/td><\/tr><tr><td>Prof. Han Zhao<\/td><td>UIUC<\/td><td>5\/5\/2022<\/td><td>Understanding Gradual Domain Adaptation: Improved Analysis, Optimal Path and Beyond<\/td><\/tr><tr><td>Prof. Weijie Su<\/td><td>UPenn<\/td><td>4\/7\/2022<\/td><td>Gaussian Differential Privacy and Edgeworth Accountant<\/td><\/tr><tr><td>Prof. Jason D. Lee<\/td><td>Princeton<\/td><td>3\/17\/2022<\/td><td>Provable Representation Learning in Deep Learning<\/td><\/tr><tr><td>Dr. Yi Tay<\/td><td>Google<\/td><td>2\/3\/2022<\/td><td>ExT5: Towards Extreme Multi-Task Scaling for Transfer Learning<\/td><\/tr><tr><td>Prof. Junjie Hu<\/td><td>Univ. of Wisconsin-Madison<\/td><td>1\/20\/2022<\/td><td>Multilingual NLP: Cross-lingual Transfer Learning and Applications<\/td><\/tr><tr><\/tr><tr><td>Prof. Tengyu Ma<\/td><td>Stanford<\/td><td>11\/17\/2021<\/td><td>Understanding Self-supervised Learning: Analysis and Robustness to Imbalance Dataset<\/td><\/tr><tr><td>Prof. Jiantao Jiao<\/td><td>UC Berkeley<\/td><td>10\/28\/2021<\/td><td>Near-optimal algorithms for Imitation Learning<\/td><\/tr><tr><td>Prof. Vered Shwartz<\/td><td>University of British Columbia<\/td><td>9\/23\/2021<\/td><td>Commonsense Knowledge and Reasoning in Natural Language<\/td><\/tr><tr><td>Dr. Jim Glass<\/td><td>MIT<\/td><td>7\/22\/2021<\/td><td>Recent Progress in Self-Supervised and Cross-Modal Speech Processing<\/td><\/tr><tr><td>Prof. Zhiting Hu<\/td><td>UCSD<\/td><td>6\/17\/2021<\/td><td>Text Generation with No (Good) Data: New Reinforcement Learning and Causal Frameworks<\/td><\/tr><tr><td>Prof. Nanyun Peng<\/td><td>UCLA<\/td><td>5\/27\/2021<\/td><td>Controllable Text Generation Beyond Auto-regressive Models<\/td><\/tr><tr><td>Prof. Ashton Anderson<\/td><td>University of Toronto<\/td><td>4\/09\/2021<\/td><td>The Cultural Structure of Online Platforms<\/td><\/tr><tr><td>Prof. Aditya Grover<\/td><td>Facebook AI Research\/UCLA<\/td><td>3\/18\/2021<\/td><td>Transformer Language Models as Universal Computation Engines<\/td><\/tr><tr><td>Prof. Diyi Yang<\/td><td>Georgia Tech<\/td><td>2\/18\/2021<\/td><td>Language Understanding in Social Context: Theory and Practice<\/td><\/tr><tr><td>Prof. Song Han<\/td><td>MIT<\/td><td>1\/21\/2021<\/td><td>Putting AI on a Diet: TinyML and Efficient Deep Learning<\/td><\/tr><tr><td>Prof. Tianqi Chen<\/td><td>Carnegie Mellon University<\/td><td>1\/15\/2021<\/td><td>Elements of Learning Systems<\/td><\/tr><tr><td>Prof. Xiang Ren<\/td><td>University of Southern California<\/td><td>12\/18\/2020<\/td><td>Label Efficient Learning with Human Explanations<\/td><\/tr><tr><td>Prof. Jiajun Wu<\/td><td>Stanford<\/td><td>11\/19\/2020<\/td><td>Neuro-Symbolic Visual Concept Learning<\/td><\/tr><tr><td>Prof. Fei Liu<\/td><td>University of Central Florida<\/td><td>10\/30\/2020<\/td><td>Toward Robust Abstractive Multi-Document Summarization and Information Consolidation<\/td><\/tr><tr><td>Prof. Vivian Yun-Nung Chen<\/td><td>National Taiwan University<\/td><td>10\/2\/2020<\/td><td>Are Your Dialogue Systems Robust and Scalable?<\/td><\/tr><tr><td>Prof. Meng Jiang<\/td><td>University of Notre Dame<\/td><td>9\/10\/2020<\/td><td>Scientific Knowledge Extraction: New Tasks and Methods<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n","protected":false},"excerpt":{"rendered":"<p>The mission of the Cognitive Services Research group (CSR) is to make fundamental contributions to advancing the state of the art of the most challenging problems in speech, language, and vision both within Microsoft and the external research community.<\/p>\n","protected":false},"featured_media":392255,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"msr_group_start":"","footnotes":""},"research-area":[13556,243062,13562,13545,13559],"msr-group-type":[243694],"msr-locale":[268875],"msr-impact-theme":[],"class_list":["post-664548","msr-group","type-msr-group","status-publish","has-post-thumbnail","hentry","msr-research-area-artificial-intelligence","msr-research-area-audio-acoustics","msr-research-area-computer-vision","msr-research-area-human-language-technologies","msr-research-area-social-sciences","msr-group-type-group","msr-locale-en_us"],"msr_group_start":"","msr_detailed_description":"","msr_further_details":"","msr_hero_images":[],"msr_research_lab":[],"related-researchers":[{"type":"user_nicename","display_name":"Faisal Ahmed","user_id":31810,"people_section":"Current members","alias":"fiahmed"},{"type":"guest","display_name":"Ehsan Azarnasab","user_id":693762,"people_section":"Current members","alias":""},{"type":"user_nicename","display_name":"Dongdong Chen","user_id":40198,"people_section":"Current members","alias":"dochen"},{"type":"user_nicename","display_name":"Yen-Chun Chen","user_id":39672,"people_section":"Current members","alias":"yenche"},{"type":"guest","display_name":"Yi-Ling Chen","user_id":658815,"people_section":"Current members","alias":""},{"type":"user_nicename","display_name":"Noel Codella","user_id":41635,"people_section":"Current members","alias":"ncodella"},{"type":"guest","display_name":"Xiyang Dai","user_id":658812,"people_section":"Current members","alias":""},{"type":"user_nicename","display_name":"Sefik Emre Eskimez","user_id":38655,"people_section":"Current members","alias":"seeskime"},{"type":"guest","display_name":"Mei Gao","user_id":658818,"people_section":"Current members","alias":""},{"type":"user_nicename","display_name":"Robert Gmyr","user_id":38487,"people_section":"Current members","alias":"rogmyr"},{"type":"user_nicename","display_name":"Junheng Hao","user_id":42366,"people_section":"Current members","alias":"junhenghao"},{"type":"guest","display_name":"Bin (Leo) Hsiao","user_id":658809,"people_section":"Current members","alias":""},{"type":"user_nicename","display_name":"Naoyuki Kanda","user_id":38661,"people_section":"Current members","alias":"nakanda"},{"type":"user_nicename","display_name":"Mahmoud Khademi","user_id":42297,"people_section":"Current members","alias":"mkhademi"},{"type":"guest","display_name":"Canrun Li","user_id":583888,"people_section":"Current members","alias":""},{"type":"guest","display_name":"Linjie Li","user_id":786499,"people_section":"Current members","alias":""},{"type":"guest","display_name":"Lin Liang","user_id":694149,"people_section":"Current members","alias":""},{"type":"user_nicename","display_name":"Kevin Lin","user_id":39694,"people_section":"Current members","alias":"keli"},{"type":"user_nicename","display_name":"Yang Liu","user_id":39594,"people_section":"Current members","alias":"yaliu10"},{"type":"user_nicename","display_name":"Mengchen Liu","user_id":40213,"people_section":"Current members","alias":"mengcliu"},{"type":"user_nicename","display_name":"Yao Qian","user_id":34976,"people_section":"Current members","alias":"yaoqian"},{"type":"user_nicename","display_name":"Hiteshi Sharma","user_id":40276,"people_section":"Current members","alias":"hitshar"},{"type":"guest","display_name":"Manthan Thakker","user_id":654609,"people_section":"Current members","alias":""},{"type":"user_nicename","display_name":"Dongmei Wang","user_id":38490,"people_section":"Current members","alias":"dowan"},{"type":"user_nicename","display_name":"Xiaofei Wang","user_id":38658,"people_section":"Current members","alias":"xiaofewa"},{"type":"user_nicename","display_name":"Shuohang Wang","user_id":39678,"people_section":"Current members","alias":"shuowa"},{"type":"user_nicename","display_name":"Lijuan Wang","user_id":32680,"people_section":"Current members","alias":"lijuanw"},{"type":"guest","display_name":"Jianfeng Wang","user_id":693753,"people_section":"Current members","alias":""},{"type":"guest","display_name":"Zhen Xiao","user_id":583885,"people_section":"Current members","alias":""},{"type":"user_nicename","display_name":"Ziyi Yang","user_id":40561,"people_section":"Current members","alias":"ziyiyang"},{"type":"guest","display_name":"Zhengyuan Yang","user_id":786502,"people_section":"Current members","alias":""},{"type":"user_nicename","display_name":"Midia Yousefi","user_id":42369,"people_section":"Current members","alias":"midiayousefi"},{"type":"user_nicename","display_name":"Michael Zeng","user_id":33141,"people_section":"Current members","alias":"nzeng"},{"type":"user_nicename","display_name":"Dong Chen","user_id":31661,"people_section":"Cognitive Services Research Alumni","alias":"doch"},{"type":"guest","display_name":"Asela Gunawardana","user_id":676728,"people_section":"Cognitive Services Research Alumni","alias":""},{"type":"guest","display_name":"Li Jiang","user_id":676719,"people_section":"Cognitive Services Research Alumni","alias":""},{"type":"user_nicename","display_name":"Kenichi Kumatani","user_id":39321,"people_section":"Cognitive Services Research Alumni","alias":"kekumata"},{"type":"guest","display_name":"Sungjin Lee","user_id":613923,"people_section":"Cognitive Services Research Alumni","alias":""},{"type":"guest","display_name":"Abdelrahman Mohamad","user_id":676725,"people_section":"Cognitive Services Research Alumni","alias":""},{"type":"guest","display_name":"Mike Seltzer","user_id":613929,"people_section":"Cognitive Services Research Alumni","alias":""},{"type":"user_nicename","display_name":"Yu Shi","user_id":37950,"people_section":"Cognitive Services Research Alumni","alias":"yushi"},{"type":"guest","display_name":"Andreas Stolcke","user_id":613932,"people_section":"Cognitive Services Research Alumni","alias":""},{"type":"guest","display_name":"Jason Williams","user_id":613935,"people_section":"Cognitive Services Research Alumni","alias":""},{"type":"user_nicename","display_name":"Wayne Xiong","user_id":34811,"people_section":"Cognitive Services Research Alumni","alias":"weixi"},{"type":"guest","display_name":"Dong Yu","user_id":676722,"people_section":"Cognitive Services Research Alumni","alias":""},{"type":"guest","display_name":"Geoff Zweig","user_id":613941,"people_section":"Cognitive Services Research Alumni","alias":""}],"related-publications":[389093,626682,704635,768004,829249,907008,481491,654966,723043,784882,891066,990939,168509,590461,695118,747499,814789,897855,419907,626688,704641,769408,842326,913755,481533,655836,726376,785632,892380,168609,595954,695127,747511,814795,897942,467697,630372,704647,771646,845977,920433,502496,656175,726589,785644,893601,215131,603846,695946,755719,815212,897963,480063,631674,704656,771661,847411,932094,503036,658509,732424,785779,897264,215138,603882,696492,763771,815227,897990,480144,640587,708523,772204,848155,940389,557769,658521,740710,786139,897684,215419,603897,697399,763792,815233,898047,480174,640596,709975,772213,851014,941436,557775,658527,741076,786148,897783,215420,605619,701485,764509,815242,898095,480186,644673,711862,773680,880455,942036,574758,664602,741082,786202,897789,350045,607158,701647,764521,817309,898116,480201,644682,712249,779719,883824,942048,578824,665019,741088,787231,897795,350093,608625,701653,767728,817771,904239,480210,648081,715066,783442,886611,942117,167938,578896,669003,744562,792371,897831,357914,612312,702925,767779,826330,905703,480222,649689,722395,783448,886998,942126,168507,583603,677550,744739,802357,897843,388979,626367,703255,767797,827617,905796,480237,650463,722401,784051,887520,942132,168508,589327,683817,747088,804583,897849],"related-downloads":[1015950],"related-videos":[],"related-projects":[],"related-events":[],"related-opportunities":[],"related-posts":[],"tab-content":[{"id":0,"name":"Knowledge and Language","content":"<p class=\"\">The <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/group\/knowledge-and-language\/\">Knowledge and Language Team<\/a> is part of the Azure Cognitive Services Research (CSR) group, focusing on cutting edge research and the development of the next generation framework for knowledge and natural language processing.<\/p>\r\nWe are working on problems including knowledge-boosted language modeling, knowledge extraction, knowledge graph, summarization, language understanding and generation. We conduct large-scale pre-training and domain-specific fine-tuning on internal and public data sets to develop state-of-the-art deep learning technologies for core knowledge and language problems in various real applications.\r\n\r\nOur work has resulted in multiple publications in top NLP conferences and first place submissions to the CommonsenseQA and FEVER leaderboards.\r\n\r\nOur recent work covers:\r\n\u2022 How to simultaneously pre-train knowledge graph and language model\r\n\u2022 Increase factual correctness of abstractive summaries via knowledge graph\r\n\u2022 Summarize multi-party meeting transcripts\r\n\u2022 Utilize positional bias in news articles for zero-shot summarization"},{"id":1,"name":"Computer Vision","content":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/group\/azure-computer-vision-research\/\">Azure Computer Vision Research<\/a> (ACVR) group is part of the Cognitive Services Research (CSR) group, focusing on cutting edge research in computer vision to advance the state of the art and develop the next generation framework for visual recognition. The problems that we are interested in include image classification; object detection and segmentation; motion analysis and object tracking; dense pose, head, and mask tracking, action recognition; image generation; real-time video understanding; visual representation learning; multi-modality representation learning; and unsupervised\/self-supervised\/contrastive learning. We leverage large-scale GPU and CPU clusters as well as internal and public data sets to develop world-leading deep learning technologies for core vision problems and generic visual representation that can be customized to a wide range of downstream tasks and real applications. The team also runs <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/project\/projectflorence\/\">Project Florence<\/a>, with a focus on developing universal backbones with shared representations for a wide spectrum of visual categories, aiming at accelerating Microsoft vision product shipping using state-of-the-art large-scale deep learning models."},{"id":2,"name":"Speech and Dialog","content":"<p class=\"x-hidden-focus\">The\u00a0<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/group\/speech-research-team\/\">Speech Research Team<\/a>\u00a0is part of the\u00a0<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/group\/cognitive-services-research\/\">Azure Cognitive Services Research (CSR) group<\/a>\u00a0and is responsible for fundamental advances in audio, speech, and spoken language processing technologies. We also work closely with engineering and product teams to bring the new technologies into Microsoft products.<\/p>\r\n<p class=\"x-hidden-focus\">We work on a wide range of speech processing problems, including speech enhancement, speech recognition, speaker diarization, multi-lingual speech recognition, spoken language understanding, end-to-end modeling and self-supervised learning. Our recent work covers the following topics.\r\n\u2022 Deep learning-based real-time speech enhancement\r\n\u2022 Monaural and multi-channel speech separation for meeting transcription\r\n\u2022 Ad hoc microphone arrays\r\n\u2022 End-to-end modeling for speaker-attributed speech recognition\r\n\u2022 Unified speech representation learning\r\n\u2022 Speech-language pre-training<\/p>\r\nThe results of our work are delivered to Microsoft speech technologies and interwoven into various products. We also contributed to the development of new services, such as\u00a0<a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/cognitive-services\/speech-service\/conversation-transcription\">Conversation Transcription<\/a>\u00a0of Azure Cognitive Services which is powering the transcription features of several Microsoft products. Our work resulted in the first place in the speaker diarization track of\u00a0<a href=\"https:\/\/www.robots.ox.ac.uk\/~vgg\/data\/voxceleb\/competition2020.html\">VoxSRC-20<\/a>\u00a0(joint work with other Microsoft researchers) and the breakthrough human parity performance on the\u00a0<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/blog\/microsoft-researchers-achieve-new-conversational-speech-recognition-milestone\/\" target=\"_blank\" rel=\"noreferrer noopener\">Switchboard conversational speech recognition task<\/a>.\r\n\r\nThe former Speech and Dialog Research Group (SDRG) was merged with the Azure Computer Vision Group in 2020 to form the Cognitive Services Research Group.\r\n\r\n&nbsp;"},{"id":3,"name":"Talks","content":"CSR organizes the Distinguished Talk Series to host discussions with leaders in academia and industry. If you\u2019re interested in giving a talk, please contact Chenguang Zhu (<a href=\"mailto:chezhu@microsoft.com\">chezhu@microsoft.com<\/a>).\r\n\r\n&nbsp;\r\n<table style=\"border-collapse: collapse;width: 100%;border-spacing: inherit;height: 505px\" border=\"1\">\r\n<tbody>\r\n<tr style=\"height: 25px\">\r\n<td style=\"width: 19.0517%;padding: inherit;border: 1px solid;height: 25px;text-align: center\">\r\n<h4>Presenter<\/h4>\r\n<\/td>\r\n<td style=\"width: 23.4483%;padding: inherit;border: 1px solid;height: 25px;text-align: center\">\r\n<h4>Affiliation<\/h4>\r\n<\/td>\r\n<td style=\"width: 10.1725%;padding: inherit;border: 1px solid;height: 25px;text-align: center\">\r\n<h4>Date<\/h4>\r\n<\/td>\r\n<td style=\"width: 47.3276%;padding: inherit;border: 1px solid;height: 25px;text-align: center\">\r\n<h4>Title<\/h4>\r\n<\/td>\r\n<\/tr>\r\n<tr style=\"height: 30px\">\r\n<td style=\"width: 19.0517%;padding: inherit;border: 1px solid;text-align: center;height: 30px\">Prof. Tengyu Ma<\/td>\r\n<td style=\"width: 23.4483%;padding: inherit;border: 1px solid;text-align: center;height: 30px\">Stanford<\/td>\r\n<td style=\"width: 10.1725%;padding: inherit;border: 1px solid;text-align: center;height: 30px\">11\/17\/2021<\/td>\r\n<td style=\"width: 47.3276%;padding: inherit;border: 1px solid;text-align: center;height: 30px\">TBD<\/td>\r\n<\/tr>\r\n<tr style=\"height: 30px\">\r\n<td style=\"width: 19.0517%;padding: inherit;border: 1px solid;text-align: center;height: 30px\">Prof. Jiantao Jiao<\/td>\r\n<td style=\"width: 23.4483%;padding: inherit;border: 1px solid;text-align: center;height: 30px\">UC Berkeley<\/td>\r\n<td style=\"width: 10.1725%;padding: inherit;border: 1px solid;text-align: center;height: 30px\">10\/28\/2021<\/td>\r\n<td style=\"width: 47.3276%;padding: inherit;border: 1px solid;text-align: center;height: 30px\">TBD<\/td>\r\n<\/tr>\r\n<tr style=\"height: 30px\">\r\n<td style=\"width: 19.0517%;padding: inherit;border: 1px solid;text-align: center;height: 30px\">Prof. Vered Shwartz<\/td>\r\n<td style=\"width: 23.4483%;padding: inherit;border: 1px solid;text-align: center;height: 30px\">University of British Columbia<\/td>\r\n<td style=\"width: 10.1725%;padding: inherit;border: 1px solid;text-align: center;height: 30px\">9\/23\/2021<\/td>\r\n<td style=\"width: 47.3276%;padding: inherit;border: 1px solid;text-align: center;height: 30px\">Commonsense Knowledge and Reasoning in Natural Language<\/td>\r\n<\/tr>\r\n<tr style=\"height: 30px\">\r\n<td style=\"width: 19.0517%;padding: inherit;border: 1px solid;text-align: center;height: 30px\">Dr. Jim Glass<\/td>\r\n<td style=\"width: 23.4483%;padding: inherit;border: 1px solid;text-align: center;height: 30px\">MIT<\/td>\r\n<td style=\"width: 10.1725%;padding: inherit;border: 1px solid;text-align: center;height: 30px\">7\/22\/2021<\/td>\r\n<td style=\"width: 47.3276%;padding: inherit;border: 1px solid;text-align: center;height: 30px\">Recent Progress in Self-Supervised and Cross-Modal Speech Processing<\/td>\r\n<\/tr>\r\n<tr style=\"height: 30px\">\r\n<td style=\"width: 19.0517%;padding: inherit;border: 1px solid;text-align: center;height: 30px\">Prof. Zhiting Hu<\/td>\r\n<td style=\"width: 23.4483%;padding: inherit;border: 1px solid;text-align: center;height: 30px\">UCSD<\/td>\r\n<td style=\"width: 10.1725%;padding: inherit;border: 1px solid;text-align: center;height: 30px\">6\/17\/2021<\/td>\r\n<td style=\"width: 47.3276%;padding: inherit;border: 1px solid;text-align: center;height: 30px\">Text Generation with No (Good) Data: New Reinforcement Learning and Causal Frameworks<\/td>\r\n<\/tr>\r\n<tr style=\"height: 30px\">\r\n<td style=\"width: 19.0517%;padding: inherit;border: 1px solid;text-align: center;height: 30px\">Prof. Nanyun Peng<\/td>\r\n<td style=\"width: 23.4483%;padding: inherit;border: 1px solid;text-align: center;height: 30px\">UCLA<\/td>\r\n<td style=\"width: 10.1725%;padding: inherit;border: 1px solid;text-align: center;height: 30px\">5\/27\/2021<\/td>\r\n<td style=\"width: 47.3276%;padding: inherit;border: 1px solid;text-align: center;height: 30px\">Controllable Text Generation Beyond Auto-regressive Models<\/td>\r\n<\/tr>\r\n<tr style=\"height: 30px\">\r\n<td style=\"width: 19.0517%;padding: inherit;border: 1px solid;text-align: center;height: 30px\">Prof. Ashton Anderson<\/td>\r\n<td style=\"width: 23.4483%;padding: inherit;border: 1px solid;text-align: center;height: 30px\">University of Toronto<\/td>\r\n<td style=\"width: 10.1725%;padding: inherit;border: 1px solid;text-align: center;height: 30px\">4\/09\/2021<\/td>\r\n<td style=\"width: 47.3276%;padding: inherit;border: 1px solid;text-align: center;height: 30px\">The Cultural Structure of Online Platforms<\/td>\r\n<\/tr>\r\n<tr style=\"height: 30px\">\r\n<td style=\"width: 19.0517%;padding: inherit;border: 1px solid;text-align: center;height: 30px\">Prof. Aditya Grover<\/td>\r\n<td style=\"width: 23.4483%;padding: inherit;border: 1px solid;text-align: center;height: 30px\">Facebook AI Research\/UCLA<\/td>\r\n<td style=\"width: 10.1725%;padding: inherit;border: 1px solid;text-align: center;height: 30px\">3\/18\/2021<\/td>\r\n<td style=\"width: 47.3276%;padding: inherit;border: 1px solid;text-align: center;height: 30px\">Transformer Language Models as Universal Computation Engines<\/td>\r\n<\/tr>\r\n<tr style=\"height: 30px\">\r\n<td style=\"width: 19.0517%;padding: inherit;border: 1px solid;text-align: center;height: 30px\">Prof. Diyi Yang<\/td>\r\n<td style=\"width: 23.4483%;padding: inherit;border: 1px solid;text-align: center;height: 30px\">Georgia Tech<\/td>\r\n<td style=\"width: 10.1725%;padding: inherit;border: 1px solid;text-align: center;height: 30px\">2\/18\/2021<\/td>\r\n<td style=\"width: 47.3276%;padding: inherit;border: 1px solid;text-align: center;height: 30px\">Language Understanding in Social Context: Theory and Practice<\/td>\r\n<\/tr>\r\n<tr style=\"height: 30px\">\r\n<td style=\"width: 19.0517%;padding: inherit;border: 1px solid;text-align: center;height: 30px\">Prof. Song Han<\/td>\r\n<td style=\"width: 23.4483%;padding: inherit;border: 1px solid;text-align: center;height: 30px\">MIT<\/td>\r\n<td style=\"width: 10.1725%;padding: inherit;border: 1px solid;text-align: center;height: 30px\">1\/21\/2021<\/td>\r\n<td style=\"width: 47.3276%;padding: inherit;border: 1px solid;text-align: center;height: 30px\">Putting AI on a Diet: TinyML and Efficient Deep Learning<\/td>\r\n<\/tr>\r\n<tr style=\"height: 30px\">\r\n<td style=\"width: 19.0517%;padding: inherit;border: 1px solid;text-align: center;height: 30px\">Prof. Tianqi Chen<\/td>\r\n<td style=\"width: 23.4483%;padding: inherit;border: 1px solid;text-align: center;height: 30px\">Carnegie Mellon University<\/td>\r\n<td style=\"width: 10.1725%;padding: inherit;border: 1px solid;text-align: center;height: 30px\">1\/15\/2021<\/td>\r\n<td style=\"width: 47.3276%;padding: inherit;border: 1px solid;text-align: center;height: 30px\">Elements of Learning Systems<\/td>\r\n<\/tr>\r\n<tr style=\"height: 30px\">\r\n<td style=\"width: 19.0517%;padding: inherit;border: 1px solid;text-align: center;height: 30px\">Prof. Xiang Ren<\/td>\r\n<td style=\"width: 23.4483%;padding: inherit;border: 1px solid;text-align: center;height: 30px\">University of Southern California<\/td>\r\n<td style=\"width: 10.1725%;padding: inherit;border: 1px solid;text-align: center;height: 30px\">12\/18\/2020<\/td>\r\n<td style=\"width: 47.3276%;padding: inherit;border: 1px solid;text-align: center;height: 30px\">Label Efficient Learning with Human Explanations<\/td>\r\n<\/tr>\r\n<tr style=\"height: 30px\">\r\n<td style=\"width: 19.0517%;padding: inherit;border: 1px solid;text-align: center;height: 30px\">Prof. Jiajun Wu<\/td>\r\n<td style=\"width: 23.4483%;padding: inherit;border: 1px solid;text-align: center;height: 30px\">Stanford<\/td>\r\n<td style=\"width: 10.1725%;padding: inherit;border: 1px solid;text-align: center;height: 30px\">11\/19\/2020<\/td>\r\n<td style=\"width: 47.3276%;padding: inherit;border: 1px solid;text-align: center;height: 30px\">Neuro-Symbolic Visual Concept Learning<\/td>\r\n<\/tr>\r\n<tr style=\"height: 30px\">\r\n<td style=\"width: 19.0517%;padding: inherit;border: 1px solid;text-align: center;height: 30px\">Prof. Fei Liu<\/td>\r\n<td style=\"width: 23.4483%;padding: inherit;border: 1px solid;text-align: center;height: 30px\">University of Central Florida<\/td>\r\n<td style=\"width: 10.1725%;padding: inherit;border: 1px solid;text-align: center;height: 30px\">10\/30\/2020<\/td>\r\n<td style=\"width: 47.3276%;padding: inherit;border: 1px solid;text-align: center;height: 30px\">Toward Robust Abstractive Multi-Document Summarization and Information Consolidation<\/td>\r\n<\/tr>\r\n<tr style=\"height: 30px\">\r\n<td style=\"width: 19.0517%;padding: inherit;border: 1px solid;text-align: center;height: 30px\">Prof. Vivian Yun-Nung Chen<\/td>\r\n<td style=\"width: 23.4483%;padding: inherit;border: 1px solid;text-align: center;height: 30px\">National Taiwan University<\/td>\r\n<td style=\"width: 10.1725%;padding: inherit;border: 1px solid;text-align: center;height: 30px\">10\/2\/2020<\/td>\r\n<td style=\"width: 47.3276%;padding: inherit;border: 1px solid;text-align: center;height: 30px\">Are Your Dialogue Systems Robust and Scalable?<\/td>\r\n<\/tr>\r\n<tr style=\"height: 30px\">\r\n<td style=\"width: 19.0517%;padding: inherit;border: 1px solid;text-align: center;height: 30px\">Prof. Meng Jiang<\/td>\r\n<td style=\"width: 23.4483%;padding: inherit;border: 1px solid;text-align: center;height: 30px\">University of Notre Dame<\/td>\r\n<td style=\"width: 10.1725%;padding: inherit;border: 1px solid;text-align: center;height: 30px\">9\/10\/2020<\/td>\r\n<td style=\"width: 47.3276%;padding: inherit;border: 1px solid;text-align: center;height: 30px\">Scientific Knowledge Extraction: New Tasks and Methods<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>"}],"msr_impact_theme":[],"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-group\/664548"}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-group"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-group"}],"version-history":[{"count":61,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-group\/664548\/revisions"}],"predecessor-version":[{"id":949812,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-group\/664548\/revisions\/949812"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media\/392255"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=664548"}],"wp:term":[{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=664548"},{"taxonomy":"msr-group-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-group-type?post=664548"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=664548"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=664548"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}