{"id":803533,"date":"2021-12-14T14:06:35","date_gmt":"2021-12-14T22:06:35","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?p=803533"},"modified":"2021-12-16T20:57:47","modified_gmt":"2021-12-17T04:57:47","slug":"azure-ai-milestone-new-foundation-model-florence-v1-0-pushing-vision-and-vision-language-state-of-the-art","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/azure-ai-milestone-new-foundation-model-florence-v1-0-pushing-vision-and-vision-language-state-of-the-art\/","title":{"rendered":"Azure AI milestone: New foundation model Florence v1.0 advances state of the art, topping popular computer vision leaderboards"},"content":{"rendered":"\n<p>The <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/group\/azure-florence\/people\/\">Project Florence Team<\/a> <\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/12\/1400x788_Florence_leaderboard_animation_no_logo.gif\" alt=\"Animated GIF shows results on several leaderboards: With the new computer vision foundation model Florence v1.0, the Project Florence team set the new state of the art on the popular leaderboards TextCaps Challenge 2021, nocaps, Kinetics-400\/Kinetics-600 action classification, and OK-VQA Leaderboard.\u00a0\"\/><figcaption>With the new computer vision foundation model Florence v1.0, the Project Florence team set the new state of the art on the popular leaderboards TextCaps Challenge 2021, nocaps, Kinetics-400\/Kinetics-600 action classification, and OK-VQA Leaderboard.&nbsp;<\/figcaption><\/figure>\n\n\n\n<p><em><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/florence-a-new-foundation-model-for-computer-vision\/\">Florence v1.0<\/a>\u2014along with recent milestones in Neural Text-to-Speech and question answering\u2014is part of a larger <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/azure.microsoft.com\/en-us\/overview\/ai-platform\/\">Azure AI <\/a>mission to provide relevant, meaningful AI solutions and services that work better for people because they better capture how people learn and work\u2014with improved vision, knowledge understanding, and speech capabilities. At the center of these efforts is XYZ-code, a joint representation of three cognitive attributes: monolingual text (X), audio or visual sensory signals (Y), and multilingual (Z). For more information about these efforts, read the <\/em><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/blog\/a-holistic-representation-toward-integrative-ai\/\"><em>XYZ-code blog post<\/em><\/a><em>.<\/em>&nbsp;<\/p>\n\n\n\n<p><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/group\/azure-florence\/\" target=\"_blank\" rel=\"noreferrer noopener\">Project Florence<\/a>&nbsp;&nbsp;was launched by&nbsp;<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/azure.microsoft.com\/en-us\/services\/cognitive-services\/\" target=\"_blank\" rel=\"noreferrer noopener\">Microsoft Azure Cognitive Services<\/a>&nbsp;in May 2020 to advance its large-scale multitask, multimodal computer vision services.&nbsp;Today, we\u2019re thrilled to announce&nbsp;an&nbsp;important milestone:&nbsp;<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/florence-a-new-foundation-model-for-computer-vision\/\" target=\"_blank\" rel=\"noreferrer noopener\">Florence&nbsp;v1.0<\/a>, a computer vision foundation model that successfully scales a large variety of vision and vision-language tasks.&nbsp;&nbsp;<\/p>\n\n\n\n<p>Florence v1.0 demonstrates superior performance on challenging tasks such as zero-shot image classification, image\/text retrieval, open-set object detection, and visual question answering. We\u2019ve achieved&nbsp;new&nbsp;state of the art with large margins on a wide range of benchmarks.<strong>&nbsp;<\/strong>Supported by Florence v1.0, we\u2019ve also achieved the new state of the art on multiple popular vision and vision-language leaderboards, including&nbsp;TextCaps&nbsp;Challenge 2021&nbsp;and Kinetics-400\/Kinetics-600 action classification.&nbsp;Florence v1.0 is&nbsp;currently&nbsp;being deployed&nbsp;in&nbsp;Azure Cognitive Services, helping to enhance its computer vision offerings.&nbsp;<\/p>\n\n\n\n<h2 id=\"a-holistic-people-centered-approach-to-ai\">A holistic, people-centered approach to AI<\/h2>\n\n\n\n<p>Project Florence is part of ongoing efforts to develop AI that operates more like people do, a journey that has been challenging but exciting. We take a holistic and people-centered approach to learning and understanding by using multimodality. Our approach examines the relationship between three attributes of human cognition\u2014monolingual text (X), audio or visual sensory cues (Y), and multilingual (Z)\u2014and brings them together under <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/blog\/a-holistic-representation-toward-integrative-ai\/\">XYZ-code, a common representation to enable AI that can speak, hear, see, and understand better. The goal is to c<\/a>reate pretrained basic AI models that learn common representations of different modalities and support a wide range of downstream AI tasks with the ability to leverage additional external domain knowledge to underpin AI systems that interpret and interact in the world more like people do.<\/p>\n\n\n\n<p>In helping to advance&nbsp;the ambitious goal of&nbsp;XYZ-code, the Project Florence team achieved its first milestone&nbsp;last year,&nbsp;attaining&nbsp;<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/blog\/novel-object-captioning-surpasses-human-performance-on-benchmarks\/\" target=\"_blank\" rel=\"noreferrer noopener\">state-of-the-art<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>&nbsp;performance on the&nbsp;<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/eval.ai\/web\/challenges\/challenge-page\/355\/leaderboard\/1011\" target=\"_blank\" rel=\"noreferrer noopener\">nocaps&nbsp;benchmark<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>.&nbsp;Compared&nbsp;with&nbsp;image descriptions provided by people,&nbsp;captions for the same images generated by the AI system were more detailed and precise.&nbsp;<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/blogs.microsoft.com\/ai\/azure-image-captioning\/\" target=\"_blank\" rel=\"noreferrer noopener\">This capability is a key component to the Microsoft mission of inclusive and accessible technology<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>.&nbsp;&nbsp;<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-full\"><a data-bi-bhvr=\"14\"  data-bi-cn=\"From left to right, figure shows the workflow of Florence v1.0, beginning with the curation of an image-text dataset from the internet, represented by two database icons. An arrow points from the database icons to two image-text pairs. The text\u2014\u201crowers carrying boat over heads on a dock\u201d and \u201cdog\u201d\u2014and their corresponding images are input into a language encoder, represented by a blue square labeled \u201clanguage encoder,\u201d and an image encoder, represented by a blue square labeled \u201cimage encoder (CoSwin),\u201d respectively, and Florence is pretrained via unified contrastive learning, represented by a gray square labeled as such. The pretrained model is then adapted to four tasks, each one represented by a green square labeled with the task: classification\/retrieval adaptation; object-level representation using a Dynamic Head Adaptor; fine-grained vision-language representation using a METER Adaptor; and video representation using a Video CoSwin Adaptor. From each green box an arrow points to an image representing the respective task. An arrow from this group of images points to a yellow square labeled \u201cUnified Vision Stack\u201d and another arrow points from \u201cUnified Vision Stack\u201d to a set of new images labeled collectively \u201cdeployment.\u201d Learn more about Florence v1.0 in the research paper. \" href=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/12\/Picture3.png\"><img loading=\"lazy\" decoding=\"async\" width=\"864\" height=\"337\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/12\/Picture3.png\" alt=\"From left to right, figure shows the workflow of Florence v1.0, beginning with the curation of an image-text dataset from the internet, represented by two database icons. An arrow points from the database icons to two image-text pairs. The text\u2014\u201crowers carrying boat over heads on a dock\u201d and \u201cdog\u201d\u2014and their corresponding images are input into a language encoder, represented by a blue square labeled \u201clanguage encoder,\u201d and an image encoder, represented by a blue square labeled \u201cimage encoder (CoSwin),\u201d respectively, and Florence is pretrained via unified contrastive learning, represented by a gray square labeled as such. The pretrained model is then adapted to four tasks, each one represented by a green square labeled with the task: classification\/retrieval adaptation; object-level representation using a Dynamic Head Adaptor; fine-grained vision-language representation using a METER Adaptor; and video representation using a Video CoSwin Adaptor. From each green box an arrow points to an image representing the respective task. An arrow from this group of images points to a yellow square labeled \u201cUnified Vision Stack\u201d and another arrow points from \u201cUnified Vision Stack\u201d to a set of new images labeled collectively \u201cdeployment.\u201d Learn more about Florence v1.0 in the research paper. \" class=\"wp-image-803713\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/12\/Picture3.png 864w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/12\/Picture3-300x117.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/12\/Picture3-768x300.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/12\/Picture3-240x94.png 240w\" sizes=\"(max-width: 864px) 100vw, 864px\" \/><\/a><figcaption>Florence v1.0 leverages data curation, unified learning, a Transformer architecture comprising an image encoder and a language encoder, and adaptation. It can be integrated into modern computer vision systems to power real-world vision and multimedia applications. Compared with existing image-text pretraining models, mainly limited to cross-modal shared representations for classification and retrieval (illustrated by the light-green adaptation module above), Florence expands the representation to support object detection, modalities beyond just RGB like image depth, and videos, respectively.<\/figcaption><\/figure><\/div>\n\n\n\n<h2 id=\"florence-v1-0-from-research-to-application\">Florence v1.0: From research to application<\/h2>\n\n\n\n<p>Project Florence\u2019s mission is to take the advancements being made in areas such as feature representation learning, transfer learning, and model architecture search and turn them into applications that can empower our partners and customers to achieve more with Azure Cognitive Services. Florence v1.0 and other AI breakthroughs achieved so far are being transferred to the cloud platform, helping to improve model quality for image captioning, tagging, and customized object detection.&nbsp;<\/p>\n\n\n\n<p>The Florence image captioning model is available to customers via the computer vision offering of\u202fAzure Cognitive Services, which is part of <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/azure.microsoft.com\/en-us\/overview\/ai-platform\/#overview\" target=\"_blank\" rel=\"noreferrer noopener\">Azure AI<\/a>, and can enable developers to incorporate alt text more easily, helping them improve accessibility of their own products and services. The Florence image captioning model is also being incorporated into <a href=\"https:\/\/www.microsoft.com\/en-us\/ai\/seeing-ai\" target=\"_blank\" rel=\"noreferrer noopener\">Seeing AI<\/a>, an app that identifies text, objects, and people in a user\u2019s surroundings, and Microsoft Word, Outlook<strong>,<\/strong> and PowerPoint on various platforms.<\/p>\n\n\n\n<figure class=\"wp-block-embed is-type-video is-provider-youtube wp-block-embed-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio\"><div class=\"wp-block-embed__wrapper\">\n<iframe loading=\"lazy\" title=\"Microsoft AI breakthrough in automatic image captioning\" width=\"500\" height=\"281\" src=\"https:\/\/www.youtube-nocookie.com\/embed\/ubpEUksa3v0?feature=oembed&rel=0\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share\" referrerpolicy=\"strict-origin-when-cross-origin\" allowfullscreen><\/iframe>\n<\/div><\/figure>\n\n\n\n<p>The Florence image tagging model is also available through the\u202fAzure Cognitive Services\u202fcomputer vision offering. It\u2019s <a href=\"https:\/\/www.microsoft.com\/en-us\/videoplayer\/embed\/RWfzHw?pid=ocpVideo0-innerdiv-oneplayer&postJsllMsg=true&maskLevel=20&market=en-us\">being incorporated into OneDrive<\/a> to empower the photo search and recommendation experience for millions of users.<\/p>\n\n\n\n<p>The Florence models can be further adapted with additional customer data through model fine-tuning. This moves us closer to our ambition of \u201ccustom vision for all\u201d\u2014that is, providing developers and customers with tools to build and improve models customized to meet their unique needs\u2014where new vision objects can be recognized by the Florence model with few-shot fine-tuning.<\/p>\n\n\n\n<p>The achievements here helped pave our way toward having AI models themselves being supplied as a service in production and contribute to many ongoing projects\u2014from Intelligent Photo for Microsoft 365 to planogram compliance for Microsoft industry clouds to <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/cognitive-services\/computer-vision\/intro-to-spatial-analysis-public-preview\">Spatial Analysis<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> for Microsoft Dynamics 365.<\/p>\n\n\n\n<p>We\u2019ll have more updates in the coming months. Please check out our\u202f<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/group\/azure-florence\/\">project page<\/a>\u202fto learn more about our technology and latest advancements.&nbsp;<\/p>\n\n\n\n<h3 id=\"note-on-responsible-ai\"><em>Note on Responsible AI<\/em>&nbsp;<\/h3>\n\n\n\n<p>Microsoft is committed to the advancement and use of AI grounded in principles that put people first and benefit society. We are putting these\u202f<a href=\"https:\/\/www.microsoft.com\/en-us\/ai\/responsible-ai?activetab=pivot1:primaryr6\">Microsoft AI principles<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>\u202finto practice throughout the company and strongly encourage developers to do the same. For guidance on deploying AI responsibly, visit\u202f<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/cognitive-services\/responsible-use-of-ai-overview\">Responsible use of AI with Cognitive Services<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>.<\/p>\n\n\n\n<h3 id=\"acknowledgment\"><em>Acknowledgment<\/em>&nbsp;<\/h3>\n\n\n\n<p><em>This research was conducted by\u00a0the\u00a0<\/em><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/group\/azure-florence\/\" target=\"_blank\" rel=\"noreferrer noopener\"><em>Project\u00a0Florence<\/em><\/a><em>\u00a0team\u00a0under\u00a0Azure Cognitive Services, in close collaboration with\u00a0the\u00a0Microsoft Research\u00a0<\/em><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/group\/deep-learning-group\/\" target=\"_blank\" rel=\"noreferrer noopener\"><em>Deep Learning Group<\/em><\/a><em>.\u00a0Thanks\u00a0to\u00a0the\u00a0Office of the\u00a0Chief Technology Officer,\u00a0Integrated Training Platform, <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/onnxruntime.ai\/\" target=\"_blank\" rel=\"noreferrer noopener\">ONNX Runtime<\/a>,\u00a0and\u00a0<\/em><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/project\/deepspeed\/\" target=\"_blank\" rel=\"noreferrer noopener\"><em>DeepSpeed<\/em><\/a><em>\u00a0teams\u00a0for making this great accomplishment possible.\u00a0Thanks to\u00a0Luis Vargas for coordination\u00a0and Microsoft Research\u00a0Asia for\u00a0its\u00a0help and\u202fcollaboration.\u00a0Thanks\u00a0also\u00a0to\u00a0<\/em><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/jfgao\/\" target=\"_blank\" rel=\"noreferrer noopener\"><em>Jianfeng Gao<\/em><\/a><em>,\u00a0<\/em><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/bainguo\/\" target=\"_blank\" rel=\"noreferrer noopener\"><em>Baining Guo<\/em><\/a><em>,\u00a0<\/em><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/nzeng\/\" target=\"_blank\" rel=\"noreferrer noopener\"><em>Michael Zeng<\/em><\/a><em>,\u00a0<\/em><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/www.linkedin.com\/in\/yumao\/\" target=\"_blank\" rel=\"noreferrer noopener\"><em>Yumao Lu<\/em><\/a><em>,\u00a0<\/em><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/zliu\/\" target=\"_blank\" rel=\"noreferrer noopener\"><em>Zicheng Liu<\/em><\/a><em>,\u00a0<\/em><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/www.linkedin.com\/in\/ce-liu-5697501a\/\" target=\"_blank\" rel=\"noreferrer noopener\"><em>Ce Liu<\/em><\/a><em>, and\u00a0<\/em><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/xdh\/\" target=\"_blank\" rel=\"noreferrer noopener\"><em>Xuedong Huang<\/em><\/a><em>\u00a0for their leadership and support.<\/em>\u00a0<\/p>\n\n\n\n<div style=\"padding-bottom:64px; padding-top:64px\" class=\"wp-block-msr-immersive-section alignfull row has-background has-light-gray-background-color wp-block-msr-immersive-section\">\n\t\n\t<div class=\"container\">\n\t\t<div class=\"wp-block-msr-immersive-section__inner\">\n\t\t\t<h3 id=\"related-blogs-and-publications\">Related blogs and publications <\/h3>\n\n\n\n<ul class=\"wp-block-list\"><li><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/blog\/a-holistic-representation-toward-integrative-ai\/\">A holistic representation toward integrative AI<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/li><li><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/blog\/novel-object-captioning-surpasses-human-performance-on-benchmarks\/\">Novel object captioning surpasses human performance on benchmarks<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/li><li><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/blog\/vinvl-advancing-the-state-of-the-art-for-vision-language-models\/\">VinVL: Advancing the state of the art for vision-language models<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/li><li><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/vivo-surpassing-human-performance-in-novel-object-captioning-with-visual-vocabulary-pre-training\/\">VIVO: Surpassing Human Performance in Novel Object Captioning with Visual Vocabulary Pre-Training<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/li><li><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/techcommunity.microsoft.com\/t5\/azure-ai-blog\/apps-can-now-narrate-what-they-see-in-the-world-as-well-as\/ba-p\/1667146\">Apps can now narrate what they see in the world as well as people do<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/li><\/ul>\t\t<\/div>\n\t<\/div>\n\n\t<\/div>\n\n\n\n<p><\/p>\n\n\n\n<p><\/p>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>The Project Florence Team Florence v1.0\u2014along with recent milestones in Neural Text-to-Speech and question answering\u2014is part of a larger Azure AI mission to provide relevant, meaningful AI solutions and services that work better for people because they better capture how people learn and work\u2014with improved vision, knowledge understanding, and speech capabilities. At the center of [&hellip;]<\/p>\n","protected":false},"author":40735,"featured_media":804889,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","footnotes":""},"categories":[1],"tags":[],"research-area":[13556,13562],"msr-region":[],"msr-event-type":[],"msr-locale":[268875],"msr-post-option":[243984],"msr-impact-theme":[],"msr-promo-type":[],"msr-podcast-series":[],"class_list":["post-803533","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-research-blog","msr-research-area-artificial-intelligence","msr-research-area-computer-vision","msr-locale-en_us","msr-post-option-blog-homepage-featured"],"msr_event_details":{"start":"","end":"","location":""},"podcast_url":"","podcast_episode":"","msr_research_lab":[],"msr_impact_theme":[],"related-publications":[],"related-downloads":[],"related-videos":[],"related-academic-programs":[],"related-groups":[694347],"related-projects":[689814],"related-events":[],"related-researchers":[{"type":"guest","value":"project-florence-team","user_id":"806017","display_name":"Project Florence  Team","author_link":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/group\/azure-florence\/people\/\" aria-label=\"Visit the profile page for Project Florence  Team\">Project Florence  Team<\/a>","is_active":true,"last_first":"Team, Project Florence ","people_section":0,"alias":"project-florence-team"}],"msr_type":"Post","featured_image_thumbnail":"<img width=\"960\" height=\"540\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/12\/1400x788_Florence_still_no_logo-scaled-960x540.jpg\" class=\"img-object-cover\" alt=\"a screen shot of a computer\" decoding=\"async\" loading=\"lazy\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/12\/1400x788_Florence_still_no_logo-scaled-960x540.jpg 960w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/12\/1400x788_Florence_still_no_logo-300x169.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/12\/1400x788_Florence_still_no_logo-1024x576.jpg 1024w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/12\/1400x788_Florence_still_no_logo-768x432.jpg 768w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/12\/1400x788_Florence_still_no_logo-1536x865.jpg 1536w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/12\/1400x788_Florence_still_no_logo-2048x1153.jpg 2048w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/12\/1400x788_Florence_still_no_logo-1066x600.jpg 1066w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/12\/1400x788_Florence_still_no_logo-655x368.jpg 655w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/12\/1400x788_Florence_still_no_logo-343x193.jpg 343w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/12\/1400x788_Florence_still_no_logo-240x135.jpg 240w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/12\/1400x788_Florence_still_no_logo-scaled-640x360.jpg 640w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/12\/1400x788_Florence_still_no_logo-1280x720.jpg 1280w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/12\/1400x788_Florence_still_no_logo-1920x1080.jpg 1920w\" sizes=\"(max-width: 960px) 100vw, 960px\" \/>","byline":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/group\/azure-florence\/people\/\" title=\"Go to researcher profile for Project Florence  Team\" aria-label=\"Go to researcher profile for Project Florence  Team\" data-bi-type=\"byline author\" data-bi-cN=\"Project Florence  Team\">Project Florence  Team<\/a>","formattedDate":"December 14, 2021","formattedExcerpt":"The Project Florence Team Florence v1.0\u2014along with recent milestones in Neural Text-to-Speech and question answering\u2014is part of a larger Azure AI mission to provide relevant, meaningful AI solutions and services that work better for people because they better capture how people learn and work\u2014with improved&hellip;","locale":{"slug":"en_us","name":"English","native":"","english":"English"},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/803533"}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/users\/40735"}],"replies":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/comments?post=803533"}],"version-history":[{"count":20,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/803533\/revisions"}],"predecessor-version":[{"id":871308,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/803533\/revisions\/871308"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media\/804889"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=803533"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/categories?post=803533"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/tags?post=803533"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=803533"},{"taxonomy":"msr-region","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-region?post=803533"},{"taxonomy":"msr-event-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event-type?post=803533"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=803533"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=803533"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=803533"},{"taxonomy":"msr-promo-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-promo-type?post=803533"},{"taxonomy":"msr-podcast-series","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-podcast-series?post=803533"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}