{"id":741421,"date":"2021-04-26T10:46:11","date_gmt":"2021-04-26T17:46:11","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?p=741421"},"modified":"2022-12-22T11:13:45","modified_gmt":"2022-12-22T19:13:45","slug":"alexandria-in-microsoft-viva-topics-from-big-data-to-big-knowledge","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/alexandria-in-microsoft-viva-topics-from-big-data-to-big-knowledge\/","title":{"rendered":"Alexandria in Microsoft Viva Topics: from big data to big knowledge"},"content":{"rendered":"\n<figure class=\"wp-block-image alignwide size-large\"><img decoding=\"async\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/04\/1400x788_Cortex_nologo_still-scaled.jpg\" alt=\"Project Alexandria Pipeline flowchart\"\/><\/figure>\n\n\n\n<p>Project Alexandria is a research project within Microsoft Research Cambridge dedicated to discovering entities, or topics of information, and their associated properties from unstructured documents. This research lab has studied knowledge mining research for over a decade, using the probabilistic programming framework Infer.NET. Project Alexandria was established seven years ago to build on <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/github.com\/dotnet\/infer\/\" target=\"_blank\" rel=\"noreferrer noopener\">Infer.NET<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> and retrieve facts, schemas, and entities from unstructured data sources while adhering to Microsoft\u2019s robust privacy standards. The goal of the project is to construct a full knowledge base from a set of documents, entirely automatically. <\/p>\n\n\n\n<p>The Alexandria research team is uniquely positioned to make direct contributions to new Microsoft products. Alexandria technology plays a central role in the <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/news.microsoft.com\/2021\/02\/04\/microsoft-unveils-new-employee-experience-platform-microsoft-viva-to-help-people-thrive-at-work\/\" target=\"_blank\" rel=\"noreferrer noopener\">recently announced Microsoft Viva Topics<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, an AI product that automatically organizes large amounts of content and expertise, making it easier for people to find information and act on it. Specifically, the Alexandria team is responsible for identifying topics and rich metadata, and combining other innovative Microsoft knowledge mining technologies to enhance the end user experience.<\/p>\n\n\n\n<p>The Alexandria team continues to contribute to Microsoft\u2019s vision of enterprise knowledge, delivering essential capabilities and focusing on the future of the enterprise knowledge base and the transition from big data to big knowledge.<\/p>\n\n\n\n\t<div class=\"border-bottom border-top border-gray-300 mt-5 mb-5 msr-promo text-center text-md-left alignwide\" data-bi-aN=\"promo\" data-bi-id=\"1115757\">\n\t\t\n\n\t\t<p class=\"msr-promo__label text-gray-800 text-center text-uppercase\">\n\t\t<span class=\"px-4 bg-white display-inline-block font-weight-semibold small\">Microsoft research podcast<\/span>\n\t<\/p>\n\t\n\t<div class=\"row pt-3 pb-4 align-items-center\">\n\t\t\t\t\t\t<div class=\"msr-promo__media col-12 col-md-5\">\n\t\t\t\t<a class=\"bg-gray-300\" href=\"https:\/\/www.microsoft.com\/en-us\/research\/podcast\/ideas-ai-and-democracy-with-madeleine-daepp-and-robert-osazuwa-ness\/\" aria-label=\"Ideas: AI and democracy with Madeleine Daepp and Robert Osazuwa Ness\" data-bi-cN=\"Ideas: AI and democracy with Madeleine Daepp and Robert Osazuwa Ness\" target=\"_blank\">\n\t\t\t\t\t<img decoding=\"async\" class=\"w-100 display-block\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/12\/NEWGinny-Madeleine-and-Robert_Ideas_Hero_Feature_No_Text_1400x788.jpg\" alt=\"Illustrated headshots of Ginny Badanes, Madeleine Daepp and Robert Ness\" \/>\n\t\t\t\t<\/a>\n\t\t\t<\/div>\n\t\t\t\n\t\t\t<div class=\"msr-promo__content p-3 px-5 col-12 col-md\">\n\n\t\t\t\t\t\t\t\t\t<h2 class=\"h4\">Ideas: AI and democracy with Madeleine Daepp and Robert Osazuwa Ness<\/h2>\n\t\t\t\t\n\t\t\t\t\t\t\t\t<p class=\"large\">As the \u201cbiggest election year in history\u201d comes to an end, researchers Madeleine Daepp and Robert Osazuwa Ness and Democracy Forward GM Ginny Badanes discuss AI\u2019s impact on democracy, including the tech\u2019s use in Taiwan and India.<\/p>\n\t\t\t\t\n\t\t\t\t\t\t\t\t<div class=\"wp-block-buttons justify-content-center justify-content-md-start\">\n\t\t\t\t\t<div class=\"wp-block-button\">\n\t\t\t\t\t\t<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/podcast\/ideas-ai-and-democracy-with-madeleine-daepp-and-robert-osazuwa-ness\/\" class=\"btn btn-brand glyph-append glyph-append-chevron-right\" aria-label=\"Listen now\" data-bi-cN=\"Ideas: AI and democracy with Madeleine Daepp and Robert Osazuwa Ness\" target=\"_blank\">\n\t\t\t\t\t\t\tListen now\t\t\t\t\t\t<\/a>\n\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t\t\t<\/div><!--\/.msr-promo__content-->\n\t<\/div><!--\/.msr-promo__inner-wrap-->\n\t<\/div><!--\/.msr-promo-->\n\t\n\n\n<h2 id=\"part-1-what-is-viva-topics\">Part 1: What is Viva Topics<\/h2>\n\n\n\n<p><a href=\"https:\/\/www.microsoft.com\/en-us\/microsoft-viva\/topics\/overview\" target=\"_blank\" rel=\"noreferrer noopener\">Microsoft Viva Topics<\/a> is one of the four modules of <a href=\"https:\/\/www.microsoft.com\/en-us\/microsoft-viva\" target=\"_blank\" rel=\"noreferrer noopener\">Microsoft Viva<\/a>, an employee experience platform that brings together communications, knowledge, learning, resources, and insights. Viva Topics uses AI to organize resources, information and expertise into topics delivered through apps like SharePoint, Microsoft Search and Office; and coming soon to Yammer, Teams, and Outlook. Extracted descriptions, topical documents, and related sites and people are presented on topic cards that help people learn, develop new skills, and innovate faster while they work.<\/p>\n\n\n\n<p><em>The Enterprise Knowledge Sharing Problem<\/em><\/p>\n\n\n\n<p>Finding information can be hard, and numerous studies suggest that inefficiencies in knowledge searching strongly impact enterprise productivity. Solving the knowledge problem will enable employees to spend less time searching and more time learning and implementing. <\/p>\n\n\n\n<figure class=\"wp-block-pullquote\"><blockquote><p>A <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/query.prod.cms.rt.microsoft.com\/cms\/api\/am\/binary\/RE4MDJC\" target=\"_blank\" rel=\"noreferrer noopener\">recent survey<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> found that employees could potentially save four to six hours each week if they didn\u2019t have to search for information\u2014or spend time recreating it. This equates to an increase of 11-14 percent in daily productivity.<\/p><\/blockquote><\/figure>\n\n\n\n<p>Common business scenarios such as accelerating onboarding time for new employees could be 20-35 percent faster, according to a <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/resources.techcommunity.microsoft.com\/wp-content\/uploads\/2021\/02\/New-Tech-Projected-TEI-of-Microsoft-365-Knowledge-Content-Services-Jan2021-Final-1.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">Forrester study<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> of the potential impact of Microsoft\u2019s knowledge and content services.<\/p>\n\n\n\n<p>Microsoft\u2019s opportunity to serve customers at scale becomes real with privacy-compliant access to a large set of enterprise data in the <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/developer.microsoft.com\/en-us\/graph\" target=\"_blank\" rel=\"noreferrer noopener\">Microsoft Graph<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. The graph currently contains context from over 18 trillion resources, including emails, events, users, files and more. This massive repository can be compliantly leveraged to create powerful knowledge extraction models and create a bespoke set of topics for each customer. Today, data from the Graph is used to power the knowledge experiences for all customers of Microsoft Viva Topics.<\/p>\n\n\n\n<p><em>What are Topics? <\/em><\/p>\n\n\n\n<p>Viva Topics automatically creates an enterprise knowledge base structured around <strong>topics,<\/strong> such as projects, events, and organizations, with related metadata about people, content, acronyms, definitions, conversations and related topics.&nbsp;Topic cards connect the user to knowledge in apps like SharePoint, Teams, Microsoft Search and more. Read more about <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/docs.microsoft.com\/en-us\/microsoft-365\/knowledge\/topic-experiences-discovery\" target=\"_blank\" rel=\"noreferrer noopener\">topic discovery<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/04\/Figure1_AlexandriaBlog.jpg\" alt=\"Figure 1: A screenshot of a Microsoft Teams conversation with a callout box providing additional information about people and related resources \"\/><figcaption>Figure 1: Viva Topics delivers knowledge in context throughout Microsoft 365.<\/figcaption><\/figure>\n\n\n\n<p>Knowledge should not be locked in a repository but instead should be useful and consumable where it is needed. Putting topics in the flow of everyday work makes knowledge from across an organization more easily discoverable and actionable.<\/p>\n\n\n\n<h2 id=\"part-2-alexandria-in-viva-topics\">Part 2: Alexandria in Viva Topics<\/h2>\n\n\n\n<p>Viva Topics brings together the richness and breadth of technology investments from across Microsoft.<\/p>\n\n\n\n<p>The different technologies have their own specialty for the rich metadata they can contribute to Viva Topics. For example, one technology specializes in delivering a definition for a topic or bringing in a relevant synopsis from Wikipedia. Alexandria technology is then used to bring all this knowledge together into one coherent knowledge base.<\/p>\n\n\n\n<p>Naomi Moneypenny, who leads Viva Topics product development, describes the collaboration: \u201cThe Project Alexandria team and technologies have been instrumental to delivering the innovative experiences for customers in Viva Topics. We value their highly collaborative approach to working with many other specialist teams across Microsoft.\u201d<\/p>\n\n\n\n<p>Project Alexandria plays two fundamental roles in the production of Viva Topics:<\/p>\n\n\n\n<ol class=\"wp-block-list\"><li><em>Topic mining: <\/em>This process includes the discovery of topics in documents, as well as the maintenance and upkeep of those topics as documents change or as new documents are created.<\/li><li><em>Topic linking:<\/em> The process of bringing together knowledge from a variety of sources into a single unified knowledge base.<\/li><\/ol>\n\n\n\n<p>Alexandria achieves both tasks through a machine learning approach called <em>probabilistic programming, <\/em>which uses a special kind of program to describe the process by which topics and their properties are mentioned in documents. The same program can then effectively be run backwards to extract topics from documents. A big advantage of this approach is that information about the task is included in the probabilistic program itself, rather than using large amounts of labelled data. This enables the process to run unsupervised \u2013 it can perform these tasks automatically without any human input. For a technical description of the Alexandria probabilistic program see the award winning <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/alexandria-unsupervised-high-precision-knowledge-base-construction-using-a-probabilistic-program\/\" target=\"_blank\" rel=\"noreferrer noopener\">Alexandria Research Paper<\/a>.<\/p>\n\n\n\n<div class=\"annotations \" data-bi-aN=\"margin-callout\">\n\t<ul class=\"annotations__list card depth-16 bg-body p-4 annotations__list--right\">\n\t\t<li class=\"annotations__list-item\">\n\t\t\t\t\t\t<span class=\"annotations__type d-block text-uppercase font-weight-semibold text-neutral-300 small\">Publication<\/span>\n\t\t\t<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/alexandria-unsupervised-high-precision-knowledge-base-construction-using-a-probabilistic-program\/\" target=\"_self\" class=\"annotations__link font-weight-semibold text-decoration-none\" data-bi-type=\"annotated-link\" aria-label=\"Alexandria: Unsupervised High-Precision Knowledge Base Construction using a Probabilistic Program\" data-bi-aN=\"margin-callout\" data-bi-cN=\"Alexandria: Unsupervised High-Precision Knowledge Base Construction using a Probabilistic Program\">\n\t\t\t\tAlexandria: Unsupervised High-Precision Knowledge Base Construction using a Probabilistic Program&nbsp;<span class=\"glyph-append glyph-append-chevron-right glyph-append-xsmall\"><\/span>\n\t\t\t<\/a>\n\t\t\t\t\t<\/li>\n\t<\/ul>\n<\/div>\n\n\n\n<p>An overview of the process of mining and linking is shown in the picture below. Topic mining itself consists of two stages\u2013finding relevant pieces of text and extracting knowledge from them using a probabilistic parser. Let\u2019s have a more detailed look at each of these stages.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/04\/Figure-2_-Alexandria-Pipeline.png\" alt=\"Figure 2: A flowchart depicting unstructured text being processed though a query engine, a probabilistic parser, and probabilistic clustering to produce a unified knowledge base \"\/><figcaption>Figure 2: The Alexandria pipeline \u2013 from unstructured text to structured knowledge. <\/figcaption><\/figure>\n\n\n\n<p><em>Alexandria topic mining<\/em><\/p>\n\n\n\n<p>To narrow down the information needed to be processed, Alexandria first runs a query engine to extract snippets from each document with the high probability of containing knowledge. This query engine can scale to run over billions of documents. For example, say that the model was parsing a document related to a company initiative, Project Alpha. The query engine would extract phases likely to contain entity information, such as \u201cProject Alpha will be released on 9\/12\/2021\u201d or \u201cProject Alpha is run by Jane Smith.\u201d<\/p>\n\n\n\n<p>The parsing process requires identifying which parts of the text snippet correspond to specific property values. In this approach, known as template matching, the model looks for a set of patterns, or templates, such as \u201cProject {name} will be released on {date}\u201d. By matching a template to the text, the process can identify which parts of the text correspond with certain properties. Alexandria performs unsupervised learning to create templates from both structured and unstructured text, and the model can readily work with thousands of templates. To improve coverage, templates can be augmented by a more sophisticated neural language model, such as an LSTM or a transformer, at increased computational cost.<\/p>\n\n\n\n<p>We can extract textual property values from \u201cProject Alpha will be released in 9\/12\/2021\u201d using the template \u201cProject {name} will be released on {completion_date}\u201d. This gives the text \u201cAlpha\u201d for name and \u201c9\/12\/2021\u201d for completion_date. These strings come with inherent uncertainty in terms of what values they represent. For example, the completion_date string could represent the date 12th September 2021 or the date 9th December 2021, depending on which date format was used (D\/M\/YYYY or M\/D\/YYYY). To handle such uncertainty, Alexandria represents values as probability distributions over strongly typed values \u2013 in this case, dates. Here, the probability distribution would retain both possible dates. If another source contained the text \u201cProject Alpha will be released on September 12\u201d, then one of the possibilities would be eliminated, leaving the extracted date as 12th September 2021.<\/p>\n\n\n\n<p>This parsing process is achieved by running the probabilistic program backwards using the Infer.NET machine learning framework. Infer.NET uses a process of message-parsing, as shown in the example below, which illustrates parsing the text \u201cProject Alpha will be released on 9\/12\/2021\u201d:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/04\/Fig3_Alexandria.jpg\" alt=\"Figure 3: A chart depicting analysis of multiple text variables  \" width=\"900\" height=\"389\"\/><figcaption>Figure 3: A factor graph showing the message passing for probabilistic understanding of the sample text \u2018Project Alpha will be released on 9\/12\/2021\u2019<\/figcaption><\/figure>\n\n\n\n<p>The output of probabilistic parsing is a probabilistic entity consisting of a set of properties and distributions over the values of these properties. Each parsed text extract gives rise to one such probabilistic entity, leading to a very large number. To combine these together, they are passed to the next stage: linking.<\/p>\n\n\n\n<p><em>Alexandria topic Linking<\/em><\/p>\n\n\n\n<p>The set of probabilistic entities coming out of the parsing stage typically contains many items which correspond to the same real-world entity. For example, we may have these two entities:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>{Name = Project Alpha, Completion-Date = September 2021}<\/li><li>{Name = Alpha, Completion-Date = 12 September, Owner = Jane Smith}.<\/li><\/ul>\n\n\n\n<p>The linking process identifies duplicate or overlapping entities and merges them using a clustering process driven by the same underlying probabilistic program as used for parsing. This ensures consistency throughout the Alexandria pipeline.<\/p>\n\n\n\n<p>When considering whether to link two entities, the linker makes use of the distributions over their property values. It calculates whether it is more likely that both distributions arose from a single underlying value or from two different values. In the example above, there is an underlying entity whose values are consistent with both entities, which is the merged entity:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>{Name = Project Alpha, Completion Date = 12 September 2021, Owner = Jane Smith}.<\/li><\/ul>\n\n\n\n<p>Typically, many hundreds or thousands of items are merged together to create more robust entries along with a detailed description of the extracted entity.<\/p>\n\n\n\n<p>The above example is a simple one. In practice, the linking process must analyze more complex values to identify compatibility. For example, if one document says Project Alpha is owned by \u201cJane Smith\u201d and another says the Project is owned by \u201cJane\u201d, they may be indicating the same person. Similarly, documents referring to \u201cProject Alpha\u201d or just \u201cAlpha\u201d may refer to the same project.<\/p>\n\n\n\n<p>Alexandria\u2019s probabilistic program can also help sort out errors introduced by humans.&nbsp;A document indicating \u201cProject Alpha is owned by Jay Smith\u201d instead of Jane Smith may refer to the same entity, even though the project owner was recorded incorrectly.<\/p>\n\n\n\n<p>The linking process can also analyze knowledge coming from other sources, even if that knowledge was not mined from a document. This allows processing of structured data from databases, other knowledge mining toolkits and even manually curated knowledge. Wherever the information comes from, it is linked together to provide a single unified knowledge base.<\/p>\n\n\n\n<p>Linking can also be applied incrementally, so that information is processed as it arrives, continually updating the knowledge base.<\/p>\n\n\n\n<h2 id=\"part-3-the-future-of-alexandria\">Part 3: The Future of Alexandria<\/h2>\n\n\n\n<p>The enterprise knowledge base will soon be internationalized, allowing information in multiple languages to be brought together in a single knowledge base. Particularly useful for international organizations, this update will come with a feature that can automatically translate extracted knowledge.<\/p>\n\n\n\n<p>Looking further ahead, Alexandria\u2019s ability to extract information automatically gives us the opportunity to customize the knowledge discovery process. By automatically retrieving the set of types and properties being talked about in an organization\u2019s documents, Alexandria can create a knowledge base with a bespoke schema exactly tailored to the needs of each organization and using the familiar language and terminology that people in the organization are used to. Read more about the proposed schema-based design in our <a href=\"alexandria: Unsupervised High-Precision Knowledge Base Construction using a Probabilistic Program - Microsoft Research\">research paper.<\/a><\/p>\n\n\n\n<p>We are only beginning to dream of the experiences that an automatically created and updated knowledge base can enable, but it is already clear that it could transform the future of how we work. The era of big knowledge is coming sooner than you might think.<\/p>\n\n\n\n<p><em>Resources and Further Reading<\/em><\/p>\n\n\n\n<ul class=\"wp-block-list\"><li><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/github.com\/dotnet\/infer\/\" target=\"_blank\" rel=\"noreferrer noopener\">Infer.Net on GitHub<\/a><\/li><li><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/alexandria-unsupervised-high-precision-knowledge-base-construction-using-a-probabilistic-program\/\" target=\"_blank\" rel=\"noreferrer noopener\">Alexandria research paper<\/a><\/li><li><a href=\"https:\/\/www.microsoft.com\/en-gb\/microsoft-viva\/topics\/overview\" target=\"_blank\" rel=\"noreferrer noopener\">Viva Topics web page<\/a><\/li><\/ul>\n","protected":false},"excerpt":{"rendered":"<p>Project Alexandria is a research project within Microsoft Research Cambridge dedicated to discovering entities, or topics of information, and their associated properties from unstructured documents. This research lab has studied knowledge mining research for over a decade, using the probabilistic programming framework Infer.NET. Project Alexandria was established seven years ago to build on Infer.NET (opens [&hellip;]<\/p>\n","protected":false},"author":38838,"featured_media":742330,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","footnotes":""},"categories":[1],"tags":[],"research-area":[13556],"msr-region":[],"msr-event-type":[],"msr-locale":[268875],"msr-post-option":[243984],"msr-impact-theme":[],"msr-promo-type":[],"msr-podcast-series":[],"class_list":["post-741421","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-research-blog","msr-research-area-artificial-intelligence","msr-locale-en_us","msr-post-option-blog-homepage-featured"],"msr_event_details":{"start":"","end":"","location":""},"podcast_url":"","podcast_episode":"","msr_research_lab":[199561],"msr_impact_theme":[],"related-publications":[],"related-downloads":[],"related-videos":[],"related-academic-programs":[],"related-groups":[],"related-projects":[580699],"related-events":[],"related-researchers":[{"type":"guest","value":"sanil-rajput","user_id":"741952","display_name":"Sanil  Rajput ","author_link":"<a href=\"https:\/\/www.linkedin.com\/in\/sanilrajput\/\" aria-label=\"Visit the profile page for Sanil  Rajput \">Sanil  Rajput <\/a>","is_active":true,"last_first":"Rajput , Sanil ","people_section":0,"alias":"sanil-rajput"},{"type":"user_nicename","value":"John Winn","user_id":32457,"display_name":"John Winn","author_link":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/jwinn\/\" aria-label=\"Visit the profile page for John Winn\">John Winn<\/a>","is_active":false,"last_first":"Winn, John","people_section":0,"alias":"jwinn"},{"type":"guest","value":"naomi-moneypenny","user_id":"741955","display_name":"Naomi  Moneypenny","author_link":"<a href=\"https:\/\/www.linkedin.com\/in\/moneypenny\" aria-label=\"Visit the profile page for Naomi  Moneypenny\">Naomi  Moneypenny<\/a>","is_active":true,"last_first":"Moneypenny, Naomi ","people_section":0,"alias":"naomi-moneypenny"},{"type":"user_nicename","value":"Yordan Zaykov","user_id":35044,"display_name":"Yordan Zaykov","author_link":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/yordanz\/\" aria-label=\"Visit the profile page for Yordan Zaykov\">Yordan Zaykov<\/a>","is_active":false,"last_first":"Zaykov, Yordan","people_section":0,"alias":"yordanz"},{"type":"guest","value":"cj-tan","user_id":"741958","display_name":"CJ  Tan","author_link":"<a href=\"https:\/\/www.linkedin.com\/in\/c-j-tan\" aria-label=\"Visit the profile page for CJ  Tan\">CJ  Tan<\/a>","is_active":true,"last_first":"Tan, CJ ","people_section":0,"alias":"cj-tan"}],"msr_type":"Post","featured_image_thumbnail":"<img width=\"960\" height=\"540\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/04\/1400x788_Cortex_nologo_still-960x540.jpg\" class=\"img-object-cover\" alt=\"Figure 2: A flowchart depicting unstructured text being processed though a query engine, a probabilistic parser, and probabilistic clustering to produce a unified knowledge base\" decoding=\"async\" loading=\"lazy\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/04\/1400x788_Cortex_nologo_still-960x540.jpg 960w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/04\/1400x788_Cortex_nologo_still-300x169.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/04\/1400x788_Cortex_nologo_still-1024x577.jpg 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/04\/1400x788_Cortex_nologo_still-768x432.jpg 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/04\/1400x788_Cortex_nologo_still-1536x865.jpg 1536w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/04\/1400x788_Cortex_nologo_still-2048x1153.jpg 2048w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/04\/1400x788_Cortex_nologo_still-16x9.jpg 16w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/04\/1400x788_Cortex_nologo_still-1066x600.jpg 1066w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/04\/1400x788_Cortex_nologo_still-655x368.jpg 655w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/04\/1400x788_Cortex_nologo_still-343x193.jpg 343w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/04\/1400x788_Cortex_nologo_still-640x360.jpg 640w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/04\/1400x788_Cortex_nologo_still-1280x720.jpg 1280w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/04\/1400x788_Cortex_nologo_still-1920x1080.jpg 1920w\" sizes=\"auto, (max-width: 960px) 100vw, 960px\" \/>","byline":"","formattedDate":"April 26, 2021","formattedExcerpt":"Project Alexandria is a research project within Microsoft Research Cambridge dedicated to discovering entities, or topics of information, and their associated properties from unstructured documents. This research lab has studied knowledge mining research for over a decade, using the probabilistic programming framework Infer.NET. Project Alexandria&hellip;","locale":{"slug":"en_us","name":"English","native":"","english":"English"},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/741421","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/users\/38838"}],"replies":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/comments?post=741421"}],"version-history":[{"count":15,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/741421\/revisions"}],"predecessor-version":[{"id":909951,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/741421\/revisions\/909951"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media\/742330"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=741421"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/categories?post=741421"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/tags?post=741421"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=741421"},{"taxonomy":"msr-region","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-region?post=741421"},{"taxonomy":"msr-event-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event-type?post=741421"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=741421"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=741421"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=741421"},{"taxonomy":"msr-promo-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-promo-type?post=741421"},{"taxonomy":"msr-podcast-series","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-podcast-series?post=741421"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}