{"id":741421,"date":"2021-04-26T10:46:11","date_gmt":"2021-04-26T17:46:11","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?p=741421"},"modified":"2022-12-22T11:13:45","modified_gmt":"2022-12-22T19:13:45","slug":"alexandria-in-microsoft-viva-topics-from-big-data-to-big-knowledge","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/alexandria-in-microsoft-viva-topics-from-big-data-to-big-knowledge\/","title":{"rendered":"Alexandria in Microsoft Viva Topics: from big data to big knowledge"},"content":{"rendered":"\n
\"Project<\/figure>\n\n\n\n

Project Alexandria is a research project within Microsoft Research Cambridge dedicated to discovering entities, or topics of information, and their associated properties from unstructured documents. This research lab has studied knowledge mining research for over a decade, using the probabilistic programming framework Infer.NET. Project Alexandria was established seven years ago to build on Infer.NET (opens in new tab)<\/span><\/a> and retrieve facts, schemas, and entities from unstructured data sources while adhering to Microsoft\u2019s robust privacy standards. The goal of the project is to construct a full knowledge base from a set of documents, entirely automatically. <\/p>\n\n\n\n

The Alexandria research team is uniquely positioned to make direct contributions to new Microsoft products. Alexandria technology plays a central role in the recently announced Microsoft Viva Topics (opens in new tab)<\/span><\/a>, an AI product that automatically organizes large amounts of content and expertise, making it easier for people to find information and act on it. Specifically, the Alexandria team is responsible for identifying topics and rich metadata, and combining other innovative Microsoft knowledge mining technologies to enhance the end user experience.<\/p>\n\n\n\n

The Alexandria team continues to contribute to Microsoft\u2019s vision of enterprise knowledge, delivering essential capabilities and focusing on the future of the enterprise knowledge base and the transition from big data to big knowledge.<\/p>\n\n\n\n\t

\n\t\t\n\n\t\t

\n\t\tSpotlight: AI-POWERED EXPERIENCE<\/span>\n\t<\/p>\n\t\n\t

\n\t\t\t\t\t\t
\n\t\t\t\t\n\t\t\t\t\t\"\"\n\t\t\t\t<\/a>\n\t\t\t<\/div>\n\t\t\t\n\t\t\t
\n\n\t\t\t\t\t\t\t\t\t

Microsoft research copilot experience<\/h2>\n\t\t\t\t\n\t\t\t\t\t\t\t\t

Discover more about research at Microsoft through our AI-powered experience<\/p>\n\t\t\t\t\n\t\t\t\t\t\t\t\t

\n\t\t\t\t\t
\n\t\t\t\t\t\t\n\t\t\t\t\t\t\tStart now\t\t\t\t\t\t<\/a>\n\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t\t\t<\/div>\n\t<\/div>\n\t<\/div>\n\t\n\n\n

Part 1: What is Viva Topics<\/h2>\n\n\n\n

Microsoft Viva Topics<\/a> is one of the four modules of Microsoft Viva<\/a>, an employee experience platform that brings together communications, knowledge, learning, resources, and insights. Viva Topics uses AI to organize resources, information and expertise into topics delivered through apps like SharePoint, Microsoft Search and Office; and coming soon to Yammer, Teams, and Outlook. Extracted descriptions, topical documents, and related sites and people are presented on topic cards that help people learn, develop new skills, and innovate faster while they work.<\/p>\n\n\n\n

The Enterprise Knowledge Sharing Problem<\/em><\/p>\n\n\n\n

Finding information can be hard, and numerous studies suggest that inefficiencies in knowledge searching strongly impact enterprise productivity. Solving the knowledge problem will enable employees to spend less time searching and more time learning and implementing. <\/p>\n\n\n\n

A recent survey (opens in new tab)<\/span><\/a> found that employees could potentially save four to six hours each week if they didn\u2019t have to search for information\u2014or spend time recreating it. This equates to an increase of 11-14 percent in daily productivity.<\/p><\/blockquote><\/figure>\n\n\n\n

Common business scenarios such as accelerating onboarding time for new employees could be 20-35 percent faster, according to a Forrester study (opens in new tab)<\/span><\/a> of the potential impact of Microsoft\u2019s knowledge and content services.<\/p>\n\n\n\n

Microsoft\u2019s opportunity to serve customers at scale becomes real with privacy-compliant access to a large set of enterprise data in the Microsoft Graph (opens in new tab)<\/span><\/a>. The graph currently contains context from over 18 trillion resources, including emails, events, users, files and more. This massive repository can be compliantly leveraged to create powerful knowledge extraction models and create a bespoke set of topics for each customer. Today, data from the Graph is used to power the knowledge experiences for all customers of Microsoft Viva Topics.<\/p>\n\n\n\n

What are Topics? <\/em><\/p>\n\n\n\n

Viva Topics automatically creates an enterprise knowledge base structured around topics,<\/strong> such as projects, events, and organizations, with related metadata about people, content, acronyms, definitions, conversations and related topics. Topic cards connect the user to knowledge in apps like SharePoint, Teams, Microsoft Search and more. Read more about topic discovery (opens in new tab)<\/span><\/a>.<\/p>\n\n\n\n

\"Figure
Figure 1: Viva Topics delivers knowledge in context throughout Microsoft 365.<\/figcaption><\/figure>\n\n\n\n

Knowledge should not be locked in a repository but instead should be useful and consumable where it is needed. Putting topics in the flow of everyday work makes knowledge from across an organization more easily discoverable and actionable.<\/p>\n\n\n\n

Part 2: Alexandria in Viva Topics<\/h2>\n\n\n\n

Viva Topics brings together the richness and breadth of technology investments from across Microsoft.<\/p>\n\n\n\n

The different technologies have their own specialty for the rich metadata they can contribute to Viva Topics. For example, one technology specializes in delivering a definition for a topic or bringing in a relevant synopsis from Wikipedia. Alexandria technology is then used to bring all this knowledge together into one coherent knowledge base.<\/p>\n\n\n\n

Naomi Moneypenny, who leads Viva Topics product development, describes the collaboration: \u201cThe Project Alexandria team and technologies have been instrumental to delivering the innovative experiences for customers in Viva Topics. We value their highly collaborative approach to working with many other specialist teams across Microsoft.\u201d<\/p>\n\n\n\n

Project Alexandria plays two fundamental roles in the production of Viva Topics:<\/p>\n\n\n\n

  1. Topic mining: <\/em>This process includes the discovery of topics in documents, as well as the maintenance and upkeep of those topics as documents change or as new documents are created.<\/li>
  2. Topic linking:<\/em> The process of bringing together knowledge from a variety of sources into a single unified knowledge base.<\/li><\/ol>\n\n\n\n

    Alexandria achieves both tasks through a machine learning approach called probabilistic programming, <\/em>which uses a special kind of program to describe the process by which topics and their properties are mentioned in documents. The same program can then effectively be run backwards to extract topics from documents. A big advantage of this approach is that information about the task is included in the probabilistic program itself, rather than using large amounts of labelled data. This enables the process to run unsupervised \u2013 it can perform these tasks automatically without any human input. For a technical description of the Alexandria probabilistic program see the award winning Alexandria Research Paper<\/a>.<\/p>\n\n\n\n

    \n\t
    \n\t\t
    \n\t\t\t\t\t\tPublication<\/span>\n\t\t\tAlexandria: Unsupervised High-Precision Knowledge Base Construction using a Probabilistic Program<\/span> <\/span><\/a>\t\t\t\t\t<\/div>\n\t<\/article>\n<\/div>\n\n\n\n

    An overview of the process of mining and linking is shown in the picture below. Topic mining itself consists of two stages\u2013finding relevant pieces of text and extracting knowledge from them using a probabilistic parser. Let\u2019s have a more detailed look at each of these stages.<\/p>\n\n\n\n

    \"Figure
    Figure 2: The Alexandria pipeline \u2013 from unstructured text to structured knowledge. <\/figcaption><\/figure>\n\n\n\n

    Alexandria topic mining<\/em><\/p>\n\n\n\n

    To narrow down the information needed to be processed, Alexandria first runs a query engine to extract snippets from each document with the high probability of containing knowledge. This query engine can scale to run over billions of documents. For example, say that the model was parsing a document related to a company initiative, Project Alpha. The query engine would extract phases likely to contain entity information, such as \u201cProject Alpha will be released on 9\/12\/2021\u201d or \u201cProject Alpha is run by Jane Smith.\u201d<\/p>\n\n\n\n

    The parsing process requires identifying which parts of the text snippet correspond to specific property values. In this approach, known as template matching, the model looks for a set of patterns, or templates, such as \u201cProject {name} will be released on {date}\u201d. By matching a template to the text, the process can identify which parts of the text correspond with certain properties. Alexandria performs unsupervised learning to create templates from both structured and unstructured text, and the model can readily work with thousands of templates. To improve coverage, templates can be augmented by a more sophisticated neural language model, such as an LSTM or a transformer, at increased computational cost.<\/p>\n\n\n\n

    We can extract textual property values from \u201cProject Alpha will be released in 9\/12\/2021\u201d using the template \u201cProject {name} will be released on {completion_date}\u201d. This gives the text \u201cAlpha\u201d for name and \u201c9\/12\/2021\u201d for completion_date. These strings come with inherent uncertainty in terms of what values they represent. For example, the completion_date string could represent the date 12th September 2021 or the date 9th December 2021, depending on which date format was used (D\/M\/YYYY or M\/D\/YYYY). To handle such uncertainty, Alexandria represents values as probability distributions over strongly typed values \u2013 in this case, dates. Here, the probability distribution would retain both possible dates. If another source contained the text \u201cProject Alpha will be released on September 12\u201d, then one of the possibilities would be eliminated, leaving the extracted date as 12th September 2021.<\/p>\n\n\n\n

    This parsing process is achieved by running the probabilistic program backwards using the Infer.NET machine learning framework. Infer.NET uses a process of message-parsing, as shown in the example below, which illustrates parsing the text \u201cProject Alpha will be released on 9\/12\/2021\u201d:<\/p>\n\n\n\n

    \"Figure
    Figure 3: A factor graph showing the message passing for probabilistic understanding of the sample text \u2018Project Alpha will be released on 9\/12\/2021\u2019<\/figcaption><\/figure>\n\n\n\n

    The output of probabilistic parsing is a probabilistic entity consisting of a set of properties and distributions over the values of these properties. Each parsed text extract gives rise to one such probabilistic entity, leading to a very large number. To combine these together, they are passed to the next stage: linking.<\/p>\n\n\n\n

    Alexandria topic Linking<\/em><\/p>\n\n\n\n

    The set of probabilistic entities coming out of the parsing stage typically contains many items which correspond to the same real-world entity. For example, we may have these two entities:<\/p>\n\n\n\n