{"id":1079073,"date":"2024-09-09T09:15:55","date_gmt":"2024-09-09T16:15:55","guid":{"rendered":""},"modified":"2024-11-05T06:40:58","modified_gmt":"2024-11-05T14:40:58","slug":"graphrag-auto-tuning-provides-rapid-adaptation-to-new-domains","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/graphrag-auto-tuning-provides-rapid-adaptation-to-new-domains\/","title":{"rendered":"GraphRAG auto-tuning provides rapid adaptation to new domains"},"content":{"rendered":"\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"576\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/09\/GraphRag-3-BlogHeroFeature-1400x788-1-1024x576.png\" alt=\"GraphRAG hero: white circles linked on a blue to green gradient.\" class=\"wp-image-1083297\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/09\/GraphRag-3-BlogHeroFeature-1400x788-1-1024x576.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/09\/GraphRag-3-BlogHeroFeature-1400x788-1-300x169.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/09\/GraphRag-3-BlogHeroFeature-1400x788-1-768x432.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/09\/GraphRag-3-BlogHeroFeature-1400x788-1-1066x600.png 1066w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/09\/GraphRag-3-BlogHeroFeature-1400x788-1-655x368.png 655w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/09\/GraphRag-3-BlogHeroFeature-1400x788-1-240x135.png 240w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/09\/GraphRag-3-BlogHeroFeature-1400x788-1-640x360.png 640w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/09\/GraphRag-3-BlogHeroFeature-1400x788-1-960x540.png 960w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/09\/GraphRag-3-BlogHeroFeature-1400x788-1-1280x720.png 1280w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/09\/GraphRag-3-BlogHeroFeature-1400x788-1.png 1400w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>GraphRAG uses large language models (LLMs) to create a comprehensive knowledge graph that details entities and their relationships from any collection of text documents. This graph enables GraphRAG to leverage the semantic structure of the data and generate responses to complex queries that require a broad understanding of the entire text. In previous blog posts,\u00a0<a href=\"http:\/\/graphrag:%20Unlocking%20LLM%20discovery%20on%20narrative%20private%20data%20-%20Microsoft%20Research\/\" target=\"_blank\" rel=\"noreferrer noopener\"><\/a>we introduced <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/blog\/graphrag-unlocking-llm-discovery-on-narrative-private-data\/\">GraphRAG<\/a> and demonstrated how it could be <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/blog\/graphrag-new-tool-for-complex-data-discovery-now-on-github\/\">applied to news articles<\/a>. In this blog post, we show that it can also be tuned to any domain to enhance the quality of the results.<\/p>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-1 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<div class=\"annotations \" data-bi-aN=\"citation\">\n\t<ul class=\"annotations__list card depth-16 bg-body p-4 \">\n\t\t<li class=\"annotations__list-item\">\n\t\t\t\t\t\t<span class=\"annotations__type d-block text-uppercase font-weight-semibold text-neutral-300 small\">Blog<\/span>\n\t\t\t<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/blog\/graphrag-unlocking-llm-discovery-on-narrative-private-data\/\" target=\"_self\" class=\"annotations__link font-weight-semibold text-decoration-none\" data-bi-type=\"annotated-link\" aria-label=\"GraphRAG: Unlocking LLM discovery on narrative private data\" data-bi-aN=\"citation\" data-bi-cN=\"GraphRAG: Unlocking LLM discovery on narrative private data\">\n\t\t\t\tGraphRAG: Unlocking LLM discovery on narrative private data&nbsp;<span class=\"glyph-append glyph-append-chevron-right glyph-append-xsmall\"><\/span>\n\t\t\t<\/a>\n\t\t\t\t\t<\/li>\n\t<\/ul>\n<\/div>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<div class=\"annotations \" data-bi-aN=\"citation\">\n\t<ul class=\"annotations__list card depth-16 bg-body p-4 \">\n\t\t<li class=\"annotations__list-item\">\n\t\t\t\t\t\t<span class=\"annotations__type d-block text-uppercase font-weight-semibold text-neutral-300 small\">Blog<\/span>\n\t\t\t<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/blog\/graphrag-new-tool-for-complex-data-discovery-now-on-github\/\" target=\"_self\" class=\"annotations__link font-weight-semibold text-decoration-none\" data-bi-type=\"annotated-link\" aria-label=\"GraphRAG: New tool for complex data discovery now on GitHub\" data-bi-aN=\"citation\" data-bi-cN=\"GraphRAG: New tool for complex data discovery now on GitHub\">\n\t\t\t\tGraphRAG: New tool for complex data discovery now on GitHub&nbsp;<span class=\"glyph-append glyph-append-chevron-right glyph-append-xsmall\"><\/span>\n\t\t\t<\/a>\n\t\t\t\t\t<\/li>\n\t<\/ul>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<p>The knowledge graph creation process is called <em>indexing<\/em>. An LLM, guided by a set of domain-specific prompts, reads all the source content and extracts the relevant information, including entities and relationships, which are then used to construct the graph. For example, when analyzing news articles, entities like people, places, and organizations are important. Here,&nbsp;relationship types might include \u201clives in,\u201d \u201cleads,\u201d and \u201cowns.\u201d&nbsp;<\/p>\n\n\n\n<p>However, each domain has a different set of entity and relationship types. In the field of chemistry, for instance,&nbsp;entity types include molecules, enzymes, and reactions, while relationship types include \u201ccatalyzes\u201d and \u201creduces.\u201d Although our default news domain prompts in GraphRAG can produce a graph when applied to chemistry, they don\u2019t capture the specific content a chemist would expect.&nbsp;<\/p>\n\n\n\n<p>Manually creating and tuning a set of domain-specific prompts is time-consuming. We know, as all the prompts used for news articles were generated manually. To streamline this process, we developed an automated tool that generates domain-specific prompts, which are tuned and ready to use. This tool follows a human-like approach; we provided an LLM with a sample of text data (e.g., 1% of 10,000 chemistry papers) and instructed it to produce the prompts it deemed most applicable to the content. Now, with these automatically generated and tuned prompts, we can immediately apply GraphRAG to a new domain of our choosing, confident that we\u2019ll get high-quality results.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"indexing-prompts-in-graphrag\">Indexing prompts in GraphRAG<\/h2>\n\n\n\n<p>During the indexing process, GraphRAG uses a set of prompts to instruct the LLM as it reads through the source content, extracting and organizing relevant information to construct the knowledge graph. Three of GraphRAG\u2019s main indexing prompts include:&nbsp;<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Entity and relationship extraction<\/strong>: Identifies all the entities present and establishes relationships among them.<\/li>\n\n\n\n<li><strong>Entity and relationship summarization<\/strong>: Consolidates instances of entities and their relationships into a single, concise description.&nbsp;<\/li>\n\n\n\n<li><strong>Community report generation<\/strong>: Generates a summary report for each community within the constructed knowledge graph.&nbsp;<\/li>\n<\/ol>\n\n\n\n<p>These prompts work best when tuned to the domain of the source content. In the rest of this blog post, we focus on domain tuning of the first prompt, \u201cEntity and relationship extraction,\u201d but similar methods apply to the second and third prompts.&nbsp;<\/p>\n\n\n\n<p>Below, Code Sample 1<strong> <\/strong>shows the default few-shot prompt for entity and relationship extraction. This prompt was originally created for news articles and is the default form found in the GraphRAG <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/github.com\/microsoft\/graphrag\" target=\"_blank\" rel=\"noreferrer noopener\">GitHub repository<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. The extraction prompt comprises four sections:&nbsp;<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Extraction instructions<\/strong>: Provide the LLM with guidance on how to perform extraction.&nbsp;<\/li>\n\n\n\n<li><strong>Few-shot examples<\/strong>: Supply the LLM real examples of the types of entities and relationships worth extracting.<\/li>\n\n\n\n<li><strong>Real data<\/strong>: Serves as a placeholder that is replaced by chunks of source content.&nbsp;<\/li>\n\n\n\n<li><strong>Gleanings<\/strong>: Encourage the LLM, over multiple turns, to extract additional information.&nbsp;<\/li>\n<\/ul>\n\n\n\n<p>The goal of auto-tuning is to create customized few-shot examples<strong> <\/strong>that are appropriate for the given domain. The default prompt, shown in Code Sample 1, provides the LLM with fifteen entity examples and twelve relationship examples, but it is notably restricted to just a few specific entity types: organization, geography, and person. These samples were invented by our team and do not represent real entities.<\/p>\n\n\n\n\n\n<div class=\"wp-block-columns has-white-color has-black-background-color has-text-color has-background has-link-color wp-elements-e597922dc3dd85531c6c1a5076660985 is-layout-flex wp-container-core-columns-is-layout-2 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<b>Goal<\/b><br \/>\n<i>Given a text document that is potentially relevant to this activity and a list of entity types, identify all entities of those types from the text and all relationships among the identified entities.<\/i><br \/><br \/>\n\n<b>Steps<\/b><br \/>\n<ol>\n    <li><b>Identify all entities.<\/b> For each identified entity, extract the following information:<\/li>\n    <ul>\n        <li><i>entity_name<\/i>: Name of the entity, capitalized<\/li>\n        <li><i>entity_type<\/i>: One of the following types: [{entity_types}]<\/li>\n        <li><i>entity_description<\/i>: Comprehensive description of the entity&#8217;s attributes and activities<\/li>\n    <\/ul>\n    Format each entity as (&#8220;<span style=\"color: orange\">entity<\/span>&#8220;, <b><entity_name><\/b>, <b><entity_type><\/b>, <i><entity_description><\/i>)<br \/><br \/>\n\n    <li><b>From the entities identified in step 1, identify all pairs of (<i>source_entity<\/i>, <i>target_entity<\/i>) that are <b>*clearly related*<\/b> to each other.<\/b><\/li>\n    <ul>\n        <li><i>source_entity<\/i>: name of the source entity, as identified in step 1<\/li>\n        <li><i>target_entity<\/i>: name of the target entity, as identified in step 1<\/li>\n        <li><i>relationship_description<\/i>: explanation as to why you think the source entity and the target entity are related to each other<\/li>\n        <li><i>relationship_strength<\/i>: a numeric score indicating strength of the relationship between the source entity and target entity<\/li>\n    <\/ul>\n    Format each relationship as (&#8220;<span style=\"color: lightgreen\">relationship<\/span>&#8220;, <b><source_entity><\/b>, <b><target_entity><\/b>, <i><relationship_description><\/i>, <b><relationship_strength><\/b>)<br \/><br \/>\n\n    <li><b>Return output in English as a single list of all the entities and relationships identified in steps 1 and 2.<\/b> Use <b>{record_delimiter}<\/b> as the list delimiter.<\/li><br \/>\n\n    <li><b>When finished, output <b>{completion_delimiter}<\/b><\/b><\/li>\n<\/ol>\n\n######################\n<b>Examples<\/b>\n######################<br \/><br \/>\n\n<b>Example 1<\/b>:<br \/>\n<i>Entity_types<\/i>: ORGANIZATION,PERSON<br \/>\n<i>Text<\/i>:<br \/>\n<i>The Verdantis&#8217;s Central Institution is scheduled to meet on Monday and Thursday, with the institution planning to release its latest policy decision on Thursday at 1:30 p.m. PDT, followed by a press conference where Central Institution Chair Martin Smith will take questions. Investors expect the Market Strategy Committee to hold its benchmark interest rate steady in a range of 3.5%-3.75%.<\/i><br \/><br \/>\n\n<b>Output<\/b>:<br \/>\n(<span style=\"color: orange\">&#8220;entity&#8221;<\/span>, <b>CENTRAL INSTITUTION<\/b>, ORGANIZATION, <i>The Central Institution is the Federal Reserve of Verdantis, which is setting interest rates on Monday and Thursday<\/i>)<br \/>\n(<span style=\"color: orange\">&#8220;entity&#8221;<\/span>, <b>MARTIN SMITH<\/b>, PERSON, <i>Martin Smith is the chair of the Central Institution<\/i>)<br \/>\n(<span style=\"color: orange\">&#8220;entity&#8221;<\/span>, <b>MARKET STRATEGY COMMITTEE<\/b>, ORGANIZATION, <i>The Central Institution committee makes key decisions about interest rates and the growth of Verdantis&#8217;s money supply<\/i>)<br \/><br \/>\n\n(<span style=\"color: lightgreen\">&#8220;relationship&#8221;<\/span>, <b>MARTIN SMITH<\/b> &#8211; <b>CENTRAL INSTITUTION<\/b>, <i>Martin Smith is the Chair of the Central Institution and will answer questions at a press conference<\/i>, 9)<br \/><br \/>\n\n<b>Example 2<\/b>:<br \/>\n<i>Entity_types<\/i>: ORGANIZATION<br \/>\n<i>Text<\/i>:<br \/>\n<i>TechGlobal&#8217;s (TG) stock skyrocketed in its opening day on the Global Exchange Thursday. But IPO experts warn that the semiconductor corporation&#8217;s debut on the public markets isn&#8217;t indicative of how other newly listed companies may perform.<\/i><br \/>\n<i>TechGlobal, a formerly public company, was taken private by Vision Holdings in 2014. The well-established chip designer says it powers 85% of premium smartphones.<\/i><br \/><br \/>\n\n<b>Output<\/b>:<br \/>\n(<span style=\"color: orange\">&#8220;entity&#8221;<\/span>, <b>TECHGLOBAL<\/b>, ORGANIZATION, <i>TechGlobal is a stock now listed on the Global Exchange which powers 85% of premium smartphones<\/i>)<br \/>\n(<span style=\"color: orange\">&#8220;entity&#8221;<\/span>, <b>VISION HOLDINGS<\/b>, ORGANIZATION, <i>Vision Holdings is a firm that previously owned TechGlobal<\/i>)<br \/><br \/>\n\n(<span style=\"color: lightgreen\">&#8220;relationship&#8221;<\/span>, <b>TECHGLOBAL<\/b> &#8211; <b>VISION HOLDINGS<\/b>, <i>Vision Holdings formerly owned TechGlobal from 2014 until present<\/i>, 5)<br \/><br \/>\n\n<b>Example 3<\/b>:<br \/>\n<i>Entity_types<\/i>: ORGANIZATION,GEO,PERSON<br \/>\n<i>Text<\/i>:<br \/>\n<i>Five Aurelians jailed for 8 years in Firuzabad and widely regarded as hostages are on their way home to Aurelia.<\/i><br \/>\n<i>The swap orchestrated by Quintara was finalized when $8bn of Firuzi funds were transferred to financial institutions in Krohaara, the capital of Quintara.<\/i><br \/>\n<i>The exchange initiated in Firuzabad&#8217;s capital, Tiruzia, led to the four men and one woman, who are also Firuzi nationals, boarding a chartered flight to Krohaara.<\/i><br \/>\n<i>They were welcomed by senior Aurelian officials and are now on their way to Aurelia&#8217;s capital, Cashion.<\/i><br \/>\n<i>The Aurelians include 39-year-old businessman Samuel Namara, who has been held in Tiruzia&#8217;s Alhamia Prison, as well as journalist Durke Bataglani, 59, and environmentalist Meggie Tazbah, 53, who also holds Bratinas nationality.<\/i><br \/><br \/>\n\n<b>Output<\/b>:<br \/>\n(<span style=\"color: orange\">&#8220;entity&#8221;<\/span>, <b>FIRUZABAD<\/b>, GEO, <i>Firuzabad held Aurelians as hostages<\/i>)<br \/>\n(<span style=\"color: orange\">&#8220;entity&#8221;<\/span>, <b>AURELIA<\/b>, GEO, <i>Country seeking to release hostages<\/i>)<br \/>\n(<span style=\"color: orange\">&#8220;entity&#8221;<\/span>, <b>QUINTARA<\/b>, GEO, <i>Country that negotiated a swap of money in exchange for hostages<\/i>)<br \/>\n(<span style=\"color: orange\">&#8220;entity&#8221;<\/span>, <b>TIRUZIA<\/b>, GEO, <i>Capital of Firuzabad where the Aurelians were being held<\/i>)<br \/>\n(<span style=\"color: orange\">&#8220;entity&#8221;<\/span>, <b>KROHAARA<\/b>, GEO, <i>Capital city in Quintara<\/i>)<br \/>\n(<span style=\"color: orange\">&#8220;entity&#8221;<\/span>, <b>CASHION<\/b>, GEO, <i>Capital city in Aurelia<\/i>)<br \/>\n(<span style=\"color: orange\">&#8220;entity&#8221;<\/span>, <b>SAMUEL NAMARA<\/b>, PERSON, <i>Aurelian who spent time in Tiruzia&#8217;s Alhamia Prison<\/i>)<br \/>\n(<span style=\"color: orange\">&#8220;entity&#8221;<\/span>, <b>ALHAMIA PRISON<\/b>, GEO, <i>Prison in Tiruzia<\/i>)<br \/>\n(<span style=\"color: orange\">&#8220;entity&#8221;<\/span>, <b>DURKE BATAGLANI<\/b>, PERSON, <i>Aurelian journalist who was held hostage<\/i>)<br \/>\n(<span style=\"color: orange\">&#8220;entity&#8221;<\/span>, <b>MEGGIE TAZBAH<\/b>, PERSON, <i>Bratinas national and environmentalist who was held hostage<\/i>)<br \/><br \/>\n\n(<span style=\"color: lightgreen\">&#8220;relationship&#8221;<\/span>, <b>FIRUZABAD<\/b> &#8211; <b>AURELIA<\/b>, <i>Firuzabad negotiated a hostage exchange with Aurelia<\/i>, 2)<br \/>\n(<span style=\"color: lightgreen\">&#8220;relationship&#8221;<\/span>, <b>QUINTARA<\/b> &#8211; <b>AURELIA<\/b>, <i>Quintara brokered the hostage exchange between Firuzabad and Aurelia<\/i>, 2)<br \/>\n(<span style=\"color: lightgreen\">&#8220;relationship&#8221;<\/span>, <b>QUINTARA<\/b> &#8211; <b>FIRUZABAD<\/b>, <i>Quintara brokered the hostage exchange between Firuzabad and Aurelia<\/i>, 2)<br \/>\n(<span style=\"color: lightgreen\">&#8220;relationship&#8221;<\/span>, <b>SAMUEL NAMARA<\/b> &#8211; <b>ALHAMIA PRISON<\/b>, <i>Samuel Namara was a prisoner at Alhamia prison<\/i>, 8)<br \/>\n(<span style=\"color: lightgreen\">&#8220;relationship&#8221;<\/span>, <b>SAMUEL NAMARA<\/b> &#8211; <b>MEGGIE TAZBAH<\/b>, <i>Samuel Namara and Meggie Tazbah were exchanged in the same hostage release<\/i>, 2)<br \/>\n(<span style=\"color: lightgreen\">&#8220;relationship&#8221;<\/span>, <b>SAMUEL NAMARA<\/b> &#8211; <b>DURKE BATAGLANI<\/b>, <i>Samuel Namara and Durke Bataglani were exchanged in the same hostage release<\/i>, 2)<br \/>\n(<span style=\"color: lightgreen\">&#8220;relationship&#8221;<\/span>, <b>MEGGIE TAZBAH<\/b> &#8211; <b>DURKE BATAGLANI<\/b>, <i>Meggie Tazbah and Durke Bataglani were exchanged in the same hostage release<\/i>, 2)<br \/>\n(<span style=\"color: lightgreen\">&#8220;relationship&#8221;<\/span>, <b>SAMUEL NAMARA<\/b> &#8211; <b>FIRUZABAD<\/b>, <i>Samuel Namara was a hostage in Firuzabad<\/i>, 2)<br \/>\n(<span style=\"color: lightgreen\">&#8220;relationship&#8221;<\/span>, <b>MEGGIE TAZBAH<\/b> &#8211; <b>FIRUZABAD<\/b>, <i>Meggie Tazbah was a hostage in Firuzabad<\/i>, 2)<br \/>\n(<span style=\"color: lightgreen\">&#8220;relationship&#8221;<\/span>, <b>DURKE BATAGLANI<\/b> &#8211; <b>FIRUZABAD<\/b>, <i>Durke Bataglani was a hostage in Firuzabad<\/i>, 2)<br \/><br \/>\n\n######################<br \/>\n<b>Real Data<\/b><br \/>\n######################<br \/><br \/>\n\n<b>Entity_types<\/b>: {entity_types}<br \/>\n<b>Text<\/b>: {input_text}<br \/><br \/>\n<b>Output<\/b>:<br \/>\n<\/div>\n<\/div>\n\n\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p>Customization can be difficult and time-consuming\u2014in both determining the right set of entities and relationships and in&nbsp;carefully constructing all the prompts for a specific domain. We address these challenges with auto-tuning.<\/p>\n\n\n\n\t<div class=\"border-bottom border-top border-gray-300 mt-5 mb-5 msr-promo text-center text-md-left alignwide\" data-bi-aN=\"promo\" data-bi-id=\"1115760\">\n\t\t\n\n\t\t<p class=\"msr-promo__label text-gray-800 text-center text-uppercase\">\n\t\t<span class=\"px-4 bg-white display-inline-block font-weight-semibold small\">Microsoft research podcast<\/span>\n\t<\/p>\n\t\n\t<div class=\"row pt-3 pb-4 align-items-center\">\n\t\t\t\t\t\t<div class=\"msr-promo__media col-12 col-md-5\">\n\t\t\t\t<a class=\"bg-gray-300\" href=\"https:\/\/www.microsoft.com\/en-us\/research\/podcast\/neurips-2024-the-co-evolution-of-ai-and-systems-with-lidong-zhou\/\" aria-label=\"NeurIPS 2024: The co-evolution of AI and systems with Lidong Zhou\" data-bi-cN=\"NeurIPS 2024: The co-evolution of AI and systems with Lidong Zhou\" target=\"_blank\">\n\t\t\t\t\t<img decoding=\"async\" class=\"w-100 display-block\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/12\/Lidong-and-Eliza_Abstracts_Hero_Feature_No_Text_1400x788.jpg\" alt=\"Illustrated headshots of Lidong Zhou and Eliza Strickland\" \/>\n\t\t\t\t<\/a>\n\t\t\t<\/div>\n\t\t\t\n\t\t\t<div class=\"msr-promo__content p-3 px-5 col-12 col-md\">\n\n\t\t\t\t\t\t\t\t\t<h2 class=\"h4\">NeurIPS 2024: The co-evolution of AI and systems with Lidong Zhou<\/h2>\n\t\t\t\t\n\t\t\t\t\t\t\t\t<p class=\"large\">Just after his NeurIPS 2024 keynote on the co-evolution of systems and AI, Microsoft CVP Lidong Zhou joins the podcast to discuss how rapidly advancing AI impacts the systems supporting it and the opportunities to use AI to enhance systems engineering itself.<\/p>\n\t\t\t\t\n\t\t\t\t\t\t\t\t<div class=\"wp-block-buttons justify-content-center justify-content-md-start\">\n\t\t\t\t\t<div class=\"wp-block-button\">\n\t\t\t\t\t\t<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/podcast\/neurips-2024-the-co-evolution-of-ai-and-systems-with-lidong-zhou\/\" class=\"btn btn-brand glyph-append glyph-append-chevron-right\" aria-label=\"Listen now\" data-bi-cN=\"NeurIPS 2024: The co-evolution of AI and systems with Lidong Zhou\" target=\"_blank\">\n\t\t\t\t\t\t\tListen now\t\t\t\t\t\t<\/a>\n\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t\t\t<\/div><!--\/.msr-promo__content-->\n\t<\/div><!--\/.msr-promo__inner-wrap-->\n\t<\/div><!--\/.msr-promo-->\n\t\n\n\n<h2 class=\"wp-block-heading\" id=\"auto-tuning-architecture\">Auto-tuning architecture<\/h2>\n\n\n\n<p>Auto-tuning takes source content and produces an automatically generated set of domain-specific prompts. Figure 1 shows the architecture of the auto-tuning process for our three main indexing prompts.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1400\" height=\"610\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/08\/Fig1_Algorithm-conceptual-diagram.png\" alt=\"GraphRAG | Figure 1. Algorithm Conceptual Diagram\" class=\"wp-image-1079931\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/08\/Fig1_Algorithm-conceptual-diagram.png 1400w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/08\/Fig1_Algorithm-conceptual-diagram-300x131.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/08\/Fig1_Algorithm-conceptual-diagram-1024x446.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/08\/Fig1_Algorithm-conceptual-diagram-768x335.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/08\/Fig1_Algorithm-conceptual-diagram-240x105.png 240w\" sizes=\"auto, (max-width: 1400px) 100vw, 1400px\" \/><figcaption class=\"wp-element-caption\">Figure 1.Diagram of the algorithm<\/figcaption><\/figure>\n\n\n\n<p>We start by sending a sample of the source content to the LLM, which first identifies the domain and then creates an appropriate persona\u2014used with downstream agents to tune the extraction process. Once the domain and persona are established, several processes occur in parallel to create our custom indexing prompts. This way, the few-shot prompts are generated based on the actual domain data and from the persona\u2019s perspective.&nbsp;<\/p>\n\n\n\n<p>To illustrate how this works in practice for entity and relationship extraction, let\u2019s shift to a new domain, the <em>Behind the Tech<\/em> podcast.&nbsp;<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"auto-tuning-the-behind-the-tech-podcast\">Auto-tuning the <em>Behind the Tech <\/em>podcast&nbsp;<\/h3>\n\n\n\n<p>Kevin Scott, CTO of Microsoft, hosts a podcast series called <em>Behind the Tech<\/em> where he interviews a wide variety of tech innovators. Given its focus on society and technology, this dataset would benefit from its own set of indexing prompts distinct from general news. While the default prompt works with podcast transcripts, we can achieve much higher precision with customized podcast-tuned prompts.<\/p>\n\n\n\n<p>To demonstrate this, we use Code Sample 2, which contains a sample raw text input chunk from the podcast.&nbsp;<\/p>\n\n\n\n<p><strong>Code Sample 2: Podcast data sample<\/strong><\/p>\n\n\n\n<div class=\"wp-block-columns has-white-color has-black-background-color has-text-color has-background has-link-color wp-elements-b79e075abf56286b1b57674fdb808bd4 is-layout-flex wp-container-core-columns-is-layout-3 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<b>KEVIN SCOTT<\/b>: Our guest today is <b>Ashley Llorens<\/b>. Ashley is a scientist, engineer, and hip-hop artist. He worked for two decades at <b>Johns Hopkins Applied Physics Laboratory<\/b>, developing novel AI technologies and served as the founding chief of the lab\u2019s <b>Intelligent Systems Center<\/b>.\r\n<br\/>\r\nHe was recently nominated by the <b>White House Office of Science and Technology Policy<\/b> to serve as an AI expert for the <b>Global Partnership on AI<\/b>. Besides his career in engineering, Ashley actually began his career as a hip-hop artist and serves as a voting member of the <b>Recording Academy<\/b> for the <b>Grammy Awards<\/b>.\r\n<br\/>\r\nAbout a month ago, Ashley joined <b>Microsoft<\/b> as a vice president, distinguished scientist, and managing director for <b>Microsoft Research<\/b>. Welcome to the show, Ashley \u2013 and to Microsoft.\r\n<br\/><br\/>\r\n<b>ASHLEY LLORENS<\/b>: Thanks so much, Kevin, great to be here.\n<\/div>\n<\/div>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p>The first step in adapting GraphRAG to the target domain is to generate a persona for the LLM to assume when generating examples for each prompt. As it adapts to the domain from the podcast text sample input, the LLM produces the following:&nbsp;<\/p>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-4 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\" style=\"flex-basis:10%\"><\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\" style=\"flex-basis:80%\">\n<p class=\"has-text-align-left\"><em>\u201cYou are an expert in social network analysis with a focus on technology and innovation communities. You are skilled at mapping and interpreting complex networks, identifying key influencers, and understanding the dynamics of community interactions. You are adept at helping organizations and researchers identify the relations and structure within specific domains, particularly in rapidly evolving fields like technology and innovation.\u201d<\/em>&nbsp;<\/p>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\" style=\"flex-basis:10%\"><\/div>\n<\/div>\n\n\n\n<p>Using the persona as part of the prompt, along with the text sample input, we allow the LLM to generate the entity and relationship-extraction prompt, including custom examples. Our indexing prompt is now automatically tuned to our new domain, as shown in Code Sample 3.&nbsp;<\/p>\n\n\n\n\n\n<div class=\"wp-block-columns has-white-color has-black-background-color has-text-color has-background has-link-color wp-elements-3870564cb4fd77c5a686fe32551d4d8f is-layout-flex wp-container-core-columns-is-layout-5 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<b>Goal<\/b><br \/>\n<i>Given a text document that is potentially relevant to this activity, first identify all entities needed from the text in order to capture the information and ideas in the text.<\/i><br \/>\n<i>Next, report all relationships among the identified entities.<\/i><br \/><br \/>\n\n<b>Steps<\/b><br \/>\n<ol>\n    <li><b>Identify all entities.<\/b> For each identified entity, extract the following information:<\/li>\n    <ul>\n        <li><i>entity_name:<\/i> Name of the entity, capitalized<\/li>\n        <li><i>entity_type:<\/i> Suggest several labels or categories for the entity. The categories should not be specific, but should be as general as possible.<\/li>\n        <li><i>entity_description:<\/i> Comprehensive description of the entity&#8217;s attributes and activities<\/li>\n    <\/ul>\n    Format each entity as (&#8220;<span style=\"color: orange\">&#8220;entity&#8221;<\/span>&#8220;, <b><entity_name><\/b>, <entity_type>, <i><entity_description><\/i>)<br \/><br \/>\n\n    <li><b>From the entities identified in step 1, identify all pairs of (<i>source_entity<\/i>, <i>target_entity<\/i>) that are <b>*clearly related*<\/b> to each other.<\/b> For each pair of related entities, extract the following information:<\/li>\n    <ul>\n        <li><i>source_entity:<\/i> name of the source entity, as identified in step 1<\/li>\n        <li><i>target_entity:<\/i> name of the target entity, as identified in step 1<\/li>\n        <li><i>relationship_description:<\/i> explanation as to why you think the source entity and the target entity are related to each other<\/li>\n        <li><i>relationship_strength:<\/i> a numeric score indicating strength of the relationship between the source entity and target entity<\/li>\n    <\/ul>\n    Format each relationship as (&#8220;<span style=\"color: lightgreen\">&#8220;relationship&#8221;<\/span>&#8220;, <b><source_entity><\/b>, <b><target_entity><\/b>, <i><relationship_description><\/i>, <b><relationship_strength><\/b>)<br \/><br \/>\n\n    <li><b>Return output in English as a single list of all the entities and relationships identified in steps 1 and 2.<\/b> Use <b>{record_delimiter}<\/b> as the list delimiter.<\/li><br \/>\n\n    <li><b>When finished, output <\/b><\/li>\n<\/ol>\n\n<b>Examples<\/b><br \/>\n#############################<br \/><br \/>\n\n<b>Example 1:<\/b><br \/>\n<i>Text:<\/i><br \/>\n<i>CHRIS URMSON: Yeah, no, and it is, right? I think one of the things that people outside of Silicon Valley who haven\u2019t been here don\u2019t realize is that it\u2019s not really. That, like, you know, people talk about Silicon Valley engineers being risk-takers. I think it\u2019s actually the opposite. It\u2019s the realization that if you go and try one of these things and you\u2019re actually good at what you do, if it fails, it fails. You\u2019ll have a job the next day at somewhere else, right? And you\u2019ll have this wealth of experience that people will value. And I think that is something that it\u2019s hard, you know, I\u2019ll categorize this as you know east coast people but, you know, kind of more conventional business folks haven\u2019t &#8212; don\u2019t kind of have that sense of the opportunities that are around. And maybe we\u2019ve just been here during a particularly<\/i><br \/><br \/>\n<i>Output:<\/i><br \/>\n(<span style=\"color: orange\">&#8220;entity&#8221;<\/span>, <b>CHRIS URMSON<\/b>, PERSON, <i>Chris Urmson is a speaker discussing the culture and dynamics of Silicon Valley, particularly the attitude towards risk and failure<\/i>)<br \/>\n(<span style=\"color: orange\">&#8220;entity&#8221;<\/span>, <b>SILICON VALLEY<\/b>, LOCATION, <i>A region in California known for its technology industry and innovative environment, where engineers are perceived as risk-takers<\/i>)<br \/>\n(<span style=\"color: orange\">&#8220;entity&#8221;<\/span>, <b>SILICON VALLEY ENGINEERS<\/b>, GROUP, <i>Engineers working in Silicon Valley, characterized by a culture that values risk-taking and resilience in the face of failure<\/i>)<br \/>\n(<span style=\"color: orange\">&#8220;entity&#8221;<\/span>, <b>EAST COAST PEOPLE<\/b>, GROUP, <i>People from the East Coast of the United States, implied to have a more conventional and less risk-tolerant approach to business compared to those in Silicon Valley<\/i>)<br \/><br \/>\n(<span style=\"color: lightgreen\">&#8220;relationship&#8221;<\/span>, <b>CHRIS URMSON<\/b> &#8211; <b>SILICON VALLEY<\/b>, <i>Chris Urmson discusses the culture and dynamics of Silicon Valley, emphasizing the local attitude towards risk and failure<\/i>, 8)<br \/>\n(<span style=\"color: lightgreen\">&#8220;relationship&#8221;<\/span>, <b>SILICON VALLEY ENGINEERS<\/b> &#8211; <b>SILICON VALLEY<\/b>, <i>Silicon Valley Engineers are part of the Silicon Valley ecosystem, embodying the local culture of risk-taking and resilience<\/i>, 9)<br \/>\n(<span style=\"color: lightgreen\">&#8220;relationship&#8221;<\/span>, <b>EAST COAST PEOPLE<\/b> &#8211; <b>SILICON VALLEY<\/b>, <i>East Coast People are contrasted with Silicon Valley individuals in terms of business culture and risk tolerance<\/i>, 7)<br \/>\n<br \/><br \/>\n\n<b>Example 2:<\/b><br \/>\n<i>Text:<\/i><br \/>\n<i>to ask Dr. Jemison that I think for her, and for me, space was this idea that really inspired us, I think, to go explore new frontiers. You know, it was this imagination of this thing that, you know, for me at least, like made me want to study computer science, because \u2013 like that was the most interesting terrestrial frontier I could go explore. And like you know, the thing that I wonder about is like what that frontier is, like what that inspiration will be for the next generation of scientists, and engineers and explorers. You know, like maybe it\u2019s synthetic biology, but it\u2019s going to be interesting to see whatever it is. [MUSIC] CHRISTINA WARREN: I couldn\u2019t agree more. I look forward to watching and learning from all of that. All right, well, that\u2019s a wrap. Thank you so much to Mae for joining us today. And to our listeners. Thank you for joining us and<\/i><br \/><br \/>\n<i>Output:<\/i><br \/>\n(<span style=\"color: orange\">&#8220;entity&#8221;<\/span>, <b>SPACE<\/b>, CONCEPT, <i>Space is described as an inspiring concept that motivates exploration and study in new frontiers, particularly in science and technology<\/i>)<br \/>\n(<span style=\"color: orange\">&#8220;entity&#8221;<\/span>, <b>COMPUTER SCIENCE<\/b>, FIELD, <i>Computer science is highlighted as an interesting terrestrial frontier that the speaker was motivated to explore due to the inspiration from space<\/i>)<br \/>\n(<span style=\"color: orange\">&#8220;entity&#8221;<\/span>, <b>SYNTHETIC BIOLOGY<\/b>, FIELD, <i>Synthetic biology is mentioned as a potential inspiring frontier for the next generation of scientists, engineers, and explorers<\/i>)<br \/>\n(<span style=\"color: orange\">&#8220;entity&#8221;<\/span>, <b>CHRISTINA WARREN<\/b>, PERSON, <i>Christina Warren is the speaker who expresses agreement and looks forward to learning from the developments in new scientific frontiers<\/i>)<br \/>\n(<span style=\"color: orange\">&#8220;entity&#8221;<\/span>, <b>MAE<\/b>, PERSON, <i>Mae is mentioned as a guest who joined Christina Warren in the discussion about future scientific frontiers<\/i>)<br \/><br \/>\n(<span style=\"color: lightgreen\">&#8220;relationship&#8221;<\/span>, <b>SPACE<\/b> &#8211; <b>COMPUTER SCIENCE<\/b>, <i>Space as a concept inspired the speaker to study computer science<\/i>, 8)<br \/>\n(<span style=\"color: lightgreen\">&#8220;relationship&#8221;<\/span>, <b>CHRISTINA WARREN<\/b> &#8211; <b>MAE<\/b>, <i>Christina Warren thanks Mae for joining the discussion<\/i>, 7)<br \/>\n<br \/><br \/>\n\n<b>Example 3:<\/b><br \/>\n<i>Text:<\/i><br \/>\n<i>educational outcomes for kids. And if you look at the children of immigrants in East San Jose or East Palo Alto here in the Silicon Valley, like often, the parents are working two, three jobs. Like, they\u2019re so busy that they have a hard time being engaged with their kids. And sometimes they don\u2019t speak English. And so, like, they don\u2019t even have the linguistic ability. And you can just imagine what a technology like this could do, where it really doesn\u2019t care what language you speak. It can bridge that gap between the parents and the teacher, and it can be there to help the parent understand where the roadblocks are for the child and to even potentially get very personalized to the child\u2019s needs and sort of help them on the things that they\u2019re struggling with. I think it\u2019s really, really very exciting. BILL GATES: Yeah, just the language barriers, we often forget about that. And that comes up in the developing world. India has<\/i><br \/><br \/>\n<i>Output:<\/i><br \/>\n(<span style=\"color: orange\">&#8220;entity&#8221;<\/span>, <b>EAST SAN JOSE<\/b>, GEO, <i>A region in Silicon Valley where many immigrant families reside, and parents often work multiple jobs<\/i>)<br \/>\n(<span style=\"color: orange\">&#8220;entity&#8221;<\/span>, <b>EAST PALO ALTO<\/b>, GEO, <i>A region in Silicon Valley known for its significant immigrant population and economic challenges<\/i>)<br \/>\n(<span style=\"color: orange\">&#8220;entity&#8221;<\/span>, <b>SILICON VALLEY<\/b>, GEO, <i>A major hub for technology and innovation in California, USA<\/i>)<br \/>\n(<span style=\"color: orange\">&#8220;entity&#8221;<\/span>, <b>BILL GATES<\/b>, PERSON, <i>Prominent technology leader and philanthropist who discusses the impact of technology on overcoming language barriers<\/i>)<br \/>\n(<span style=\"color: orange\">&#8220;entity&#8221;<\/span>, <b>TECHNOLOGY<\/b>, CONCEPT, <i>Refers to new technological solutions that can assist in bridging language gaps between parents and teachers, and provide personalized support to children<\/i>)<br \/><br \/>\n(<span style=\"color: lightgreen\">&#8220;relationship&#8221;<\/span>, <b>EAST SAN JOSE<\/b> &#8211; <b>SILICON VALLEY<\/b>, <i>East San Jose is a part of Silicon Valley<\/i>, 9)<br \/>\n(<span style=\"color: lightgreen\">&#8220;relationship&#8221;<\/span>, <b>EAST PALO ALTO<\/b> &#8211; <b>SILICON VALLEY<\/b>, <i>East Palo Alto is located within Silicon Valley<\/i>, 9)<br \/>\n(<span style=\"color: lightgreen\">&#8220;relationship&#8221;<\/span>, <b>TECHNOLOGY<\/b> &#8211; <b>BILL GATES<\/b>, <i>Bill Gates discusses the potential of technology to solve language barriers and educational challenges<\/i>, 8)<br \/>\n<br \/><br \/>\n\n<b>Real Data<\/b><br \/>\n######################<br \/><br \/>\n<i>Text:<\/i> {input_text}<br \/><br \/>\n<i>Output:<\/i>\n<\/div>\n<\/div>\n\n\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p>Here, the automatically generated prompt using the sample content from Code Sample 2 identifies fourteen entity examples across six different entity types (person, location, group, concept, field, and geography) and eight relationship examples.<\/p>\n\n\n\n<p>To assess how this impacts the extraction of the entire dataset, we used&nbsp;both the default and the auto-tuned prompt to generate the entity and relationship outputs. Before we explain the results,&nbsp;let\u2019s&nbsp;review the default prompt\u2019s outputs, which produced seven entities and six relationships, as shown in Code Sample 4.&nbsp;<\/p>\n\n\n\n<p><strong>Code Sample 4: Default extraction output<\/strong><\/p>\n\n\n\n<div class=\"wp-block-columns has-white-color has-black-background-color has-text-color has-background has-link-color wp-elements-cabe88b6b451ba8f7a35576aa1566ff1 is-layout-flex wp-container-core-columns-is-layout-6 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<b>Entities and Relationships:<\/b><br \/>\n(<span style=\"color:orange\">&#8220;entity&#8221;<\/span>, <b>ASHLEY LLORENS<\/b>, PERSON, <i>Ashley Llorens is a scientist, engineer, hip-hop artist, and vice president at Microsoft. He has worked at Johns Hopkins Applied Physics Laboratory and was nominated by the White House to serve as an AI expert for the Global Partnership on AI. He is also a voting member of the Recording Academy for the Grammy Awards.<\/i>)<br \/>\n(<span style=\"color:orange\">&#8220;entity&#8221;<\/span>, <b>JOHNS HOPKINS APPLIED PHYSICS LABORATORY<\/b>, ORGANIZATION, <i>An institution where Ashley Llorens worked for two decades developing novel AI technologies and served as the founding chief of the intelligent systems center.<\/i>)<br \/>\n(<span style=\"color:orange\">&#8220;entity&#8221;<\/span>, <b>WHITE HOUSE OFFICE OF SCIENCE AND TECHNOLOGY POLICY<\/b>, ORGANIZATION, <i>A U.S. government office that nominated Ashley Llorens to serve as an AI expert for the Global Partnership on AI.<\/i>)<br \/>\n(<span style=\"color:orange\">&#8220;entity&#8221;<\/span>, <b>GLOBAL PARTNERSHIP ON AI<\/b>, ORGANIZATION, <i>An international initiative focused on AI where Ashley Llorens serves as an expert.<\/i>)<br \/>\n(<span style=\"color:orange\">&#8220;entity&#8221;<\/span>, <b>RECORDING ACADEMY<\/b>, ORGANIZATION, <i>An organization responsible for the Grammy Awards, of which Ashley Llorens is a voting member.<\/i>)<br \/>\n(<span style=\"color:orange\">&#8220;entity&#8221;<\/span>, <b>MICROSOFT<\/b>, ORGANIZATION, <i>A multinational technology company where Ashley Llorens joined as a vice president, distinguished scientist, and managing director for Microsoft Research.<\/i>)<br \/>\n(<span style=\"color:orange\">&#8220;entity&#8221;<\/span>, <b>KEVIN SCOTT<\/b>, PERSON, <i>The host of the podcast where Ashley Llorens was interviewed and welcomed to Microsoft.<\/i>)<br \/><br \/>\n\n(<span style=\"color:lightgreen\">&#8220;relationship&#8221;<\/span>, <b>ASHLEY LLORENS<\/b> &#8211; <b>JOHNS HOPKINS APPLIED PHYSICS LABORATORY<\/b>, <i>Ashley Llorens worked at Johns Hopkins Applied Physics Laboratory for two decades<\/i>, 9)<br \/>\n(<span style=\"color:lightgreen\">&#8220;relationship&#8221;<\/span>, <b>ASHLEY LLORENS<\/b> &#8211; <b>WHITE HOUSE OFFICE OF SCIENCE AND TECHNOLOGY POLICY<\/b>, <i>Ashley Llorens was nominated by the White House Office of Science and Technology Policy to serve as an AI expert<\/i>, 8)<br \/>\n(<span style=\"color:lightgreen\">&#8220;relationship&#8221;<\/span>, <b>ASHLEY LLORENS<\/b> &#8211; <b>GLOBAL PARTNERSHIP ON AI<\/b>, <i>Ashley Llorens serves as an AI expert for the Global Partnership on AI<\/i>, 8)<br \/>\n(<span style=\"color:lightgreen\">&#8220;relationship&#8221;<\/span>, <b>ASHLEY LLORENS<\/b> &#8211; <b>RECORDING ACADEMY<\/b>, <i>Ashley Llorens is a voting member of the Recording Academy for the Grammy Awards<\/i>, 7)<br \/>\n(<span style=\"color:lightgreen\">&#8220;relationship&#8221;<\/span>, <b>ASHLEY LLORENS<\/b> &#8211; <b>MICROSOFT<\/b>, <i>Ashley Llorens joined Microsoft as a vice president and managing director for Microsoft Research<\/i>, 9)<br \/>\n(<span style=\"color:lightgreen\">&#8220;relationship&#8221;<\/span>, <b>KEVIN SCOTT<\/b> &#8211; <b>ASHLEY LLORENS<\/b>, <i>Kevin Scott interviewed Ashley Llorens on the podcast and welcomed him to Microsoft<\/i>, 8)<br \/>\n<\/div>\n<\/div>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p>Using the auto-tuned, domain-specific, automatically generated prompt, we achieved a deeper extraction, producing nine entities and eight relationships, as shown below in Code Sample 5.&nbsp;<\/p>\n\n\n\n<p><strong>Code Sample 5: Auto-tuned extraction output<\/strong><\/p>\n\n\n\n<div class=\"wp-block-columns has-white-color has-black-background-color has-text-color has-background has-link-color wp-elements-9b177358bd6238b6be0c0da3d9ffc848 is-layout-flex wp-container-core-columns-is-layout-7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<b>Entities and Relationships:<\/b><br \/>\n(<span style=\"color:orange\">&#8220;entity&#8221;<\/span>, <b>ASHLEY LLORENS<\/b>, PERSON, <i>Ashley Llorens is a scientist, engineer, hip-hop artist, and executive at Microsoft. He has a background in AI technology development and has been involved in the music industry as a hip-hop artist and Grammy voting member<\/i>)<br \/>\n(<span style=\"color:orange\">&#8220;entity&#8221;<\/span>, <b>KEVIN SCOTT<\/b>, PERSON, <i>Kevin Scott is the host of the podcast where Ashley Llorens is a guest. He engages with Ashley about his background and career<\/i>)<br \/>\n(<span style=\"color:orange\">&#8220;entity&#8221;<\/span>, <b>JOHNS HOPKINS APPLIED PHYSICS LABORATORY<\/b>, ORGANIZATION, <i>An institution where Ashley Llorens worked for two decades, developing novel AI technologies and leading the intelligent systems center<\/i>)<br \/>\n(<span style=\"color:orange\">&#8220;entity&#8221;<\/span>, <b>WHITE HOUSE OFFICE OF SCIENCE AND TECHNOLOGY POLICY<\/b>, ORGANIZATION, <i>A governmental office that nominated Ashley Llorens to serve as an AI expert for the Global Partnership on AI<\/i>)<br \/>\n(<span style=\"color:orange\">&#8220;entity&#8221;<\/span>, <b>GLOBAL PARTNERSHIP ON AI<\/b>, ORGANIZATION, <i>An international initiative focused on AI where Ashley Llorens serves as an expert.<\/i>)<br \/>\n(<span style=\"color:orange\">&#8220;entity&#8221;<\/span>, <b>RECORDING ACADEMY<\/b>, ORGANIZATION, <i>An organization responsible for the Grammy Awards, of which Ashley Llorens is a voting member.<\/i>)<br \/>\n(<span style=\"color:orange\">&#8220;entity&#8221;<\/span>, <b>MICROSOFT<\/b>, ORGANIZATION, <i>A major technology company where Ashley Llorens recently joined as a vice president, distinguished scientist, and managing director for Microsoft Research<\/i>)<br \/>\n(<span style=\"color:orange\">&#8220;entity&#8221;<\/span>, <b>CHICAGO<\/b>, LOCATION, <i>The city where Ashley Llorens grew up, specifically mentioned as the south side and south suburbs, which influenced his interest in music and technology<\/i>)<br \/>\n(<span style=\"color:orange\">&#8220;entity&#8221;<\/span>, <b>HIP-HOP<\/b>, MUSIC GENRE, <i>A music genre that significantly influenced Ashley Llorens during his childhood in Chicago, leading him to pursue a career in music alongside his technical career<\/i>)<br \/><br \/>\n\n(<span style=\"color:lightgreen\">&#8220;relationship&#8221;<\/span>, <b>ASHLEY LLORENS<\/b> &#8211; <b>JOHNS HOPKINS APPLIED PHYSICS LABORATORY<\/b>, <i>Ashley Llorens worked at Johns Hopkins Applied Physics Laboratory for two decades, developing AI technologies<\/i>, 9)<br \/>\n(<span style=\"color:lightgreen\">&#8220;relationship&#8221;<\/span>, <b>ASHLEY LLORENS<\/b> &#8211; <b>WHITE HOUSE OFFICE OF SCIENCE AND TECHNOLOGY POLICY<\/b>, <i>Ashley Llorens was nominated by the White House Office of Science and Technology Policy to serve as an AI expert<\/i>, 9)<br \/>\n(<span style=\"color:lightgreen\">&#8220;relationship&#8221;<\/span>, <b>ASHLEY LLORENS<\/b> &#8211; <b>GLOBAL PARTNERSHIP ON AI<\/b>, <i>Ashley Llorens serves as an AI expert for the Global Partnership on AI<\/i>, 9)<br \/>\n(<span style=\"color:lightgreen\">&#8220;relationship&#8221;<\/span>, <b>ASHLEY LLORENS<\/b> &#8211; <b>RECORDING ACADEMY<\/b>, <i>Ashley Llorens is a voting member of the Recording Academy for the Grammy Awards<\/i>, 7)<br \/>\n(<span style=\"color:lightgreen\">&#8220;relationship&#8221;<\/span>, <b>ASHLEY LLORENS<\/b> &#8211; <b>MICROSOFT<\/b>, <i>Ashley Llorens recently joined Microsoft as a vice president and managing director for Microsoft Research<\/i>, 9)<br \/>\n(<span style=\"color:lightgreen\">&#8220;relationship&#8221;<\/span>, <b>ASHLEY LLORENS<\/b> &#8211; <b>CHICAGO<\/b>, <i>Ashley Llorens grew up in Chicago, which influenced his early interest in music, particularly hip-hop<\/i>, 7)<br \/>\n(<span style=\"color:lightgreen\">&#8220;relationship&#8221;<\/span>, <b>ASHLEY LLORENS<\/b> &#8211; <b>HIP-HOP<\/b>, <i>Ashley Llorens was deeply influenced by hip-hop music during his upbringing in Chicago, leading him to pursue a career in music<\/i>, 8)<br \/>\n(<span style=\"color:lightgreen\">&#8220;relationship&#8221;<\/span>, <b>KEVIN SCOTT<\/b> &#8211; <b>ASHLEY LLORENS<\/b>, <i>Kevin Scott hosts Ashley Llorens on the podcast, discussing his background and career transitions<\/i>, 7)<br \/>\n<\/div>\n<\/div>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p>Compared with the default prompt, the auto-tuned prompt is an improvement, with more entities and more relationships, providing a more comprehensive view of our data. One key difference between this output and the output from the default prompt is the expansion in entity types being extracted.&nbsp;The default prompt is limited to three example types: organization, geography, and person. However, the auto-tuned prompt expands to more example types derived from the sample input text: organization, person, location, and music genre.&nbsp;<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"putting-it-all-together\">Putting it all together&nbsp;<\/h3>\n\n\n\n<p>We can observe a clear difference in the final outputs after using these auto-tuned prompts for indexing the podcast source data.&nbsp;To measure this difference, we compared the size of the resulting knowledge graphs using default with auto-tuned prompts. The following results were achieved while keeping all parameters constant between both runs and using GPT4-Turbo:&nbsp;<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><tbody><tr><td>&nbsp;<\/td><td>Entities&nbsp;<\/td><td>Relationships&nbsp;<\/td><td>Communities&nbsp;<\/td><\/tr><tr><td>Default prompt&nbsp;<\/td><td>1796&nbsp;<\/td><td>2851&nbsp;<\/td><td>352&nbsp;<\/td><\/tr><tr><td>Auto-tuned prompt&nbsp;<\/td><td><strong>4896&nbsp;<\/strong><\/td><td><strong>8210&nbsp;<\/strong><\/td><td><strong>1027&nbsp;<\/strong><\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>As shown, the use of auto-tuning yields a significantly larger knowledge graph. For example, a prompt that looks for molecules will extract much more from a chemistry dataset than one that looks for people and places. More communities in the knowledge graph means that it can better serve global search queries. While the size of the knowledge graph can be a proxy for its utility, the true measure of utility is observed in end-to-end queries. To demonstrate this, we provide&nbsp;the following side-by-side comparison of a question using the default and auto-tuned prompts.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity is-style-dots\"\/>\n\n\n\n<p>We asked GraphRAG the following question: &#8220;What&#8217;s the relationship between Ashley Llorens and Chicago?&#8221;&nbsp;<\/p>\n\n\n\n<p>First, we got the following results using the knowledge graph built with the default prompt. We then used the knowledge graph built with our auto-tuned prompt:<\/p>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-8 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column has-lighter-gray-background-color has-background is-layout-flow wp-block-column-is-layout-flow\">\n<p class=\"has-text-align-center\"><strong>Default query results<\/strong>&nbsp;<\/p>\n\n\n\n<p class=\"has-text-align-center\">&#8220;<em>I am sorry but I am unable to answer this question given the provided data.<\/em>&#8220;<\/p>\n<\/div>\n\n\n\n<div class=\"wp-block-column has-lighter-gray-background-color has-background is-layout-flow wp-block-column-is-layout-flow\">\n<p class=\"has-text-align-center\"><strong>Auto-tuned query results<\/strong>&nbsp;<\/p>\n\n\n\n<p class=\"has-text-align-center\">&#8220;<em>**Origin and Influence**: Ashley Llorens has a significant connection to Chicago, which has deeply influenced his personal and professional life. Growing up in a city known for its vibrant music scene, particularly hip-hop, has shaped his musical pursuits and artistic expression.<\/em>&#8220;<\/p>\n\n\n\n<p class=\"has-text-align-center\"><em>&#8220;**Professional Impact**: His background and experiences in Chicago have carried over into his professional environment, notably at the Applied Physics Laboratory, where he has integrated elements of hip-hop culture [Data: Reports (940)].<\/em>&#8220;<\/p>\n<\/div>\n<\/div>\n\n\n\n<p>With auto-tuned indexing prompts, our knowledge graph became more representative of the dataset\u2019s entities and relationships, enabling it to yield a valid response to the query.&nbsp;<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"advancing-auto-tuning-and-expanding-graphrag-s-capabilities\">Advancing auto-tuning and expanding GraphRag\u2019s capabilities<\/h2>\n\n\n\n<p>Looking ahead, we\u2019re exploring other methods to build on this auto-tuning work.&nbsp;We\u2019re excited to investigate new approaches for creating the core GraphRAG knowledge graph and are also studying ways to measure and evaluate the quality of these graph structures.&nbsp;Additionally, we\u2019re researching methods to better assess performance so that we can identify the types of queries where GraphRAG provides unique value. This includes evaluating human-generated versus auto-tuned prompts, as well as exploring potential improvements to the auto-tuner.&nbsp;<\/p>\n\n\n\n<p>Overall, these new auto-tuner developments make GraphRAG much more accessible and turnkey. We hope this auto-tuning work removes many of the challenges involved when working with new datasets. We invite you to try out these capabilities yourself using GraphRAG\u2019s <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/github.com\/microsoft\/graphrag\" target=\"_blank\" rel=\"noreferrer noopener\">core library<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> and our Azure-based solution accelerator, available on <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/github.com\/Azure-Samples\/graphrag-accelerator\" target=\"_blank\" rel=\"noreferrer noopener\">GitHub<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>.<\/p>\n\n\n\n<div class=\"wp-block-buttons is-layout-flex wp-block-buttons-is-layout-flex\">\n<div class=\"wp-block-button is-style-fill-github\"><a data-bi-type=\"button\" class=\"wp-block-button__link wp-element-button\" href=\"https:\/\/github.com\/microsoft\/graphrag\" target=\"_blank\" rel=\"noreferrer noopener\">Try out GraphRAG<\/a><\/div>\n\n\n\n<div class=\"wp-block-button is-style-fill-github\"><a data-bi-type=\"button\" class=\"wp-block-button__link wp-element-button\" href=\"https:\/\/github.com\/Azure-Samples\/graphrag-accelerator\" target=\"_blank\" rel=\"noreferrer noopener\">Try out GraphRAG Accelerator<\/a><\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>GraphRAG uses LLM-generated knowledge graphs to substantially improve complex Q&A over retrieval-augmented generation (RAG). Discover automatic tuning of GraphRAG for new datasets, making it more accurate and relevant.<\/p>\n","protected":false},"author":42735,"featured_media":1083297,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","footnotes":""},"categories":[1],"tags":[],"research-area":[13556],"msr-region":[],"msr-event-type":[],"msr-locale":[268875],"msr-post-option":[269148,243984,269142],"msr-impact-theme":[],"msr-promo-type":[],"msr-podcast-series":[],"class_list":["post-1079073","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-research-blog","msr-research-area-artificial-intelligence","msr-locale-en_us","msr-post-option-approved-for-river","msr-post-option-blog-homepage-featured","msr-post-option-include-in-river"],"msr_event_details":{"start":"","end":"","location":""},"podcast_url":"","podcast_episode":"","msr_research_lab":[],"msr_impact_theme":[],"related-publications":[],"related-downloads":[],"related-videos":[],"related-academic-programs":[],"related-groups":[],"related-projects":[1027041],"related-events":[],"related-researchers":[{"type":"user_nicename","value":"Alonso Guevara Fern&aacute;ndez","user_id":42522,"display_name":"Alonso Guevara Fern&aacute;ndez","author_link":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/alonsog\/\" aria-label=\"Visit the profile page for Alonso Guevara Fern&aacute;ndez\">Alonso Guevara Fern&aacute;ndez<\/a>","is_active":false,"last_first":"Guevara Fern\u00e1ndez, Alonso","people_section":0,"alias":"alonsog"},{"type":"guest","value":"katy-smith","user_id":"1028889","display_name":"Katy Smith","author_link":"Katy Smith","is_active":true,"last_first":"Smith, Katy","people_section":0,"alias":"katy-smith"},{"type":"user_nicename","value":"Joshua Bradley","user_id":43272,"display_name":"Joshua Bradley","author_link":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/joshbradley\/\" aria-label=\"Visit the profile page for Joshua Bradley\">Joshua Bradley<\/a>","is_active":false,"last_first":"Bradley, Joshua","people_section":0,"alias":"joshbradley"},{"type":"user_nicename","value":"Darren Edge","user_id":31509,"display_name":"Darren Edge","author_link":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/daedge\/\" aria-label=\"Visit the profile page for Darren Edge\">Darren Edge<\/a>","is_active":false,"last_first":"Edge, Darren","people_section":0,"alias":"daedge"},{"type":"user_nicename","value":"Ha Trinh","user_id":43245,"display_name":"Ha Trinh","author_link":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/trinhha\/\" aria-label=\"Visit the profile page for Ha Trinh\">Ha Trinh<\/a>","is_active":false,"last_first":"Trinh, Ha","people_section":0,"alias":"trinhha"},{"type":"user_nicename","value":"Sarah Smith","user_id":42579,"display_name":"Sarah Smith","author_link":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/smithsarah\/\" aria-label=\"Visit the profile page for Sarah Smith\">Sarah Smith<\/a>","is_active":false,"last_first":"Smith, Sarah","people_section":0,"alias":"smithsarah"},{"type":"user_nicename","value":"Ben Cutler","user_id":31188,"display_name":"Ben Cutler","author_link":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/bcutler\/\" aria-label=\"Visit the profile page for Ben Cutler\">Ben Cutler<\/a>","is_active":false,"last_first":"Cutler, Ben","people_section":0,"alias":"bcutler"},{"type":"user_nicename","value":"Steven Truitt","user_id":43143,"display_name":"Steven Truitt","author_link":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/steventruitt\/\" aria-label=\"Visit the profile page for Steven Truitt\">Steven Truitt<\/a>","is_active":false,"last_first":"Truitt, Steven","people_section":0,"alias":"steventruitt"},{"type":"user_nicename","value":"Jonathan Larson","user_id":32385,"display_name":"Jonathan Larson","author_link":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/jolarso\/\" aria-label=\"Visit the profile page for Jonathan Larson\">Jonathan Larson<\/a>","is_active":false,"last_first":"Larson, Jonathan","people_section":0,"alias":"jolarso"}],"msr_type":"Post","featured_image_thumbnail":"<img width=\"960\" height=\"540\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/09\/GraphRag-3-BlogHeroFeature-1400x788-1-960x540.png\" class=\"img-object-cover\" alt=\"GraphRAG image on blue to green gradient\" decoding=\"async\" loading=\"lazy\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/09\/GraphRag-3-BlogHeroFeature-1400x788-1-960x540.png 960w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/09\/GraphRag-3-BlogHeroFeature-1400x788-1-300x169.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/09\/GraphRag-3-BlogHeroFeature-1400x788-1-1024x576.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/09\/GraphRag-3-BlogHeroFeature-1400x788-1-768x432.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/09\/GraphRag-3-BlogHeroFeature-1400x788-1-1066x600.png 1066w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/09\/GraphRag-3-BlogHeroFeature-1400x788-1-655x368.png 655w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/09\/GraphRag-3-BlogHeroFeature-1400x788-1-240x135.png 240w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/09\/GraphRag-3-BlogHeroFeature-1400x788-1-640x360.png 640w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/09\/GraphRag-3-BlogHeroFeature-1400x788-1-1280x720.png 1280w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/09\/GraphRag-3-BlogHeroFeature-1400x788-1.png 1400w\" sizes=\"auto, (max-width: 960px) 100vw, 960px\" \/>","byline":"","formattedDate":"September 9, 2024","formattedExcerpt":"GraphRAG uses LLM-generated knowledge graphs to substantially improve complex Q&amp;A over retrieval-augmented generation (RAG). Discover automatic tuning of GraphRAG for new datasets, making it more accurate and relevant.","locale":{"slug":"en_us","name":"English","native":"","english":"English"},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/1079073","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/users\/42735"}],"replies":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/comments?post=1079073"}],"version-history":[{"count":107,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/1079073\/revisions"}],"predecessor-version":[{"id":1085517,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/1079073\/revisions\/1085517"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media\/1083297"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=1079073"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/categories?post=1079073"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/tags?post=1079073"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=1079073"},{"taxonomy":"msr-region","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-region?post=1079073"},{"taxonomy":"msr-event-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event-type?post=1079073"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=1079073"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=1079073"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=1079073"},{"taxonomy":"msr-promo-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-promo-type?post=1079073"},{"taxonomy":"msr-podcast-series","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-podcast-series?post=1079073"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}