{"id":1079073,"date":"2024-09-09T09:15:55","date_gmt":"2024-09-09T16:15:55","guid":{"rendered":""},"modified":"2024-11-05T06:40:58","modified_gmt":"2024-11-05T14:40:58","slug":"graphrag-auto-tuning-provides-rapid-adaptation-to-new-domains","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/graphrag-auto-tuning-provides-rapid-adaptation-to-new-domains\/","title":{"rendered":"GraphRAG auto-tuning provides rapid adaptation to new domains"},"content":{"rendered":"\n
\"GraphRAG<\/figure>\n\n\n\n

GraphRAG uses large language models (LLMs) to create a comprehensive knowledge graph that details entities and their relationships from any collection of text documents. This graph enables GraphRAG to leverage the semantic structure of the data and generate responses to complex queries that require a broad understanding of the entire text. In previous blog posts,\u00a0<\/a>we introduced GraphRAG<\/a> and demonstrated how it could be applied to news articles<\/a>. In this blog post, we show that it can also be tuned to any domain to enhance the quality of the results.<\/p>\n\n\n\n

\n
\n
\n\t
\n\t\t
\n\t\t\t\t\t\tBlog<\/span>\n\t\t\tGraphRAG: Unlocking LLM discovery on narrative private data<\/span> <\/span><\/a>\t\t\t\t\t<\/div>\n\t<\/article>\n<\/div>\n<\/div>\n\n\n\n
\n
\n\t
\n\t\t
\n\t\t\t\t\t\tBlog<\/span>\n\t\t\tGraphRAG: New tool for complex data discovery now on GitHub<\/span> <\/span><\/a>\t\t\t\t\t<\/div>\n\t<\/article>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n

The knowledge graph creation process is called indexing<\/em>. An LLM, guided by a set of domain-specific prompts, reads all the source content and extracts the relevant information, including entities and relationships, which are then used to construct the graph. For example, when analyzing news articles, entities like people, places, and organizations are important. Here, relationship types might include \u201clives in,\u201d \u201cleads,\u201d and \u201cowns.\u201d <\/p>\n\n\n\n

However, each domain has a different set of entity and relationship types. In the field of chemistry, for instance, entity types include molecules, enzymes, and reactions, while relationship types include \u201ccatalyzes\u201d and \u201creduces.\u201d Although our default news domain prompts in GraphRAG can produce a graph when applied to chemistry, they don\u2019t capture the specific content a chemist would expect. <\/p>\n\n\n\n

Manually creating and tuning a set of domain-specific prompts is time-consuming. We know, as all the prompts used for news articles were generated manually. To streamline this process, we developed an automated tool that generates domain-specific prompts, which are tuned and ready to use. This tool follows a human-like approach; we provided an LLM with a sample of text data (e.g., 1% of 10,000 chemistry papers) and instructed it to produce the prompts it deemed most applicable to the content. Now, with these automatically generated and tuned prompts, we can immediately apply GraphRAG to a new domain of our choosing, confident that we\u2019ll get high-quality results.<\/p>\n\n\n\n

Indexing prompts in GraphRAG<\/h2>\n\n\n\n

During the indexing process, GraphRAG uses a set of prompts to instruct the LLM as it reads through the source content, extracting and organizing relevant information to construct the knowledge graph. Three of GraphRAG\u2019s main indexing prompts include: <\/p>\n\n\n\n

    \n
  1. Entity and relationship extraction<\/strong>: Identifies all the entities present and establishes relationships among them.<\/li>\n\n\n\n
  2. Entity and relationship summarization<\/strong>: Consolidates instances of entities and their relationships into a single, concise description. <\/li>\n\n\n\n
  3. Community report generation<\/strong>: Generates a summary report for each community within the constructed knowledge graph. <\/li>\n<\/ol>\n\n\n\n

    These prompts work best when tuned to the domain of the source content. In the rest of this blog post, we focus on domain tuning of the first prompt, \u201cEntity and relationship extraction,\u201d but similar methods apply to the second and third prompts. <\/p>\n\n\n\n

    Below, Code Sample 1 <\/strong>shows the default few-shot prompt for entity and relationship extraction. This prompt was originally created for news articles and is the default form found in the GraphRAG GitHub repository (opens in new tab)<\/span><\/a>. The extraction prompt comprises four sections: <\/p>\n\n\n\n