{"id":1005408,"date":"2024-02-13T12:00:00","date_gmt":"2024-02-13T20:00:00","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?p=1005408"},"modified":"2024-04-02T14:41:02","modified_gmt":"2024-04-02T21:41:02","slug":"graphrag-unlocking-llm-discovery-on-narrative-private-data","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/graphrag-unlocking-llm-discovery-on-narrative-private-data\/","title":{"rendered":"GraphRAG: Unlocking LLM discovery on narrative private data"},"content":{"rendered":"\n
\"Project<\/figure>\n\n\n\n

Editor\u2019s note, Apr. 2, 2024 \u2013<\/strong> Figure 1 was updated to clarify the origin of each source.<\/em><\/p>\n\n\n\n

Perhaps the greatest challenge \u2013 and opportunity \u2013 of LLMs is extending their powerful capabilities to solve problems beyond the data on which they have been trained, and to achieve comparable results with data the LLM has never seen. This opens new possibilities in data investigation, such as identifying themes and semantic concepts with context and grounding on datasets. In this post, we introduce GraphRAG, created by Microsoft Research, as a significant advance in enhancing the capability of LLMs.<\/p>\n\n\n\n

\n\t
\n\t\t
\n\t\t\t\t\t\tPublication<\/span>\n\t\t\tCan Generalist Foundation Models Outcompete Special-Purpose Tuning? Case Study in Medicine<\/span> <\/span><\/a>\t\t\t\t\t<\/div>\n\t<\/article>\n<\/div>\n\n\n\n

Retrieval-Augmented Generation (RAG) is a technique to search for information based on a user query and provide the results as reference for an AI answer to be generated. This technique is an important part of most LLM-based tools and the majority of RAG approaches use vector similarity as the search technique. GraphRAG uses LLM-generated knowledge graphs to provide substantial improvements in question-and-answer performance when conducting document analysis of complex information. This builds upon our recent research<\/a>, which points to the power of prompt augmentation when performing discovery on private datasets<\/em>. Here, we define private dataset <\/em>as data that the LLM is not trained on and has never seen before, such as an enterprise\u2019s proprietary research, business documents, or communications. Baseline RAG<\/em>[1]<\/a> was created to help solve this problem, but we observe situations where baseline RAG performs very poorly. For example:<\/p>\n\n\n\n