GraphRAG with MongoDB and LangChain
On this page
This tutorial demonstrates how to implement GraphRAG by using MongoDB Atlas and LangChain. GraphRAG is an alternative approach to traditional RAG that structures your data as a knowledge graph instead of as vector embeddings. When combined with an LLM, this approach enables relationship-aware retrieval and multi-hop reasoning.
Work with a runnable version of this tutorial as a Python notebook.
Background
GraphRAG is an alternative approach to traditional RAG that structures data as a knowledge graph of entities and their relationships instead of as vector embeddings. While vector-based RAG finds documents that are semantically similar to the query, GraphRAG finds connected entities to the query and traverses the relationships in the graph to retrieve relevant information.
This approach is particularly useful for answering relationship-based questions like "What is the connection between Company A and Company B?" or "Who is Person X's manager?".
MongoDBGraphStore
is a component in the LangChain MongoDB integration
that allows you to implement GraphRAG by storing entities (nodes)
and their relationships (edges) in a MongoDB collection. This component
stores each entity as a document with relationship fields that
reference other documents in your collection. It executes queries using
the $graphLookup
aggregation stage.
Prerequisites
To complete this tutorial, you must have the following:
An Atlas account with a cluster running MongoDB version 6.0.11, 7.0.2, or later (including RCs). Ensure that your IP address is included in your Atlas project's access list. To learn more, see Create a Cluster.
An OpenAI API Key. You must have an OpenAI account with credits available for API requests. To learn more about registering an OpenAI account, see the OpenAI API website.
An environment to run interactive Python notebooks such as VS Code or Colab.
Set Up the Environment
Set up the environment for this tutorial.
Create an interactive Python notebook by saving a file
with the .ipynb
extension. This notebook allows you to
run Python code snippets individually, and you'll use
it to run the code in this tutorial.
To set up your notebook environment:
Define environment variables.
Copy the following code example, replace the variables with your own values, then run the code:
| Your OpenAI API key |
| Your Atlas cluster's SRV connection string |
import os os.environ["OPENAI_API_KEY"] = "<api-key>" ATLAS_CONNECTION_STRING = "<connection-string>" ATLAS_DB_NAME = "langchain_db" # MongoDB database to store the knowledge graph ATLAS_COLLECTION = "wikipedia" # MongoDB collection to store the knowledge graph
Note
Your connection string should use the following format:
mongodb+srv://<db_username>:<db_password>@<clusterName>.<hostname>.mongodb.net
Use Atlas as a Knowledge Graph
This section demonstrates how to use Atlas as a knowledge graph for GraphRAG. Paste and run the following code in your notebook:
Load the sample data.
For this tutorial, you use publicly accessible data from Wikipedia as the data source. To load the sample data, run the following code snippet. It performs the following steps:
Retrieves a subset Wikipedia pages, filtered by the query
Sherlock Holmes
.Uses a text splitter to split the data into smaller documents.
Specifies chunk parameters, which determine the number of characters in each document and the number of characters that should overlap between two consecutive documents.
from langchain_community.document_loaders import WikipediaLoader from langchain.text_splitter import TokenTextSplitter wikipedia_pages = WikipediaLoader(query="Sherlock Holmes", load_max_docs=3).load() text_splitter = TokenTextSplitter(chunk_size=1024, chunk_overlap=0) wikipedia_docs = text_splitter.split_documents(wikipedia_pages)
Instantiate the graph store.
Use the MongoDBGraphStore
class to construct the knowledge graph and load it into your
Atlas cluster:
from langchain_mongodb.graphrag.graph import MongoDBGraphStore graph_store = MongoDBGraphStore.from_connection_string( connection_string = ATLAS_CONNECTION_STRING, database_name = ATLAS_DB_NAME, collection_name = ATLAS_COLLECTION, entity_extraction_model = chat_model )
Add documents to the knowledge graph.
Add documents to the collection by
using the add_documents
method. When you add
new documents, this method finds existing
entities and updates them or creates new ones if
they do not exist.
This step might take a few minutes. You can ignore any warnings that appear in the output.
graph_store.add_documents(wikipedia_docs)
After you run the sample code, you can
view how your data is stored by
navigating to the documents.wikipedia
collection
in the Atlas UI.
(Optional) Visualize the knowledge graph.
You can visualize the graph structure using the networkx
and pyvis
libraries. For an example, see the
notebook.
Answer Questions about Your Data
Invoke the knowledge graph to answer questions.
Use the chat_response
method to invoke the knowledge graph.
It retrieves relevant documents from Atlas, and then
uses the chat model you specified to generate an answer
in natural language.
Specifically, the chat model extracts entities from the query,
Atlas traverses the knowledge graph to find
connected entities by using the $graphLookup
stage,
and the closest entities and their relationships are sent
with the query back to the chat model to generate a response.
query = "Who inspired Sherlock Holmes?" answer = graph_store.chat_response(query) print(answer.content)
Sherlock Holmes was inspired by Dr. Joseph Bell, a physician known for his keen observation and deductive reasoning, as acknowledged by Sir Arthur Conan Doyle, Holmes' creator.