Docs Menu
Docs Home
/
Atlas
/ / /

GraphRAG with MongoDB and LangChain

On this page

  • Background
  • Prerequisites
  • Set Up the Environment
  • Use Atlas as a Knowledge Graph
  • Answer Questions about Your Data

This tutorial demonstrates how to implement GraphRAG by using MongoDB Atlas and LangChain. GraphRAG is an alternative approach to traditional RAG that structures your data as a knowledge graph instead of as vector embeddings. When combined with an LLM, this approach enables relationship-aware retrieval and multi-hop reasoning.

Work with a runnable version of this tutorial as a Python notebook.

GraphRAG is an alternative approach to traditional RAG that structures data as a knowledge graph of entities and their relationships instead of as vector embeddings. While vector-based RAG finds documents that are semantically similar to the query, GraphRAG finds connected entities to the query and traverses the relationships in the graph to retrieve relevant information.

This approach is particularly useful for answering relationship-based questions like "What is the connection between Company A and Company B?" or "Who is Person X's manager?".

MongoDBGraphStore is a component in the LangChain MongoDB integration that allows you to implement GraphRAG by storing entities (nodes) and their relationships (edges) in a MongoDB collection. This component stores each entity as a document with relationship fields that reference other documents in your collection. It executes queries using the $graphLookup aggregation stage.

To complete this tutorial, you must have the following:

  • An Atlas account with a cluster running MongoDB version 6.0.11, 7.0.2, or later (including RCs). Ensure that your IP address is included in your Atlas project's access list. To learn more, see Create a Cluster.

  • An OpenAI API Key. You must have an OpenAI account with credits available for API requests. To learn more about registering an OpenAI account, see the OpenAI API website.

  • An environment to run interactive Python notebooks such as VS Code or Colab.

Set up the environment for this tutorial. Create an interactive Python notebook by saving a file with the .ipynb extension. This notebook allows you to run Python code snippets individually, and you'll use it to run the code in this tutorial.

To set up your notebook environment:

1

Run the following command:

pip install --quiet --upgrade pymongo langchain_community wikipedia langchain_openai langchain_mongodb
2

Copy the following code example, replace the variables with your own values, then run the code:

<api-key>

Your OpenAI API key

<connection-string>

Your Atlas cluster's SRV connection string

import os
os.environ["OPENAI_API_KEY"] = "<api-key>"
ATLAS_CONNECTION_STRING = "<connection-string>"
ATLAS_DB_NAME = "langchain_db" # MongoDB database to store the knowledge graph
ATLAS_COLLECTION = "wikipedia" # MongoDB collection to store the knowledge graph

Note

Your connection string should use the following format:

mongodb+srv://<db_username>:<db_password>@<clusterName>.<hostname>.mongodb.net

This section demonstrates how to use Atlas as a knowledge graph for GraphRAG. Paste and run the following code in your notebook:

1

Initialize the LLM by using the init_chat_model method from LangChain:

from langchain_openai import OpenAI
from langchain.chat_models import init_chat_model
chat_model = init_chat_model("gpt-4o", model_provider="openai", temperature=0)
2

For this tutorial, you use publicly accessible data from Wikipedia as the data source. To load the sample data, run the following code snippet. It performs the following steps:

  • Retrieves a subset Wikipedia pages, filtered by the query Sherlock Holmes.

  • Uses a text splitter to split the data into smaller documents.

  • Specifies chunk parameters, which determine the number of characters in each document and the number of characters that should overlap between two consecutive documents.

from langchain_community.document_loaders import WikipediaLoader
from langchain.text_splitter import TokenTextSplitter
wikipedia_pages = WikipediaLoader(query="Sherlock Holmes", load_max_docs=3).load()
text_splitter = TokenTextSplitter(chunk_size=1024, chunk_overlap=0)
wikipedia_docs = text_splitter.split_documents(wikipedia_pages)
3

Use the MongoDBGraphStore class to construct the knowledge graph and load it into your Atlas cluster:

from langchain_mongodb.graphrag.graph import MongoDBGraphStore
graph_store = MongoDBGraphStore.from_connection_string(
connection_string = ATLAS_CONNECTION_STRING,
database_name = ATLAS_DB_NAME,
collection_name = ATLAS_COLLECTION,
entity_extraction_model = chat_model
)
4

Add documents to the collection by using the add_documents method. When you add new documents, this method finds existing entities and updates them or creates new ones if they do not exist.

This step might take a few minutes. You can ignore any warnings that appear in the output.

graph_store.add_documents(wikipedia_docs)

After you run the sample code, you can view how your data is stored by navigating to the documents.wikipedia collection in the Atlas UI.

5

You can visualize the graph structure using the networkx and pyvis libraries. For an example, see the notebook.

Invoke the knowledge graph to answer questions. Use the chat_response method to invoke the knowledge graph. It retrieves relevant documents from Atlas, and then uses the chat model you specified to generate an answer in natural language.

Specifically, the chat model extracts entities from the query, Atlas traverses the knowledge graph to find connected entities by using the $graphLookup stage, and the closest entities and their relationships are sent with the query back to the chat model to generate a response.

query = "Who inspired Sherlock Holmes?"
answer = graph_store.chat_response(query)
print(answer.content)
Sherlock Holmes was inspired by Dr. Joseph Bell, a physician known for his keen observation and deductive reasoning, as acknowledged by Sir Arthur Conan Doyle, Holmes' creator.

Back

Parent Document Retrieval