Comparing two approaches for retrieving relevant code snippets from a codebase.
To keep things simple for me to test against oss repositories, I've just added samples-typescript as a git submodule
- Traditional RAG: Uses semantic vector search (via OpenAI embeddings stored in Milvus) to find code snippets.
- GraphRAG: Traditional RAG by leveraging a code knowledge graph. It first performs a vector search to get initial candidate snippets, then traverses a structural code graph (built in Neo4j) to find related snippets (e.g., functions called by candidates, functions in the same file, functions related via shared dependencies).
- Python 3.10+
uv
- OpenAI
- LangChain
- tree-sitter: Parser generator tool and Python bindings (
tree-sitter
,tree-sitter-javascript
) - Docker & Docker Compose
- Milvus: Open-source vector database
- Neo4j: Graph database
- Parsing: The
JavaScriptParser
reads.js
files from the target directory, usestree-sitter
to build Abstract Syntax Trees (ASTs), and extracts structured information. - Store Embeddings: Take extracted function code snippets, embeds them using OpenAI, and stores the embeddings along with snippet IDs in Milvus.
- Building knowledge Graph: Take the structured
CodeFile
data and populates a Neo4j database, creating nodes (CodeFile
,Function
,Module
,Variable
) and relationships (CONTAINS
,CALLS
,REQUIRES
,DECLARES_VARIABLE
). - Query & Compare:
- Traditional RAG: Performs a vector similarity search for the query, returning a ranked list of function snippets (based on semantic meaning).
- GraphRAG:
- Performs the same vector search as above to get initial candidate function IDs.
- Performs a graph traversal starting from these IDs, following relationships (
CALLS
,CONTAINS
, sharedREQUIRES
) to find structurally related function snippets. - The results from vector search and graph traversal are combined and deduplicated.
- Output: A
out.json
file containing the detailed parsed structure is also generated for inspection.
1. Prerequisites:
- OpenAI API Key
- Python 3.10 or later.
uv
(recommended) orpip
for Python package management.- Docker and Docker Compose installed and running.
2. Setup Repository:
- Clone Repository
git clone https://github.com/dead8309/graph-rag-indexer
cd grag-indexer
- Update Git Submodules:
git submodule update --init --recursive
3. Install Dependencies:
uv sync
4. Configure Environment:
-
Copy the example environment file:
cp .env.example .env
-
Provide your OpenAI API Key
-
Verify the
MILVUS_HOST
,MILVUS_PORT
-
Verify the
NEO4J_URI
,NEO4J_USER
match your setup.
5. Start Databases:
docker compose up -d
uv run python -m src.main
You can visualize the code knowledge graph using the Neo4j Browser:
- Open
http://localhost:7474
in your web browser. - Connect using the URI (
bolt://localhost:7687
), username (neo4j
), and the password you set in.env
/docker-compose.yml
. - Run Cypher queries to explore the graph, for example: