Comparing two approaches for retrieving relevant code snippets from a codebase.
To keep things simple for me to test against oss repositories, I've just added samples-typescript as a git submodule
- Traditional RAG: Uses semantic vector search (via OpenAI embeddings stored in Milvus) to find code snippets.
- GraphRAG: Traditional RAG by leveraging a code knowledge graph. It first performs a vector search to get initial candidate snippets, then traverses a structural code graph (built in Neo4j) to find related snippets (e.g., functions called by candidates, functions in the same file, functions related via shared dependencies).
- Python 3.10+
uv- OpenAI
- LangChain
- tree-sitter: Parser generator tool and Python bindings (
tree-sitter,tree-sitter-javascript) - Docker & Docker Compose
- Milvus: Open-source vector database
- Neo4j: Graph database
- Parsing: The
JavaScriptParserreads.jsfiles from the target directory, usestree-sitterto build Abstract Syntax Trees (ASTs), and extracts structured information. - Store Embeddings: Take extracted function code snippets, embeds them using OpenAI, and stores the embeddings along with snippet IDs in Milvus.
- Building knowledge Graph: Take the structured
CodeFiledata and populates a Neo4j database, creating nodes (CodeFile,Function,Module,Variable) and relationships (CONTAINS,CALLS,REQUIRES,DECLARES_VARIABLE). - Query & Compare:
- Traditional RAG: Performs a vector similarity search for the query, returning a ranked list of function snippets (based on semantic meaning).
- GraphRAG:
- Performs the same vector search as above to get initial candidate function IDs.
- Performs a graph traversal starting from these IDs, following relationships (
CALLS,CONTAINS, sharedREQUIRES) to find structurally related function snippets. - The results from vector search and graph traversal are combined and deduplicated.
- Output: A
out.jsonfile containing the detailed parsed structure is also generated for inspection.
1. Prerequisites:
- OpenAI API Key
- Python 3.10 or later.
uv(recommended) orpipfor Python package management.- Docker and Docker Compose installed and running.
2. Setup Repository:
- Clone Repository
git clone https://github.com/dead8309/graph-rag-indexer
cd grag-indexer- Update Git Submodules:
git submodule update --init --recursive3. Install Dependencies:
uv sync4. Configure Environment:
-
Copy the example environment file:
cp .env.example .env
-
Provide your OpenAI API Key
-
Verify the
MILVUS_HOST,MILVUS_PORT -
Verify the
NEO4J_URI,NEO4J_USERmatch your setup.
5. Start Databases:
docker compose up -duv run python -m src.mainYou can visualize the code knowledge graph using the Neo4j Browser:
- Open
http://localhost:7474in your web browser. - Connect using the URI (
bolt://localhost:7687), username (neo4j), and the password you set in.env/docker-compose.yml. - Run Cypher queries to explore the graph, for example: