grag-indexer

Overview

Comparing two approaches for retrieving relevant code snippets from a codebase.

To keep things simple for me to test against oss repositories, I've just added samples-typescript as a git submodule

Traditional RAG: Uses semantic vector search (via OpenAI embeddings stored in Milvus) to find code snippets.
GraphRAG: Traditional RAG by leveraging a code knowledge graph. It first performs a vector search to get initial candidate snippets, then traverses a structural code graph (built in Neo4j) to find related snippets (e.g., functions called by candidates, functions in the same file, functions related via shared dependencies).

Knowledge Graphs

Stack

Python 3.10+
uv
OpenAI
LangChain
tree-sitter: Parser generator tool and Python bindings (tree-sitter, tree-sitter-javascript)
Docker & Docker Compose
Milvus: Open-source vector database
Neo4j: Graph database

Workflow

Parsing: The JavaScriptParser reads .js files from the target directory, uses tree-sitter to build Abstract Syntax Trees (ASTs), and extracts structured information.
Store Embeddings: Take extracted function code snippets, embeds them using OpenAI, and stores the embeddings along with snippet IDs in Milvus.
Building knowledge Graph: Take the structured CodeFile data and populates a Neo4j database, creating nodes (CodeFile, Function, Module, Variable) and relationships (CONTAINS, CALLS, REQUIRES, DECLARES_VARIABLE).
Query & Compare:
- Traditional RAG: Performs a vector similarity search for the query, returning a ranked list of function snippets (based on semantic meaning).
- GraphRAG:
  - Performs the same vector search as above to get initial candidate function IDs.
  - Performs a graph traversal starting from these IDs, following relationships (CALLS, CONTAINS, shared REQUIRES) to find structurally related function snippets.
  - The results from vector search and graph traversal are combined and deduplicated.
Output: A out.json file containing the detailed parsed structure is also generated for inspection.

Setup

1. Prerequisites:

OpenAI API Key
Python 3.10 or later.
uv (recommended) or pip for Python package management.
Docker and Docker Compose installed and running.

2. Setup Repository:

Clone Repository

git clone https://github.com/dead8309/graph-rag-indexer
cd grag-indexer

Update Git Submodules:

git submodule update --init --recursive

3. Install Dependencies:

uv sync

4. Configure Environment:

Copy the example environment file:
```
cp .env.example .env
```
Provide your OpenAI API Key
Verify the MILVUS_HOST, MILVUS_PORT
Verify the NEO4J_URI, NEO4J_USER match your setup.

5. Start Databases:

docker compose up -d

Run

uv run python -m src.main

Exploring the Graph

You can visualize the code knowledge graph using the Neo4j Browser:

Open http://localhost:7474 in your web browser.
Connect using the URI (bolt://localhost:7687), username (neo4j), and the password you set in .env/docker-compose.yml.
Run Cypher queries to explore the graph, for example:

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 70 Commits
assets		assets
samples-typescript @ f63a20e		samples-typescript @ f63a20e
src		src
.env.example		.env.example
.gitignore		.gitignore
.gitmodules		.gitmodules
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

grag-indexer

Overview

Knowledge Graphs

Stack

Workflow

Setup

Run

Exploring the Graph

License

About

Uh oh!

Releases

Packages

Languages

License

dead8309/graph-rag-indexer

Folders and files

Latest commit

History

Repository files navigation

grag-indexer

Overview

Knowledge Graphs

Stack

Workflow

Setup

Run

Exploring the Graph

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages