RepoPilot

AI-powered codebase Q&A using RAG

Live Demo: repo-pilot-black.vercel.app

RepoPilot lets you index any public GitHub repository and ask natural language questions about its code. It uses retrieval-augmented generation (RAG) to find the most relevant code chunks via semantic search, then streams context-grounded answers powered by Claude. Every answer includes source references with expandable code snippets so you can verify the context yourself.

Tech Stack

Layer	Technology
Frontend	React 19 + TypeScript + Tailwind CSS
Backend	Python 3.11 / FastAPI
Vector DB	ChromaDB (local, file-persisted)
Embeddings	OpenAI `text-embedding-3-small`
LLM	Anthropic Claude (claude-sonnet-4-20250514)
Rate Limiting	SlowAPI (IP-based)
Testing	Pytest + GitHub Actions CI
Containerization	Docker
Deployment	Vercel (frontend) + Render (backend)

Architecture

┌─────────────────┐     ┌──────────────────────────┐     ┌─────────────┐
│  React Frontend  │────▶│  Python/FastAPI Backend   │────▶│  ChromaDB   │
│  (TypeScript)    │◀────│                          │◀────│ (Vector DB) │
│  Vercel          │     │  /api/upload             │     └─────────────┘
└─────────────────┘     │  /api/query              │
                        │  /api/repos              │     ┌─────────────┐
                        │                          │────▶│ Claude API  │
                        │  Render (Docker)         │◀────│ (Anthropic) │
                        └──────────────────────────┘     └─────────────┘
                                    │
                                    ▼
                        ┌──────────────────────┐
                        │  OpenAI Embeddings   │
                        │  text-embedding-3-sm │
                        └──────────────────────┘

Features

Repository indexing — Paste a GitHub URL to clone, chunk, embed, and store any public repo
Semantic code search — Finds relevant code using vector similarity, not keyword matching
Streaming responses — Answers stream token-by-token with real-time markdown rendering and syntax highlighting
Source transparency — Every answer shows which files and line ranges were used, with expandable code cards
Rate limiting — IP-based limits on expensive endpoints (5 uploads/hr, 30 queries/hr)
Responsive UI — Dark theme with collapsible sidebar for mobile

How It Works

Upload — You provide a GitHub URL. The backend clones the repo and walks its file tree, skipping non-code files (images, lock files, node_modules, etc.)
Chunk — Code files are split into semantically meaningful chunks using hybrid file-aware chunking: small files stay whole, large files split at function/class boundaries with ~10-line overlap between chunks
Embed — Each chunk is embedded using OpenAI's text-embedding-3-small model and stored in ChromaDB alongside its metadata (filename, line range, language)
Query — When you ask a question, the question is embedded with the same model, the top-k most similar chunks are retrieved, and Claude generates a streamed answer grounded in those code snippets

API Endpoints

`POST /api/upload`

Index a GitHub repository. Rate limited to 5 requests/hour per IP.

// Request
{ "github_url": "https://github.com/user/repo" }

// Response
{
  "repo_id": "repo-a1b2c3d4",
  "repo_name": "repo",
  "files_processed": 42,
  "chunks_created": 156,
  "status": "indexed"
}

`POST /api/query`

Ask a question about an indexed repo. Rate limited to 30 requests/hour per IP. Returns a Server-Sent Events stream.

// Request
{ "repo_id": "repo-a1b2c3d4", "question": "Where is authentication handled?" }

// Response (SSE stream)
data: {"type": "sources", "chunks": [{"filename": "src/auth.py", "start_line": 1, "end_line": 45, "content": "...", "score": 0.87}]}
data: {"type": "token", "content": "The"}
data: {"type": "token", "content": " authentication"}
...
data: {"type": "done"}

`GET /api/repos`

List all indexed repositories.

// Response
{
  "repos": [
    {
      "repo_id": "repo-a1b2c3d4",
      "name": "repo",
      "files": 42,
      "chunks": 156,
      "indexed_at": "2025-01-15T..."
    }
  ]
}

`DELETE /api/repos/{repo_id}`

Delete an indexed repository and its vector data.

`GET /api/health`

Health check. Returns {"status": "ok"}.

Local Development

Backend

cd backend

# Create a virtual environment
python3 -m venv venv
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Configure environment variables
cp .env.example .env
# Edit .env and add your API keys:
#   ANTHROPIC_API_KEY=sk-ant-...
#   OPENAI_API_KEY=sk-...
#   CHROMA_PERSIST_DIR=./chroma_data
#   ALLOWED_ORIGINS=http://localhost:5173

# Start the server
uvicorn app.main:app --reload --port 8000

Frontend

cd frontend

# Install dependencies
npm install

# Configure environment
echo "VITE_API_URL=http://localhost:8000" > .env

# Start dev server
npm run dev

The frontend runs at http://localhost:5173 and proxies API calls to the backend at http://localhost:8000.

Testing

Tests run with pytest and use mocked external services (no API keys needed):

cd backend
pytest -v

Tests are also run automatically on push and pull requests to main via GitHub Actions. See .github/workflows/test.yml.

Docker

Build and run the backend container:

cd backend

# Build
docker build -t repopilot-backend .

# Run
docker run -p 8000:8000 \
  -e ANTHROPIC_API_KEY=sk-ant-... \
  -e OPENAI_API_KEY=sk-... \
  -e CHROMA_PERSIST_DIR=/app/chroma_data \
  -e ALLOWED_ORIGINS=http://localhost:5173 \
  repopilot-backend

To persist the vector database across container restarts, mount a volume:

docker run -p 8000:8000 \
  -v repopilot-data:/app/chroma_data \
  -e ANTHROPIC_API_KEY=sk-ant-... \
  -e OPENAI_API_KEY=sk-... \
  -e CHROMA_PERSIST_DIR=/app/chroma_data \
  -e ALLOWED_ORIGINS=http://localhost:5173 \
  repopilot-backend

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.github/workflows		.github/workflows
backend		backend
frontend		frontend
.gitignore		.gitignore
README.md		README.md
REPOPILOT_SPEC.md		REPOPILOT_SPEC.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RepoPilot

Tech Stack

Architecture

Features

How It Works

API Endpoints

`POST /api/upload`

`POST /api/query`

`GET /api/repos`

`DELETE /api/repos/{repo_id}`

`GET /api/health`

Local Development

Backend

Frontend

Testing

Docker

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RepoPilot

Tech Stack

Architecture

Features

How It Works

API Endpoints

POST /api/upload

POST /api/query

GET /api/repos

DELETE /api/repos/{repo_id}

GET /api/health

Local Development

Backend

Frontend

Testing

Docker

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`POST /api/upload`

`POST /api/query`

`GET /api/repos`

`DELETE /api/repos/{repo_id}`

`GET /api/health`

Packages