This project provides a Retrieval-Augmented Generation (RAG) API for querying a codebase. It uses:
- OpenAI embeddings and ChatCompletion (GPT models),
- A FAISS index for vector similarity search,
- FastAPI for the RESTful API.
- Python 3.9+ (Recommended)
- Virtual environment (e.g.,
venv
or conda) - FAISS (for vector indexing):
- Conda (recommended):
conda install -c pytorch faiss-cpu
- Pip (if you find precompiled wheels):
pip install faiss-cpu
- Or build from source (FAISS GitHub)
- Conda (recommended):
- OpenAI Python Library and dependencies
- A valid OpenAI API key (for embeddings and GPT-based text generation).
-
Clone this repository (or copy the code):
git clone https://github.com/yourusername/rag-code-query.git cd rag-code-query
-
Create a virtual environment:
python -m venv venv source venv/bin/activate # On Linux/Mac venv\Scripts\activate # On Windows
-
Install required packages:
pip install -r requirements.txt
export OPENAI_API_KEY="your-openai-api-key"
/
├── app.py # Main FastAPI application code
├── repository_processor.py
├── requirements.txt
├── README.md
└── ...
- Start the Server
Run the FastAPI application (assuming your main file is app.py):
uvicorn app:app --reload
This starts the server at http://127.0.0.1:8000.
- Index a Repository
Before querying a repository, you need to index it. The indexing process:
- Chunks the code using chunk_repository.
- Generates OpenAI embeddings for each chunk.
- Builds a FAISS index (repository_index.faiss) and a metadata file (metadata.json).
You can automatically create or refresh the index by calling the /index endpoint or letting the code auto-initialize if the index/metadata files are missing.
example:
curl -X POST "http://127.0.0.1:8000/index
- Query the Repository
Once indexed, you can send queries to the /query endpoint. The endpoint:
- Retrieves the most relevant chunks from FAISS.
- Optionally summarizes if necessary.
- Uses GPT-4 to generate a final answer.
Example (replace placeholders accordingly)
curl --location 'http://127.0.0.1:8000/query' \
--header 'Content-Type: application/json' \
--data '{
"question": "What does class Grip do?"
}'
curl http://127.0.0.1:8000/health
- FAISS Installation
If pip install faiss-cpu doe
conda install -c pytorch faiss-cpu
or build from source.
- OpenAI Rate Limits
If you encounter rate limits or 429 errors, you may need to throttle requests or upgrade your OpenAI plan.
- Large Repositories
- Increase the chunk size or the number of retrieved chunks (k) carefully.
- Summarize or do multi-step retrieval if you exceed GPT’s context window.
- Token Counting
If you’re hitting token limits, install and use tiktoken to check prompt lengths.
Summarize large chunks before sending them to GPT.