Local-first RAG assistant: query your own docs with LLMs (OpenAI/Ollama). Fast, private, and context-grounded answers.
RAGent: A wise helper for your knowledge forests.
RAGent lets you store your own (or your customers') documents and use them as a smart helper for complex, context-specific questions.
Never struggle to remember all the details again—just upload your content, select the context, and ask! RAGent will answer strictly from your chosen documents, acting as an external brain for your projects, clients, or personal knowledge.
Privacy: You can use commercial (OpenAI) or local (Ollama) language models. For sensitive or private data, use a local model so nothing is ever sent to the cloud.
- Strict context answering: The agent only answers from the ingested documents—no hallucinations.
- Concise answers: By default, the agent responds in one sentence.
- Multi-client support: Choose a context folder at startup for per-client/document QA.
- Local & remote LLMs: Use OpenAI (gpt-4o, gpt-4, etc.) or Ollama (local models).
- Local embeddings: Uses HuggingFace sentence-transformers or Ollama for embeddings; avoids OpenAI quota issues.
- Flexible backend and embedding selection: Instantly switch between Ollama (local) and OpenAI (cloud) LLMs and embeddings using CLI flags or the
.envfile. - Per-client/project isolation: Keep each customer's or project's documents in separate folders for clean, context-specific retrieval.
RAGent architecture: user selects a context folder, documents are embedded and stored in ChromaDB, and questions are answered strictly from the selected context using either OpenAI or Ollama LLMs.
To use local models, you need Ollama installed and running:
- Install Ollama: See Ollama downloads for your OS.
- Start Ollama: Run
ollama serveor start the Ollama app. - Pull a model: For example:
ollama pull llama2or your chosen model (e.g.deepseek-r1:8b). - Configure
.env: SetLLM_BACKEND=ollama,OLLAMA_MODEL=llama2(or your chosen model), andEMBEDDING_BACKEND=hf(orollama/openai).
If you do not want to use Ollama:
- Set
LLM_BACKEND=openaiand provide your OpenAI API key and model in.env.template.
Troubleshooting:
- If you see errors like "Could not connect to Ollama" or "model not found," make sure Ollama is running and the model is pulled.
pip install -r requirements.txt- Copy
.env.templateto.envand set your API keys and model names. - Example for OpenAI:
OPENAI_API_KEY=sk-... LLM_BACKEND=openai # or ollama OPENAI_MODEL=gpt-4o # or gpt-4, etc. EMBEDDING_BACKEND=hf # or ollama/openai - Example for Ollama:
LLM_BACKEND=ollama OLLAMA_BASE_URL=http://localhost:11434 OLLAMA_MODEL=llama2 EMBEDDING_BACKEND=hf # or ollama/openai DATA_ROOT=data CHROMA_DB_DIR=chroma_db - Note: If you use
EMBEDDING_BACKEND=hf, make suresentence-transformersis installed (already included in requirements.txt).
data folder (with at least one client/project subfolder) must exist before starting the app!
To create the folders from your project root:
mkdir -p data/ClientA- Place
.txtfiles inside folders underdata/(e.g.,data/clientA/,data/clientB/,data/facts/). - Each folder is a separate knowledge base (brain) for a client, project, or topic.
- Example folder structure:
data/ clientA/ onboarding.txt api_endpoints.txt clientB/ requirements.txt meeting_notes.txt facts/ team.txt mission.txt
python -m src.main- Select the context folder (e.g.,
clientA) at the prompt. - Ask your question. RAGent will answer using only the selected folder’s content.
python -m src.main --backend ollama --model llama2
python -m src.main --backend openai --model gpt-4oTo use local models, you need Ollama installed and running:
- Install Ollama: See Ollama downloads for your OS.
- Start Ollama: Run
ollama serveor start the Ollama app. - Pull a model: For example:
ollama pull llama2or your chosen model (e.g.deepseek-r1:8b). - Configure
.env: SetLLM_BACKEND=ollamaandOLLAMA_MODEL=llama2(or your chosen model).
If you do not want to use Ollama:
- Set
LLM_BACKEND=openaiand provide your OpenAI API key and model in.env.
Troubleshooting:
- If you see errors like "Could not connect to Ollama" or "model not found," make sure Ollama is running and the model is pulled.
- Install dependencies
pip install -r requirements.txt
- Configure environment
- Copy
.env.exampleto.envand set your API keys and model names. - Example:
OPENAI_API_KEY=sk-... LLM_BACKEND=openai # or ollama OPENAI_MODEL=gpt-4o # or gpt-4, etc. EMBEDDING_BACKEND=hf # or ollama/openai
- Copy
- Add your customer or project content
- Place
.txtfiles inside folders underdata/(e.g.,data/clientA/,data/clientB/,data/facts/). - Each folder is a separate knowledge base (brain) for a client, project, or topic.
- Example folder structure:
data/ clientA/ onboarding.txt api_endpoints.txt clientB/ requirements.txt meeting_notes.txt facts/ team.txt mission.txt
- Place
- Run the agent
python -m src.main
- Select the context folder (e.g.,
clientA) at the prompt. - Ask your question. RAGent will answer using only the selected folder’s content.
- Select the context folder (e.g.,
RAGent is designed for reliability and simplicity, but can be extended with advanced techniques from the latest RAG research, such as:
- Multi-hop Reasoning / Chain-of-Thought: Answer complex questions that require combining information from multiple documents or steps.
- Query Rewriting / Self-Refinement: Automatically rephrase or expand your question to improve retrieval.
- Retrieval Fusion / Re-ranking: Combine results from multiple retrieval methods (keyword, vector, etc.) and re-rank them for relevance.
- Tool-Augmented or Agentic RAG: Let the agent use plugins, calculators, or external APIs as part of its answer process.
- Answer Verification / Self-Consistency: Double-check answers or generate multiple candidates to select the most reliable response.
- Long-Context Handling / Summarization: Summarize or synthesize information across many documents for concise overviews.
For a comprehensive list of state-of-the-art RAG methods and resources, see:
- Ask questions about your documents interactively.
- The agent retrieves relevant context and answers, grounded ONLY in your knowledge base.
- Answers are concise—by default, the agent responds in one sentence.
- Example (project-specific context):
You: What is the onboarding code for clientA? [Retrieved context]: [1] The onboarding code for clientA is XJ-42B. Agent: The onboarding code for clientA is XJ-42B.
- Change
LLM_BACKEND/MODELin.envor use CLI flags (--backend,--model). - Ollama must be running and the model pulled (e.g.
ollama pull llama2).
- Add more
.txtfiles to your data folder and rerun the agent.
- Ollama errors: Make sure you have Ollama installed, running, and the model pulled (e.g.
ollama pull llama2). - OpenAI errors: Ensure your API key is set in
.envand you have access to the selected model. - Embeddings: If using
EMBEDDING_BACKEND=hf, confirm thatsentence-transformersis installed (it is included in requirements.txt). - General: If you see unexpected errors, check your
.envconfiguration and the README examples.
- PDF/HTML support
- Web UI
- Per-customer knowledge base folders
- Advanced chunking and metadata
.env,data/, andchroma_db/are gitignored by default.
MIT

