Intelligent AI Consultant using RAG (Retrieval-Augmented Generation) to answer questions based on custom knowledge bases. While originally designed for data existing in the data/ folder (RAG source), and named PokeConsultor as I plan to implement a PokeAPI MCP server, it is domain-agnostic and can be easily adapted to any context.
- 🧠 Advanced Memory System: Integrated with LangChain's
SummarizationMiddlewarefor intelligent context management and automatic summarization of long conversations. - 🔐 PII Protection Middleware: Built-in
PIIMiddlewarestack (email, credit card, IP, API key, bearer token, database URL) with automatic redaction for inputs and tool results. - 🔍 Hybrid Search: Combines semantic (vector) search with lexical search using Rank Fusion (RRF).
- ⚡ Incremental Embeddings: Intelligent system that detects new, modified, or deleted files, processing only what's necessary.
- 📚 Multi-format Support: Automatic loading of PDF, CSV, TXT, Markdown, and more via Factory Pattern.
- 🖥️ Dual Interfaces: Choose between a powerful interactive CLI or a modern graphical interface built with PySide6.
- 🎯 LLM Profiles: Granular model configuration for different roles (Executor, Supervisor, Default).
- Python 3.11 up to 3.13
- uv (highly recommended)
-
Clone the repository
git clone https://github.com/frbelotto/PokeConsultor.git cd PokeConsultor -
Sync the environment
uv sync
-
Configure Environment Variables Create a
.envfile based on.env.example:cp .env.example .env
Then edit at least:
GROQ_API_KEYDATA_PATH(default:data/)POKEAPI_MCP_SERVER_URL(use a URL string; if MCP is disabled, keep any valid placeholder URL)
-
Run the application
uv run main.py
The system is divided into decoupled modules for easy maintenance and expansion:
graph TD
A[User] -->|Query| B[AIAgent]
B -->|Check Memory| C(Summarization Middleware)
B -->|Context Request| D[RAG Service]
D -->|Query| F[Hybrid Executor]
F -->|Vector Search| G[(ChromaDB)]
F -->|Lexical Search| H[Lexical Index]
F -->|Best Context| B
| Module | Responsibility |
|---|---|
agents/ |
Conversation orchestration and LangChain/LangGraph integration. |
services/rag/ |
Core retrieval engine, including hybrid search and RRF fusion. |
services/memory.py |
Checkpointing + middleware stack (PII redaction and optional summarization). |
services/data_loaders/ |
Extensible system for processing various file types. |
ui/ |
CLI and GUI (PySide6) implementations. |
The retrieval capability is exposed as a LangChain tool named retrieve_context.
This keeps retrieval decoupled from response generation and allows the LLM to call
the tool multiple times whenever needed.
High-level flow:
- User sends a question.
- The agent decides if retrieval is necessary.
- The agent calls
retrieve_context(query). - The tool runs hybrid retrieval (lexical + vector + RRF).
- The tool returns structured output (
context,sources,retrieved_docs). - The model synthesizes the final answer using only retrieved context.
uv run main.pyuv run main.py --guimemory: View the current memory state and summaries.clear_memory: Reset session history.debug: Enable detailed retrieval and token logs.exit: Close the application.
| Variable | Required | Notes |
|---|---|---|
GROQ_API_KEY |
Yes (for Groq models) | LLM provider key |
HF_TOKEN |
Optional | Enables authenticated Hugging Face downloads |
LLM_DEFAULT_* |
Yes | Default profile used by the agent |
LLM_PROFILE_EXECUTOR_* |
Yes | Executor profile |
LLM_PROFILE_SUPERVISOR_* |
Yes | Supervisor profile |
DATA_PATH |
Yes | Folder containing RAG source files |
CACHE_DIR |
Optional | Cache base path |
POKEAPI_MCP_ENABLED |
Optional | Enables/disables MCP usage |
POKEAPI_MCP_SERVER_URL |
Yes | Must be a URL string |
SUMMARIZATION_* |
Optional | Controls memory summarization |
Note:
AGENT_RECURSION_LIMITis not an environment variable anymore. The recursion budget is computed internally by the agent based on middleware stack size.
The system uses Rank Reciprocal Fusion (RRF) to combine results. You can adjust search sensitivity within the search services if needed.
RAGService automatically calculates token limits based on the configured model (e.g., Llama-3.1, Mixtral), ensuring the final prompt never exceeds the LLM's context window.
To avoid GraphRecursionError with multiple middlewares, the agent computes a safe budget internally:
This keeps runtime stable without extra tuning in .env.
If this happens:
- Ensure you are running the latest local code.
- Restart the process (CLI/GUI) after updates.
- Confirm your
.envdoes not rely on legacyAGENT_RECURSION_LIMITbehavior. - Keep middleware stack changes synchronized with the codebase.
Set HF_TOKEN in .env to increase rate limits and improve download reliability.
Feedbacks and Pull Requests are very welcome! If you find a bug or have a feature idea, please open an Issue.
Developed with ❤️ by Fábio Radicchi Belotto