This tool was originally created to explore AI-driven learning workflows.
Read the full story and motivations in my blog post: Building a Multi-Agent RAG System for Technical Learning.
The project has a modular structure with a Next.js frontend and a Python backend split across multiple packages using Poetry. LLM logic is decoupled from the API layer to potentially make it easier to swap out frameworks or languages. This decoupling is one of the core architectural goals—and challenges. The setup also serves as a sandbox for experimenting with systems design and working with a modern, service-oriented stack.
- LangGraph – Multi-agent state machine orchestrator for LLM workflows
- Server – Python 3.12, FastAPI, Pydantic, structured as multiple Poetry-managed packages
- Frontend – Next.js App Router with Server Actions and Server-Sent Events
- Databases – PostgreSQL for structured data, Chroma for embeddings, Firebase for blob storage
- Deployment – Docker Compose for local orchestration
As of now, the app is intended for local development. VPS deployment is on the roadmap.
To get everything running, make sure your .env files are correctly configured (see .env.example), then simply run:
docker compose -f docker-compose.dev.yml upTo run the frontend and backend manually (e.g., with hot reload during development), use the following:
Backend:
cd backend
poetry run fastapi dev server/main.py --port 8080Frontend:
cd frontend
pnpm devMake sure the Docker services (Postgres, Chroma, etc.) are still running in the background via Compose.
OAuth2-based authentication follows FastAPI’s official guide.
Routes and security logic are defined in security.py.
ChatController: Handles WebSocket connections and invokesChatService.EmbeddingController: Provides REST endpoints to initiate EPUB parsing and embedding workflows.FlashcardsController: Manages flashcard operations and Anki export viaAnkiService.
The statemachine package also powers real-time chat via WebSockets.
ChatService: Wraps LangGraph calls withget_openai_callback()to capture usage metadata (token counts, timing). Outputs are packaged intoChatOutputStreamDTO.- WebSocket Endpoint (
/ws/{chat_id}): Streams each LLM response chunk together with metadata, allowing the Next.js frontend to render text progressively and display live telemetry.
The parser is tested with books from the 3 main publishers in the tech publishing space. O'Reilly media, Manning shelter Island and Packt Publishing (Conditionally also tested with No Starch Press)
A dedicated EpubProcessingService decouples parsing
from API logic. Located in the tools Poetry package, the parser traverses the epub file and extracts
the raw html data, then it parses the html to plain text in order to prepare them for the vector embedding.
- Finds the EPUB’s TOC file
- Breaks content into sub-chapters
- Persists chapters in Postgres (
Chaptermodel) - Queues each chapter for embedding
Libraries used:
beautifulsoup4for HTML traversal- Python’s built-in
zipfileto extract EPUB contents
A dedicated EmbeddingService handles the transformation of parsed chapters into vectors. It:
-
Retrieves sanitized plain text of each chapter from Postgres
-
Generates vector embeddings in Chroma DB, organizing collections by learning area
-
Applies metadata weighting experiments (e.g. boosting title tokens for more relevant retrieval)
Libraries used:
chromadbclient for vector storage
A self-contained Poetry package statemachine that implements multi-agent workflows as a state
machine using LangGraph. The main chat functionality and vector DB retrieval are orchestrated by the
RagAgent, which leverages LangChain Expression Language (LCEL) for
precise control over execution parameters—most notably the selection and weighting of retrieved documents.
- Main Agent:
RagAgent- Streams chat responses and handles vector DB retrieval with metadata-aware prompts.
- Graph as State Machine
- Each agent node represents a state; edges define transitions. The top‑level orchestrator, the
SupervisorAgent, receives the initial output fromRagAgentand drives the workflow based on itsSupervisorState.
- Each agent node represents a state; edges define transitions. The top‑level orchestrator, the
- Separation of Concerns
- Decision logic—such as agent orchestration, gap analysis, and flashcard creation—remains isolated from external services (DB access, HTTP routes, SSE), ensuring the core graph is clean, testable, and maintainable.
StateGraph: Builds the directed graph from a configuration of agents and transitions, used bySupervisorAgent.StateExecutor: Traverses the graph, invoking each agent’srun()method and evaluating transition conditions defined insupervisor_state.py.- Agent Interfaces: Abstract base classes (
BaseAgent,StreamableAgent) defining the contract for custom agents likeKnowledgeIdentificationAgentandFlashcardAgent.
Implemented in
knowledge_identification_agent.py,
this agent inspects a user’s question, identifies knowledge gaps, and outputs a list of missing concepts for follow-up.
Defined in flashcard_agent.py, the FlashcardAgent transforms
identified gaps into study cards in a two-step process:
- Content Generation: Builds front and back text for each card, using LCEL prompts.
- Formatting: Applies categories and chromatic coding, yielding JSON-ready flashcard objects for downstream services.
We use the popular genanki library to generate and push flashcards into Anki decks. Implemented in
anki_service.py, it:
- Converts JSON flashcard objects into
genankinote and deck models.
- Zustand Cache: All data loaded at startup and kept in-memory (no multi-user concerns)
- Separation: Frontend cache (Zustand) vs backend storage (FastAPI + SQLModel)
- UI‑First: Instant filtering, mutation, and coordination for modern SPA interactivity
- Reused from shadcn/UI examples—fully accessible & responsive
- Built with
@tanstack/react-tableusinguseReactTablefor state management
- Chat: Session interaction UI with live WebSocket streaming and Markdown rendering
- Console: “Bento” view exposing document viewer, flashcards, and creator tabs for RAG embedding & area control
- Dashboard: Central control panel for file uploads, flashcard management, and agent instructions
- Client‑side JS Worker: Parses EPUB to extract metadata and cover image before upload
epub-processor.worker.ts:
- Logger: Centralized logging via custom
logger.tsin UI to capture errors and events
🚧 Currently under active development – basic integrations are in place. The OpenTelemetry collector is streaming traces and metrics to the respective services.
- Metrics & Monitoring: Prometheus
- Log Aggregation: Loki
- Dashboards: Grafana
- Tracing: OpenTelemetry & Tempo
- Deploy to VPS, with Kubernetes and an API Gateway
- Create research agent that finds best articles addressing knowledge gaps
- Use message broker for queue updates instead SSE
- Optimize WebSocket & rendering performance
- Add logging and other metrics
- Pagination for long chapter lists or chat history
- Delete
/arearoute - Create agent instruction routes
- Create CI/CD setup with code formatting
- Implement text‑marking for flashcard creation
- Testing endpoints & services
- Token cleanup
- Persist chat input in local storage
- Chat persistence with filter support
- Testing & refining agents (flashcards, knowledge-gap)
- Consolidate
session.exec()logic - Pydantic model validation improvements
