RAG Forest is a context-aware document intelligence system designed to help students, faculty, and professionals rapidly generate:
- 📊 Context-dependent PowerPoint presentations
- 📝 Structured notes
- 🧩 Architecture diagrams, UML diagrams, and flowcharts
- 📚 Comprehensive answers synthesized from multiple documents
Unlike generic LLM-based generators, RAG Forest grounds every output strictly in user-provided documents, ensuring relevance, accuracy, and contextual integrity.
- Project Summary
- Goal of the Project
- Problem Statement
- Target Users
- What Makes RAG Forest Different
- Current Capabilities
- Current Limitations
- How to Run the Project Locally
- LLM Configuration
- Features Under Active Development
- Future Roadmap
- Platform Vision
The core goal of RAG Forest is to:
Reduce the time and effort required to understand a topic by automatically extracting, synthesizing, and structuring knowledge from uploaded documents—while preserving user-defined context.
This project is built around Retrieval-Augmented Generation (RAG) to avoid hallucinated or generic outputs and instead produce document-faithful results.
Traditional approaches to generating notes or presentations using LLMs suffer from key limitations:
- Outputs are often generic and not context-aware
- Documents must be manually read and summarized
- Diagrams and architecture flows require separate tools
- Information from multiple documents is hard to consolidate
- No grounding to what the user actually provided
- Understands user-uploaded documents
- Allows unlimited queries over uploaded content
- Generates answers only from relevant documents
- Produces structured content suitable for:
- PPTs
- Notes
- Diagrams
- 🎓 Students — exam preparation, concept understanding, notes
- 👩🏫 Faculty — lecture slides, structured explanations
- 👨💼 Professionals — architecture design, technical summaries
| Feature | Generic LLM Tools | RAG Forest |
|---|---|---|
| Context awareness | ❌ | ✅ |
| Multi-document synthesis | ❌ | ✅ |
| Grounded responses | ❌ | ✅ |
| Diagram-ready understanding | ❌ | ✅ |
| PPT & notes oriented | ❌ | ✅ |
- Upload PDF documents (at least once)
- Upload multiple documents
- Text extraction from PDFs
- Documents indexed and stored for reuse
- Ask unlimited queries after upload
- Queries answered using all relevant documents
- Responses are context-dependent, not generic
- Combines information across documents
- Produces structured, readable answers
- Designed to feed into:
- PPT generation
- Notes creation
- Diagram design workflows
- FastAPI-based backend
- Modular RAG pipeline
- Local execution support
- Only text content from PDFs is processed
- Images, tables, and diagrams are not yet interpreted
- Manual LLM API setup required
- UI supports basic interaction (early-stage)
- Python 3.10+
- Git
- (Optional but recommended) Virtual environment
git clone https://github.com/Sachin-baba-1/RAG_forest.git
cd RAG_forestpython -m venv env
env\Scripts\activate # Windows
pip install -r requirements.txt
cd forest
fastapi dev backend/main.py
http://127.0.0.1:8000/ui/
- Currently supports Mistral (manual API setup)
- User must configure the API key locally
- Selected for free and accessible experimentation
- 🔑 User authentication (sign-in system)
- 📂 Document selection per query
- 💬 Multiple chat sessions
- 📑 Custom number of pages for notes and PPTs
- 🎨 Template selection for presentations
- 📊 Structured PPT generation
- 📄 Exportable document notes
- Image interpretation inside PDFs
- Table and diagram comprehension
- Context-aware diagram extraction
- UML diagrams
- Architecture flowcharts
- Concept graphs
- Hosted online platform
- No manual API setup required
- Support for multiple LLM providers
- Plug-and-play API key management