MultimodalRAG is an advanced Retrieval-Augmented Generation (RAG) system designed to process and query PDF documents containing text, images, and tables. It leverages multimodal embeddings, semantic retrieval, and Large Language Models (LLMs) via Groq to deliver accurate, source-grounded answers.
- Multi-PDF Upload with automatic preprocessing
- Smart Extraction of text, images, and tabular data
- Advanced Semantic Chunking using hybrid strategies
- Multimodal Embeddings (text and image) via CLIP/BGE
- Vector Indexing with Qdrant
- Context-Aware Retrieval using vector similarity
- LLM-Driven Response Generation (GPT-4o, Claude, LLaMA via Groq API)
- Modern Web Interface powered by Streamlit
- Table Interpretation with caption/context extraction
- OCR and Image Analysis with automatic captioning
- Integrated Monitoring & Performance Metrics
- Containerized Deployment with Docker & Docker Compose
| Component | Technology | Description |
|---|---|---|
| Frontend | Streamlit | Interactive web UI |
| Vector DB | Qdrant | High-performance vector search |
| Embeddings | CLIP / BGE / OpenAI | Multimodal encoding models |
| LLMs | Groq API | Access to GPT, Claude, LLaMA |
| Parsing | PyMuPDF / Tesseract | Text/table/OCR extraction |
| Pipeline | LangChain | RAG orchestration framework |
| CI/CD | Pre-commit, GitHub Actions | Dev workflow automation |
git clone <repository-url>
cd multimodalragcp .env.example .env
# Edit .env and add your GROQ_API_KEYdocker-compose up -dopen http://localhost:8501pip install -r requirements.txt
docker run -d -p 6333:6333 qdrant/qdrant
streamlit run streamlit_app/Home.py- Python 3.8+ installed
- Docker & Docker Compose available
-
.envconfigured with valid GROQ_API_KEY - Ports 8501 (Streamlit) and 6333 (Qdrant) available
- Upload a sample PDF file
- Textual query: "What is this document about?"
- Image query: "Show me related diagrams or illustrations"
- Table query: "What data is shown in the tables?"
Qdrant not reachable
docker ps
curl http://localhost:6333/healthGROQ API not responding
cat .env | grep GROQ_API_KEYPort conflict
streamlit run streamlit_app/Home.py --server.port=8502cp .env.example .env
# Add GROQ_API_KEY
docker-compose up -dmake setup-dev # Install dependencies & pre-commit hooks
make qdrant-start # Launch Qdrant
make run # Launch Streamlit Appmake help # List all commands
make setup-dev # Full local dev setup
make run # Launch the app
make reindex # Re-index all documents
make evaluate # Run automatic evaluation
make benchmark # Run benchmark analysis
make clean # Clean temporary files
make docker-build # Build the Docker image
make ci # Run CI pipelineOther commands include lint, format, check-all, bandit, and more for code quality checks can be found in the Makefile or by running make help.
multimodalrag/
├── src/
│ ├── config.py
│ ├── core/
│ ├── llm/
│ ├── pipeline/
│ └── utils/
├── data/
│ ├── models/
│ └── raw/
├── scripts/
├── streamlit_app/
├── logs/
- Upload PDF via UI
- Automatic Processing and semantic chunking
- Multimodal Indexing (text + image)
- Query using text, image or hybrid inputs
- Response Generation using LLMs
make reindexThis project is licensed under the MIT License. See the LICENSE file for more details.