MultimodalRAG

MultimodalRAG is an advanced Retrieval-Augmented Generation (RAG) system designed to process and query PDF documents containing text, images, and tables. It leverages multimodal embeddings, semantic retrieval, and Large Language Models (LLMs) via Groq to deliver accurate, source-grounded answers.

Key Features

Multi-PDF Upload with automatic preprocessing
Smart Extraction of text, images, and tabular data
Advanced Semantic Chunking using hybrid strategies
Multimodal Embeddings (text and image) via CLIP/BGE
Vector Indexing with Qdrant
Context-Aware Retrieval using vector similarity
LLM-Driven Response Generation (GPT-4o, Claude, LLaMA via Groq API)
Modern Web Interface powered by Streamlit
Table Interpretation with caption/context extraction
OCR and Image Analysis with automatic captioning
Integrated Monitoring & Performance Metrics
Containerized Deployment with Docker & Docker Compose

Technology Stack

Component	Technology	Description
Frontend	Streamlit	Interactive web UI
Vector DB	Qdrant	High-performance vector search
Embeddings	CLIP / BGE / OpenAI	Multimodal encoding models
LLMs	Groq API	Access to GPT, Claude, LLaMA
Parsing	PyMuPDF / Tesseract	Text/table/OCR extraction
Pipeline	LangChain	RAG orchestration framework
CI/CD	Pre-commit, GitHub Actions	Dev workflow automation

Data Flow

Quick Start

1. Clone the repository

git clone <repository-url>
cd multimodalrag

2. Set up API keys

cp .env.example .env
# Edit .env and add your GROQ_API_KEY

3. Run the full stack (App + Qdrant)

docker-compose up -d

4. Open the app

open http://localhost:8501

Local Setup (Alternative)

pip install -r requirements.txt
docker run -d -p 6333:6333 qdrant/qdrant
streamlit run streamlit_app/Home.py

Pre-Launch Checklist

Python 3.8+ installed
Docker & Docker Compose available
.env configured with valid GROQ_API_KEY
Ports 8501 (Streamlit) and 6333 (Qdrant) available

Functional Test Scenarios

Upload a sample PDF file
Textual query: "What is this document about?"
Image query: "Show me related diagrams or illustrations"
Table query: "What data is shown in the tables?"

Troubleshooting

Qdrant not reachable

docker ps
curl http://localhost:6333/health

GROQ API not responding

cat .env | grep GROQ_API_KEY

Port conflict

streamlit run streamlit_app/Home.py --server.port=8502

Developer Setup

Docker-based Setup

cp .env.example .env
# Add GROQ_API_KEY
docker-compose up -d

Makefile-based Setup

make setup-dev       # Install dependencies & pre-commit hooks
make qdrant-start    # Launch Qdrant
make run             # Launch Streamlit App

Makefile Commands Reference

make help             # List all commands
make setup-dev        # Full local dev setup
make run              # Launch the app
make reindex          # Re-index all documents
make evaluate         # Run automatic evaluation
make benchmark        # Run benchmark analysis
make clean            # Clean temporary files
make docker-build     # Build the Docker image
make ci               # Run CI pipeline

Other commands include lint, format, check-all, bandit, and more for code quality checks can be found in the Makefile or by running make help.

Project Structure

multimodalrag/
├── src/                  
│   ├── config.py         
│   ├── core/             
│   ├── llm/              
│   ├── pipeline/         
│   └── utils/            
├── data/                 
│   ├── models/           
│   └── raw/              
├── scripts/              
├── streamlit_app/        
├── logs/

Workflow

Upload PDF via UI
Automatic Processing and semantic chunking
Multimodal Indexing (text + image)
Query using text, image or hybrid inputs
Response Generation using LLMs

Running Indexer Manually

make reindex

License

This project is licensed under the MIT License. See the LICENSE file for more details.

Name		Name	Last commit message	Last commit date
Latest commit History 209 Commits
data/models		data/models
docs		docs
scripts		scripts
src		src
streamlit_app		streamlit_app
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
pyrightconfig.json		pyrightconfig.json
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MultimodalRAG

Key Features

Technology Stack

Data Flow

Quick Start

1. Clone the repository

2. Set up API keys

3. Run the full stack (App + Qdrant)

4. Open the app

Local Setup (Alternative)

Pre-Launch Checklist

Functional Test Scenarios

Troubleshooting

Developer Setup

Docker-based Setup

Makefile-based Setup

Makefile Commands Reference

Project Structure

Workflow

Running Indexer Manually

License

Documentation

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

FrancescoRomeo02/multimodalragApp

Folders and files

Latest commit

History

Repository files navigation

MultimodalRAG

Key Features

Technology Stack

Data Flow

Quick Start

1. Clone the repository

2. Set up API keys

3. Run the full stack (App + Qdrant)

4. Open the app

Local Setup (Alternative)

Pre-Launch Checklist

Functional Test Scenarios

Troubleshooting

Developer Setup

Docker-based Setup

Makefile-based Setup

Makefile Commands Reference

Project Structure

Workflow

Running Indexer Manually

License

Documentation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages