🖼️ VisualIndexer

AI-Powered Image Management & Semantic Search with PyTorch, CLIP, Transformers & Streamlit

📋 Project Description

VisualIndexer is a complete and intelligent system for image management, automatic indexing, and semantic search. Powered by Artificial Intelligence and state-of-the-art Deep Learning models (PyTorch, CLIP, Transformers).

This project enables:

📥 Batch ingest and optimize images
🔍 Automatically extract EXIF metadata
📄 Recognize text in images (OCR)
🏷️ Automatically generate intelligent visual tags
🧠 Create semantic vector representations
⚡ Search images by similarity
🎨 Explore results via an interactive web interface

🚀 Key Features

1️⃣ Image Ingestion

Batch upload and ingest image files
Automatic duplicate detection (MD5 hash)
Intelligent optimization and resizing (max 1920x1080)
Adaptive JPEG compression (quality 85%)

2️⃣ Metadata Extraction

Complete EXIF extraction (capture date, camera, GPS, etc.)
Image dimensions and format
Automatic CSV generation for analysis

3️⃣ Text Recognition (OCR)

Multi-language Tesseract OCR (English + French)
Extract text present in images
JSON caching for optimization

4️⃣ Automatic Tagging

Vision Transformer CLIP (OpenAI)
Intelligent visual tag generation
50+ predefined categories (city, portrait, food, document, etc.)

5️⃣ Semantic Embeddings

384D vector generation with Sentence-Transformers
Semantic content representation
Advanced similarity search

6️⃣ Advanced Search Engine

Text search with embeddings
Metadata filtering (date, size, format)
Combined tag search
Intelligent result fusion

7️⃣ Interactive Web Interface

Modern Streamlit dashboard
Image visualization
Multi-criteria search
Result export

🛠️ Technologies Used

VisualIndexer uses a modern and performant technology stack:

Python 3.10+ - Primary language
Pip - Package manager

Deep Learning & Vision

Technologie	Version	Usage
PyTorch	2.1.1	Framework deep learning
TorchVision	0.16.1	Vision utilities
Transformers	4.35.2	HuggingFace models
Sentence-Transformers	2.2.2	Embeddings sémantiques
CLIP	0.1.0.post1	Vision-Language model

Image Processing

Technology	Version	Usage
Pillow	10.1.0	Image manipulation
OpenCV	4.8.1	Vision algorithms
Pytesseract	0.3.10	OCR wrapper

Data Science & Analytics

Technology	Version	Usage
NumPy	1.26.2	Numerical computing
Pandas	2.1.3	Dataframes & data processing
Scikit-learn	1.3.2	ML utilities

Web & UI

Technologie	Version	Usage
Streamlit	1.29.0	Interface web interactive

Database & Utils

Technology	Version	Usage
PostgreSQL	-	(Optional) Database
Python-dotenv	1.0.0	Environment variables
TQDM	4.66.1	Progress bars
Requests	2.31.0	HTTP client

External Infrastructure

Tesseract OCR - Optical character recognition (Windows/Linux/Mac)

📁 Project Structure

VisualIndexer/
├── main.py                 # Main entry point
├── requirements.txt        # Python dependencies
├── .env                    # Configuration (Tesseract path)
├── .gitignore              # Git exclusions
│
├── config/
│   └── settings.py         # Centralized configuration
│
├── scripts/                # Business logic modules
│   ├── ingest.py           # Ingestion & duplicates
│   ├── extract_metadata.py # EXIF extraction
│   ├── ocr.py              # Tesseract OCR
│   ├── tag_clip.py         # CLIP tagging
│   ├── embeddings.py       # Semantic vectors
│   └── search.py           # Search engine
│
├── ui/
│   └── interface.py        # Streamlit interface
│
├── data/
│   ├── images/
│   │   ├── raw/            # Input images
│   │   └── processed/      # Optimized images
│   ├── metadata.csv        # Metadata
│   ├── embeddings.json     # Embeddings cache
│   └── ocr_results.json    # OCR cache
│
├── models/
│   └── cache/              # ML models cache
│
├── README.md               # Documentation
├── GUIDE_UTILISATION.md    # Usage guide
└── COMMITS_GUIDE.md        # Commits guide

⚙️ Installation & Configuration

Prérequis

Python 3.10 ou supérieur
Git
2GB d'espace disque (pour les modèles)

Installation Rapide

# 1. Cloner le repo
git clone https://github.com/IlyasFardaouix/VisualIndexer.git
cd VisualIndexer

# 2. Créer environnement virtuel
python -m venv venv
source venv/Scripts/activate  # Windows: venv\Scripts\activate

# 3. Installer dépendances
pip install -r requirements.txt

# 4. Installer Tesseract (Windows)
# Télécharger: https://github.com/tesseract-ocr/tesseract
# Installer et configurer path dans .env

# 5. Placer images
# Mettre images dans: data/images/raw/

# 6. Lancer le pipeline
python main.py --mode pipeline

# 7. Lancer l'interface web
python main.py --mode ui
# Accès: http://localhost:8501

Pipeline 5 Étapes

Images Brutes
    ↓
[1] INGESTION → Détection doublons, optimisation
    ↓
[2] MÉTADONNÉES → Extraction EXIF, CSV
    ↓
[3] OCR → Reconnaissance texte
    ↓
[4] TAGGING → CLIP vision, tags
    ↓
[5] EMBEDDINGS → Vecteurs sémantiques, recherche
    ↓
Résultats Indexés & Recherchables

🎯 Cas d'Usage

✅ Archivage Intelligent - Gestion massive d'images professionnelles
✅ Recherche Sémantique - Trouver images par similarité visuelle
✅ Indexation Automatique - Tags et métadonnées sans intervention
✅ Dédoublonnage - Eliminer doublons détectés
✅ Documentation - Extraire texte depuis documents scannés
✅ E-Commerce - Cataloguer produits en images

📝 Usage

Full Pipeline Mode

python main.py --mode pipeline

Processes all images in the data/images/raw/ folder

Web Interface Mode

python main.py --mode ui

Launches the Streamlit dashboard on http://localhost:8501

Ingestion Only Mode

python main.py --mode ingest

Ingests images only without AI modules

📚 Additional Documentation

GUIDE_UTILISATION.md - Complete usage guide
COMMITS_GUIDE.md - GitHub commits documentation
requirements.txt - Complete dependencies list

💡 Optimizations & Performance

✅ Intelligent ML model caching
✅ Reused embedding vectors
✅ Optimized JPEG compression
✅ Batch processing
✅ Progress tracking with TQDM

🔒 Security Configuration

Sensitive variables are stored in .env:

TESSERACT_PATH=C:\Program Files\Tesseract-OCR\tesseract.exe
OCR_LANGUAGE=eng+fra
DB_HOST=localhost
DB_PORT=5432

📄 License

MIT License - Free to use

👤 Author

Ilyas Fardaouix
GitHub: @IlyasFardaouix

🤝 Support & Contributions

Have questions or improvements? Open an Issue or submit a Pull Request

⭐ If you like this project, don't forget to star it!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🖼️ VisualIndexer

📋 Project Description

🚀 Key Features

1️⃣ Image Ingestion

2️⃣ Metadata Extraction

3️⃣ Text Recognition (OCR)

4️⃣ Automatic Tagging

5️⃣ Semantic Embeddings

6️⃣ Advanced Search Engine

7️⃣ Interactive Web Interface

🛠️ Technologies Used

Deep Learning & Vision

Image Processing

Data Science & Analytics

Web & UI

Database & Utils

External Infrastructure

📁 Project Structure

⚙️ Installation & Configuration

Prérequis

Installation Rapide

Pipeline 5 Étapes

🎯 Cas d'Usage

📝 Usage

Full Pipeline Mode

Web Interface Mode

Ingestion Only Mode

📚 Additional Documentation

💡 Optimizations & Performance

🔒 Security Configuration

📄 License

👤 Author

🤝 Support & Contributions

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

🖼️ VisualIndexer

📋 Project Description

🚀 Key Features

1️⃣ Image Ingestion

2️⃣ Metadata Extraction

3️⃣ Text Recognition (OCR)

4️⃣ Automatic Tagging

5️⃣ Semantic Embeddings

6️⃣ Advanced Search Engine

7️⃣ Interactive Web Interface

🛠️ Technologies Used

Deep Learning & Vision

Image Processing

Data Science & Analytics

Web & UI

Database & Utils

External Infrastructure

📁 Project Structure

⚙️ Installation & Configuration

Prérequis

Installation Rapide

Pipeline 5 Étapes

🎯 Cas d'Usage

📝 Usage

Full Pipeline Mode

Web Interface Mode

Ingestion Only Mode

📚 Additional Documentation

💡 Optimizations & Performance

🔒 Security Configuration

📄 License

👤 Author

🤝 Support & Contributions