Skip to content

knileshh/visual-product-matcher

Repository files navigation

Visual Product Matcher

Live Deployment Frontend Repository

🎯 Production-ready AI-powered visual search engine with 42,700+ fashion products. Find visually similar items using CLIP embeddings and FAISS similarity search.

Created by Nilesh Kumar | [email protected]

Python Flask CLIP FAISS Docker Products

🌟 Highlights

  • 🎨 42,700+ Product Database - Fully indexed fashion products with embeddings
  • πŸš€ Production Ready - Deployed with Docker (1.62GB optimized image)
  • ⚑ Lightning Fast - <100ms search across entire catalog using FAISS
  • πŸ€– AI-Powered - OpenAI CLIP (ViT-B/32) for semantic understanding
  • 🌐 Cloud-Ready - Cloudinary CDN integration for scalable storage
  • πŸ”’ Enterprise Security - Rate limiting, validation, auto-cleanup
  • πŸ“Š Real-Time Search - Upload image or URL, get instant similar products
  • 🌍 Live Demo - Check the live deployment at https://visualmatch.knileshh.com

οΏ½ Project Scale

πŸ“¦ Products:        42,700 fashion items
πŸ” Embeddings:      21.8 million dimensions (42,700 Γ— 512)
πŸ’Ύ Database:        187 MB optimized SQLite
🐳 Docker Image:    1.62 GB (87% optimized from 12.4GB)
⚑ Search Index:    FAISS IndexFlat for maximum accuracy
🌐 Image Storage:   Cloudinary CDN (globally distributed)
🎯 Search Speed:    <100ms for entire catalog
πŸ“Š Memory Usage:    ~1.5GB runtime

οΏ½πŸš€ Features

  • Visual Search: Upload an image or provide a URL to find visually similar products
  • AI-Powered: Uses OpenAI's CLIP model for semantic image understanding
  • Fast Search: FAISS indexing enables efficient similarity search across 42K+ products
  • Try it Live: Visit the live demo: https://visualmatch.knileshh.com
  • Adjustable Threshold: Fine-tune similarity matching with adjustable threshold slider
  • Responsive UI: Mobile-friendly interface that works on all devices
  • RESTful API: Well-documented API endpoints for integration
  • Auto-Cleanup: Automatic removal of old uploaded files to prevent storage bloat
  • Cloud Storage: Integrated with Cloudinary for scalable image hosting

πŸ“ Project Structure

visual-product-matcher/
β”œβ”€β”€ app.py                      # Main Flask application
β”œβ”€β”€ config.yaml                 # Application configuration
β”œβ”€β”€ requirements.txt            # Python dependencies
β”œβ”€β”€ Dockerfile                  # Docker configuration
β”œβ”€β”€ .env.example               # Environment variables template
β”œβ”€β”€ .gitignore                 # Git ignore rules
β”‚
β”œβ”€β”€ src/                       # Source code
β”‚   β”œβ”€β”€ models/                # Database models
β”‚   β”œβ”€β”€ services/              # Business logic services
β”‚   β”œβ”€β”€ routes/                # API and UI routes
β”‚   └── middleware/            # Security and rate limiting
β”‚
β”œβ”€β”€ templates/                 # HTML templates
β”œβ”€β”€ static/                    # CSS, JS, images
β”‚   β”œβ”€β”€ css/
β”‚   β”œβ”€β”€ js/
β”‚   └── images/
β”‚
β”œβ”€β”€ data/                      # Data directory (gitignored)
β”‚   β”œβ”€β”€ products.db           # SQLite database
β”‚   β”œβ”€β”€ embeddings/           # Cached embeddings
β”‚   β”œβ”€β”€ index/                # FAISS index
β”‚   β”œβ”€β”€ uploads/              # User uploads (auto-cleaned)
β”‚   └── temp/                 # Temporary files
β”‚
β”œβ”€β”€ scripts/                   # Utility scripts
β”‚   β”œβ”€β”€ init_data.py          # Initialize database and index
β”‚   β”œβ”€β”€ upload_to_cloudinary.py  # Cloudinary migration
β”‚   β”œβ”€β”€ quick_api_test.py     # API testing
β”‚   └── README.md             # Scripts documentation
β”‚
β”œβ”€β”€ deployment/                # Deployment files
β”‚   β”œβ”€β”€ Dockerfile            # Docker container
β”‚   β”œβ”€β”€ gunicorn_config_cloud.py  # Production config
β”‚   └── README.md             # Deployment guide
β”‚
β”œβ”€β”€ docs/                      # Documentation
β”‚   β”œβ”€β”€ API_DOCUMENTATION.md  # API reference
β”‚   β”œβ”€β”€ DEPLOYMENT_CHECKLIST.md  # Deployment guide
β”‚   └── README.md             # Docs index
β”‚
└── logs/                      # Application logs (gitignored)

πŸ”§ Installation

1. Clone the Repository

git clone <repository-url>
cd visual-product-matcher

2. Create Virtual Environment

python -m venv venv

# On Windows
venv\Scripts\activate

# On Linux/Mac
source venv/bin/activate

3. Install Dependencies

pip install -r requirements.txt

4. Configure Environment

# Copy example env file
cp .env.example .env

# Edit .env with your credentials
# Add CLOUDINARY_URL if using Cloudinary

5. Initialize Data

Build the product database and FAISS index:

python scripts/init_data.py

This will:

  • Scan all images in fashion-images/ directory
  • Extract metadata and store in SQLite database
  • Generate CLIP embeddings for all products
  • Build FAISS search index

Note: Initial setup may take 30-60 minutes for 42K images depending on your hardware.

πŸš€ Usage

Running Locally

python app.py

The application will be available at http://localhost:5000

Running with Docker

# Build image
docker build -t visual-matcher .

# Run container
docker run -d -p 8080:8080 --env-file .env visual-matcher

# Check health
curl http://localhost:8080/api/health

Running in Production

gunicorn --config gunicorn_config_cloud.py "app:create_app()"

πŸ“š Documentation

Quick API Examples

Upload and Search:

curl -X POST http://localhost:5000/api/upload \
  -F "[email protected]" \
  -F "k=10" \
  -F "threshold=0.3"

Search by URL:

curl -X POST http://localhost:5000/api/search-url \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com/image.jpg", "k": 10}'

Health Check:

curl http://localhost:5000/api/health

πŸ§ͺ Testing

Run API tests:

# Quick test (while server is running)
python scripts/quick_api_test.py

# Comprehensive test suite
python scripts/test_api_endpoints.py

βš™οΈ Configuration

Edit config.yaml to customize:

  • Upload settings: File size limits, allowed formats, auto-cleanup intervals
  • ML settings: CLIP model, device (CPU/GPU), batch size
  • Search settings: Default results count, similarity thresholds
  • Performance: Caching, indexing options

Auto-Cleanup Feature

Uploaded files are automatically deleted after 60 minutes (configurable):

upload:
  cleanup:
    enabled: true
    interval_minutes: 30  # Check every 30 minutes
    max_age_minutes: 60   # Delete files older than 60 minutes

🚒 Deployment

Docker (Optimized: 1.62GB)

The Docker image has been optimized from 12.4GB to 1.62GB (87% reduction):

# Build optimized image
docker build -t visual-product-matcher:latest .

# Run locally
docker run -d -p 8080:8080 --env-file .env visual-product-matcher:latest

# Test
curl http://localhost:8080/api/health

Push to Docker Hub

# Login
docker login

# Tag image
docker tag visual-product-matcher:latest YOUR_USERNAME/visual-product-matcher:latest
docker tag visual-product-matcher:latest YOUR_USERNAME/visual-product-matcher:v1.2.0

# Push
docker push YOUR_USERNAME/visual-product-matcher:latest
docker push YOUR_USERNAME/visual-product-matcher:v1.2.0

See docs/DOCKER_HUB_PUSH.md for detailed instructions.

Railway Deployment

  1. From Docker Hub:

    • Deploy: YOUR_USERNAME/visual-product-matcher:latest
    • Add environment variables
    • Set PORT=8080
  2. From GitHub:

    • Connect your repository
    • Railway auto-detects Dockerfile
    • Configure environment variables
    • Deploy!

See docs/DEPLOYMENT_CHECKLIST.md for complete guide.

Next.js Frontend Integration

// See docs/API_DOCUMENTATION.md for integration examples
const results = await fetch('https://your-api.railway.app/api/search-url', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({ url: imageUrl, k: 20, threshold: 0.3 })
});

πŸ” Security

  • Rate Limiting: Upload (10/min), Search (30/min), General (100/hour)
  • File Validation: Type, size, and integrity checks
  • SSRF Protection: URL validation and sanitization
  • Auto-Cleanup: Prevents storage exhaustion
  • CORS: Configured for cross-origin requests

See SECURITY.md for security policy and reporting.

πŸ“Š Performance

  • Docker Image: 1.62GB (optimized, 87% smaller than original 12.4GB)
  • Search Speed: <100ms for 42K products (with FAISS)
  • Embedding Generation: ~50ms per image (GPU) / ~200ms (CPU)
  • Memory Usage: ~1.5GB runtime (with model loaded)
  • Database: 187MB (SQLite with products + metadata)
  • Startup Time: ~30 seconds (includes model loading)

πŸ”¬ Technology Stack

  • Backend: Python 3.10, Flask 3.0, Gunicorn
  • ML/AI: PyTorch (CPU-optimized), OpenAI CLIP (ViT-B/32), FAISS
  • Database: SQLite with SQLAlchemy
  • Cloud Storage: Cloudinary CDN
  • Frontend: HTML5, CSS3, Vanilla JavaScript
  • Container: Docker (multi-stage, optimized build)
  • Deployment: Railway-ready, Cloud Run compatible

🀝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

Copyright (c) 2025 Nilesh Kumar

πŸ™ Acknowledgments

πŸ‘¨β€πŸ’» Author

Nilesh Kumar


Built with ❀️ by Nilesh Kumar β€’ Production Ready πŸš€

🀝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments


Built with ❀️ by Nilesh Kumar β€’ Production Ready πŸš€

About

AI-powered visual product search and matching system using CLIP embeddings

Resources

License

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published