This project is a multi-modal product search engine that allows users to input a text querry or an image query and retrieves the top matching fashion product images from a dataset using deep learning and vector similarity.
It uses CLIP (Contrastive Language–Image Pre-training) for creating embeddings from both images and text, and FAISS for efficient nearest neighbor search.
Real world applications:
- E-Commerce Product Search
- Visual Search (Search by Image or Text)
- Digital Asset Management (DAM)
- Content Moderation & Compliance
- Game Asset or 3D Model Search
We use a subset of the Fashion Product Images (Small) dataset from Kaggle, consisting of:
- ~15,000 product images
styles.csv: CSV file mapping product metadatastyles/: Folder containing per-image JSON files withcategory,subcategory, and descriptions


