An AI-powered image search application that combines text and visual similarity to help you find the right images. This application uses advanced deep learning models (CLIP) to understand both the visual content and textual descriptions of your images.
This application has been refactored to use a modern architecture:
- Backend: FastAPI with ChromaDB for vector storage and similarity search
- Frontend: Next.js with React, TypeScript, and Tailwind CSS
- Multimodal Search: Find images using text, visual similarity, or a combination of both
- Image Management: Upload, browse, and manage your image collection
- Custom Metadata: Add and edit descriptions and custom metadata for each image
- Background Removal: Optionally remove image backgrounds during upload
- Filter System: Create and apply filters to organize your image collection
- Automatic Image Captioning: AI-generated image descriptions for better searchability
- Batch Upload: Process multiple images at once with real-time progress tracking
- Duplicate Detection: Automatically identify and handle duplicate images
- Advanced Progress Tracking: Detailed visibility into processing steps, including filter application
- Python 3.8+ for the backend
- Node.js 18+ for the frontend
- Git
- PyTorch 2.0+ (compatible with PyTorch 2.6 using
weights_only=Falseparameter)
-
Clone the repository
git clone https://github.com/yourusername/multimodal-image-similarity-search.git cd multimodal-image-similarity-search -
Set up the backend
cd backend pip install -r requirements.txt -
Set up the frontend
cd frontend npm install
-
Create or edit the
.envfile in the backend directory with the following variables:COLLECTION_NAME=image-match CHROMA_PERSIST_DIR=chroma_data
-
Create a
.env.localfile in the frontend directory with:NEXT_PUBLIC_API_URL=http://localhost:8000
-
Start the backend (from the backend directory)
python run.py
The API will be available at http://localhost:8000
-
Start the frontend development server (from the frontend directory)
npm run dev
The frontend will be available at http://localhost:3000
The application supports uploading and processing multiple images at once with real-time progress tracking:
- Select multiple files for upload
- See detailed progress for each file being processed
- Monitor filter application in real-time
- Automatic handling of duplicate images
Create and apply custom filters to organize your image collection:
- Add new filters with natural language queries
- Track filter processing with a progress indicator
- Apply filters to all existing images
- Search images using filters
The backend provides the following REST API endpoints:
- POST /api/upload: Upload and process an image
- POST /api/upload-folder: Batch upload multiple images
- POST /api/search/image: Search by image similarity
- GET /api/search/text: Search by text description
- POST /api/search/multimodal: Search using both image and text
- GET /api/images: Get all images in the collection
- GET /api/filters: Get all saved filters
- POST /api/filters: Add a new filter
- DELETE /api/filters/{filter_query}: Delete a filter
- PUT /api/metadata/{image_id}: Update image metadata
- POST /api/reset: Reset the system (clear all data)
- GET /api/filter-progress: Check the progress of filter processing
- GET /api/image/{image_id}: Get a specific image by ID
- FastAPI - High-performance API framework
- ChromaDB - Vector database for similarity search
- PyTorch - Deep learning framework
- CLIP - Multimodal model from OpenAI
- Moondream - Image captioning model
- Rembg - Background removal
- Next.js - React framework
- TypeScript - Type-safe JavaScript
- Tailwind CSS - Utility-first CSS framework
- Zustand - State management
- Axios - HTTP client
- React Dropzone - File upload component
This project is licensed under the MIT License - see the LICENSE file for details.