This project provides a semantic image search application that allows users to search for images using natural language descriptions. The application leverages the CLIP (Contrastive Language–Image Pretraining) model to generate embeddings for both text and images, and uses FAISS (Facebook AI Similarity Search) for efficient similarity search.
semantic-seach-images.-.snippets.mp4
- Project Overview
- Installation
- Usage
- Configuration
- Project Structure
- Dependencies
- Troubleshooting
- License
The project consists of the following components:
- Embedding Generation: Uses the CLIP model to generate embeddings for images and text.
- Indexing: Creates a FAISS index for efficient similarity search.
- Search Interface: A Gradio-based web interface for searching images using text queries.
The workflow is as follows:
- Generate embeddings for all images in a directory.
- Create a FAISS index using the embeddings.
- Use the index to search for images based on text queries.
- Python 3.9 or higher
- pip (Python package manager)
-
Clone the repository:
git clone https://github.com/your-username/semantic-image-search.git cd semantic-image-search -
Create a virtual environment (optional but recommended):
python -m venv venv source venv/bin/activate # On Windows, use `venv\Scripts\activate`
-
Install the required dependencies:
pip install -r requirements.txt
Before running the application, you need to generate embeddings and create a FAISS index for your images.
-
Place your images in a directory (e.g.,
dataset/images). -
Run the
create_embeddings.pyscript:python create_embeddings.py --image_dir /path/to/your/image/directory
Replace
/path/to/your/image/directorywith the path to your image directory.This script will:
- Generate embeddings for all images in the directory.
- Create a FAISS index and save it to the
dataset/embeddingsdirectory (as specified inconfig.py).
Note: Ensure the
dataset/embeddingsdirectory exists or update theConfig.DATA_DIRpath inconfig.py.
Once the embeddings and index are created, you can start the application.
-
Run the
app.pyscript:python app.py
-
Open your browser and navigate to the URL provided in the terminal (usually
http://127.0.0.1:7860). -
Use the interface to:
- Enter a text query (e.g., "a sunny beach scene").
- Adjust the number of results using the slider.
- Click the "Search" button to view the results.
The project's configuration is managed in the config.py file. Key settings include:
IMAGE_EXTENSIONS: Supported image file extensions.MODEL_NAME: The CLIP model to use (default:clip-ViT-B-32).DEVICE: The device to use for inference (cpuorcuda).DATA_DIR: Directory to save embeddings and index files.INDEX_PATH,EMBEDDINGS_PATH,PATHS_FILE: Paths to the FAISS index, embeddings, and image paths files.
Update these settings as needed.
semantic-image-search/
├── create_embeddings.py # Script to generate embeddings and create FAISS index
├── app.py # Main application file (Gradio interface)
├── config.py # Configuration settings
├── test_app.py # Run tests
├── utils/
│ ├── embeddings.py # Embedding generation using CLIP
│ └── indexer.py # FAISS index creation and search
├── dataset/ # Directory for images and embeddings
│ ├── sample_images/ # Place your subset images here
│ └── embeddings/ # Embeddings and index files are saved here
├── requirements.txt # List of dependencies
└── README.md # This file
The project relies on the following Python libraries:
torch(PyTorch)sentence-transformers(for CLIP model)faiss-cpuorfaiss-gpu(for similarity search)gradio(for the web interface)Pillow(for image processing)numpy(for numerical operations)
Install them using:
pip install -r requirements.txt-
CUDA/GPU Issues:
- If you encounter CUDA-related errors, ensure you have the correct version of PyTorch installed for your GPU.
- Alternatively, set
DEVICE = "cpu"inconfig.py.
-
File Not Found Errors:
- Ensure the
dataset/embeddingsdirectory exists or updateConfig.DATA_DIRinconfig.py.
- Ensure the
-
Gradio Interface Not Loading:
- Ensure the application is running and accessible at
http://127.0.0.1:7860. - Check for port conflicts or firewall settings.
- Ensure the application is running and accessible at
-
No Results Found:
- Ensure the image directory contains valid images with supported extensions (
.png,.jpg,.jpeg,.webp).
- Ensure the image directory contains valid images with supported extensions (
- Image Directory (Dataset): Contains the images to be indexed and searched.
- Text Query (User query): The user's natural language input for searching images.
- Gradio Interface (UI): The web-based UI for user interaction.
- Embedding Generator (CLIP Model):
- Converts images and text into high-dimensional vectors (embeddings).
- Ensures embeddings are normalized for efficient similarity comparison.
- FAISS Index (Indexer):
- Stores image embeddings in a searchable index.
- Enables fast retrieval of similar embeddings using L2 distance or cosine similarity.
- Search Engine:
- Coordinates the search process by integrating the embedding generator and FAISS index.
- Retrieves the top
kmost similar images for a given query.
- FAISS Index File (
image_index.faiss):- Stores the indexed embeddings for fast similarity search.
- Image Paths File (
image_paths.npy):- Maps embeddings to their corresponding image paths for retrieval.
- Logs:
- Optional logging for debugging and monitoring system performance.
- Search Results:
- The top
kimages most similar to the query are displayed in the Gradio interface.
- The top
- Status Messages:
- Provides feedback to the user (e.g., "Found 5 results for query: 'a sunny beach scene'").
- Input:
- Images are read from the
Image Directory.
- Images are read from the
- Processing:
- Each image is passed to the
Embedding Generatorto create a vectorized embedding. - Embeddings are stored in the
FAISS Index. - Image paths are saved in the
Image Paths File.
- Each image is passed to the
- Output:
- The FAISS index and image paths are saved to disk for later use.
- Input:
- The user submits a text query through the Gradio interface.
- Processing:
- The query is converted into an embedding using the
Embedding Generator. - The
Search Enginequeries the FAISS index for the most similar image embeddings. - The corresponding image paths are retrieved from the
Image Paths File.
- The query is converted into an embedding using the
- Output:
- The images are loaded and displayed in the Gradio interface.
- User runs
create_embeddings.pywith--image_dir dataset/images. - The system:
- Processes images and generates embeddings.
- Creates a FAISS index and saves it to
dataset/embeddings/image_index.faiss. - Saves image paths to
dataset/embeddings/image_paths.npy.
- User runs
app.pyto launch the Gradio interface. - User enters a query (e.g., "a sunny beach scene") and selects the number of results.
- The system:
- Converts the query into an embedding.
- Searches the FAISS index for similar images.
- Retrieves and displays the top
kimages.