Welcome to the Image Retrieval Web Application repository! This project focuses on creating a web-based image retrieval system using Gradio library for the frontend and a combination of Convolutional Neural Networks (CNNs) and Transformers from the Hugging Face Transformers library for feature extraction.
-
Web Interface: The application provides a user-friendly web interface powered by Gradio, allowing users to easily interact with the image retrieval system.
-
Feature Extraction: The backend utilizes state-of-the-art CNNs and Transformers to extract meaningful feature vectors from images. This enables efficient and accurate image retrieval.
-
Model Variety: The project supports various CNN architectures and Transformers, allowing users to choose the best model for their specific needs.
- In this project, we have implemented 8 models for experimentation. They are as follows:
- VisionTransformer
- BEiT
- MobileViTV2
- Bit
- EfficientFormer
- MobileNetV2
- ResNet
- EfficientNet
-
The "Paris Buildings" dataset comprises a collection of high-resolution images capturing various architectural structures across Paris. The dataset is carefully annotated, providing information on building types, architectural styles, and geographic locations. Researchers and developers can leverage this dataset for tasks such as image classification, object detection, and scene understanding.
-
For this dataset, it is divided into two parts. You can download them here:
-
The groundtruth can be downloaded here.
-
The "Oxford Buildings" dataset focuses on architectural diversity within the city of Oxford. Similar to the Paris dataset, it includes annotated images to facilitate research in computer vision and related fields. This dataset is particularly suitable for tasks involving cross-city analysis or comparative studies between different architectural environments.
-
The images in the Oxford Buildings dataset can be found here.
-
The groundtruth can be downloaded here.
The Euclidean distance measures the straight-line distance between two points in Euclidean space. For vectors
This method is suitable for scenarios where the absolute magnitude and direction of the vectors are crucial for similarity assessment.
Cosine similarity calculates the cosine of the angle between two vectors in multi-dimensional space. For vectors
This method is effective when the direction of vectors is more important than their magnitudes. It is widely used in text mining, document analysis, and recommendation systems.
For a visual representation of the project in action, check out the Video Demo.
-
Clone the repository:
git clone https://github.com/duongve13112002/ImageRetrieval.git
-
Install dependencies:
pip install -r requirements.txt
-
Run the notebook (Detailed instructions on how to run are provided in here):
jupyter notebook notebook.ipynb
Visit
http://localhost:7860in your web browser to access the Image Retrieval Web Application.
-
Upload Image: Use the web interface to upload an image for retrieval.
-
Retrieve Similar Images: The system will extract features using the selected model and display similar images from the dataset.
Contributions are welcome! If you find any issues or have ideas for improvements, please create a GitHub issue or submit a pull request.
This project is licensed under the MIT License - see the LICENSE file for details.
- Gradio: Gradio Documentation
- Hugging Face Transformers: Transformers Documentation
Thank you for using and contributing to the Image Retrieval Web Application!