A web-based exploratory search system leveraging CLIP (Contrastive Language-Image Pre-training) models for enhanced discovery of digital collections, including maps, photographs, and born-digital documents.
This project describes out Digital Collections Explorer, available at: https://arxiv.org/abs/2507.00961.
We present Digital Collections Explorer, a web-based, open-source exploratory search platform that leverages CLIP (Contrastive Language-Image Pre-training) for enhanced visual discovery of digital collections. Our Digital Collections Explorer can be installed locally and configured to run on a visual collection of interest on disk in just a few steps. Building upon recent advances in multimodal search techniques, our interface enables natural language queries and reverse image searches over digital collections with visual features. An overview of our system can be seen in the image above.
- Multimodal search capabilities using both text and image inputs
- Support for various digital collection types:
- Historical maps
- Photographs
- Born-digital documents
- Fine-tuned CLIP models for improved accuracy (coming soon)
- User-friendly web interface for exploration
- Python 3.8+
- Node.js 14+
- Git
- Docker (optional, for containerized deployment)
git clone https://github.com/hinxcode/digital-collections-explorer.git
cd digital-collections-explorernpm install
npm run setup -- --type=photographsAvailable collection types:
photographs: For photo collections and image archivesmaps: For map collectionsdocuments: For born-digital documents collections
This will configure the project for your specific collection type and build the frontend.
# Create and activate a virtual environment (recommended)
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt-
Add your images to the directory configured as
raw_data_dir(default:data/raw). Supported formats include JPG, JPEG, PNG, GIF, BMP, TIFF, and WebP. The images in subdirectories will also be retrieved recursively. -
Generate embeddings for your collection:
python -m src.models.clip.generate_embeddingsThis will process all images found in raw_data_dir and create embeddings in embeddings_dir (both set in config.json).
python -m src.backend.mainThe API server will start at http://localhost:8000
For active development with hot-reloading:
# To enable auto-reloading of the backend server whenever code changes, first modify the `api_config.debug` setting in in `config.json` from `false` to `true`.
# Next, ensure the backend server is running. If the server is not yet running, navigate to the project's root directory and execute:
python -m src.backend.main
# Start the frontend development server
cd src/frontend/[photographs|maps|documents]
npm run devThis will start a frontend dev server at http://localhost:5173 with hot-reloading enabled. The development server will automatically proxy API requests to the backend at http://localhost:8000.
When you're ready to deploy your changes, and only if you have customized the frontend and made code changes, since Step 2 has already built the frontend once:
npm run frontend-buildThen restart the backend server to serve the updated frontend.
Contributions are welcome! We appreciate bug fixes, new features, and documentation improvements.
- Fork and clone the repository
- Create a feature branch:
git checkout -b feature/my-change - Set up the environment following the Quick Start guide above
- Make your changes and test locally
- Run linting:
- Python:
black . && isort . - Frontend:
npm run lint(in the frontend directory)
- Python:
- Commit with clear messages (Conventional Commits encouraged)
- Open a Pull Request
For detailed guidelines, please read CONTRIBUTING.md.
- ๐ Report bugs using our bug report template
- โจ Suggest features using our feature request template
- ๐ Improve documentation using our documentation template
- ๐ป Submit code via Pull Requests following our PR template
This project adheres to a Code of Conduct. By participating, you are expected to uphold this code.
Mahowald, J., & Lee, B. C. G. (2024). Integrating Visual and Textual Inputs for Searching Large-Scale Map Collections with CLIP. arXiv:2410.01190 [cs.IR]. https://arxiv.org/abs/2410.01190
