Skip to content

Conversation

@NetZissou
Copy link
Contributor

Gradio web interface for the BioCLIP Vector Database image search.

See docs/interface.md for documentation.

Image handling
- parse uploaded image to PIL
- embed PIL object to vector (placeholder dummy function)
- retrieve images based on provided UUID to (local file path, remote URI) mapping

Python functions to interact with DB server
- check server health -> Boolean
- search by providing vector, top_n, nprobe -> tabular result (UUID, distance)
- Removed db interface, replaced it with client
- Added scripts to handle image retrieval for HDF5
- Implemented a Gradio interface to search using the vector db server
- Updated the interface documentation with instructions to initialize
  the app
- Updated dependency for this web interface
@NetZissou NetZissou requested a review from smenon8 November 10, 2025 15:26
@NetZissou NetZissou marked this pull request as draft November 12, 2025 14:51
@NetZissou
Copy link
Contributor Author

Here's the updated Gradio interface with export disabled, and modified parameter names and short description.

{59188871-4859-4763-B8EE-5BB828B3494D}

NetZissou and others added 8 commits December 11, 2025 16:35
- `spark_stratified_sampler.py`: stratify sample data for leader index
  training
- `train_leader.py`: train a FAISS leader index with cuvs enabled
- `create_lookup.py`: create partition manifest and ID-mapped lookup
  tables from source Parquets
- `train_index.py`: add data to leader index to build the complete FAISS
  index for the vector dataset
- `convert_to_sqlite.py`: ingest parquet lookup tables into SQLite db
  and build index to improve search performance
added script to merge FAISS index partition into a monolithic object
added conda env config file as FAISS installation is best supported in
conda channels
Mocking the server from @smenon8 but using a monolithic approach to load
the merged index in-memory to enable fast search. Use SQLite as the
FAISS integer ID to metadata matching.
added script that start a image retrieval server that takes the UUID
from client and return search result from h5 files on disk. Used SQLite
with pre-built index on `uuid` to speed-up h5 files identification.
Enabled multi-threading to parallelize the h5 I/O process.
- added a new front-end application server that only communicate to the
FAISS index search server and the image retreival server.
- added css to temp fix the scrollbar [issue](gradio-app/gradio#10033) in Gradio
- fixed a bug in `image.py` to ensure embedding vector and model is on
  the same device (both CPU or both GPU)
- added slurm job script for quick deployment
- added guide on how to setup each server
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants