Skip to content

Expanding to a full YFCC-100M filtered dataset #236

@landrumb

Description

@landrumb

We are looking to test our submission on a 100M scale filtered dataset, and would be happy to integrate it into datasets.py if the embeddings and metadata were added to the domain where the dataset currently downloads from. We would prepare them ourselves, but the corresponding file for dataset preparation refers to an external script for generating the metadata, and we do not have the full set of CLIP descriptors.

@mdouze could you make the full 100M vector dataset available where the 10M subset is hosted at https://dl.fbaipublicfiles.com/billion-scale-ann-benchmarks/yfcc100M/?

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions