Katherine L. Mentzer and Andrea Montanari
Accepted to KDD '24
Scaling Training Data with Lossy Image Compression addresses how to handle large computer vision datasets when storage is limited. Storing massive datasets can be costly and challenging, but there are options to optimize storage with lossy image compression. We explore the balance between keeping fewer, full-resolution images and keeping more, lossily-compressed images. Our findings reveal that lossy compression can improve performance in the storage-limited regime. Understanding the relationship between test error, number of images, and compression levels allows us to optimize storage and improve model performance. This approach is valuable for anyone facing storage limitations. Learn more in the full paper.
-
Clone the repository:
git clone https://github.com/yourusername/repo-name.git cd repo-name -
Create the main environment:
conda env create -f environment.yaml
-
Create the MMSegmentation and MMDetection environment:
conda env create -f environment_mm.yaml
See the official MMSegmentation Installation Instructions and MMDetection Installation Instructions for more information on how to install the necessary packages.
The scripts are divided into classification, segmentation, detection, and theory for each set of experiments. Scripts with the prefix train are used for model training. The paper_plots.ipynb file shows the creation of each of the figures in the paper.
For any questions or inquiries, please contact Katherine Mentzer at [email protected].