Scaling Training Data with Lossy Image Compression

Katherine L. Mentzer and Andrea Montanari

Accepted to KDD '24

Summary

Scaling Training Data with Lossy Image Compression addresses how to handle large computer vision datasets when storage is limited. Storing massive datasets can be costly and challenging, but there are options to optimize storage with lossy image compression. We explore the balance between keeping fewer, full-resolution images and keeping more, lossily-compressed images. Our findings reveal that lossy compression can improve performance in the storage-limited regime. Understanding the relationship between test error, number of images, and compression levels allows us to optimize storage and improve model performance. This approach is valuable for anyone facing storage limitations. Learn more in the full paper.

Installation

Clone the repository:

git clone https://github.com/yourusername/repo-name.git
cd repo-name

Create the main environment:
```
conda env create -f environment.yaml
```
Create the MMSegmentation and MMDetection environment:
```
conda env create -f environment_mm.yaml
```
See the official MMSegmentation Installation Instructions and MMDetection Installation Instructions for more information on how to install the necessary packages.

Usage

The scripts are divided into classification, segmentation, detection, and theory for each set of experiments. Scripts with the prefix train are used for model training. The paper_plots.ipynb file shows the creation of each of the figures in the paper.

Datasets

Contact

For any questions or inquiries, please contact Katherine Mentzer at [email protected].

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
classification		classification
object_detection		object_detection
semantic_segmentation		semantic_segmentation
theory		theory
utils		utils
LICENSE		LICENSE
README.md		README.md
environment.yaml		environment.yaml
environment_mm.yaml		environment_mm.yaml
paper_plots.ipynb		paper_plots.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Scaling Training Data with Lossy Image Compression

Summary

Installation

Usage

Datasets

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

granica-ai/LossyCompressionScalingKDD2024

Folders and files

Latest commit

History

Repository files navigation

Scaling Training Data with Lossy Image Compression

Summary

Installation

Usage

Datasets

Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages