Skip to content

granica-ai/LossyCompressionScalingKDD2024

Repository files navigation

Scaling Training Data with Lossy Image Compression

Katherine L. Mentzer and Andrea Montanari

Accepted to KDD '24

arXiv Preprint
Promo Video

Summary

Scaling Training Data with Lossy Image Compression addresses how to handle large computer vision datasets when storage is limited. Storing massive datasets can be costly and challenging, but there are options to optimize storage with lossy image compression. We explore the balance between keeping fewer, full-resolution images and keeping more, lossily-compressed images. Our findings reveal that lossy compression can improve performance in the storage-limited regime. Understanding the relationship between test error, number of images, and compression levels allows us to optimize storage and improve model performance. This approach is valuable for anyone facing storage limitations. Learn more in the full paper.

Installation

  1. Clone the repository:

    git clone https://github.com/yourusername/repo-name.git
    cd repo-name
  2. Create the main environment:

    conda env create -f environment.yaml
  3. Create the MMSegmentation and MMDetection environment:

    conda env create -f environment_mm.yaml

    See the official MMSegmentation Installation Instructions and MMDetection Installation Instructions for more information on how to install the necessary packages.

Usage

The scripts are divided into classification, segmentation, detection, and theory for each set of experiments. Scripts with the prefix train are used for model training. The paper_plots.ipynb file shows the creation of each of the figures in the paper.

Datasets

Contact

For any questions or inquiries, please contact Katherine Mentzer at [email protected].

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published