Removed zip compression when saving checkpoints #1210

CharlelieLrt · 2025-11-04T22:04:01Z

PhysicsNeMo Pull Request

Description

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.
The CHANGELOG.md is up to date with these changes.
An issue is linked to this pull request.

Dependencies

Review Process

All PRs are reviewed by the PhysicsNeMo team before merging.

Depending on which files are changed, GitHub may automatically assign a maintainer for review.

We are also testing AI-based code review tools (e.g., Greptile), which may add automated comments with a confidence score.
This score reflects the AI’s assessment of merge readiness and is not a qualitative judgment of your work, nor is
it an indication that the PR will be accepted / rejected.

AI-generated feedback should be reviewed critically for usefulness.
You are not required to respond to every AI comment, but they are intended to help both authors and reviewers.
Please react to Greptile comments with 👍 or 👎 to provide feedback on their accuracy.

Signed-off-by: Charlelie Laurent <[email protected]>

CharlelieLrt · 2025-11-04T22:04:32Z

/blossom-ci

greptile-apps

Greptile Overview

Greptile Summary

This PR changes the checkpoint saving mechanism from zipfile.ZIP_DEFLATED to zipfile.ZIP_STORED, effectively disabling compression when creating checkpoint archives.

Key Changes:

Modified physicsnemo/models/module.py:544 to use ZIP_STORED instead of ZIP_DEFLATED
This follows the recent refactoring in #1185 that introduced zip format support

Trade-offs:

Faster I/O: Removes CPU-intensive compression/decompression operations during checkpoint save/load
Larger files: Checkpoint files will be larger on disk (PyTorch model weights are already somewhat compressed by torch.save)
Better for training workflows: Training checkpoints are often written frequently, so faster save times can reduce interruptions

This is a reasonable optimization for ML workloads where checkpoint I/O speed is often more critical than storage space, especially when saving frequently during training.

Confidence Score: 5/5

This PR is safe to merge with minimal risk
The change is a simple one-line modification that switches from compressed to uncompressed zip format. The code is functionally identical otherwise - both ZIP_DEFLATED and ZIP_STORED are valid zip compression modes, and the loading logic handles both transparently. This is a performance optimization trading file size for speed, which is appropriate for ML checkpointing workflows.
No files require special attention

Important Files Changed

File Analysis

Filename	Score	Overview
physicsnemo/models/module.py	5/5	Changed zip compression from `ZIP_DEFLATED` to `ZIP_STORED` (no compression). This trades file size for faster save/load performance.

Sequence Diagram

sequenceDiagram
    participant User
    participant Module
    participant ZipFile
    participant FileSystem
    
    User->>Module: save(file_name)
    Module->>Module: Create temp file
    Module->>ZipFile: ZipFile(path, "w", ZIP_STORED)
    Note over ZipFile: No compression applied
    Module->>ZipFile: writestr("model.pt", state_dict)
    Module->>ZipFile: writestr("args.json", args)
    Module->>ZipFile: writestr("metadata.json", metadata)
    ZipFile-->>Module: Archive created
    Module->>FileSystem: Upload to destination
    Module->>Module: Clean up temp file
    Module-->>User: Checkpoint saved
    
    User->>Module: load(file_name)
    Module->>FileSystem: Download checkpoint
    Module->>ZipFile: ZipFile(path, "r")
    Note over ZipFile: Reads uncompressed data
    ZipFile->>Module: Read "model.pt"
    Module->>Module: torch.load(model_bytes)
    Module-->>User: Model loaded

_{1 file reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

coreyjadams

Good catch and fix, lgtm!

coreyjadams · 2025-11-05T14:55:17Z

/blossom-ci

greptile-apps

Greptile Overview

Greptile Summary

This PR removes zip compression overhead from checkpoint saving by changing from zipfile.ZIP_DEFLATED to zipfile.ZIP_STORED in physicsnemo/models/module.py:544.

Key changes:

Changed compression mode from ZIP_DEFLATED (compress data) to ZIP_STORED (store without compression)
This change reduces CPU overhead during checkpoint saves, trading faster save times for larger file sizes
The change is backward compatible - existing checkpoint loading code works with both compressed and uncompressed zip archives
PyTorch model state dicts (.pt files) are already compressed internally, so additional zip compression provides minimal benefit

Confidence Score: 5/5

This PR is safe to merge with minimal risk
The change is a simple one-line modification that only affects compression mode, not the zip file format or structure. The loading code in physicsnemo/models/module.py:700 uses zipfile.ZipFile(cached_file_name, "r") which automatically handles both compressed and uncompressed zip files, ensuring backward compatibility. The change trades disk space for performance, which is a reasonable optimization for large model checkpoints.
No files require special attention

Important Files Changed

File Analysis

Filename	Score	Overview
physicsnemo/models/module.py	5/5	Changed from `ZIP_DEFLATED` to `ZIP_STORED` to eliminate compression overhead during checkpoint saves

Sequence Diagram

sequenceDiagram
    participant User
    participant Module
    participant ZipFile
    participant TempFile
    participant FileSystem

    User->>Module: save(file_name)
    Module->>TempFile: Create temporary .zip file
    Module->>ZipFile: Open with ZIP_STORED (no compression)
    Module->>Module: Serialize state_dict to BytesIO
    Module->>ZipFile: writestr("model.pt", state_dict_bytes)
    Module->>ZipFile: writestr("args.json", args_json)
    Module->>ZipFile: writestr("metadata.json", metadata_json)
    ZipFile->>TempFile: Write uncompressed archive
    Module->>FileSystem: Upload temp file to destination
    Module->>TempFile: Clean up temporary file

_{1 file reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

CharlelieLrt and others added 2 commits November 4, 2025 13:59

Removed zip compression

e2d07ce

Signed-off-by: Charlelie Laurent <[email protected]>

Merge branch 'main' into bugfix-checkpoints-zip-overhead

718d20c

greptile-apps bot reviewed Nov 4, 2025

View reviewed changes

CharlelieLrt requested a review from coreyjadams November 4, 2025 23:37

coreyjadams approved these changes Nov 5, 2025

View reviewed changes

Merge branch 'main' into bugfix-checkpoints-zip-overhead

b0e3337

greptile-apps bot reviewed Nov 5, 2025

View reviewed changes

coreyjadams merged commit ea7d521 into NVIDIA:main Nov 5, 2025
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Removed zip compression when saving checkpoints #1210

Removed zip compression when saving checkpoints #1210

Uh oh!

CharlelieLrt commented Nov 4, 2025

Uh oh!

CharlelieLrt commented Nov 4, 2025

Uh oh!

greptile-apps bot left a comment •

edited

Loading

Uh oh!

coreyjadams left a comment

Uh oh!

coreyjadams commented Nov 5, 2025

Uh oh!

greptile-apps bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Removed zip compression when saving checkpoints #1210

Removed zip compression when saving checkpoints #1210

Uh oh!

Conversation

CharlelieLrt commented Nov 4, 2025

PhysicsNeMo Pull Request

Description

Checklist

Dependencies

Review Process

Uh oh!

CharlelieLrt commented Nov 4, 2025

Uh oh!

greptile-apps bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Greptile Overview

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

coreyjadams left a comment

Choose a reason for hiding this comment

Uh oh!

coreyjadams commented Nov 5, 2025

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Greptile Overview

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

greptile-apps bot left a comment •

edited

Loading