Skip to content

Conversation

@CharlelieLrt
Copy link
Collaborator

PhysicsNeMo Pull Request

Description

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.
  • The CHANGELOG.md is up to date with these changes.
  • An issue is linked to this pull request.

Dependencies

Review Process

All PRs are reviewed by the PhysicsNeMo team before merging.

Depending on which files are changed, GitHub may automatically assign a maintainer for review.

We are also testing AI-based code review tools (e.g., Greptile), which may add automated comments with a confidence score.
This score reflects the AI’s assessment of merge readiness and is not a qualitative judgment of your work, nor is
it an indication that the PR will be accepted / rejected.

AI-generated feedback should be reviewed critically for usefulness.
You are not required to respond to every AI comment, but they are intended to help both authors and reviewers.
Please react to Greptile comments with 👍 or 👎 to provide feedback on their accuracy.

@CharlelieLrt
Copy link
Collaborator Author

/blossom-ci

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

This PR changes the checkpoint saving mechanism from zipfile.ZIP_DEFLATED to zipfile.ZIP_STORED, effectively disabling compression when creating checkpoint archives.

Key Changes:

  • Modified physicsnemo/models/module.py:544 to use ZIP_STORED instead of ZIP_DEFLATED
  • This follows the recent refactoring in #1185 that introduced zip format support

Trade-offs:

  • Faster I/O: Removes CPU-intensive compression/decompression operations during checkpoint save/load
  • Larger files: Checkpoint files will be larger on disk (PyTorch model weights are already somewhat compressed by torch.save)
  • Better for training workflows: Training checkpoints are often written frequently, so faster save times can reduce interruptions

This is a reasonable optimization for ML workloads where checkpoint I/O speed is often more critical than storage space, especially when saving frequently during training.

Confidence Score: 5/5

  • This PR is safe to merge with minimal risk
  • The change is a simple one-line modification that switches from compressed to uncompressed zip format. The code is functionally identical otherwise - both ZIP_DEFLATED and ZIP_STORED are valid zip compression modes, and the loading logic handles both transparently. This is a performance optimization trading file size for speed, which is appropriate for ML checkpointing workflows.
  • No files require special attention

Important Files Changed

File Analysis

Filename Score Overview
physicsnemo/models/module.py 5/5 Changed zip compression from ZIP_DEFLATED to ZIP_STORED (no compression). This trades file size for faster save/load performance.

Sequence Diagram

sequenceDiagram
    participant User
    participant Module
    participant ZipFile
    participant FileSystem
    
    User->>Module: save(file_name)
    Module->>Module: Create temp file
    Module->>ZipFile: ZipFile(path, "w", ZIP_STORED)
    Note over ZipFile: No compression applied
    Module->>ZipFile: writestr("model.pt", state_dict)
    Module->>ZipFile: writestr("args.json", args)
    Module->>ZipFile: writestr("metadata.json", metadata)
    ZipFile-->>Module: Archive created
    Module->>FileSystem: Upload to destination
    Module->>Module: Clean up temp file
    Module-->>User: Checkpoint saved
    
    User->>Module: load(file_name)
    Module->>FileSystem: Download checkpoint
    Module->>ZipFile: ZipFile(path, "r")
    Note over ZipFile: Reads uncompressed data
    ZipFile->>Module: Read "model.pt"
    Module->>Module: torch.load(model_bytes)
    Module-->>User: Model loaded
Loading

1 file reviewed, no comments

Edit Code Review Agent Settings | Greptile

Copy link
Collaborator

@coreyjadams coreyjadams left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch and fix, lgtm!

@coreyjadams
Copy link
Collaborator

/blossom-ci

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

This PR removes zip compression overhead from checkpoint saving by changing from zipfile.ZIP_DEFLATED to zipfile.ZIP_STORED in physicsnemo/models/module.py:544.

Key changes:

  • Changed compression mode from ZIP_DEFLATED (compress data) to ZIP_STORED (store without compression)
  • This change reduces CPU overhead during checkpoint saves, trading faster save times for larger file sizes
  • The change is backward compatible - existing checkpoint loading code works with both compressed and uncompressed zip archives
  • PyTorch model state dicts (.pt files) are already compressed internally, so additional zip compression provides minimal benefit

Confidence Score: 5/5

  • This PR is safe to merge with minimal risk
  • The change is a simple one-line modification that only affects compression mode, not the zip file format or structure. The loading code in physicsnemo/models/module.py:700 uses zipfile.ZipFile(cached_file_name, "r") which automatically handles both compressed and uncompressed zip files, ensuring backward compatibility. The change trades disk space for performance, which is a reasonable optimization for large model checkpoints.
  • No files require special attention

Important Files Changed

File Analysis

Filename Score Overview
physicsnemo/models/module.py 5/5 Changed from ZIP_DEFLATED to ZIP_STORED to eliminate compression overhead during checkpoint saves

Sequence Diagram

sequenceDiagram
    participant User
    participant Module
    participant ZipFile
    participant TempFile
    participant FileSystem

    User->>Module: save(file_name)
    Module->>TempFile: Create temporary .zip file
    Module->>ZipFile: Open with ZIP_STORED (no compression)
    Module->>Module: Serialize state_dict to BytesIO
    Module->>ZipFile: writestr("model.pt", state_dict_bytes)
    Module->>ZipFile: writestr("args.json", args_json)
    Module->>ZipFile: writestr("metadata.json", metadata_json)
    ZipFile->>TempFile: Write uncompressed archive
    Module->>FileSystem: Upload temp file to destination
    Module->>TempFile: Clean up temporary file
Loading

1 file reviewed, no comments

Edit Code Review Agent Settings | Greptile

@coreyjadams coreyjadams merged commit ea7d521 into NVIDIA:main Nov 5, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants