Skip to content

Conversation

@dreamtalen
Copy link
Contributor

@dreamtalen dreamtalen commented Oct 13, 2025

PhysicsNeMo Pull Request

Description

Upgrade multi-storage-client to latest version 0.33.0

Highlights:

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.
  • The CHANGELOG.md is up to date with these changes.
  • An issue is linked to this pull request.

Dependencies

@dreamtalen dreamtalen requested a review from ktangsali as a code owner October 13, 2025 23:10
@dreamtalen dreamtalen changed the title chore: bump multi-storage-client to v0.32.0 with rust client chore: bump multi-storage-client to v0.33.0 with rust client Oct 23, 2025
Copy link
Collaborator

@ktangsali ktangsali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

Upgraded multi-storage-client from >=0.14.0 to >=0.33.0, introducing Rust client support for improved performance when accessing object storage.

Key Changes:

  • Updated version constraint in pyproject.toml storage extras
  • Added rust_client.skip_signature: True config for unsigned S3 bucket access
  • Documented change in CHANGELOG.md
  • Added explicit fsspec extra and zarr dependency to example requirements

Issue Found:

  • Dockerfile still references the old version constraint (>=0.14.0) and needs updating to match pyproject.toml (>=0.33.0)

Confidence Score: 3/5

  • Safe to merge after updating Dockerfile to match the new version constraint
  • The version bump is a significant jump (0.14.0 to 0.33.0) but the configuration changes suggest backward compatibility. However, there's a version inconsistency: pyproject.toml requires >=0.33.0 while Dockerfile still has >=0.14.0, which could cause issues in containerized environments. The rust client configuration is properly added and the changelog is documented.
  • The Dockerfile requires an update to match the new version constraint in pyproject.toml

Important Files Changed

File Analysis

Filename Score Overview
pyproject.toml 5/5 Updated multi-storage-client[boto3] dependency from >=0.14.0 to >=0.33.0 in storage optional dependencies
CHANGELOG.md 5/5 Added changelog entry documenting the multi-storage-client version bump to 0.33.0 with rust client support
examples/multi_storage_client/msc_config.yaml 5/5 Added rust_client.skip_signature: True configuration to enable rust client for unsigned S3 bucket access
examples/multi_storage_client/requirements.txt 5/5 Added explicit fsspec extra and zarr dependency (fsspec already in main deps, zarr redundant addition)

Sequence Diagram

sequenceDiagram
    participant User
    participant PhysicsNeMo
    participant MSC as Multi-Storage-Client
    participant RustClient as Rust Client (NEW)
    participant S3 as AWS S3
    
    User->>PhysicsNeMo: Install with storage extras
    PhysicsNeMo->>MSC: Install multi-storage-client[boto3]>=0.33.0
    
    User->>PhysicsNeMo: Configure msc_config.yaml
    Note over PhysicsNeMo: rust_client.skip_signature: True
    
    User->>PhysicsNeMo: Access data via zarr.open("msc://...")
    PhysicsNeMo->>MSC: Request data with MSC protocol
    
    alt Rust Client Enabled (v0.33.0+)
        MSC->>RustClient: Use Rust bindings for performance
        RustClient->>S3: Fetch data (unsigned access)
        S3-->>RustClient: Return data
        RustClient-->>MSC: Return data (faster)
    else Legacy Python Client
        MSC->>S3: Fetch via boto3
        S3-->>MSC: Return data
    end
    
    MSC-->>PhysicsNeMo: Return zarr data with caching
    PhysicsNeMo-->>User: Training/analysis ready
Loading

Additional Comments (1)

  1. Dockerfile, line 216 (link)

    logic: Dockerfile still specifies multi-storage-client[boto3]>=0.14.0 but pyproject.toml now requires >=0.33.0. Update to match.

4 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

This PR upgrades multi-storage-client from v0.14.0 to v0.33.0, introducing Rust client bindings for improved performance when accessing cloud object storage.

Key changes:

  • Updated minimum version requirement in pyproject.toml, Dockerfile from >=0.14.0 to >=0.33.0
  • Added rust_client.skip_signature: True configuration in example config to enable unsigned S3 access with Rust client
  • Enhanced example dependencies with fsspec extra and explicit zarr package for better compatibility
  • Updated CHANGELOG.md to document the upgrade

Benefits:

  • Performance improvements from Rust bindings while maintaining unified Python interfaces
  • Enhanced monitoring with granular metrics via Metrics v2 feature
  • Backward compatible upgrade with proper configuration for existing unsigned S3 workflows

Confidence Score: 5/5

  • This PR is safe to merge with minimal risk - it's a well-structured dependency upgrade with appropriate configuration updates
  • The PR makes consistent version updates across all dependency declarations (pyproject.toml and Dockerfile), properly configures the new Rust client feature for existing unsigned S3 workflows, enhances example dependencies for better functionality, and documents changes in the CHANGELOG. The upgrade from 0.14.0 to 0.33.0 is backward compatible, and the addition of rust_client configuration correctly mirrors the existing signature_version: UNSIGNED behavior.
  • No files require special attention

Important Files Changed

File Analysis

Filename Score Overview
CHANGELOG.md 5/5 Added entry documenting the version bump to 0.33.0 with rust client support
Dockerfile 5/5 Updated minimum version requirement from 0.14.0 to 0.33.0 for multi-storage-client
pyproject.toml 5/5 Updated storage extra dependency from 0.14.0 to 0.33.0 for multi-storage-client
examples/multi_storage_client/requirements.txt 5/5 Added fsspec extra and zarr dependency for complete example functionality
examples/multi_storage_client/msc_config.yaml 5/5 Added rust_client configuration with skip_signature option for unsigned S3 access

Sequence Diagram

sequenceDiagram
    participant User
    participant PhysicsNeMo
    participant MSC as Multi-Storage-Client
    participant RustClient as Rust Client
    participant S3 as AWS S3

    Note over User,S3: Version Upgrade: 0.14.0 → 0.33.0

    User->>PhysicsNeMo: Install with storage extra
    PhysicsNeMo->>MSC: Install multi-storage-client[boto3]>=0.33.0
    
    User->>PhysicsNeMo: Load msc_config.yaml
    Note over PhysicsNeMo: Config includes rust_client settings
    
    User->>PhysicsNeMo: zarr.open("msc://cmip6-pds/...")
    PhysicsNeMo->>MSC: Request data via msc:// protocol
    
    alt Rust Client Available (v0.33.0+)
        MSC->>RustClient: Use Rust bindings for performance
        RustClient->>S3: Unsigned request (skip_signature: True)
        S3-->>RustClient: Return data
        RustClient-->>MSC: Process with Rust performance
    else Fallback to Python SDK
        MSC->>S3: Use boto3 Python SDK
        S3-->>MSC: Return data
    end
    
    MSC-->>PhysicsNeMo: Return zarr data
    PhysicsNeMo-->>User: Training data ready
Loading

5 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

@ktangsali
Copy link
Collaborator

/blossom-ci

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants