Skip to content

Releases: GoogleCloudPlatform/cluster-toolkit

v1.57.2: automate nvidia-bug-report collection on GCE COS VM

11 Jul 23:11
6e8b14b

Choose a tag to compare

What's Changed

Key New Features 🎉

  • feat: add script to automate nvidia-bug-report collection on GCE COS VM. by @ljqg in #4317

New Contributors

Full Changelog: v1.57.1...v1.57.2

v1.57.1: Add Kueue-0.12.2 and make it as default

10 Jul 13:16
d733fad

Choose a tag to compare

What's Changed

Module Improvements 🔨

Full Changelog: v1.57.0...v1.57.1

Release v1.57.0

30 Jun 17:56
0bba393

Choose a tag to compare

Highlights

What's Changed

Breaking changes 🚨

As part of #4275 the install_cloud_rdma_drivers.sh startup script will now be removed from H4D blueprints, users should update to this version of Cluster Toolkit as the latest HPC VM/Slurm images will have compatible versions of the RDMA packages pre-installed

Key New Features 🎉

New Modules 🧱

  • dependency manager module implementation for helm dependencies by @ighosh98 in #4298

Improvements 🛠

Deprecations 💤

Bug fixes 🐞

Full Changelog: v1.56.0...v1.57.0

Release v1.56.0

23 Jun 21:18
e67a073

Choose a tag to compare

What's Changed

Breaking changes 🚨

There was a schema change introduced for load_bq.py in v1.56.0

Improvements 🛠

Version Updates ⏫

  • Bump urllib3 from 2.3.0 to 2.5.0 in /community/front-end/ofe by @dependabot in #4296
  • Bump protobuf from 5.29.3 to 5.29.5 in /community/front-end/ofe by @dependabot in #4286
  • Bump requests from 2.32.3 to 2.32.4 in /community/front-end/ofe by @dependabot in #4285

Bug fixes 🐞

Full Changelog: v1.55.1...v1.56.0

v1.55.1 Hotfix: Reduce the severity of missed metadata fetches

17 Jun 19:05
51c51f2

Choose a tag to compare

This is a hotfix in order to reduce the severity of missed metadata fetches for new supported metadata fields in Slurm-GCP.

What's Changed

Full Changelog: v1.55.0...v1.55.1

Release v1.55.0

16 Jun 23:01
32e03a7

Choose a tag to compare

Highlights

  • New blueprint example that lets you create a high-throughput execution environment for Google Deepmind's AlphaFold 3
  • Updated A3-Ultra GCSFuse example blueprint to align with best practices

What's Changed

Key New Features 🎉

  • AlphaFold 3 High Throughput Solution (af3-slurm) by @fschuerm in #4231

Improvements 🛠

Version Updates ⏫

  • Bump django from 5.1.9 to 5.1.10 in /community/front-end/ofe by @dependabot in #4248

Bug fixes 🐞

  • Kueue Config Integration Tests incorporating different Accelerator types for different machines by @ishitachail in #4252

Full Changelog: v1.54.0...v1.55.0

Release v1.54.0

10 Jun 12:12
07fdb16

Choose a tag to compare

Highlights

  • The Managed Lustre support for non-default ports with GKE compatibility has been added. Improvement to speed up GKE cluster deployment. Further, A3 High network blocking script has been implemented as a startup-script feature.

What's Changed

Module Improvements 🔨

  • Add Managed Lustre support for non-default ports (GKE compatibility) by @tpdownes in #4210
  • Implement A3 High network blocking script as startup-script feature by @tpdownes in #4233

Improvements 🛠

Full Changelog: v1.53.0...v1.54.0

Release v1.53.0

05 Jun 17:16
48a7503

Choose a tag to compare

Highlights

  • The A3Mega Slurm solution now standardizes on Ubuntu: the Debian-based custom Slurm image has been deprecated and replaced with a custom Ubuntu Slurm image. Correspondingly, the A3M Slurm Ubuntu solutions have been refactored into a single, consolidated blueprint.

What's Changed

Key New Features 🎉

Module Improvements 🔨

Improvements 🛠

Deprecations 💤

Version Updates ⏫

  • Upgrade Ansible to maximum allowed version on oldest supported OS distributions by @tpdownes in #4139

Bug fixes 🐞

  • Add recurse to condor spool directory by @aneo-ssam in #4178
  • Fixed the parser error in test-gke-a2-highgpu-kueue by @ishitachail in #4204
  • Fixing the missing comma between the mount_options config for gcs A3U and A4 by @raushan2016 in #4207
  • Cleanup GCS Fuse configurations and add required permissions for fio-job-template by @ighosh98 in #4214

New Contributors

Full Changelog: v1.52.0...v1.53.0

Release v1.52.0

22 May 21:19
70b458c

Choose a tag to compare

What's Changed

Breaking Changes 🚨

Improvements 🛠

Bug fixes 🐞

New Contributors

Full Changelog: v1.51.1...v1.52.0

Address bug in updated NVIDIA package causing Slurm job failures

20 May 17:04
8b7aae6

Choose a tag to compare

What's Changed

Bug fixes 🐞

  • Block broken release of nvidia-container-toolkit by @tpdownes in #4152

Full Changelog: v1.51.0...v1.51.1