Releases: GoogleCloudPlatform/cluster-toolkit
Releases · GoogleCloudPlatform/cluster-toolkit
v1.62.2: Fix Calculate correct "assuredCount" for reservations
What's Changed
Bug fixes 🐞
- Hotfix: Calculate correct "assuredCount" for reservations by @abbas1902 in #4550
Full Changelog: v1.62.1...v1.62.2
Release v1.62.1
Adding license to j2 and ps1 files #4517
Full Changelog: v1.62.0...v1.62.1
Release v1.62.0
What's Changed
- Add A4X blueprints and gpu definition by @alyssa-sm in #4461
- Create a new community scheduler module for Slinky (Slurm on Kubernetes) by @ndebuhr in #3862
Improvements 🛠
- add gcluster-build-info to hcs image build by @RachaelSTamakloe in #4465
New Contributors
Full Changelog: v1.61.0...v1.62.0
Release v1.61.0
What's Changed
Module Improvements 🔨
- namespace for gke-modules by @PayalJakhar in #4451
Improvements 🛠
- Added capacity checks for reservations by @PayalJakhar in #4372
- Add optimized gcsfuse configurations to A3 Ultra and A4 blueprints. by @samskillman in #4441
Bug fixes 🐞
- Hold all nvidia software to the same version (fix to develop) by @samskillman in #4459
- Use setsid resume.py to reduce reconfigure time by @samskillman in #4436
Full Changelog: v1.60.0...v1.61.0
Release v1.60.0
What's Changed
Module Improvements 🔨
- [v2][Bugfix] Applying K8s manifests to GKE clusters via URL by @shubpal07 in #4352
Improvements 🛠
- Adding new configurations to support IMEX in slurm by @alyssa-sm in #4418
- Update GKE release channel for a2 high Kueue integ. tests by @shubpal07 in #4438
Bug fixes 🐞
- Fix build-service-image nvidia/kernel/lustre mismatch by @samskillman in #4454
- Hold all nvidia software to the same version by @samskillman in #4458
New Contributors
Full Changelog: v1.59.2...v1.60.0
v1.59.2: Fix Lustre kernel compatibility
What's Changed
Bug fixes 🐞
- Move base image forward, fixing lustre/nvidia/driver compat. by @samskillman in #4453
Full Changelog: v1.59.1...v1.59.2
Release v1.59.1
What's Changed
Version Updates ⏫
- Hotfix release. Update Slurm images to
6.9 > 6.10, Ubuntu20.04 > 22.04, Debian11 > 12by @mr0re1 in #4442
Full Changelog: v1.59.0...v1.59.1
Release v1.59.0
What's Changed
Module Improvements 🔨
- Adding ips_per_nat config for A family blueprints by @vikramvs-gg in #4366
Improvements 🛠
- Symlink
/var/lib/mysqlto "state disk" by @mr0re1 in #4374 - update chs commit by @RachaelSTamakloe in #4395
- Implement accelerator topology by @alyssa-sm in #4404
Version Updates ⏫
- gke-node-pool module to use "google" instead of "google-beta" provider by @kadupoornima in #4368
Bug fixes 🐞
- Fix additional disks for login nodes by @annuay-google in #4403
- Fix nvidia version mismatch for service images by @harshthakkar01 in #4413
New Contributors
- @sarthakag made their first contribution in #4406
- @rachit-google made their first contribution in #4408
Full Changelog: v1.58.1...v1.59.0
v1.58.1 Hotfix: Resolve a3u/a4h slurm nvidia version mismatch error
This is a hotfix to resolve the NVIDIA driver and library version mismatch error on a3-ultragpu and a4-highgpu Slurm clusters.
What's Changed
Bug fixes 🐞
- Resolve a3u/a4h slurm nvidia version mismatch error by @RachaelSTamakloe in #4409
Full Changelog: v1.58.0...v1.58.1
Release v1.58.0
Highlights
- Support for GKE H4D instances: A new blueprint has been added for deploying GKE clusters with H4D instances
- Deprecation of Parallelstore blueprints: The blueprints for deploying Parallelstore have been deprecated and have been removed.
What's Changed
Key New Features 🎉
- Add Kueue 0.12.2 and make it the new default by @mwysokin in #4312
- Add GKE H4D blueprint and integration test by @SwarnaBharathiMantena in #4396
Module Improvements 🔨
- [Bugfix] Applying K8s manifests to GKE clusters via URL by @shubpal07 in #4292
- Revert "[Bugfix] Applying K8s manifests to GKE clusters via URL" by @shubpal07 in #4350
Improvements 🛠
- Install CHS on A3m and Common image by @RachaelSTamakloe in #4334
- Remove IMEX and use default GPU driver in gke-a4x by @parulbajaj01 in #4367
- Implement async suspend.py by @alyssa-sm in #4363
- Eliminate code duplication and move chs download to shared.yaml by @RachaelSTamakloe in #4364
- Add support for Kueue/TAS in gke-a4x by @parulbajaj01 in #4375
Deprecations 💤
Bug fixes 🐞
- Remove Kueue topology annotation as DWS does not work with TAS (yet) by @SwarnaBharathiMantena in #4335
- Fix bug in tensorflow example 'text input must be of type str by @nick-stroud in #4391
New Contributors
- @PayalJakhar made their first contribution in #4353
- @jhpriy made their first contribution in #4361
- @agrawalkhushi18 made their first contribution in #4358
Full Changelog: v1.57.2...v1.58.0