Release v1.50.0
·
1636 commits
to release-candidate
since this release
Highlights
- New blueprints for Managed Lustre attached to VMs and to Slurm clusters (including opt-in solution for A3 Ultra and A4 Slurm blueprints)
- Breaking change: RoCE (RDMA) networks no longer support firewall rules. Older blueprints will fail with a validation warning; the solution is to remove the firewall rules following the examples in 312d7fb.
What's Changed
Key New Features 🎉
- Move from auth/munge to auth/slurm by @harshthakkar01 in #3955
- deprecate kueue v0.11.1 and use v0.11.4 by @ighosh98 in #4026
New Modules 🧱
- Cluster Toolkit - new module for creating Artifact Registries by @scott-nag in #3639
Module Improvements 🔨
- Enable specification of system node pool zones in the GKE Cluster module by @ndebuhr in #3976
- Add support for optional GCS bucket module config by @mohitchaurasia91 in #3990
- fix(htcondor): explicitly set region for cm and ap addresses to match subnetwork region by @rbekhtaoui in #3991
- Remove the broken auto_delete_disk system, and replace it with a working snapshot-based alternative, in the NFS Server module by @ndebuhr in #3887
- add non-queue flex-start support in gke by @chengcongdu in #3995
- Extend slurm_conf_tpl to support raw content by @gkcalat in #4010
- Adding GKE support for Managed Lustre by @cdunbar13 in #4022
Improvements 🛠
- Add a simple XPK blueprint example by @ndebuhr in #3980
- Add unique name for resource policy by @parulbajaj01 in #4002
- update kueue configurations and reservations in a3 mega, ultra and A4 by @ighosh98 in #4017
- Remove k8s service account var from gke-a3U blueprint by @parulbajaj01 in #4024
- disable unattended upgrades in a3u and a4h slurm solutions by @RachaelSTamakloe in #4006
- Add improved MIG lifecycle management for flex by @abbas1902 in #4015
- Add dws flex and spot provisioning options to the A4 example by @abbas1902 in #3945
Deprecations 💤
- Removal of the omnia install module and related content by @cdunbar13 in #4021
Version Updates ⏫
- Update a3ultra to 570 and cuda 12-8 by @samskillman in #3859
Bug fixes 🐞
- Address shelve permissions by @casassg in #3951
- Fixing the kernel upgrade flag for slurm image creation by @cdunbar13 in #4005
- Missed setting that breaks integration test by @cdunbar13 in #4029
- Fix
placement_max_distancein slurm partitions by @cdunbar13 in #4030 - A3 Ultra Slurm: workaround temporary driver packaging issue by @tpdownes in #4059
New Contributors
- @casassg made their first contribution in #3951
- @gkcalat made their first contribution in #4019
- @rick154 made their first contribution in #4046
Full Changelog: v1.49.1...v1.50