v1.1.0: Google Cloud Batch, Slurm V5, Jumbo Frames, and Advanced Networking in Slurm V4
Key New Features
- Google Cloud Batch support: read more.
- Slurm V5 support & example blueprint.
- Slurm V4 partitions now support advanced networking features such as gVNIC adapters and high egress (Tier 1) bandwidth.
- Slurm V4 partitions now support placement groups for all Compute Engine machine families that support them (A2, C2, C2D, N2, N2D).
- VPC module supports jumbo frames for higher bandwidth and lower latency performance.
New Resources
schedmd-slurm-gcp-v5-partition: Creates a partition to be used by a slurm-controller.schedmd-slurm-gcp-v5-controller: Creates a Slurm controller node using slurm-gcp.schedmd-slurm-gcp-v5-login: Creates a Slurm login node using slurm-gcp.cloud-batch-job: Creates a Google Cloud Batch job template that works with other Toolkit modules.cloud-batch-login-node: Creates a VM that can be used for submission of Google Cloud Batch jobs.htcondor-configure: Creates Toolkit runners and service accounts to configure an HTCondor pool.htcondor-install: Creates a startup script to install HTCondor and exports a list of required APIs.
Version updates
github.com/hashicorp/go-getter: from 1.5.11 to 1.6.1github.com/SchedMD/slurm-gcp//tf/modules/controller/: from 4.1.8 to 4.2
What's Changed
- Add external IP output to vm-instance module by @tpdownes in #353
- Default to not disabling services upon destroy by @tpdownes in #351
- Support extra args for ansible playbooks by @tpdownes in #352
- Bump github.com/hashicorp/go-getter from 1.5.11 to 1.6.1 by @tpdownes in #350
- Create dependabot configuration file by @tpdownes in #354
- Add support for Slurm to
usethestartup_scriptmodule by @nick-stroud in #349 - Adopt Slurm v4.2.0 module by @tpdownes in #356
- Upgrade to yaml.v3 by @nick-stroud in #347
- Improve Packer module by @tpdownes in #355
- Update VPC module to support setting MTU by @tpdownes in #363
- Add HTCondor Install module by @tpdownes in #359
- Add HTCondor Configure module by @tpdownes in #360
- Reliably detect when nodes fail to be scaled in by @tpdownes in #364
- Fix rare failure modes of monitoring test by @tpdownes in #366
- Improve detection of Slurm startup by @tpdownes in #367
- Install compatible protobuf for older Python by @tpdownes in #370
- Add security setting for go-getter by @mittz in #371
- Add headers to quota sections in README for linking by @nick-stroud in #369
- Add HTCondor Pool blueprint (experimental) by @tpdownes in #361
- Improve Slurm partition module documentation by @tpdownes in #372
- Adopt Google Private Access by default by @tpdownes in #373
- Add integration test for HTCondor by @tpdownes in #362
- Patch omnia-install to continue working with 1.0 by @heyealex in #374
- Update spack resource environments and flags by @douglasjacobsen in #346
- Add variable for slurm UID in omnia-install by @heyealex in #375
- Add provider_meta to htcondor-configure module by @tpdownes in #379
- Extend periodic cleanup to reset Filestore API by @tpdownes in #380
- Add slurm-gcp v5 controller module by @heyealex in #378
- Fix Cloud Build Filestore cleanup by @tpdownes in #383
- Update minimum Packer release by @tpdownes in #384
- Modules/slurm gcp v5 partition by @heyealex in #381
- fix: install_nfs_client_runner was using 'content' instead of 'source' by @nick-stroud in #387
- Maintenance of VPC module by @tpdownes in #386
- Address bug in Shared VPC Filestore blueprint by @tpdownes in #389
- Add slurm-gcp v5 login node module by @heyealex in #388
- Add support for Cloud Batch by @nick-stroud in #394
- Rename documentation to reference Google Cloud Batch by @nick-stroud in #397
- Add community example using slurm-gcp v5 modules by @heyealex in #393
- Update to version v1.1.0 by @nick-stroud in #398
- Release v1.1.0 by @nick-stroud in #396
Full Changelog: v1.0.0...v1.1.0