Releases: ROCm/spur
Releases · ROCm/spur
Spur nightly (20260622 · 38bffdb)
Automated nightly build from main at 2026-06-22 07:57 UTC.
Commit: 38bffdb
1 commit(s) since previous nightly.
Install
curl -fsSL https://raw.githubusercontent.com/ROCm/spur/main/install.sh | bash -s -- nightlyDocker
docker pull rocm/spur:nightlySpur v0.3.0
What's Changed
- CLAUDE.md: add don'ts for writing unit tests by @shiv-tyagi in #112
- Fix test t50_106: test real RawModeGuard, not simulation (#111) by @powderluv in #114
- GPU over-scheduling fix, cancel deallocation, K8s pending failure det… by @codetap-developer in #115
- Add secretEnv field to SpurJob CRD (#117) by @powderluv in #118
- Add auto-update: startup check + CLI self-update by @powderluv in #119
- fix(spurd): deadlock in exec_in_job due to double mutex acquisition by @shiv-tyagi in #120
- fix(spurctld): propagate Raft propose errors to RPC callers by @shiv-tyagi in #121
- fix(spurctld): refactor submit_array_job to avoid deadlock by @shiv-tyagi in #122
- chore: sync Cargo.lock after spur-update crate addition by @shiv-tyagi in #123
- fix(spurctld): return error from snapshot_state instead of swallowing serde failures by @shiv-tyagi in #124
- fix(spur-cli): group sinfo rows by node state within partition by @shiv-tyagi in #127
- Wire spurctld accounting notifications to spurdbd by @shiv-tyagi in #129
- fix(spurd): drop privilege inside unshare wrapper, not before exec by @powderluv in #130
- chore: bump workspace version to 0.3.0 by @powderluv in #131
New Contributors
- @codetap-developer made their first contribution in #115
Full Changelog: v0.2.2...v0.3.0
Spur v0.2.2
Changes since v0.2.1
- #107/#108: Set supplementary groups (video, render) for non-root job processes — fixes GPU device access when running as non-root
- #109: Replace GetNode heartbeat hack with proper Heartbeat RPC — cleaner separation of read/write paths, real telemetry in heartbeats
- #110: Sync CLAUDE.md and README.md with current codebase
Spur v0.2.1
Hotfix: fix namespace wrapper mount ordering (PR #106). Removes private /tmp mount that broke job output file visibility.
Spur v0.2.0
Spur v0.2.0
Major release with 99 commits since v0.1.0. Highlights:
Raft-based HA (PR #84, #93)
- Replaced K8s Lease leader election with OpenRaft consensus
- Works on both bare-metal and Kubernetes — unified HA
- Config:
controller.peers = ["node1:6817", "node2:6817"]
Topology-aware scheduling (PR #80)
topology/tree— fat-tree fabric-aware scheduling, minimize switch hopstopology/block— rack-level job co-location- Config:
[topology]section with switch hierarchy
Bare-metal job isolation (PRs #100, #102, #103, #104, #105)
- UID/GID enforcement via setuid (jobs run as submitting user)
- PID + mount namespace isolation (private /tmp, /dev/shm, process view)
- GPU device restriction via selective bind-mount
- seccomp-BPF syscall whitelist (~150 syscalls, blocks ptrace/mount/bpf)
- Landlock LSM filesystem access control (kernel 5.13+)
[isolation]config section for operator control- Fork bomb protection (pids.max) and OOM isolation (memory.oom.group)
K8s operator improvements (PRs #85, #93, #98)
- hostNetwork, privileged, hostIPC, shmSize fields now applied to pods
- Extra device plugin resources (RDMA, InfiniBand)
- Cross-namespace SpurJob support
- Node state machine with auto-recovery from heartbeat timeout
Scheduler fixes (PR #94)
- Fixed misleading PendingReason::Priority on new jobs
- Dispatch failures now requeue instead of marking Failed
- salloc timeout + pending reason display
Binaries
spur-v0.2.0-linux-x86_64.tar.gz— all binaries (dynamically linked, glibc 2.39+)spurd-static— statically linked agent (musl, works on Ubuntu 22.04+)
Spur v0.1.0
What's Changed
- Users/powderluv/k8s operator and bare metal deploy by @powderluv in #1
- Multi-node Slurm feature tests + nodelist/exclude/time-limit fixes by @powderluv in #2
- Add release workflow and curl-pipe-bash installer by @powderluv in #3
New Contributors
- @powderluv made their first contribution in #1
Full Changelog: https://github.com/powderluv/spur/commits/v0.1.0