Releases: NVIDIA/KAI-Scheduler
Releases · NVIDIA/KAI-Scheduler
v0.5.4
v0.4.10
v0.5.3
v0.4.9
What's Changed
- chrry-pick to v0.4: Changed to regular ubuntu docker image in tests (#123) by @enoodle in #188
- cherrypick v0.4 - Pre creating binding request, delete any pending status updates for t… by @davidLif in #186
- Don't add nodepol label for empty nodepool by @itsomri in #183
Full Changelog: v0.4.8...v0.4.9
v0.5.2
What's Changed
- docs: improve snapshot tool doc by @enoodle in #157
- [Refactor] Reclaimable api improvements by @itsomri in #148
- add pr coverage report by @enoodle in #154
- scheduler: Add LastStartTimestamp to PodGroup by @ArmedGuy in #153
- Add min-runtime configuration to queues by @ArmedGuy in #155
- chore: coverage update will open pr by @enoodle in #159
- chore: add missing token in action by @enoodle in #160
- Document GPU Sharing with MPS by @omer-dayan in #158
- fix coverage pr reports and badge generation by @enoodle in #166
- fix update coverage badge by @enoodle in #171
- Run scenario filters on the no potential victims scenario by @davidLif in #164
- Update PodGrouper docs to match the latest implementation by @romanbaron in #177
- Prep changelog for v0.5 version branch by @itsomri in #174
- Don't add nodepol label for empty nodepool by @itsomri in #182
Full Changelog: v0.5.1...v0.5.2
v0.5.1
What's Changed
- Scheduling gates by @itsomri in #122
- [Bugfix] Reclaim victim queue order by @itsomri in #62
- make nodeSelector, affinity and tolerations configurable by @gflatters in #127
- Replace run.ai string with kai.scheduler by @omer-dayan in #131
- Made scheduling queue label key configurable by @romanbaron in #129
- Cache GetDeservedShare and GetFairShare by @davidLif in #139
- refactor(binder): add cache to the resource reservation client by @enoodle in #144
- Remove requirement for Worker when using PyTorchJob by @Phlip79 in #149
- PodGroup info caching for some of the results by @davidLif in #138
- refactor(scheduler): patch pod labels concurrently by @enoodle in #147
- [Design] Minimum runtime before preemptions and reclaims by @ArmedGuy in #126
- scheduler: bugfix: Make pod_scenario_builder build scenarios for rest of elastic job by @ArmedGuy in #132
New Contributors
- @gflatters made their first contribution in #127
- @Phlip79 made their first contribution in #149
Full Changelog: v0.5.0...v0.5.1
v0.4.8
Fixed
- Queue order function now takes into account potential victims, resulting in better reclaim scenarios.
CHANGED
- Cached GetDeservedShare and GetFairShare function in the scheduler PodGroupInfo to improve performance.
- Added cache to the binder resource reservation client.
- More Caching and improvements to PodGroupInfo class.
- Update pod labels after scheduling decision concurrently in the background.
v0.5.0
What's Changed
- Fix typo in priority docs by @Sovietaced in #108
- Remove repeated active departments queue creation by @bgedik in #113
- Fix filename for stalegangeviction by @ArmedGuy in #115
- [Bugfix] Prep for queue comparison changes by @itsomri in #114
- Add/amend test logging to use .TestTopologyBasic.Name by @ArmedGuy in #116
- Made resource reservation parameters configurable by @romanbaron in #106
- Add Changelog to track changes by @itsomri in #117
- Changed to regular ubuntu docker image in tests by @romanbaron in #123
- Cluster autoscaler adjustment for GPU sharing pods by @romanbaron in #119
- added contributing, maintainer and owners files by @romanbaron in #74
New Contributors
- @Sovietaced made their first contribution in #108
- @bgedik made their first contribution in #113
- @ArmedGuy made their first contribution in #115
Full Changelog: v0.4.7...v0.5.0
v0.4.7
What's Changed
- fix: snapshot tool cache.Run call by @enoodle in #102
- Docs hotfix: Update and rename pytorch-elasitc.yaml to pytorch-elastic.yaml by @EkinKarabulut in #105
- Adding Issue Templates for bug & feature/enhancement requests by @EkinKarabulut in #103
- fix: gpu resource device count calculation by @enoodle in #107
Full Changelog: v0.4.6...v0.4.7
v0.4.6
What's Changed
- Initialize metrics namespace on scheduler run by @romanbaron in #100
Full Changelog: v0.4.5...v0.4.6