Releases: PKUHPC/CraneSched
Releases · PKUHPC/CraneSched
Release v1.1.3
What's Changed
- feat: Add supplementary gids by @RileyWen in #476
- chore: Auto format file and commit. by @L-Xiafeng in #472
- feat: Add DefaultMemPerCPU and MaxMemPerCPU in ccontrol by @zhansan114514 in #478
- fix: Fix SIGPIPE handler by @Nativu5 in #486
- chore: Fix CI for pull requests from forked repo by @L-Xiafeng in #483
- chore: Add format for protos by @L-Xiafeng in #487
- fix: Fix default mem not initialized. by @L-Xiafeng in #489
- fix: Change the permissions of the system-generated job script to the submitting user by @zhansan114514 in #488
- Dev/reservation by @NamelessOIer in #351
- fix: delete user by @huerni in #491
- fix: Resv clean thread not joined in destructor. by @L-Xiafeng in #492
- feat: Add crane user for CraneCtld by @Nativu5 in #490
- fix: Fix permission error when using RPM by @Nativu5 in #494
- docs: Add English Readme by @L-Xiafeng in #493
- refactor: Refactor craned up-line logic. by @L-Xiafeng in #475
- Fix: check qos limit at the first time by @NamelessOIer in #459
- fix: Fix crun timeout. by @L-Xiafeng in #497
- Fix: Node select by @NamelessOIer in #500
- refactor: Change CgroupManager to stateless, add jobmanager. by @L-Xiafeng in #461
- chore: Fix clang format. by @L-Xiafeng in #502
- fix: Fix duplicated task status changes when err_before_exec is set by @Nativu5 in #504
- refactor: optimize efficiency by @NamelessOIer in #503
- fix: Fix error env returned. by @L-Xiafeng in #508
- fix: Set keepalive for Cforedclient. by @L-Xiafeng in #509
- fix: handle empty res_avail on node by @NamelessOIer in #510
- feat: Sync ctld and craned status in configuring state. by @L-Xiafeng in #496
- chore: Fix typo by @L-Xiafeng in #511
Release v1.1.2
What's Changed
- fix cacct priority val error by @1daidai1 in #414
- feat: Add macro to disable newer interface of libcgroup by @RileyWen in #432
- Add specific reasons for failed task submission by @1daidai1 in #375
- Fix: remove __ERR_CODE_COUNT by @NamelessOIer in #437
- Dev/partition allow accounts by @huerni in #404
- Dev/x11 refactor by @RileyWen in #421
- Dev/account multi refactor by @huerni in #426
- feat: Add node draining/resuming event by @1daidai1 in #436
- Fix cacct complete task allocCPU is 0 when ctld restart by @1daidai1 in #449
- Fix: fix cpu qos limit. by @NamelessOIer in #455
- chores: Upgrade to GCC 15 / Clang 20 by @Nativu5 in #453
- feat: Add Vault by @huerni in #456
- chore: Add clang-format check. by @L-Xiafeng in #464
- feat: Add cbatch --open-mode option by @1daidai1 in #460
- Add missing unset optional value handling. by @RileyWen in #468
New Contributors
- @wtr0504 made their first contribution in #327
- @huyongqii made their first contribution in #378
Release v1.1.1
What's Changed
- fix: Fix craned not using config file listen port. by @L-Xiafeng in #429
- fix: Fix CraneCtld not using configured craned listen port. by @L-Xiafeng in #430
- fix: Craned QueryTaskIdFromPid no send to correct async handle. by @L-Xiafeng in #431
Release v1.1.0
New Features:
- Add X11 forwarding
- Add modify the default user account
- User QoS Resource Limits
- Add multiple GID support
- Crun support pty (#362)
- Add a remote call for task completion for plugins running on craned (#378)
- Feat: Add ascend support and refactor (#333)
- Feat: Support cgroupv2 (#327)
- feat: Add CRANE_USE_SYSTEM_LIBCGROUP (#357)
- Feat: Add GitHub Action for Building and Packaging (#350)
Enhancements:
- Scheduling Algorithm Optimization
- optimize batch hold/release (#394)
- Refactor: Replace libevent with libuvw in Craned (#359)
- Separate Authentication and Modification in AccountManager (#321)
- Refactor: Detect proxy environment variable on daemon start (#358)
Bug Fixes:
- fix: Add nullptr checks in DestroyCgroupHookAsync call
- Bugfix/bpf compilation
- Fix race condition in setuid
- Fix libcgroup configuring and DEB packaging size (#409)
- Fix Skip Link-Local IPv6 Address (#400)
- Bugfix: Check permission when update node (#395)
- fix: Exit on node configured without any partition. (#379)
- fix: Fix sigsegv in cranectld (#368)
- Fix free a locked lock when release cgroup (#365)
- Bugfix: Fetch resource from total instead of available (#360)
- Bugfix: empty CRANE_PARTITION environment variable when task using default partition.
Release v1.0.0
Overview
This is the first GA (General Availability) release of CraneSched, and can be considered ready for production.
For CraneSched documentation, see CraneSched-document.
New Features
- Submit Batch Jobs via cbatch: Users submit the entire computational process script to the system for scheduling and execution via cbatch.
- Support specifying the resources required for the job, including memory, number of cores, parallel tasks per node, number of nodes needed, etc.
- Support specifying job execution parameters, including specifying/excluding certain compute nodes, specifying cluster partition type, QoS configuration, repeat execution count, timeout duration, environment variables, etc.
- Support specifying task output information, including task name, account and user associated with the task, email notification method, execution log, and error log redirection, etc.
- Submit Interactive Jobs via calloc and crun: After specifying task resources via command line, the task is launched on the compute node. calloc requires users to log into the compute node manually, while crun automatically connects to the compute node.
- Support specifying the resources required for the job, including memory, number of cores, parallel tasks per node, number of nodes needed, etc.
- Support specifying job execution parameters, including specifying/excluding certain compute nodes, specifying cluster partition type, QoS configuration, timeout duration, environment variables, etc.
- Support specifying task output information, including task name, associated account, log level, etc.
- Cancel Jobs via ccancel: Support unified job cancellation based on conditions such as submission account, submission username, task name, task ID, node, cluster partition, task status, etc.
- View Job Queue via cqueue: Support filtering query results based on conditions such as submission account, user, task name, task ID, cluster partition, QoS configuration, task status, etc.
- View Completed Job Queue via cacct: Support filtering query results based on conditions such as submission account, user, task name, task ID, cluster partition, QoS configuration, task status, execution time, submission time, end time, etc.
- View Node and Partition Status via cinfo: Support filtering query results based on node response status, partition of the node, node work status, etc.; support querying at fixed intervals.
- Dynamically View/Modify Node/Partition/Task Status via ccontrol: Support viewing detailed information of nodes/tasks/partitions; modifying job timeout, priority, etc.; modifying node status; pausing and resuming tasks, etc.
- Manage User and Account Information via cacctmgr: Support adding, deleting, modifying, and querying accounts/users/QoS/partitions; banning/unbanning users/accounts.
- Job Monitor Hook
- Support for Plugin Module
- Device Support
- Support for IPV6