Releases: It4innovations/hyperqueue
Releases · It4innovations/hyperqueue
Nightly build 2026-01-13
HyperQueue dev
Artifact summary:
- hq-vdev-*: Main HyperQueue build containing the
hqbinary. Download this archive to
use HyperQueue from the command line. - hyperqueue-dev-*: Wheel containing the
hyperqueuepackage with HyperQueue Python
bindings.
v0.24.0
HyperQueue 0.24.0
Breaking changes
- The
--no-detect-resourcesflag of thehq worker startcommand has been removed.
You can now configure automatically detected resources in a granular way using the new
--detect-resourcesflag (see below).--no-detect-resourcescorresponds to--detect-resources=none.
New features
- New policy
tight(andtight!) that is the original implementation ofcompact.
It selects minimal number of resources groups and then tries to get maximum resources from
a biggest group and then maximum resources from the second biggest group, etc.
The policycompactnow behaves as is described in the section "Changes". - Resource policy
compact!is now allowed to take fractional resource request. - New command
hq alloc cat <alloc-id> <stdout/stderr>, which can be used
to debug the output of allocations submitted by the automatic allocator. - New command
hq server waitthat repeatedly tries to connect to a server with a configurable timeout.
This is useful for deployment scripts that need to wait for server availability. - New
hq alloc addparameter called--wrap-worker-cmd. It can be used to start
workers on allocated nodes using some wrapping mechanism (e.g. Podman). - New flag
--detect-resourcesforhq worker start. It can be used to configure which worker resources
will be automatically detected. You can e.g. say--detect-resources=cpus,gpus/nvidia.
See documentation for more information. - The scheduler has better compacting behavior when there are small number of tasks
and workers appearing/disappering - Autoallocator keeps log file when probing allocation fails.
- Unstable: Resource "coupling".
You may specify that some resources are coupled, e.g. cpus and gpus.
That means that cpus are gpus are organized in numa nodes, and allocation strategy
will respect that, i.e., it tries to find cpus and gpus from the same numa nodes.
Note: The current implementation does not detect coupling automatically,
you have to specify it manually.
Changes
- Allocation policy
compactwas updated.
It still tries to find the minimal number of the resource groups,
but when they are found, resources are evenly taken from the selected groups.
In rare cases when you need original behavior, use new policytight.
It is not a breaking change, because thecompactpreviously did not specified how
exactly will be resources taken from groups. - Worker process terminated because of idle timeout now returns zero exit code.
Fixes
- Fixed the issue of possible ignoring idle timeout when time request is used.
- Fixes broken streaming when job file is used.
- Fixed missing fields in export of journal into JSON
Artifact summary:
- hq-v0.24.0-*: Main HyperQueue build containing the
hqbinary. Download this archive to
use HyperQueue from the command line. - hyperqueue-0.24.0-*: Wheel containing the
hyperqueuepackage with HyperQueue Python
bindings.
v0.24.0-rc1
HyperQueue 0.24.0-rc1
Breaking changes
- The
--no-detect-resourcesflag of thehq worker startcommand has been removed.
You can now configure automatically detected resources in a granular way using the new
--detect-resourcesflag (see below).--no-detect-resourcescorresponds to--detect-resources=none.
New features
- New policy
tight(andtight!) that is the original implementation ofcompact.
It selects minimal number of resources groups and then tries to get maximum resources from
a biggest group and then maximum resources from the second biggest group, etc.
The policycompactnow behaves as is described in the section "Changes". - Resource policy
compact!is now allowed to take fractional resource request. - New command
hq alloc cat <alloc-id> <stdout/stderr>, which can be used
to debug the output of allocations submitted by the automatic allocator. - New command
hq server waitthat repeatedly tries to connect to a server with a configurable timeout.
This is useful for deployment scripts that need to wait for server availability. - New
hq alloc addparameter called--wrap-worker-cmd. It can be used to start
workers on allocated nodes using some wrapping mechanism (e.g. Podman). - New flag
--detect-resourcesforhq worker start. It can be used to configure which worker resources
will be automatically detected. You can e.g. say--detect-resources=cpus,gpus/nvidia.
See documentation for more information. - The scheduler has better compacting behavior when there are small number of tasks
and workers appearing/disappering - Autoallocator keeps log file when probing allocation fails.
- Unstable: Resource "coupling".
You may specify that some resources are coupled, e.g. cpus and gpus.
That means that cpus are gpus are organized in numa nodes, and allocation strategy
will respect that, i.e., it tries to find cpus and gpus from the same numa nodes.
Note: The current implementation does not detect coupling automatically,
you have to specify it manually.
Changes
- Allocation policy
compactwas updated.
It still tries to find the minimal number of the resource groups,
but when they are found, resources are evenly taken from the selected groups.
In rare cases when you need original behavior, use new policytight.
It is not a breaking change, because thecompactpreviously did not specified how
exactly will be resources taken from groups. - Worker process terminated because of idle timeout now returns zero exit code.
Fixes
- Fixed the issue of possible ignoring idle timeout when time request is used.
- Fixes broken streaming when job file is used.
Artifact summary:
- hq-v0.24.0-rc1-*: Main HyperQueue build containing the
hqbinary. Download this archive to
use HyperQueue from the command line. - hyperqueue-0.24.0-rc1-*: Wheel containing the
hyperqueuepackage with HyperQueue Python
bindings.
v0.23.0
HyperQueue 0.23.0
Breaking change
- In
--crash-limitvalue 0 is no longer allowed, use--crash-limit=unlimited. - The
--workers-per-allocflag of thehq alloc addcommand has been replaced with--max-workers-per-alloc,
which determines the maximum number of workers to spawn in each allocation. Previously the flag caused the
allocator to (almost) always spawn the determined number of workers per allocation, regardless of actual
computational load.
Changes
The automatic allocator has been finally reimplemented, and is now much better:
- It now uses information from the scheduler to determine how many allocations to spawn, and thus it can react to the
current computational load much more accurately. It should also be less "eager". - It properly supports multi-node tasks.
- It considers computational load across all allocation queues (before, each queue was treated separately, which led to
creating too many submissions). - It now exposes a
min-utilizationparameter, which can be used to avoid spawning an allocation that couldn't be utilized
enough.
As this is a large behavioral change, we would be happy to hear your feedback!
New features
- New command
hq task explain <job_id> <task_id>explains why a task cannot be run on a given worker. - The server scheduler now slightly prioritizes tasks from older jobs and finishing partially-computed task graphs
- New values for
--crash-limit:never-restart- task is never restarted, even if it "crashes" on a worker that was explicitly terminated.unlimited- unlimited crash limit
hq worker infocontains more informationhq job forgettries to free more memory- You can now configure Job name in the Python API.
hq job progressnow displays all jobs and tasks that you wait for, rather than those that were unfinished at the
time when the command was executed.
Fixes
- Fixed a problem with journal loading when task dependencies are used
- Fixed restoring crash counters and instance ids from journal
- Fixed some corner cases of load balancing in server scheduler
Docs
- CLI documentation (when
--helpis used) was cleaned up and improved - Our documentation now contains an automatically generated reference of all available HQ CLI commands and options
- The
hq docandhq generate-completioncommands have been documented
Experimental
- Added direct data transfers between tasks. User API not stabilized
Artifact summary:
- hq-v0.23.0-*: Main HyperQueue build containing the
hqbinary. Download this archive to
use HyperQueue from the command line. - hyperqueue-0.23.0-*: Wheel containing the
hyperqueuepackage with HyperQueue Python
bindings.
v0.23.0-rc1
HyperQueue 0.23.0-rc1
Breaking change
- In
--crash-limitvalue 0 is no longer allowed, use--crash-limit=unlimited. - The
--workers-per-allocflag of thehq alloc addcommand has been replaced with--max-workers-per-alloc,
which determines the maximum number of workers to spawn in each allocation. Previously the flag caused the
allocator to (almost) always spawn the determined number of workers per allocation, regardless of actual
computational load.
Changes
The automatic allocator has been finally reimplemented, and is now much better:
- It now uses information from the scheduler to determine how many allocations to spawn, and thus it can react to the
current computational load much more accurately. It should also be less "eager". - It properly supports multi-node tasks.
- It considers computational load across all allocation queues (before, each queue was treated separately, which led to
creating too many submissions). - It now exposes a
min-utilizationparameter, which can be used to avoid spawning an allocation that couldn't be utilized
enough.
As this is a large behavioral change, we would be happy to hear your feedback!
New features
- New command
hq task explain <job_id> <task_id>explains why a task cannot be run on a given worker. - The server scheduler now slightly prioritizes tasks from older jobs and finishing partially-computed task graphs
- New values for
--crash-limit:never-restart- task is never restarted, even if it "crashes" on a worker that was explicitly terminated.unlimited- unlimited crash limit
hq worker infocontains more informationhq job forgettries to free more memory- You can now configure Job name in the Python API.
hq job progressnow displays all jobs and tasks that you wait for, rather than those that were unfinished at the
time when the command was executed.
Fixes
- Fixed a problem with journal loading when task dependencies are used
- Fixed restoring crash counters and instance ids from journal
- Fixed some corner cases of load balancing in server scheduler
Docs
- CLI documentation (when
--helpis used) was cleaned up and improved - Our documentation now contains an automatically generated reference of all available HQ CLI commands and options
- The
hq docandhq generate-completioncommands have been documented
Experimental
- Added direct data transfers between tasks. User API not stabilized
Artifact summary:
- hq-v0.23.0-rc1-*: Main HyperQueue build containing the
hqbinary. Download this archive to
use HyperQueue from the command line. - hyperqueue-0.23.0-rc1-*: Wheel containing the
hyperqueuepackage with HyperQueue Python
bindings.
v0.22.0
HyperQueue 0.22.0
New features
- Added
hq worker deploy-sshto deploy workers to a set of nodes using SSH. - Added
hq doccommand for accessing documentation about various HQ features from the command-line. hq journal replayadded. It similar tohq journal streambut it will not wait for new events.- More robust initialization of dashboard
- Authentication and encryption of client/worker connection can be disabled. It is mostly for testing
and benchmarking purpose. Do not use if you are not in 100% safe environment.
Breaking change
- The Python API now requires Python 3.9, up from Python 3.6.
Fixes
- Fixes #848, inefficient scheduling of tasks with priorities
- HyperQueue will no longer allocate extreme amounts of memory when loading a corrupted journal
Artifact summary:
- hq-v0.22.0-*: Main HyperQueue build containing the
hqbinary. Download this archive to
use HyperQueue from the command line. - hyperqueue-0.22.0-*: Wheel containing the
hyperqueuepackage with HyperQueue Python
bindings.
v0.22.0-rc1
HyperQueue 0.22.0-rc1
New features
- Added
hq worker deploy-sshto deploy workers to a set of nodes using SSH. - Added
hq doccommand for accessing documentation about various HQ features from the command-line. hq journal replayadded. It similar tohq journal streambut it will not wait for new events.- More robust initialization of dashboard
- Authentication and encryption of client/worker connection can be disabled. It is mostly for testing
and benchmarking purpose. Do not use if you are not in 100% safe environment.
Breaking change
- The Python API now requires Python 3.9, up from Python 3.6.
Fixes
- Fixes #848, inefficient scheduling of tasks with priorities
- HyperQueue will no longer allocate extreme amounts of memory when loading a corrupted journal
Artifact summary:
- hq-v0.22.0-rc1-*: Main HyperQueue build containing the
hqbinary. Download this archive to
use HyperQueue from the command line. - hyperqueue-0.22.0-rc1-*: Wheel containing the
hyperqueuepackage with HyperQueue Python
bindings.
v0.21.1
HyperQueue 0.21.1
Fixes
- Fixes random task crashes. Details in #823.
Artifact summary:
- hq-v0.21.1-*: Main HyperQueue build containing the
hqbinary. Download this archive to
use HyperQueue from the command line. - hyperqueue-0.21.1-*: Wheel containing the
hyperqueuepackage with HyperQueue Python
bindings.
v0.21.0
HyperQueue 0.21.0
Caution
This release contains a critical bug that can sometimes randomly kill tasks. Please use v0.21.1 instead.
Breaking change
- Pre-built HyperQueue releases available from our GitHub repository are now built with GLIBC
2.28, instead of2.17. If you need to run HyperQueue on a system with an older GLIBC version, you might need to recompile it from source on your system. If you encounter any issues, please let us know.
Changes
hq event-logcommand renamed tohq journalhq dashboardhas been re-enabled by default.
New features
- Added
hq journal prunefor pruning journal file. - Added
hq journal flushfor forcing server to flush the journal.
Artifact summary:
- hq-v0.21.0-*: Main HyperQueue build containing the
hqbinary. Download this archive to
use HyperQueue from the command line. - hyperqueue-0.21.0-*: Wheel containing the
hyperqueuepackage with HyperQueue Python
bindings.
v0.21.0-rc1
HyperQueue 0.21.0-rc1
Breaking change
- Pre-built HyperQueue releases available from our GitHub repository are now built with GLIBC
2.28, instead of2.17. If you need to run HyperQueue on a system with an older GLIBC version, you might need to recompile it from source on your system. If you encounter any issues, please let us know.
Changes
hq event-logcommand renamed tohq journalhq dashboardhas been re-enabled by default.
New features
- Added
hq journal prunefor pruning journal file. - Added
hq journal flushfor forcing server to flush the journal.
Artifact summary:
- hq-v0.21.0-rc1-*: Main HyperQueue build containing the
hqbinary. Download this archive to
use HyperQueue from the command line. - hyperqueue-0.21.0-rc1-*: Wheel containing the
hyperqueuepackage with HyperQueue Python
bindings.