Skip to content

v0.24.0

Latest

Choose a tag to compare

@github-actions github-actions released this 09 Sep 07:29

HyperQueue 0.24.0

Breaking changes

  • The --no-detect-resources flag of the hq worker start command has been removed.
    You can now configure automatically detected resources in a granular way using the new
    --detect-resources flag (see below). --no-detect-resources corresponds to --detect-resources=none.

New features

  • New policy tight (and tight!) that is the original implementation of compact.
    It selects minimal number of resources groups and then tries to get maximum resources from
    a biggest group and then maximum resources from the second biggest group, etc.
    The policy compact now behaves as is described in the section "Changes".
  • Resource policy compact! is now allowed to take fractional resource request.
  • New command hq alloc cat <alloc-id> <stdout/stderr>, which can be used
    to debug the output of allocations submitted by the automatic allocator.
  • New command hq server wait that repeatedly tries to connect to a server with a configurable timeout.
    This is useful for deployment scripts that need to wait for server availability.
  • New hq alloc add parameter called --wrap-worker-cmd. It can be used to start
    workers on allocated nodes using some wrapping mechanism (e.g. Podman).
  • New flag --detect-resources for hq worker start. It can be used to configure which worker resources
    will be automatically detected. You can e.g. say --detect-resources=cpus,gpus/nvidia.
    See documentation for more information.
  • The scheduler has better compacting behavior when there are small number of tasks
    and workers appearing/disappering
  • Autoallocator keeps log file when probing allocation fails.
  • Unstable: Resource "coupling".
    You may specify that some resources are coupled, e.g. cpus and gpus.
    That means that cpus are gpus are organized in numa nodes, and allocation strategy
    will respect that, i.e., it tries to find cpus and gpus from the same numa nodes.
    Note: The current implementation does not detect coupling automatically,
    you have to specify it manually.

Changes

  • Allocation policy compact was updated.
    It still tries to find the minimal number of the resource groups,
    but when they are found, resources are evenly taken from the selected groups.
    In rare cases when you need original behavior, use new policy tight.
    It is not a breaking change, because the compact previously did not specified how
    exactly will be resources taken from groups.
  • Worker process terminated because of idle timeout now returns zero exit code.

Fixes

  • Fixed the issue of possible ignoring idle timeout when time request is used.
  • Fixes broken streaming when job file is used.
  • Fixed missing fields in export of journal into JSON

Artifact summary:

  • hq-v0.24.0-*: Main HyperQueue build containing the hq binary. Download this archive to
    use HyperQueue from the command line
    .
  • hyperqueue-0.24.0-*: Wheel containing the hyperqueue package with HyperQueue Python
    bindings.