Skip to content

[Slurm] Alternatives for job completion monitoring #25

@sylvlecl

Description

@sylvlecl
  • Do you want to request a feature or report a bug?

Feature

  • What is the current behavior?

In order to monitor the completion of jobs submitted to Slurm, we use files and filesystem polling.
Depending on the polling frequency, this introduces some performance cost (delay between the end of the task and the time when the computation manager identifies it as completed), and some load on the underlying filesystem, in particular when multiple processes using a computation manager are running.

  • What is the expected behavior?

We could be able to configure the way the completion monitoring is performed.
Polling will be one implementation of this functionality.

Other interesting implementations would be :

  1. A very simple in house networking protocol, for example implemented with netty.
  2. Using a message broker (kafka, rabbitmq ...) : this should probably be left for implementation by client projects
  • What is the motivation / use case for changing the behavior?

Improving perceived performances while relieving the filesystem.

  • Please tell us about your environment:
    • powsybl-hpc version: 2.7.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions