Skip to content

Commit 5cbe5ed

Browse files
feat(job-manager): add Kueue scheduling option for user workloads (#492)
Introduces Kueue as an alternative way to submit user jobs. Kueue will schedule Kubernetes jobs representing user runtime batch and job workloads. Co-authored-by: Xavier Tintin <[email protected]>. Closes reanahub/reana#800
1 parent 112afe5 commit 5cbe5ed

File tree

3 files changed

+15
-0
lines changed

3 files changed

+15
-0
lines changed

AUTHORS.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,3 +25,4 @@ The list of contributors in alphabetical order:
2525
- [Sinclert Perez](https://www.linkedin.com/in/sinclert)
2626
- [Tibor Simko](https://orcid.org/0000-0001-7202-5803)
2727
- [Vladyslav Moisieienkov](https://orcid.org/0000-0001-9717-0775)
28+
- [Xavier Tintin](https://orcid.org/0000-0002-3150-9112)

reana_job_controller/config.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -177,6 +177,12 @@
177177
SLURM_SSH_AUTH_TIMEOUT = float(os.getenv("SLURM_SSH_AUTH_TIMEOUT", "60"))
178178
"""Seconds to wait for SLURM SSH authentication response."""
179179

180+
USE_KUEUE = bool(strtobool(os.getenv("USE_KUEUE", "False")))
181+
"""Whether to use Kueue to manage job execution."""
182+
183+
KUEUE_LOCAL_QUEUE_NAME = "local-queue-job"
184+
"""Name of the local queue to be used by Kueue."""
185+
180186
REANA_USER_ID = os.getenv("REANA_USER_ID")
181187
"""User UUID of the owner of the workflow."""
182188

reana_job_controller/kubernetes_job_manager.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,8 @@
5252
REANA_KUBERNETES_JOBS_MEMORY_LIMIT,
5353
REANA_KUBERNETES_JOBS_MAX_USER_MEMORY_LIMIT,
5454
REANA_USER_ID,
55+
USE_KUEUE,
56+
KUEUE_LOCAL_QUEUE_NAME,
5557
)
5658
from reana_job_controller.errors import ComputingBackendSubmissionError
5759
from reana_job_controller.job_manager import JobManager
@@ -155,12 +157,18 @@ def secrets(self):
155157
def execute(self):
156158
"""Execute a job in Kubernetes."""
157159
backend_job_id = build_unique_component_name("run-job")
160+
158161
self.job = {
159162
"kind": "Job",
160163
"apiVersion": "batch/v1",
161164
"metadata": {
162165
"name": backend_job_id,
163166
"namespace": REANA_RUNTIME_KUBERNETES_NAMESPACE,
167+
"labels": (
168+
{"kueue.x-k8s.io/queue-name": KUEUE_LOCAL_QUEUE_NAME}
169+
if USE_KUEUE
170+
else {}
171+
),
164172
},
165173
"spec": {
166174
"backoffLimit": KubernetesJobManager.MAX_NUM_JOB_RESTARTS,

0 commit comments

Comments
 (0)