Skip to content

AWI-CM1 runs fail on Levante when launched via JupyterHub due to conflicting SLURM memory environment variables #1387

@Pokotiwha

Description

@Pokotiwha

@pgierz @mandresm

When launching AWI-CM1 simulations with ESM-Tools on Levante, runs fail if submitted from a JupyterHub terminal. The same simulations work without problems when launched from a regular terminal (e.g. Ubuntu shell / SSH).

Error log excerpt:
srun: fatal: SLURM_MEM_PER_CPU, SLURM_MEM_PER_GPU, and SLURM_MEM_PER_NODE are mutually exclusive.

This probably happens because JupyterHub itself requests memory (e.g. via --mem), which sets SLURM variables like SLURM_MEM_PER_NODE. When the simulation is launched from within that environment, additional SLURM memory options cause a mutual exclusivity conflict.

Workarounds / Solutions:

Best practice (recommended):
Do not launch simulations via JupyterHub. Instead, use a regular login terminal (SSH / Ubuntu shell).

Alternative fix:
Use the --purge-slurm-env option to clear inherited SLURM variables in the run .yaml file:

computer:
    additional_flags: "--purge-slurm-env"

Additional Notes:

This issue can easily be reproduced:
Run the same experiment from Ubuntu terminal -> works.
Run the same experiment from a JupyterHub terminal -> fails with the error above.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions