@pgierz @mandresm
When launching AWI-CM1 simulations with ESM-Tools on Levante, runs fail if submitted from a JupyterHub terminal. The same simulations work without problems when launched from a regular terminal (e.g. Ubuntu shell / SSH).
Error log excerpt:
srun: fatal: SLURM_MEM_PER_CPU, SLURM_MEM_PER_GPU, and SLURM_MEM_PER_NODE are mutually exclusive.
This probably happens because JupyterHub itself requests memory (e.g. via --mem), which sets SLURM variables like SLURM_MEM_PER_NODE. When the simulation is launched from within that environment, additional SLURM memory options cause a mutual exclusivity conflict.
Workarounds / Solutions:
Best practice (recommended):
Do not launch simulations via JupyterHub. Instead, use a regular login terminal (SSH / Ubuntu shell).
Alternative fix:
Use the --purge-slurm-env option to clear inherited SLURM variables in the run .yaml file:
computer:
additional_flags: "--purge-slurm-env"
Additional Notes:
This issue can easily be reproduced:
Run the same experiment from Ubuntu terminal -> works.
Run the same experiment from a JupyterHub terminal -> fails with the error above.