-
Notifications
You must be signed in to change notification settings - Fork 17
Description
Hi all,
I am trying to run AWICM3-v3.3 on NHR@ZIB(formerly HLRN).
My branch is similar to 'origin/feat/awicm3_v3.3_and_v3.4_and_geomar'. We added cdo and nco modules, changed partition. We also modified configs/components/xios/xios.yaml file, since the names of the HDF5 libraries are different on NHR@ZIB.
My problem is that when I use 'taskset : True' as a setting, I get an error below;
Contents of hostfile_srun:
0-959 ./prog_fesom.sh
960-1151 ./prog_oifs.sh
1152-1152 ./prog_rnfmap.sh
1153-1156 ./prog_xios.sh
Submitting jobscript to batch system...
Output written by slurm:
cd /scratch/usr/hbkufuko/runtime/awicm3-v3.3//taskset-true/run_20000102-20000102/scripts/; sbatch taskset-true_compute_20000102-20000102.run
sbatch: error: --nodes is incompatible with --distribution=arbitrary
Exiting entire Python process!
If I manually submit the job (below) model is restarting without a problem.
sbatch run../scripts/...run
With the recommendation of @JanStreffing I tried 'taskset : False'. (Also there is another discussion one year ago; #1148)
However, the job never starts with a hetjob. Start time always iterates. When I kill the job and change the setting to True again, the job starts.
I have attached runscript and blogin yaml files. Let me know if something else is needed, please.
Kind regards,
Ufuk