@@ -296,7 +296,9 @@ program heap will grow to during execution:
296
296
297
297
By default the heap will occupy as much of the free memory on the locale
298
298
(compute node) as the runtime can acquire, less a certain amount to
299
- allow for demands from other (system) programs running there. Advanced
299
+ allow for demands from other (system) programs running there. (Note
300
+ that the default with slurm job placement is 16 GiB; see "Communication
301
+ Layer Concurrency and Slurm", below, for more information.) Advanced
300
302
users may want to make the heap smaller than this. Programs start more
301
303
quickly with a smaller heap, and in the unfortunate event that you need
302
304
to produce core files, those will be written more quickly if the heap is
@@ -540,6 +542,25 @@ Parameters associated with the ugni communication layer:
540
542
silently increased or reduced so as to fall within it.
541
543
542
544
545
+ Communication Layer Concurrency and Slurm
546
+ -----------------------------------------
547
+
548
+ When slurm is used for job placement on Cray systems, it limits the
549
+ total NIC memory registration in order to allow for job sharing on
550
+ the compute nodes. In our experience this limit is approximately
551
+ 240 GiB. The product of CHPL_RT_MAX_HEAP_SIZE and the communication
552
+ layer concurrency discussed above must be less than this. The ugni
553
+ communication layer adjusts its heap size and concurency defaults to
554
+ reflect this limit when slurm is responsible for job placement. The
555
+ default heap size is reduced to 16 GiB. The concurrency is computed
556
+ such that the product of heap size and concurrency is below 240 GiB.
557
+ Thus under slurm, the ugni communication layer can support programs
558
+ with very large heaps or programs that need a lot of communication
559
+ concurrency, but not programs that need both simultaneously. Such
560
+ programs need to be run using ALPS for job placement instead of
561
+ slurm.
562
+
563
+
543
564
Network Atomics
544
565
---------------
545
566
0 commit comments