Skip to content

Commit 0745f87

Browse files
authored
Add another section to the ex debug docs (#28413)
Adds another section to the new EX debug docs based on an error a user saw that was not covered [Reviewed by @bradcray]
2 parents 0e2b3f3 + 14935ab commit 0745f87

File tree

1 file changed

+11
-0
lines changed

1 file changed

+11
-0
lines changed

doc/rst/platforms/cray.rst

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -465,6 +465,7 @@ See this :ref:`Slurm troubleshooting section <mem-not-avail>` for help
465465
with this error.
466466

467467

468+
.. _ex-register-too-much-mem:
468469

469470
OFI error: fi_mr_reg(ofi_domain, ...): Cannot allocate memory
470471
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -482,6 +483,16 @@ Note that changing this environment variable, as with any
482483
recompiling your program.
483484

484485

486+
Detected 1 oom_kill event in StepId=.... Some of the step tasks have been OOM Killed
487+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
488+
489+
This error is also a sign that the Chapel runtime is being too aggressive in
490+
registering memory, or that something else is going wrong in the memory
491+
registration.
492+
493+
See :ref:`the previous section <ex-register-too-much-mem>` for steps to work
494+
around this error.
495+
485496
OFI error: fi_enable(tcip->txCtx): Invalid resource domain
486497
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
487498

0 commit comments

Comments
 (0)