Test suite runs on head vs fails on compute node #21241
-
|
I'm an OpenHPC cluster manager trying to help my team install MOOSE. I'm to the point where the test suite runs with few failures (mostly lack of CPUs). But on the compute nodes pretty much every test is skipped, and ultimately MPI errors kill the run. I'm not having much success debugging this. OpenHPC has no MOOSE meta package. Most of the MOOSE dependencies are already part of OpenHPC, but so many of the packages are built with MOOSE compile dependencies missing that we pretty much built it per the instructions. PETSc being a prime example. Some examples of things I had to add to get MOOSE tests to work: libtirpc-devel, python36-devel, and... python -m pip install --upgrade pip Most of the python stuff has OpenHPC packages, that don't work with MOOSE either. I have re-run the libmesh upgrade script a few times as these things were added, in case that was necessary. |
Beta Was this translation helpful? Give feedback.
Replies: 5 comments 45 replies
-
|
Hello Could you please give us the test log for a few of the errors? Guillaume |
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
|
I'll check, but.... [mwmoorcroft@c4 test]$ prun ./moose_test-opt -i ./tests/mesh/concentric_circle_mesh/concentric_circ In UnstructuredMesh::stitch_meshes: Mesh Information: Mesh Bounding Box: Mesh Element Type(s): Mesh Nodesets: Mesh Sidesets: Mesh Edgesets: Mesh Subdomains: [mwmoorcroft@c4 test]$ module unload pmix In UnstructuredMesh::stitch_meshes: Mesh Information: Mesh Bounding Box: Mesh Element Type(s): Mesh Nodesets: Mesh Sidesets: Mesh Edgesets: Mesh Subdomains: |
Beta Was this translation helpful? Give feedback.
-
|
For the record there has been a bug in OpenHPC (maybe it's slurm) that prevented pmix from working correctly. Apparently they STILL have not fixed pmix. They pushed the milestone from OHPC 2.5 to OHPC 2.6. :-( The only answer is to rebuild openmpi yourself. I have have done that, but then patches don't arrive with dnf. edit: I'm told OpenMPI PMIx may get fixed after July. |
Beta Was this translation helpful? Give feedback.
I'll check, but....
[mwmoorcroft@c4 test]$ prun ./moose_test-opt -i ./tests/mesh/concentric_circle_mesh/concentric_circ
le_mesh.i --mesh-only
[prun] Master compute host = c4
[prun] Resource manager = slurm
[prun] Launch cmd = mpirun ./moose_test-opt -i ./tests/mesh/concentric_circle_mesh/concentric_circle_mesh.i --mesh-only (family=openmpi4)
In UnstructuredMesh::stitch_meshes:
This mesh has 53 nodes on boundary 1.
Other mesh has 53 nodes on boundary 3.
Minimum edge length on both surfaces is 0.002625.
In UnstructuredMesh::stitch_meshes:
Found 53 matching nodes.
In UnstructuredMesh::stitch_meshes:
This mesh has 105 nodes on boundary 2.
Other mesh has 105 nodes on boundary 4.
Minimum edge l…