Skip to content

Conversation

@Sbozzolo
Copy link
Member

@Sbozzolo Sbozzolo commented Nov 3, 2025

This commit puts MFEM on the same footing as the other dependencies. Instead of being unconditionally built, it is now build when building external dependencies. This allowed me to add MFEM as a dependency managed spack.

In doing this, I found a new bug in spack:
spack/spack#51505

So, I changed our approach to testing the spack package. Instead of adding a local repo, I overwrite palace in the builtin repo.

As a side effect, this makes the spack CI action even faster, because now MFEM can be cached.

Among the challenges I faced in implementing this is that hypre and mfem are currently being built with autotools (in spack). This makes certain things hard/annoying. For example, passing through the MFEM_DATA_PATH turned out to be a pain, so I decided to vend the two meshes instead (we are already doing that for other meshes).

It should be noted that hypre have switched to cmake with release 3.0, and that there's a PR open in spack-packages to use cmake in MFEM.

@Sbozzolo Sbozzolo added the build Related to building label Nov 3, 2025
@Sbozzolo Sbozzolo force-pushed the gbozzola/external_mfem branch 18 times, most recently from f4929ab to 463f59a Compare November 5, 2025 18:38
This commit puts MFEM on the same footing as the other dependencies.
Instead of being unconditionally built, it is now build when building
external dependencies. This allowed me to add MFEM as a dependency
managed spack.

In doing this, I found a new bug in spack:
spack/spack#51505

So, I changed our approach to testing the spack package. Instead of
adding a local repo, I overwrite palace in the `builtin` repo.
@Sbozzolo Sbozzolo force-pushed the gbozzola/external_mfem branch from 463f59a to a17970f Compare November 5, 2025 18:42
auto ref_tet_path = fs::path(MFEM_DATA_PATH) / "ref-tetrahedron.mesh";
mfem::Mesh single_tet(ref_tet_path.string());
mfem::ParMesh pmesh(Mpi::World(), single_tet);
mfem::Mesh mesh = mfem::Mesh::MakeCartesian3D(1, 1, 1, mfem::Element::TETRAHEDRON);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the mesh here because it doesn't really matter what mesh we use, and I preferred picking something that does not depend on a file.

@Sbozzolo Sbozzolo marked this pull request as ready for review November 5, 2025 18:43
Copy link
Contributor

@cameronrutherford cameronrutherford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might have missed something... LGTM! I think largely just clarifying questions.

Comment on lines 42 to 43
cp -r spack_repo/local/packages/palace "$(spack location --repo builtin)/packages"
cp extern/patch/mfem/* "$(spack location --repo builtin)/packages/palace"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting that we have patches in this location, even though spack wants to use them... I suppose once we merge this upstream for users it won't be that confusing of a end user workflow, but I wonder if we should have something in our docs about local development in this fashion, and how to iterate on mfem patches (maybe we do and I just haven't read it).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added developer documentation to discuss this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we do the symlink for developer docs, but then do the cp approach for CI... This seems fine but perhaps we have some more consistency across these approaches?

Comment on lines 55 to 56
eigensolver: [arpack, slepc]
solver: [mumps, strumpack, superlu-dist]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is perhaps something we just have to take a stance on, but there are likely always going to be build configurations that we miss out on.

For example before this change, we had:

- @develop+strumpack+arpack+mumps+tests

Now we have:

  • @develop+strumpack+arpack
  • @develop+mumps+arpack
  • @develop+superlu-dist+arpack

And the same for slepc...

However, note that +slepc is the default for palace, and so is +superlu-dist, so those variants will always be enabled.

The question(s) then are:

  • Do we want to perhaps change this matrix to be like [+arpack~slepc, ~arpack+slepc, +arpack+slepc]? This gets more complicated for the solver too...
  • What (eigen)solver is used in testing when more than one is enabled?
  • Do we want to test the full matrix of builds here? What's an acceptable subset?

I think there are some logical answers to these, but wanted to avoid adding my bias...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just adding in arpack or strumpack or mumps into the build is not going to change from the default superlu-dist slepc combination, as they are the defaults are all in. So if the goal is testing those, the defaults need to be explicitly ~'d.

For the non spack tests, we use a pairwise test matrix, as we can't test every combination, as that would blow up the number ridiculously. Instead pairwise testing gives a reasonable hope at catching failures induced by poor interactions between dependencies, to then be investigated more directly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added them mostly to check that they compile correctly, but I will try translating our linux matrix to a spack matrix (given that it has better caching -> it's faster), and maybe in the future we can reduce the size of linux matrix.

Comment on lines 116 to 122
if(PALACE_WITH_SUNDIALS AND NOT PALACE_BUILD_EXTERNAL_DEPS)
# Spack's sundials requires MPI with C bindings, so we alias
# MPI_C to MPI_CXX (if it does not already exist).
if(NOT TARGET MPI::MPI_C AND TARGET MPI::MPI_CXX)
add_library(MPI::MPI_C ALIAS MPI::MPI_CXX)
endif()
endif()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Odd that this isn't the find_package default behavior given the MPI C++ bindings were deprecated in 3.0 - https://www.mpi-forum.org/docs/mpi-2.2/mpi22-report/node32.htm

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting! I am not sure about this. We use MPI_CXX in a couple of other places

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤷🏻 I suppose it's an artifact that we leave. No big deal either way, but hopefully the context of my comment makes sense.

Copy link
Collaborator

@hughcars hughcars left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some tiny things related to test matrix and dependencies, but the big one is removing the ability to use mfem data folder meshes in our unit tests. I don't like having to check in every mesh that we're going to test against, rather than pulling them from mfem on demand. If there's a way to have the unit test target populate its data folder within the build folder from mfem at build time, that would be the best outcome. This conflicts with spack's tendency to blow away anything but the built artifacts though, so there'd have to be some kind of temporary folder or reusing of an existing.

Comment on lines -94 to -99
set_property(
SOURCE
${CMAKE_CURRENT_SOURCE_DIR}/test-geodata.cpp
${CMAKE_CURRENT_SOURCE_DIR}/test-rap.cpp
APPEND PROPERTY COMPILE_DEFINITIONS "MFEM_DATA_PATH=\"${CMAKE_INSTALL_PREFIX}/share/palace/test/mfem-data\""
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is removing the ability to use mfem reference meshes directly, which seems unfortunate. I presume this is because in a spack build these data files don't persist. I wonder if we can have the reference meshes downloaded directly from the mfem repo during configuration instead, but that might be complicated. Rationale being it would be great to avoid having to check in duplicates of meshes which we can access from mfem, and which aren't changing any time soon.

It seems like if the test target is built we can have a configuration step that copies the meshes from the MFEM github to our build local data folder as part of configuration. We'd need a list of these meshes to be stored in the cmake, but then adding or removing them in future consists only of adding that string, rather than checking in another mesh. I'd basically like to maintain the flexibility of being able to use any mfem vended mesh without copying it into the repo.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I had to do this is that MFEM is currently built with autotools (when installed with spack). For this reason, the information on MFEM_DATA_PATH is lost, and it is very hard recover.

I'd be inclined to push this to the future, possibly tackling #506 at the same time.

Comment on lines 55 to 56
eigensolver: [arpack, slepc]
solver: [mumps, strumpack, superlu-dist]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just adding in arpack or strumpack or mumps into the build is not going to change from the default superlu-dist slepc combination, as they are the defaults are all in. So if the goal is testing those, the defaults need to be explicitly ~'d.

For the non spack tests, we use a pairwise test matrix, as we can't test every combination, as that would blow up the number ridiculously. Instead pairwise testing gives a reasonable hope at catching failures induced by poor interactions between dependencies, to then be investigated more directly.

@Sbozzolo Sbozzolo force-pushed the gbozzola/external_mfem branch 2 times, most recently from afaaaff to 0532542 Compare November 7, 2025 19:42
@Sbozzolo
Copy link
Member Author

Sbozzolo commented Nov 7, 2025

This is the current status of this PR.

I incorporated the feedback from the review and I tried having a full test matrix for our spack build. The full test matrix ended up being a lot of work and little success. I made significant progress and got most of the matrix entries to at least concretize, and I fix all the build errors I could easily address. Now, the various builds fail for various different reasons, as you can see for example here. Some of these errors are probably pointers to bugs or edge cases in upstream dependencies. I am not sure how to debug these issues in an efficient way, given that I cannot reproduce the test environment to do interactive debugging.

In addition to this, I found that our OpenMP builds seem to be incorrect (#557), but I am not sure why.

Finally, I also added documentation to give some pointers on how spack can be used for co-development and I describe how to work around some of the known spack bugs for our setup.

@Sbozzolo
Copy link
Member Author

Sbozzolo commented Nov 10, 2025

I tracked down the issue with OpenMP and found that it was a missing build option in libCEED. This will be fixed in spack/spack-packages#2361, and I added a temporary workaround while the PR gets merged.

This run shows that now the OpenMP tests pass: https://github.com/awslabs/palace/actions/runs/19241391053

They are still kind of slow (see also #438).

@Sbozzolo Sbozzolo force-pushed the gbozzola/external_mfem branch 3 times, most recently from d253133 to c564838 Compare November 10, 2025 20:23
@Sbozzolo Sbozzolo force-pushed the gbozzola/external_mfem branch 2 times, most recently from 5b6c007 to 5c56576 Compare November 10, 2025 21:07
@Sbozzolo Sbozzolo force-pushed the gbozzola/external_mfem branch from 5c56576 to 953f9e7 Compare November 10, 2025 21:12
Copy link
Contributor

@cameronrutherford cameronrutherford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots of NIT comments and changes, but this is looking awesome!!!

Comment on lines 42 to 43
cp -r spack_repo/local/packages/palace "$(spack location --repo builtin)/packages"
cp extern/patch/mfem/* "$(spack location --repo builtin)/packages/palace"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we do the symlink for developer docs, but then do the cp approach for CI... This seems fine but perhaps we have some more consistency across these approaches?


env:
# Note that each spack version needs its own registry
SPACK_VERSION: develop
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, it seems like we ended up having the bug merged into spack develop relatively quickly.

Longer term, it might be nice to have some develop pipelines that we allow to fail, although I don't really know how useful that would be, and surely we don't want to waste compute / duplicate work here...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Along the same lines, I can't imagine we want to test all spack versions... Perhaps we should advertise somewhere in the repo / docs what version of Spack we are compatible with?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, the official getting started page for spack always checks out the most recent develop, which is not great.

I opened this: spack/spack#51547

cuda: ~cuda

runs-on: palace_ubuntu-latest_16-core
# Fails configuring PETSc
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we want to file / tag bug reports for each of these?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can easily produce a good bug report. We don't have a minimal reproducer and we can only point to some failed github actions jobs, which won't really help the maintainers.

Same for the others below. I think we should do our due diligence before opening bug reports.

Copy link
Contributor

@cameronrutherford cameronrutherford Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's unreasonable to at least open a bug report with a link to the failing pipeline, even if you don't have a strong reproducer / minimum example. The more complex example that runs in a GitHub workflow is still useful as a bug report to our dependent projects IMO, and it's easy enough for them to reproduce if they spin up their own GitHub action.

I am also suggesting this as it would be good to have some tracking of the bug... If we don't open a downstream issue, then we should open an issue in Palace to track the failing builds?

That, or we should minimize and officially state our supported platforms.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think we should track them in Palace, and possibly report them later to the relevant downstream project. For example, at this point, it is unclear where the issues are (is it the spack package, the actual build system, ...)

to the `develop` section of your `spack.yaml` pointing to your local MFEM
checkout. Spack will then manage both packages in development mode, rebuilding
each as needed when you make changes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could also include some note here if you want about using the CMake based build after the first spack build. i.e. you can do something like this once you have the first build done and you are just modifying Palace source code (very rough commands from memory):

cd ~/repos/palace
cd palace-<spack-hash>-build
source spack-build-env.txt
# optionally
make clean
make -j

This way you can iterate without calling out to Spack / Python interpreter which is moderately faster when doing small / frequent iterations - certainly way faster now that this doesn't trigger an MFEM rebuild.

@Sbozzolo Sbozzolo force-pushed the gbozzola/external_mfem branch from 5e21be0 to dd7a4cf Compare November 11, 2025 18:22
@Sbozzolo Sbozzolo force-pushed the gbozzola/external_mfem branch from dd7a4cf to b0ea6e5 Compare November 11, 2025 22:42
@Sbozzolo Sbozzolo force-pushed the gbozzola/external_mfem branch from b0ea6e5 to 854e9ea Compare November 11, 2025 22:59
@Sbozzolo Sbozzolo force-pushed the gbozzola/external_mfem branch from e343b2a to 6c63e6d Compare November 11, 2025 23:29
@Sbozzolo Sbozzolo force-pushed the gbozzola/external_mfem branch from 9a2f827 to 581d320 Compare November 12, 2025 18:01
@Sbozzolo
Copy link
Member Author

Sbozzolo commented Nov 12, 2025

I found that some static builds were broken (because our build system relied on transitively linking libraries through MFEM) and fixed that.

In this, I also found that the package.py for SUNDIALS in the released spack-packages is broken because it refers to an non-existing commit. This was fixed here, but it is not part of the released packages. So, as of right now, I have resorted to just copying the spack recipes we need with the fixes we need. We can discuss how to best do this.

Following @cameronrutherford's suggestion, I added an env file and managed to get another combination to build, but one of the regression tests fails: https://github.com/awslabs/palace/actions/runs/19281378654/job/55133032500
I tried reproducing the same palace spec locally to see if the issue was with the choice of solvers, but the test passes locally.

I also think I might have found an issue with the PETSc configure error in -lomp not being passed depending on the config for petsc/superlu, but I could not verify this on GitHub actions.

Failing builds (left for future work)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

build Related to building

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants