-
Notifications
You must be signed in to change notification settings - Fork 85
Compile MFEM externally #550
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
f4929ab to
463f59a
Compare
This commit puts MFEM on the same footing as the other dependencies. Instead of being unconditionally built, it is now build when building external dependencies. This allowed me to add MFEM as a dependency managed spack. In doing this, I found a new bug in spack: spack/spack#51505 So, I changed our approach to testing the spack package. Instead of adding a local repo, I overwrite palace in the `builtin` repo.
463f59a to
a17970f
Compare
| auto ref_tet_path = fs::path(MFEM_DATA_PATH) / "ref-tetrahedron.mesh"; | ||
| mfem::Mesh single_tet(ref_tet_path.string()); | ||
| mfem::ParMesh pmesh(Mpi::World(), single_tet); | ||
| mfem::Mesh mesh = mfem::Mesh::MakeCartesian3D(1, 1, 1, mfem::Element::TETRAHEDRON); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the mesh here because it doesn't really matter what mesh we use, and I preferred picking something that does not depend on a file.
cameronrutherford
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might have missed something... LGTM! I think largely just clarifying questions.
.github/workflows/docs.yml
Outdated
| cp -r spack_repo/local/packages/palace "$(spack location --repo builtin)/packages" | ||
| cp extern/patch/mfem/* "$(spack location --repo builtin)/packages/palace" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting that we have patches in this location, even though spack wants to use them... I suppose once we merge this upstream for users it won't be that confusing of a end user workflow, but I wonder if we should have something in our docs about local development in this fashion, and how to iterate on mfem patches (maybe we do and I just haven't read it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added developer documentation to discuss this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we do the symlink for developer docs, but then do the cp approach for CI... This seems fine but perhaps we have some more consistency across these approaches?
.github/workflows/spack.yml
Outdated
| eigensolver: [arpack, slepc] | ||
| solver: [mumps, strumpack, superlu-dist] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is perhaps something we just have to take a stance on, but there are likely always going to be build configurations that we miss out on.
For example before this change, we had:
- @develop+strumpack+arpack+mumps+tests
Now we have:
@develop+strumpack+arpack@develop+mumps+arpack@develop+superlu-dist+arpack
And the same for slepc...
However, note that +slepc is the default for palace, and so is +superlu-dist, so those variants will always be enabled.
The question(s) then are:
- Do we want to perhaps change this matrix to be like
[+arpack~slepc, ~arpack+slepc, +arpack+slepc]? This gets more complicated for the solver too... - What (eigen)solver is used in testing when more than one is enabled?
- Do we want to test the full matrix of builds here? What's an acceptable subset?
I think there are some logical answers to these, but wanted to avoid adding my bias...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just adding in arpack or strumpack or mumps into the build is not going to change from the default superlu-dist slepc combination, as they are the defaults are all in. So if the goal is testing those, the defaults need to be explicitly ~'d.
For the non spack tests, we use a pairwise test matrix, as we can't test every combination, as that would blow up the number ridiculously. Instead pairwise testing gives a reasonable hope at catching failures induced by poor interactions between dependencies, to then be investigated more directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added them mostly to check that they compile correctly, but I will try translating our linux matrix to a spack matrix (given that it has better caching -> it's faster), and maybe in the future we can reduce the size of linux matrix.
palace/CMakeLists.txt
Outdated
| if(PALACE_WITH_SUNDIALS AND NOT PALACE_BUILD_EXTERNAL_DEPS) | ||
| # Spack's sundials requires MPI with C bindings, so we alias | ||
| # MPI_C to MPI_CXX (if it does not already exist). | ||
| if(NOT TARGET MPI::MPI_C AND TARGET MPI::MPI_CXX) | ||
| add_library(MPI::MPI_C ALIAS MPI::MPI_CXX) | ||
| endif() | ||
| endif() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Odd that this isn't the find_package default behavior given the MPI C++ bindings were deprecated in 3.0 - https://www.mpi-forum.org/docs/mpi-2.2/mpi22-report/node32.htm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting! I am not sure about this. We use MPI_CXX in a couple of other places
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤷🏻 I suppose it's an artifact that we leave. No big deal either way, but hopefully the context of my comment makes sense.
hughcars
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some tiny things related to test matrix and dependencies, but the big one is removing the ability to use mfem data folder meshes in our unit tests. I don't like having to check in every mesh that we're going to test against, rather than pulling them from mfem on demand. If there's a way to have the unit test target populate its data folder within the build folder from mfem at build time, that would be the best outcome. This conflicts with spack's tendency to blow away anything but the built artifacts though, so there'd have to be some kind of temporary folder or reusing of an existing.
| set_property( | ||
| SOURCE | ||
| ${CMAKE_CURRENT_SOURCE_DIR}/test-geodata.cpp | ||
| ${CMAKE_CURRENT_SOURCE_DIR}/test-rap.cpp | ||
| APPEND PROPERTY COMPILE_DEFINITIONS "MFEM_DATA_PATH=\"${CMAKE_INSTALL_PREFIX}/share/palace/test/mfem-data\"" | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is removing the ability to use mfem reference meshes directly, which seems unfortunate. I presume this is because in a spack build these data files don't persist. I wonder if we can have the reference meshes downloaded directly from the mfem repo during configuration instead, but that might be complicated. Rationale being it would be great to avoid having to check in duplicates of meshes which we can access from mfem, and which aren't changing any time soon.
It seems like if the test target is built we can have a configuration step that copies the meshes from the MFEM github to our build local data folder as part of configuration. We'd need a list of these meshes to be stored in the cmake, but then adding or removing them in future consists only of adding that string, rather than checking in another mesh. I'd basically like to maintain the flexibility of being able to use any mfem vended mesh without copying it into the repo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I had to do this is that MFEM is currently built with autotools (when installed with spack). For this reason, the information on MFEM_DATA_PATH is lost, and it is very hard recover.
I'd be inclined to push this to the future, possibly tackling #506 at the same time.
.github/workflows/spack.yml
Outdated
| eigensolver: [arpack, slepc] | ||
| solver: [mumps, strumpack, superlu-dist] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just adding in arpack or strumpack or mumps into the build is not going to change from the default superlu-dist slepc combination, as they are the defaults are all in. So if the goal is testing those, the defaults need to be explicitly ~'d.
For the non spack tests, we use a pairwise test matrix, as we can't test every combination, as that would blow up the number ridiculously. Instead pairwise testing gives a reasonable hope at catching failures induced by poor interactions between dependencies, to then be investigated more directly.
afaaaff to
0532542
Compare
0532542 to
1d0e924
Compare
|
This is the current status of this PR. I incorporated the feedback from the review and I tried having a full test matrix for our spack build. The full test matrix ended up being a lot of work and little success. I made significant progress and got most of the matrix entries to at least concretize, and I fix all the build errors I could easily address. Now, the various builds fail for various different reasons, as you can see for example here. Some of these errors are probably pointers to bugs or edge cases in upstream dependencies. I am not sure how to debug these issues in an efficient way, given that I cannot reproduce the test environment to do interactive debugging. In addition to this, I found that our OpenMP builds seem to be incorrect (#557), but I am not sure why. Finally, I also added documentation to give some pointers on how spack can be used for co-development and I describe how to work around some of the known spack bugs for our setup. |
|
I tracked down the issue with OpenMP and found that it was a missing build option in libCEED. This will be fixed in spack/spack-packages#2361, and I added a temporary workaround while the PR gets merged. This run shows that now the OpenMP tests pass: https://github.com/awslabs/palace/actions/runs/19241391053 They are still kind of slow (see also #438). |
d253133 to
c564838
Compare
5b6c007 to
5c56576
Compare
5c56576 to
953f9e7
Compare
cameronrutherford
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lots of NIT comments and changes, but this is looking awesome!!!
.github/workflows/docs.yml
Outdated
| cp -r spack_repo/local/packages/palace "$(spack location --repo builtin)/packages" | ||
| cp extern/patch/mfem/* "$(spack location --repo builtin)/packages/palace" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we do the symlink for developer docs, but then do the cp approach for CI... This seems fine but perhaps we have some more consistency across these approaches?
|
|
||
| env: | ||
| # Note that each spack version needs its own registry | ||
| SPACK_VERSION: develop |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, it seems like we ended up having the bug merged into spack develop relatively quickly.
Longer term, it might be nice to have some develop pipelines that we allow to fail, although I don't really know how useful that would be, and surely we don't want to waste compute / duplicate work here...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Along the same lines, I can't imagine we want to test all spack versions... Perhaps we should advertise somewhere in the repo / docs what version of Spack we are compatible with?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, the official getting started page for spack always checks out the most recent develop, which is not great.
I opened this: spack/spack#51547
| cuda: ~cuda | ||
|
|
||
| runs-on: palace_ubuntu-latest_16-core | ||
| # Fails configuring PETSc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did we want to file / tag bug reports for each of these?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we can easily produce a good bug report. We don't have a minimal reproducer and we can only point to some failed github actions jobs, which won't really help the maintainers.
Same for the others below. I think we should do our due diligence before opening bug reports.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's unreasonable to at least open a bug report with a link to the failing pipeline, even if you don't have a strong reproducer / minimum example. The more complex example that runs in a GitHub workflow is still useful as a bug report to our dependent projects IMO, and it's easy enough for them to reproduce if they spin up their own GitHub action.
I am also suggesting this as it would be good to have some tracking of the bug... If we don't open a downstream issue, then we should open an issue in Palace to track the failing builds?
That, or we should minimize and officially state our supported platforms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think we should track them in Palace, and possibly report them later to the relevant downstream project. For example, at this point, it is unclear where the issues are (is it the spack package, the actual build system, ...)
| to the `develop` section of your `spack.yaml` pointing to your local MFEM | ||
| checkout. Spack will then manage both packages in development mode, rebuilding | ||
| each as needed when you make changes. | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could also include some note here if you want about using the CMake based build after the first spack build. i.e. you can do something like this once you have the first build done and you are just modifying Palace source code (very rough commands from memory):
cd ~/repos/palace
cd palace-<spack-hash>-build
source spack-build-env.txt
# optionally
make clean
make -jThis way you can iterate without calling out to Spack / Python interpreter which is moderately faster when doing small / frequent iterations - certainly way faster now that this doesn't trigger an MFEM rebuild.
5e21be0 to
dd7a4cf
Compare
dd7a4cf to
b0ea6e5
Compare
b0ea6e5 to
854e9ea
Compare
e343b2a to
6c63e6d
Compare
9a2f827 to
581d320
Compare
|
I found that some static builds were broken (because our build system relied on transitively linking libraries through MFEM) and fixed that. In this, I also found that the package.py for SUNDIALS in the released Following @cameronrutherford's suggestion, I added an I also think I might have found an issue with the PETSc configure error in Failing builds (left for future work)
|
This commit puts MFEM on the same footing as the other dependencies. Instead of being unconditionally built, it is now build when building external dependencies. This allowed me to add MFEM as a dependency managed spack.
In doing this, I found a new bug in spack:
spack/spack#51505
So, I changed our approach to testing the spack package. Instead of adding a local repo, I overwrite palace in the
builtinrepo.As a side effect, this makes the spack CI action even faster, because now MFEM can be cached.
Among the challenges I faced in implementing this is that hypre and mfem are currently being built with autotools (in spack). This makes certain things hard/annoying. For example, passing through the
MFEM_DATA_PATHturned out to be a pain, so I decided to vend the two meshes instead (we are already doing that for other meshes).It should be noted that hypre have switched to cmake with release 3.0, and that there's a PR open in spack-packages to use cmake in MFEM.