Releases: sphexa-org/sphexa
Releases · sphexa-org/sphexa
v0.96.2 2026/03
v0.96.1 2026/03
Fixes compilation issues with CUDA 13 and OpenMPI
v0.96 2026/03
Features
- Add option to remove particles for which neighbor search did not converge
Improvements
- Separate halo and MAC peer rank lists
- Fully converge global tree when needed
- Replace focusTransfer and macRefinement with multi-level LET updates and update LET until converged
- Improve performance of segmented reductions with multiple threads per segment
- Eliminate duplicated data structure for CPU and GPU particles data
Fixes
- Add rocthrust and hipcub dependencies in CMake to fix HIP spack package build
- Add custom MPI reduction to prevent overflows during initialization
v0.95 2025/10
LET improvements
- Enforce global octree keys in LET
- find halos with tight interaction boxes
- unify mac and halo flags
Performance improvements
- Avoid duplication of particle data fields on the CPU and GPU
- find halos with a) tight and b) fp interaction boxes
- stackless DFS traversals
- GPU-direct MPI communication for all LET parts, including globals
Features and enhancements:
- Replaced h5part with h5hut
- Improved profiling framework, capturing additional information
- replaced
gsl::spanwithstd::span
HIP and Spack
- HIP compatibility without requiring the source code to be hipified
- CMake changes to allow easier integration with Spack
Propagator library
Enhancements:
- Separate library with a translation unit for each propagator to reduce compilation times
Fixes:
- Prevent GPU kernel launches with 0 thread blocks which started to be an issue with CUDA 12.6
CUDA 12.5 compatibility
Fixes:
- Full encapsulation of
thrust::device_vector, because starting from CUDA 12.5 inclusion of its in.cppfiles is no longer possible
Dynamic LET surface refinement and node pruning
New features:
- Refine LET resolution at surface after domain boundary changes
- Prune LET nodes outside focus that exceed the LET resolution on the owning rank
Hierarchical block time steps
New features:
- Hierarchical block time stepping
Ewald summation
New features:
- Ewald summation on CPUs and GPUs for gravitational forces with periodic boundarys
- New smoothing kernel for SPH: S49
Performance enhancements:
- Improve tree refinement for remote LET nodes such that fewer remote nodes are needed to ensure successful gravity traversal. Improves performance due to smaller amount of communication needed
- injectKeys on GPUs. This tree resolution-enforcement mechanism is needed more frequently than previously thought,
hence it made sense to port it to GPU.
Fixes:
- Fix compilation issues with CUDA 12.4 related to
thrust::device_vector