Skip to content

Releases: sphexa-org/sphexa

v0.96.2 2026/03

23 Mar 12:27
03cc0ca

Choose a tag to compare

Fixes

  • Explicitly enforce rank 0 as the source of truth for parallel HDF step attribute writes.

v0.96.1 2026/03

18 Mar 10:49
ac45b95

Choose a tag to compare

Fixes compilation issues with CUDA 13 and OpenMPI

v0.96 2026/03

11 Mar 09:31
8ccb3b6

Choose a tag to compare

Features

  • Add option to remove particles for which neighbor search did not converge

Improvements

  • Separate halo and MAC peer rank lists
  • Fully converge global tree when needed
  • Replace focusTransfer and macRefinement with multi-level LET updates and update LET until converged
  • Improve performance of segmented reductions with multiple threads per segment
  • Eliminate duplicated data structure for CPU and GPU particles data

Fixes

  • Add rocthrust and hipcub dependencies in CMake to fix HIP spack package build
  • Add custom MPI reduction to prevent overflows during initialization

v0.95 2025/10

06 Nov 13:54
5e12d29

Choose a tag to compare

LET improvements

  • Enforce global octree keys in LET
  • find halos with tight interaction boxes
  • unify mac and halo flags

Performance improvements

  • Avoid duplication of particle data fields on the CPU and GPU
  • find halos with a) tight and b) fp interaction boxes
  • stackless DFS traversals
  • GPU-direct MPI communication for all LET parts, including globals

Features and enhancements:

  • Replaced h5part with h5hut
  • Improved profiling framework, capturing additional information
  • replaced gsl::span with std::span

HIP and Spack

13 Dec 08:58

Choose a tag to compare

  • HIP compatibility without requiring the source code to be hipified
  • CMake changes to allow easier integration with Spack

Propagator library

21 Nov 14:56
f56fc7c

Choose a tag to compare

Enhancements:

  • Separate library with a translation unit for each propagator to reduce compilation times

Fixes:

  • Prevent GPU kernel launches with 0 thread blocks which started to be an issue with CUDA 12.6

CUDA 12.5 compatibility

21 Nov 13:37

Choose a tag to compare

Fixes:

  • Full encapsulation of thrust::device_vector, because starting from CUDA 12.5 inclusion of its in .cpp files is no longer possible

Dynamic LET surface refinement and node pruning

21 Nov 13:34

Choose a tag to compare

New features:

  • Refine LET resolution at surface after domain boundary changes
  • Prune LET nodes outside focus that exceed the LET resolution on the owning rank

Hierarchical block time steps

20 Nov 17:33

Choose a tag to compare

New features:

  • Hierarchical block time stepping

Ewald summation

20 Nov 17:17
61a9674

Choose a tag to compare

New features:

  • Ewald summation on CPUs and GPUs for gravitational forces with periodic boundarys
  • New smoothing kernel for SPH: S49

Performance enhancements:

  • Improve tree refinement for remote LET nodes such that fewer remote nodes are needed to ensure successful gravity traversal. Improves performance due to smaller amount of communication needed
  • injectKeys on GPUs. This tree resolution-enforcement mechanism is needed more frequently than previously thought,
    hence it made sense to port it to GPU.

Fixes:

  • Fix compilation issues with CUDA 12.4 related to thrust::device_vector