Skip to content

v2.5.0

Latest

Choose a tag to compare

@mattmartineau mattmartineau released this 21 Dec 15:42
· 5 commits to main since this release
cc1cebd

Summary of Key Changes in AMGX (v2.4.0 → v2.5.0)

CUDA Upgrades

  • Blackwell (B200/GB200/RTX Pro 6000) support
  • CUDA 13 support
  • Minimum CUDA version raised from 10.0 to 12.0, tested up to 13.0
  • Dropped support for older GPU architectures: SM20, SM35, SM52, SM60 removed
  • New minimum: Volta (SM70)+, with support for SM75, SM80, SM86, SM89, SM90, SM100, SM120
    • Tested on Hopper, Blackwell (incl. RTX Pro 6000)
  • Consolidated and removed repetitive / redundant architecture-specific code

Build System Changes

  • Deprecated CUDA_ARCH in favor of standard CMAKE_CUDA_ARCHITECTURES
  • Removed Thrust submodule dependency - now uses system/CUDA-bundled Thrust
  • Removed OpenMP dependency entirely
  • Removed NVTX linking in favor of NVTX3

cuSPARSE API Updates

  • Removed mixed-precision support (DISABLE_MIXED_PRECISION removed)
  • Consolidated to use only generic cuSPARSE SpGEMM interfaces and added multiple flags
    -- use_cusparse_spgemm, cusparse_spgemm_alg, etc.
  • Removed legacy cusparseCsrgemm2 wrapper implementations

Error Handling Improvements

  • New AMGX_CHECK_API_ERROR_NORSRC macro for resource-independent error checks
  • Improved error handling throughout

Memory Management

  • Added runtime detection for cudaMallocAsync support via cudaDevAttrMemoryPoolsSupported
  • Fallback behavior when async memory pools aren't supported by the device

Perf optimizations

  • Optimized hash_set insertion
  • Fixed perf bug with fill_A_kernel_1x1

Bug fixes

  • MPI comm dup bug
  • Convergence check for absolute testing against relative
  • Block size handling in distributed_arranger resize of A
  • Block size handling in renumbering and reordering components
  • Bug in scaled norm factor calculation
  • Fixed output performance for matrix writer
  • Integer overflow in dense LU
  • Fixed handling of latency hiding to use global row count

MPI Example Enhancements (e.g., amgx_mpi_capi.c)

  • Added -cd flag for diagonal dominance checking
  • Added -om flag for matrix output/writing
  • Added -r flag for performance repeat runs