Skip to content

Releases: NVIDIA/nvloom

v.1.3.0

06 Nov 01:53

Choose a tag to compare

Added

  • CSV defined benchmarks
  • Memory access latency benchmarks
  • Sample Dockerfile to build nvloom

Changed

  • Retry mechanism for CUDA multicast allocations was removed

Fixed

  • Freeing MNNVL memory did not have enough MPI barriers, leading to race conditions in extremely rare edge-cases
  • Benchmarking algorithm sometimes would record "end event" twice. This had no impact on benchmark results.

v1.2.0

21 Jul 18:58

Choose a tag to compare

Added

  • Multicast reductions benchmarks
  • Option to specify iteration count (-i/--iterations)
  • Option to repeat a testcase for a specified number of iterations (-c/--repeat)
  • Option to repeat a testcase for a specified number of seconds (-d/--duration)
  • CUDA Stream Ordered Memory Allocator was added as a new allocator option (-a cudapool)

Changed

  • Caching multicast allocations is now much faster, thanks to multicast-specific memory pool

v1.1.0

22 May 20:15
ff08fe0

Choose a tag to compare

Added

  • Heatmap plotter
  • Support for CUDA Error Log Management
  • Retry mechanism for CUDA multicast allocations
  • Nvloom_cli argument to set number of samples in gpu-to-rack testcases
  • Nvloom_cli now prints its version, git commit it was built from, and specified buffer size
  • Nvloom_cli now prints units when reporting results
  • Native compilation for sm_103 on CUDA 12.9 toolkits

Changed

  • Expanded README.md
  • Rack-to-rack are now both unidir and bidir, and bidir rack-to-rack are symmetry-optimized.

Fixed

  • Bug where requesting allocations over 4 GiB would fail with CUDA_OUT_OF_MEMORY

v1.0.0

18 Mar 16:05

Choose a tag to compare

Initial release