Skip to content

Releases: NVIDIA/cloudai

v1.4.beta17

21 Aug 13:09
ad8a9eb

Choose a tag to compare

v1.4.beta17 Pre-release
Pre-release

What's Changed

  • Add error detection and retry mechanism for worker failures by @TaekyungHeo in #659
  • Use single source of data for reporting and NIXL pass/fail by @amaslenn in #670
  • Write trajectory file for DSE jobs in single-sbatch mode by @amaslenn in #671

Full Changelog: v1.4.beta16...v1.4.beta17

v1.4.beta16

19 Aug 15:28
bf4bdf3

Choose a tag to compare

v1.4.beta16 Pre-release
Pre-release

What's Changed

Full Changelog: v1.4.beta15...v1.4.beta16

v1.4.beta15

19 Aug 14:32
4ed3be3

Choose a tag to compare

v1.4.beta15 Pre-release
Pre-release

What's Changed

  • Re-use comparison report for NIXL by @amaslenn in #664
  • Handle single-sbatch metadata layout in report by @amaslenn in #666
  • Follow-up for PR647 (Support explicit node assignment for prefill and decode workers) by @TaekyungHeo in #665

Full Changelog: v1.4.beta14...v1.4.beta15

v1.4.beta14

18 Aug 20:19
51340e7

Choose a tag to compare

v1.4.beta14 Pre-release
Pre-release

What's Changed

New Contributors

Full Changelog: v1.4.beta13...v1.4.beta14

v1.4.beta13

18 Aug 15:12
2d419ff

Choose a tag to compare

v1.4.beta13 Pre-release
Pre-release

What's Changed

Full Changelog: v1.4.beta12...v1.4.beta13

v1.4.beta12

18 Aug 11:51
13b83f2

Choose a tag to compare

v1.4.beta12 Pre-release
Pre-release

What's Changed

  • Comparison report for NCCL workloads by @amaslenn in #656
  • Support explicit node assignment for prefill and decode workers by @TaekyungHeo in #647

Full Changelog: v1.4.beta11...v1.4.beta12

v1.4.beta11

15 Aug 23:14
267db18

Choose a tag to compare

v1.4.beta11 Pre-release
Pre-release

What's Changed

  • Support for DeepSeekR1 model with SGLang / AI Dynamo by @TaekyungHeo in #641
  • Support mounting any JSON files for --dynamo-deepep-config by @TaekyungHeo in #650
  • Set tp-size and dp-size from args if provided, else use total_gpus by @TaekyungHeo in #649
  • Add environment validation to startup sequence by @TaekyungHeo in #651
  • Follow-up for PR641 (Support for DeepSeekR1 model with SGLang / AI Dynamo) by @TaekyungHeo in #653
  • Reorder the functions in ai_dynamo.sh for improved maintainability by @TaekyungHeo in #654
  • Refactor GPU count to use _gpus_per_node in vllm and env validation by @TaekyungHeo in #657
  • Mount huggingface_home_container_path unconditionally by @TaekyungHeo in #655
  • Refactor nodelist validation to check DYNAMO_NODELIST only if both args empty by @TaekyungHeo in #658

Full Changelog: v1.4.beta10...v1.4.beta11

v1.4.beta10

13 Aug 15:25
4df1de4

Choose a tag to compare

v1.4.beta10 Pre-release
Pre-release

What's Changed

Full Changelog: v1.4.beta9...v1.4.beta10

v1.4.beta9

11 Aug 15:51
500b209

Choose a tag to compare

v1.4.beta9 Pre-release
Pre-release

What's Changed

  • Updates for SlurmContainer workload by @amaslenn in #638
  • Handle missing tests gracefully by adding MissingTestError to avoid backtrace by @TaekyungHeo in #640
  • Clean up src/cloudai/workloads/ai_dynamo/ai_dynamo.sh by @TaekyungHeo in #639

Full Changelog: v1.4.beta8...v1.4.beta9

v1.4.beta8

11 Aug 09:55
d76029c

Choose a tag to compare

v1.4.beta8 Pre-release
Pre-release

What's Changed

  • Add multi-worker-per-node GPU slicing support with dynamic allocation by @TaekyungHeo in #636
  • Log mapping between AI Dynamo nodes and roles by @TaekyungHeo in #617

Full Changelog: v1.4.beta7...v1.4.beta8