Skip to content

Releases: NVIDIA/cloudai

v0.8.rc1

15 Aug 15:30
b13bafe

Choose a tag to compare

v0.8.rc1 Pre-release
Pre-release

What's Changed

Full Changelog: v0.8.rc0...v0.8.rc1

v0.8.rc0

02 Aug 19:41
570181a

Choose a tag to compare

v0.8.rc0 Pre-release
Pre-release

What's Changed

Full Changelog: v0.7.14...v0.8.rc0

v0.7.14

16 Jul 12:35
81e6278

Choose a tag to compare

v0.7.14 Pre-release
Pre-release

What's Changed

  • Make sure to dry-run all tests in a test scenario by @TaekyungHeo in #149
  • Add warning for insufficient epochs in JaxToolbox report generation by @TaekyungHeo in #148
  • Remove subtest name check in NcclTestSlurmCommandGenStrategy by @TaekyungHeo in #147
  • Handle disk quota exceeded error in cache_docker_image method by @TaekyungHeo in #150
  • Remove unused properties from TestTemplateStrategy by @amaslenn in #151
  • Enhance error messages to provide guidance for missing schemas by @TaekyungHeo in #152
  • Bump version to v0.7.14 by @TaekyungHeo in #153

Full Changelog: v0.7.13...v0.7.14

v0.7.13

10 Jul 19:17
b645a1e

Choose a tag to compare

v0.7.13 Pre-release
Pre-release

What's Changed

  • Pass all env vars to final command in NeMo launcher test template by @TaekyungHeo in #134
  • Added JaxToolbox (Grok) troubleshooting steps by @TaekyungHeo in #142
  • Improve tokenizer path handling in NeMo Launcher Slurm strategy by @TaekyungHeo in #136
  • Remove identical if-else branches by @amaslenn in #143
  • Move parts of srun CLI generation into base class by @amaslenn in #140

Full Changelog: v0.7.12...v0.7.13

v0.7.12

09 Jul 19:08
d8acc30

Choose a tag to compare

v0.7.12 Pre-release
Pre-release

What's Changed

Full Changelog: v0.7.11...v0.7.12

v0.7.11

05 Jul 19:38
a3e568c

Choose a tag to compare

v0.7.11 Pre-release
Pre-release

What's Changed

  • Enhance Quick Start guide for Docker repo access and API key by @TaekyungHeo in #99
  • Add copyright headers for TOML, update its format by @amaslenn in #117
  • Add pyxis mktemp error handling and test cases in JaxToolbox strategy by @TaekyungHeo in #118
  • Add section on downloading NeMo datasets to USER_GUIDE.md by @TaekyungHeo in #116
  • Enhance USER_GUIDE.md with system schema description and troubleshooting steps by @TaekyungHeo in #120
  • Fix bug in generating NeMo launcher command by @TaekyungHeo in #124
  • Add section on describing a test scenario to USER_GUIDE.md by @TaekyungHeo in #123
  • Add cache_docker_images_locally field to system schema in USER_GUIDE.md by @TaekyungHeo in #125
  • Allow local docker image caching for JaxToolbox by @TaekyungHeo in #126
  • Update BaseRunner to include scenario name in output directory by @TaekyungHeo in #119
  • Add mpi field to SlurmSystem and allow different MPI options in schema by @TaekyungHeo in #127
  • Bump version to v0.7.11 by @TaekyungHeo in #128

Full Changelog: v0.7.10...v0.7.11

v0.7.10

24 Jun 23:27
c173117

Choose a tag to compare

v0.7.10 Pre-release
Pre-release

What's Changed

Full Changelog: v0.7.9...v0.7.10

v0.7.9

18 Jun 11:06
0f7513d

Choose a tag to compare

v0.7.9 Pre-release
Pre-release

What's Changed

  • Fix _check_docker_image_accessibility condition and add detailed logging by @TaekyungHeo in #107
  • Refactor handle_install_and_uninstall to identify unique test templates by @TaekyungHeo in #105
  • Bump version to v0.7.9 by @TaekyungHeo in #106

Full Changelog: v0.7.8...v0.7.9

v0.7.8

17 Jun 18:45
4908d9a

Choose a tag to compare

v0.7.8 Pre-release
Pre-release

What's Changed

Full Changelog: v0.7.7...v0.7.8

v0.7.7

15 Jun 03:01
1377377

Choose a tag to compare

v0.7.7 Pre-release
Pre-release

What's Changed

  • Refactor _check_docker_image_accessibility to remove srun usage by @TaekyungHeo in #98
  • Update NcclTestJobStatusRetrievalStrategy to improve error messages by @TaekyungHeo in #100
  • Update JaxToolboxJobStatusRetrievalStrategy to improve error messages by @TaekyungHeo in #101
  • Bump version to v0.7.7 by @TaekyungHeo in #102

Full Changelog: v0.7.6...v0.7.7