Skip to content

NVIDIA NeMo Run 0.6.0

Latest

Choose a tag to compare

@chtruong814 chtruong814 released this 09 Oct 16:13
030f862

NVIDIA Nemo Run 0.6.0

Detailed Changelogs:

Executors

  • Added Pre-Launch Commands Support to LeptonExecutor #312
  • Remove breaking torchrun config for single-node runs #292
  • Upgrade skypilot to v0.10.0, introduce network_tier #297
  • Fixes for multi-node execution with torchrun + LocalExecutor #251
  • Add option to specify --container-env for srun #293
  • Fix skypilot archive mount bug #288
  • finetune on dgxcloud with nemo-run and deploy on bedrock example #286

Ray Integration

  • Add nsys patch in ray sub template #318
  • Add logs dir to container mount for ray slurm #287
  • Allow customizing folder for SlurmRayRequest #281

CLI & Configuration

Experiment & Job Management

  • Use thread pool for status, run methods inside experiment + other fixes #295

Packaging & Deployment

  • Correctly append tar files for packaging #317

Documentation

  • Create CHANGELOG.md #314
  • docs: Fixing doc build issue #290
  • fix docs tutorial links and add intro to guides/index.md #285
  • README #277

CI/CD

  • changelog workflow #315
  • Update release.yml #306
  • ci(fix): Use GITHUB_TOKEN for community bot #302
  • ci: Add community-bot #300

Bug Fixes

  • [Bugfix] Adding a check for name length #273
  • misc fixes #280
  • adding fix for lowercase and name length k8s requirements #274

Others

  • Specify nodes for gpu metrics collection and split data to each rank #320
  • Apply '_enable_goodbye_message' check to both goodbye messages. #319
  • Update refs #278
  • chore: Bump to version 0.6.0rc0.dev0 #272