This guide explains how to create and configure workload recipes using the metadata.yaml file. It covers all available configuration options and patterns for defining workloads.
Each workload recipe requires a metadata.yaml file that defines:
- General Information: Workload identification and framework
- Container Images: Runtime environment containers
- Repositories: Git repositories for dependencies
- Downloads: Offline assets (tokenizers, models, datasets)
- Setup: Virtual environment and dependency installation
- Tools: Workload-specific tool versions (e.g., nsys)
- Run Configuration: GPU configs, model sizes, and test scales
A complete metadata.yaml follows this structure:
general:
# Workload identification
container:
# Container images
repositories: # Optional
# Git repositories
downloads: # Optional
# Offline assets (tokenizers, models, datasets)
tools: # Optional
# Tool versions
setup: # Optional
# Dependencies and setup tasks
run:
# Launch configuration and GPU configsIdentifies the workload at a high level:
general:
workload: nemotron4 # workload model name
workload_type: pretrain # Type of workload
framework: nemo2 # Framework used
model: nemotron4 # Optional: Override model name in llmb-configworkload(string, required): Name of the workload, must match the directory nameworkload_type(enum, required): One of:pretrain- Pre-training workloadsinference- Inference workloadsfinetune- Fine-tuning workloads
framework(string, required): Framework name (e.g.,nemo2,maxtext,megatron)model(string, optional): Model name to use inllmb-config_jobid.yamlformodel_info.model_name. If not specified, defaults to theworkloadvalue. Useful when multiple workload directories share the same base model (e.g.,llama3.1andllama3.3both usemodel: llama3)
Note: Version information is managed centrally in release.yaml at the repository root and does not need to be specified in individual recipe metadata files.
Defines the OCI container images that provide the runtime environment.
container:
images:
- 'nvcr.io#nvidia/nemo:25.07.01'container:
images:
- 'nvcr.io#nvidia/nemo:25.07.01'
- 'nvcr.io#nvidia/pytorch:24.12-py3'Override the automatically generated filename:
container:
images:
- url: 'nvcr.io#nvidia/nemo:25.07.01'
name: 'my-custom-name.sqsh'Use different containers for different GPU types:
container:
images:
by_gpu:
h100: 'nvcr.io#nvidia/nemo:25.01'
gb200: 'nvcr.io#nvidia/nemo:25.05'
default: 'nvcr.io#nvidia/nemo:25.07.01' # Fallback for other GPUsNote: Image URLs use # instead of / between registry and image path.
Defines Git repositories to clone during setup. These can be used as dependencies or referenced in the setup.
repositories:
nemo:
url: "https://github.com/NVIDIA/NeMo.git"
commit: "763ffa8b00a2fca9f7a204e14111ed190de7d947" # Full 40-char SHA
megatron_core:
url: "https://github.com/NVIDIA/Megatron-LM.git"
commit: "ac198fc0d60a8c748597e01ca4c6887d3a7bcf3d"repositories:
by_gpu:
h100:
nemo:
url: "https://github.com/NVIDIA/NeMo.git"
commit: "abc123..."
gb200:
nemo:
url: "https://github.com/NVIDIA/NeMo.git"
commit: "def456..."
default:
nemo:
url: "https://github.com/NVIDIA/NeMo.git"
commit: "789abc..."Important: Commit must be the full 40-character SHA hash, not a short hash or tag.
Specifies offline assets to download during installation. This section is used to ensure models and tokenizers are available in air-gapped or offline environments.
The recommended way to specify HuggingFace assets is using the huggingface list. This allows you to specify both tokenizers and model configurations.
downloads:
huggingface:
- repo_id: Qwen/Qwen3-30B-A3B
assets: [tokenizer, config] # Optional: defaults to both if omittedrepo_id(string, required): The HuggingFace repository ID.assets(list of enums, optional): Which assets to download. Allowed values:tokenizer,config.- If omitted, it defaults to both
[tokenizer, config].
- If omitted, it defaults to both
- No Weights: This section does NOT download model weights (SafeTensors/Pickle). It only downloads metadata, tokenizers, and configuration files.
- Download vs. Verify: Downloads run first, then a separate verification step checks that required assets load offline (
local_files_only=True). This is an internal implementation split (two functions), not a separate lifecycle phase. - Caching: Assets are cached in
$LLMB_INSTALL/.cache/huggingfaceand made available to workloads via theHF_HOMEenvironment variable.
The hf_tokenizers key is supported for backward compatibility but is restricted to tokenizers only. It does not download model configurations.
downloads:
hf_tokenizers:
- 'meta-llama/Meta-Llama-3-70B'Important
Exclusivity Rule: You cannot use both hf_tokenizers and huggingface within the same metadata.yaml file. Mixing them will result in a validation error.
Existing recipes using hf_tokenizers should eventually migrate to the huggingface structure. Note that hf_tokenizers only downloads the tokenizer, while the new huggingface key defaults to both tokenizer and config.
Legacy (Tokenizer only):
downloads:
hf_tokenizers:
- 'nvidia/Nemotron-4-340B-Base'Migrated (Tokenizer only):
downloads:
huggingface:
- repo_id: nvidia/Nemotron-4-340B-Base
assets: [tokenizer]Omit the assets field to download both.
downloads:
huggingface:
- repo_id: Qwen/Qwen3-30B-A3Bdownloads:
huggingface:
- repo_id: nvidia/Nemotron-4-340B-Base
assets: [tokenizer]downloads:
huggingface:
- repo_id: meta-llama/Llama-3.1-405B
assets: [config]Note: Accessing private or gated models requires the HF_TOKEN environment variable to be set during the installation phase.
Specifies workload-specific tool versions (currently supports nsys for profiling).
Only use this section when you need a specific tool version. If your container's tools work fine, omit this section.
tools:
nsys: "2025.5.1.121-3638078"Use different versions for different GPU types:
tools:
nsys:
by_gpu:
h100: "2025.1.1.118-3638078"
gb200: "2025.5.1.121-3638078"
default: "2025.4.1.172-3634357" # Optional: fallback versionOnly specify versions for GPUs that need custom tools (others use container version):
tools:
nsys:
by_gpu:
h100: "2025.1.1.118-3638078"
gb200: "2025.5.1.121-3638078"
# b200 and other GPUs will use container nsys (no download)Resolution Logic:
- If GPU explicitly listed in
by_gpu→ use that version - Else if
defaultkey exists → use default version - Else → use container version (no download)
For more details, see tools.md.
Defines virtual environment creation, dependencies, and setup tasks.
setup:
venv_req: true # Create a Python virtual environment
dependencies:
pip:
- package: nemo
repo_key: nemo
install_target: '.[nlp]'
- 'scipy<1.13.0'
- 'bitsandbytes==0.46.0'
- package: megatron-core
repo_key: megatron_coreSimple string format (PyPI package):
dependencies:
pip:
- 'numpy==1.24.0'
- 'torch>=2.0'Repository-based package:
dependencies:
pip:
- package: nemo # Package name
repo_key: nemo # References key in repositories section
install_target: '.[nlp]' # Optional: extras or specific target
editable: true # Optional: install in editable mode (-e)dependencies:
git:
my_package:
repo_key: my_repo # References key in repositories section
install_method:
type: clone # 'clone' or 'script'
path: 'subdir' # Optional: subdirectory within repoRun custom commands during setup:
setup:
venv_req: true
tasks:
- name: "Download dataset"
cmd: "python download_data.py --output $DATASET_DIR"
job_type: local # 'local', 'nemo2', 'srun', or 'sbatch'
requires_gpus: false # Optional: whether task needs GPUs
env: # Optional: environment variables
DATASET_DIR: "/data"Task Types:
local: Run on current nodenemo2: Run with nemo2 launchersrun: Run via SLURM srunsbatch: Submit as SLURM batch job
⚠️ DEPRECATED: Thesetup_scriptfunctionality is deprecated and will be removed in a future release. Please migrate to thetasksfeature above for all setup operations.
For backward compatibility only:
setup:
setup_script: "setup.sh" # Path to setup script (DEPRECATED - use tasks instead)
venv_req: trueDefines how the workload is launched and what configurations to test.
run:
launcher_type: 'nemo' # Launcher type
launch_script: 'launch.sh' # Launch script path
gpu_configs: # Per-GPU configurations
h100:
model_configs:
- model_size: '405b'
dtypes: ['fp8', 'bf16']
scales: [512, 1024, 2048]nemo: NeMo launcher (nemo2 workloads)megatron_bridge: Megatron bridge launchersbatch: Direct SLURM sbatch submission
Define test configurations for each GPU type:
gpu_configs:
h100:
model_configs:
- model_size: '15b'
dtypes: ['fp8', 'bf16']
scales: [16, 32, 64, 128]
b200:
model_configs:
- model_size: '15b'
dtypes: ['fp8']
scales: [32, 64, 128, 256]Supported GPU Types: h100, b200, gb200, gb300
Each model config specifies:
model_configs:
- model_size: '340b'
dtypes: ['fp8', 'bf16']
scales: [128, 256, 512, 1024]
exact_scales: false # Optional: allow power-of-2 extensionDefine different scales for different dtypes:
model_configs:
- model_size: '405b'
dtypes:
fp8: [128, 256, 512] # Short form
bf16: # Long form with exact_scales
scales: [256, 512]
exact_scales: trueFields:
model_size(string, required): Model size identifier (e.g.,'7b','405b')dtypes(required): Precision types to test. Can be:- Single dtype:
'fp8' - List:
['fp8', 'bf16'] - Mapping:
fp8: [128, 256]orfp8: {scales: [128, 256], exact_scales: true}
- Single dtype:
scales(list, optional): GPU counts to test (legacy, used when dtypes is not a mapping)exact_scales(bool, optional): Iffalse(default), scales are extended to max with power-of-2 values
Supported dtypes: fp8, bf16, nvfp4
Many sections support GPU-specific overrides using the by_gpu pattern.
section_name:
by_gpu:
h100: <value_for_h100>
b200: <value_for_b200>
gb200: <value_for_gb200>
gb300: <value_for_gb300>
default: <fallback_value> # Optional- Check if GPU type explicitly listed → use that value
- Else if
defaultkey exists → use default value - Else → use top-level value or system default
container.images: Different containers per GPUrepositories: Different repository versions per GPUtools: Different tool versions per GPU
Here's a complete metadata.yaml example:
general:
workload: nemotron4
workload_type: pretrain
framework: nemo2
container:
images:
- 'nvcr.io#nvidia/nemo:25.07.01'
repositories:
nemo:
url: "https://github.com/NVIDIA/NeMo.git"
commit: "763ffa8b00a2fca9f7a204e14111ed190de7d947"
megatron_core:
url: "https://github.com/NVIDIA/Megatron-LM.git"
commit: "ac198fc0d60a8c748597e01ca4c6887d3a7bcf3d"
nemo_run:
url: "https://github.com/NVIDIA/NeMo-Run.git"
commit: "04f900a9c1cde79ce6beca6a175b4c62b99d7982"
downloads:
huggingface:
- repo_id: 'nvidia/Nemotron-4-340B-Base'
assets: [tokenizer]
tools:
nsys:
by_gpu:
h100: "2025.5.1.121-3638078"
gb200: "2025.5.1.121-3638078"
default: "2025.4.1.172-3634357"
setup:
venv_req: true
dependencies:
pip:
- package: nemo
repo_key: nemo
install_target: '.[nlp]'
- 'scipy<1.13.0'
- 'bitsandbytes==0.46.0'
- package: megatron-core
repo_key: megatron_core
- package: nemo_run
repo_key: nemo_run
run:
launcher_type: 'nemo'
launch_script: 'launch.sh'
gpu_configs:
h100:
model_configs:
- model_size: '15b'
dtypes: ['fp8', 'bf16']
scales: [16, 32, 64, 128, 256, 512, 1024, 2048]
- model_size: '340b'
dtypes: ['fp8', 'bf16']
scales: [256, 512, 1024, 2048]
b200:
model_configs:
- model_size: '15b'
dtypes: ['fp8', 'bf16']
scales: [16, 32, 64, 128, 256, 512, 1024]
- model_size: '340b'
dtypes: ['fp8', 'bf16']
scales: [128, 256, 512, 1024]Validate your metadata file:
python -m yamale -s .gitlab/ci/metadata_schema.yaml <workload>/metadata.yamlThe schema validates:
- Required vs optional fields
- Field types (string, int, bool, list, etc.)
- Enum values (GPU types, dtypes, launcher types)
- Format requirements (commit SHA length, version patterns)
Only use by_gpu when configurations truly differ by GPU type. Simple deployments should use the same config across GPUs.
# Good
dependencies:
pip:
- 'scipy==1.12.0'
- 'numpy>=1.24,<2.0'
# Avoid
dependencies:
pip:
- 'scipy' # No version = unpredictable behaviorAlways use full 40-character SHA hashes for repository commits:
repositories:
nemo:
url: "https://github.com/NVIDIA/NeMo.git"
commit: "763ffa8b00a2fca9f7a204e14111ed190de7d947" # Good
# commit: "763ffa8" # BAD: short hash will fail validationInclude comments explaining why certain scales are chosen:
scales: [128, 256, 512, 1024] # Tested scales for memory-optimal configs
exact_scales: true # Don't extend - these are the only supported scalesAlways validate and test install after modifying metadata:
# Validate schema
python -m yamale -s .gitlab/ci/metadata_schema.yaml workload/metadata.yaml
# Test installation
llmb-install express /tmp/test-install --workloads your_workloadrun:
gpu_configs:
h100:
model_configs:
- model_size: '7b'
dtypes: ['fp8', 'bf16']
scales: [8, 16, 32]
- model_size: '70b'
dtypes: ['fp8', 'bf16']
scales: [64, 128, 256]
- model_size: '405b'
dtypes: ['fp8']
scales: [512, 1024, 2048]setup:
venv_req: true
tasks:
- name: "Download model weights"
cmd: "python download_weights.py --model $MODEL_NAME"
job_type: local
requires_gpus: false
env:
MODEL_NAME: "llama-3.1-405b"
HF_TOKEN: "$HF_TOKEN" # References environment variable
dependencies:
pip:
- 'transformers>=4.35'
- 'accelerate>=0.24'container:
images:
by_gpu:
h100: ['nvcr.io#nvidia/nemo:25.01']
gb200: ['nvcr.io#nvidia/nemo:25.05-gb']
default: ['nvcr.io#nvidia/nemo:25.07.01']
tools:
nsys:
by_gpu:
gb200: "2025.6.0.125-3638078" # GB200 needs newer nsys
# Other GPUs use container nsysError: workload_type: 'training' is not valid under any of the given enum values
Solution: Use valid enum values. Check the schema for allowed values:
- workload_type:
pretrain,inference,finetune - GPU types:
h100,b200,gb200,gb300 - dtypes:
fp8,bf16,nvfp4
Error: commit: '763ffa8' is not valid - must be 40 characters
Solution: Use full commit hash:
# Get full hash
git rev-parse HEAD
# Or from GitHub: click commit, copy full SHA from URL or UIError: ModuleNotFoundError: No module named 'megatron'
Solution: Ensure package is in dependencies and repo_key references correct repository:
repositories:
megatron_core:
url: "https://github.com/NVIDIA/Megatron-LM.git"
commit: "..."
setup:
dependencies:
pip:
- package: megatron-core
repo_key: megatron_core # Must match repository key- Tools Configuration Guide: Detailed tool version configuration
- Main README: Installation and usage guide
- Headless Installation: Automated deployment guide
The complete schema is defined in .gitlab/ci/metadata_schema.yaml. Key enums and types:
- GPU Types:
h100,b200,gb200,gb300,default(for by_gpu only) - Workload Types:
pretrain,inference,finetune,tools - Dtypes:
fp8,bf16,nvfp4 - Launcher Types:
nemo,megatron_bridge,sbatch - Job Types:
local,nemo2,srun,sbatch
- commit: Full 40-character SHA hash
- image URLs: Use
#instead of/(e.g.,nvcr.io#nvidia/nemo:25.07.01)