-
Notifications
You must be signed in to change notification settings - Fork 22
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Describe the bug
I'm trying to install microbenchmarks via headless mode but they don't seem to be recognized. Because of #26, I'm using v25.10 for now as it can install other workloads but it fails to do so in for microbenchmarks:
=== HEADLESS INSTALLATION MODE ===
✓ Configuration loaded from: /cm-tests/.../dgx-automation/config/dgx-headless-config-gb300.yaml
Environment type: uv
Install path: /cm-tests/dgxc-benchmarking/gb300/workloads
GPU type: gb300
Node architecture: aarch64
Install method: local
Selected workloads: pretrain_nemotron4-15b, pretrain_nemotron4-340b, pretrain_llama3.1, pretrain_deepseek-v3, pretrain_grok1, pretrain_nemotron-h, microbenchmark_cpu_overhead, microbenchmark_nccl
Development mode: Using repository at /cm-tests/dgxc-benchmarking/gb300
Error: Selected workloads not found: ['microbenchmark_cpu_overhead', 'microbenchmark_nccl']
Custom script failed for run gb300, version v25.10
Preparation step failed with code 1.
I tried it with different names like nccl and microbenchmark-nccl but none of them worked.
Steps/Code to reproduce bug
Here is my headless play file:
venv_type: uv
install_path: /cm-tests/dgxc-benchmarking/gb300/workloads
slurm_info:
slurm:
account: root
gpu_partition: main
cpu_partition: main
gpu_partition_gres: 8
cpu_partition_gres: null
node_architecture: aarch64
gpu_type: gb300
node_architecture: aarch64
install_method: local
selected_workloads:
- pretrain_nemotron4-15b
- pretrain_nemotron4-340b
- pretrain_llama3.1
- pretrain_deepseek-v3
- pretrain_grok1
- pretrain_nemotron-h
- microbenchmark_cpu_overhead
- microbenchmark_nccl
env_vars:
HF_TOKEN: hf_
And I use this command to run it: ./install.sh --play config.yaml -v -d.
Expected behavior
The installation will succeed, and in case there are errors the issue will be clearly indicated.
Environment details (please complete the following information):
Environment location: Cloud(Nebius)
Method of DGXC Benchmarking install: From source with UV
Run print_env.sh from the project root and paste the results here
By submitting this issue, you agree to follow our code of conduct.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request