Skip to content

Getting started on DGX Spark #942

@michaelfuckner

Description

@michaelfuckner

Hi,

I have a DGX Spark running DGX OS 7.5.0 (which basically is Ubuntu 24.04 LTS) and I don't know how to get started. I created a new user, added him to sudo and docker, created a venv, installed mlc scripts via pip and then...? detect os looks fine, but I can't get any benchmark from the docs running.

(mlc) molli123@spark-7994:~$ mlcr detect,os -j
[2026-05-07 10:18:33,480 script_utils.py:  88 INFO ] - * mlcr detect,os
[2026-05-07 10:18:33,484 module.py      :1037 INFO ] -      ! load /home/molli123/MLC/repos/local/cache/detect-os_aac747bd/mlc-cached-state.json
[2026-05-07 10:18:33,485 module.py      :1901 INFO ] - {
  "return": 0,
  "env": {
    "MLC_USER_RUN_DIR": "/home/molli123",
    "MLC_HOST_OS_TYPE": "linux",
    "MLC_HOST_OS_BITS": "64",
    "MLC_HOST_OS_FLAVOR": "ubuntu",
    "MLC_HOST_OS_FLAVOR_LIKE": "debian",
    "MLC_HOST_OS_VERSION": "24.04",
    "MLC_HOST_OS_KERNEL_VERSION": "6.17.0-1014-nvidia",
    "MLC_HOST_OS_GLIBC_VERSION": "2.39",
    "MLC_HOST_OS_MACHINE": "aarch64",
    "MLC_HOST_OS_PACKAGE_MANAGER": "apt",
    "MLC_HOST_OS_PACKAGE_MANAGER_INSTALL_CMD": "DEBIAN_FRONTEND=noninteractive apt-get install -y",
    "MLC_HOST_OS_PACKAGE_MANAGER_UPDATE_CMD": "apt-get update -y",
    "+MLC_HOST_OS_DEFAULT_LIBRARY_PATH": [
      "/usr/local/lib/aarch64-linux-gnu",
      "/lib/aarch64-linux-gnu",
      "/usr/lib/aarch64-linux-gnu",
      "/usr/local/lib",
      "/lib",
      "/usr/lib",
      "/usr/aarch64-linux-gnu/lib"
    ],
    "MLC_HOST_PLATFORM_FLAVOR": "aarch64",
    "MLC_HOST_PYTHON_BITS": "64",
    "MLC_HOST_SYSTEM_NAME": "spark-7994",
    "MLC_HOST_FILESYSTEMS": "ext4 vfat"
  },
  "new_env": {
    "MLC_HOST_OS_TYPE": "linux",
    "MLC_HOST_OS_BITS": "64",
    "MLC_HOST_OS_FLAVOR": "ubuntu",
    "MLC_HOST_OS_FLAVOR_LIKE": "debian",
    "MLC_HOST_OS_VERSION": "24.04",
    "MLC_HOST_OS_KERNEL_VERSION": "6.17.0-1014-nvidia",
    "MLC_HOST_OS_GLIBC_VERSION": "2.39",
    "MLC_HOST_OS_MACHINE": "aarch64",
    "MLC_HOST_OS_PACKAGE_MANAGER": "apt",
    "MLC_HOST_OS_PACKAGE_MANAGER_INSTALL_CMD": "DEBIAN_FRONTEND=noninteractive apt-get install -y",
    "MLC_HOST_OS_PACKAGE_MANAGER_UPDATE_CMD": "apt-get update -y",
    "+MLC_HOST_OS_DEFAULT_LIBRARY_PATH": [
      "/usr/local/lib/aarch64-linux-gnu",
      "/lib/aarch64-linux-gnu",
      "/usr/lib/aarch64-linux-gnu",
      "/usr/local/lib",
      "/lib",
      "/usr/lib",
      "/usr/aarch64-linux-gnu/lib"
    ],
    "MLC_HOST_PLATFORM_FLAVOR": "aarch64",
    "MLC_HOST_PYTHON_BITS": "64",
    "MLC_HOST_SYSTEM_NAME": "spark-7994",
    "MLC_HOST_FILESYSTEMS": "ext4 vfat"
  },
  "state": {
    "os_uname_machine": "aarch64",
    "os_uname_all": "Linux spark-7994 6.17.0-1014-nvidia #14-Ubuntu SMP PREEMPT_DYNAMIC Tue Mar 17 19:01:40 UTC 2026 aarch64 aarch64 aarch64 GNU/Linux"
  },
  "new_state": {
    "os_uname_machine": "aarch64",
    "os_uname_all": "Linux spark-7994 6.17.0-1014-nvidia #14-Ubuntu SMP PREEMPT_DYNAMIC Tue Mar 17 19:01:40 UTC 2026 aarch64 aarch64 aarch64 GNU/Linux"
  },
  "deps": []
}
(mlc) molli123@spark-7994:~$

I believe this looks fine, but the Benchmark does not run:

(mlc) molli123@spark-7994:~$ mlcr run-mlperf,inference,_full,_r5.1-dev,_performance-only    --model=resnet50    --implementation=nvidia    --framework=tensorrt    --category=datacenter    --scenario=Offline    --execution_mode=valid    --device=cuda    --quiet
[2026-05-07 10:42:29,705 script_utils.py:  88 INFO ] - * mlcr run-mlperf,inference,_full,_r5.1-dev,_performance-only
[2026-05-07 10:42:29,706 script_utils.py:  88 INFO ] -   * mlcr detect,os
[2026-05-07 10:42:29,710 module.py      :1037 INFO ] -        ! load /home/molli123/MLC/repos/local/cache/detect-os_aac747bd/mlc-cached-state.json
[2026-05-07 10:42:29,711 script_utils.py:  88 INFO ] -   * mlcr detect,cpu
[2026-05-07 10:42:29,712 module.py      :1037 INFO ] -        ! load /home/molli123/MLC/repos/local/cache/detect-cpu_ccbf2776/mlc-cached-state.json
[2026-05-07 10:42:29,714 script_utils.py:  88 INFO ] -   * mlcr get,python3
[2026-05-07 10:42:29,715 module.py      :1037 INFO ] -        ! load /home/molli123/MLC/repos/local/cache/get-python3_09388c6a/mlc-cached-state.json
[2026-05-07 10:42:29,719 script_utils.py:  88 INFO ] -   * mlcr get,mlcommons,inference,src
[2026-05-07 10:42:29,720 module.py      :1037 INFO ] -        ! load /home/molli123/MLC/repos/local/cache/get-mlperf-inference-src_49bff69c/mlc-cached-state.json
[2026-05-07 10:42:29,722 script_utils.py:  88 INFO ] -   * mlcr get,sut,description
[2026-05-07 10:42:29,723 script_utils.py:  88 INFO ] -     * mlcr detect,os
[2026-05-07 10:42:29,724 module.py      :1037 INFO ] -          ! load /home/molli123/MLC/repos/local/cache/detect-os_aac747bd/mlc-cached-state.json
[2026-05-07 10:42:29,725 script_utils.py:  88 INFO ] -     * mlcr detect,cpu
[2026-05-07 10:42:29,726 module.py      :1037 INFO ] -          ! load /home/molli123/MLC/repos/local/cache/detect-cpu_ccbf2776/mlc-cached-state.json
[2026-05-07 10:42:29,727 script_utils.py:  88 INFO ] -     * mlcr get,python3
[2026-05-07 10:42:29,728 module.py      :1037 INFO ] -          ! load /home/molli123/MLC/repos/local/cache/get-python3_09388c6a/mlc-cached-state.json
[2026-05-07 10:42:29,736 script_utils.py:  88 INFO ] -     * mlcr get,compiler
[2026-05-07 10:42:29,737 module.py      :1037 INFO ] -          ! load /home/molli123/MLC/repos/local/cache/get-llvm_1df55db8/mlc-cached-state.json
[2026-05-07 10:42:29,737 script_utils.py:  88 INFO ] -     * mlcr get,cuda-devices
[2026-05-07 10:42:29,737 main.py        : 134 ERROR] - Error during 'script' action: Script run execution failed in /home/molli123/MLC/repos/mlcommons@mlperf-automations/automation/script/module.py.
Error : no scripts were found with tags: ['get', 'cuda-devices'] (when variations ignored)
[2026-05-07 10:42:29,738 main.py        : 148 ERROR] -   at /home/molli123/mlc/lib/python3.12/site-packages/mlc/script_action.py:308 in call_script_module_function
[2026-05-07 10:42:29,738 main.py        : 178 ERROR] - Failed script: run-mlperf,inference,_full,_r5.1-dev,_performance-only
[2026-05-07 10:42:29,738 main.py        : 179 ERROR] - To rerun just the failed part: mlcr run-mlperf,inference,_full,_r5.1-dev,_performance-only --model=resnet50 --implementation=nvidia --framework=tensorrt --category=datacenter --scenario=Offline --execution_mode=valid --device=cuda
[2026-05-07 10:42:29,738 main.py        : 188 ERROR] - Please file an issue at https://github.com/mlcommons/mlperf-automations/issues with the full console log.
[2026-05-07 10:42:29,738 main.py        : 191 ERROR] - mlcflow 1.2.1
[2026-05-07 10:42:29,743 main.py        : 196 ERROR] -   mlcommons@mlperf-automations: dev 97b03ddb5
(mlc) molli123@spark-7994:~$

whatever Benchmark I try to run, It fails. Any idea how to get started?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions