Skip to content

Issac Sim (Omniverse) Crashes with HAMi scheduler #935

Open
@davidliyutong

Description

@davidliyutong

When using HAMi scheduler on a K3S cluster, Isaac Sim crashes. The nvidia-device-plugin works fine though.

I created a container with TurboVNC, VirtualGL and installed Isaac Sim to perform simulation tasks. However when launching the program it crashes with this message:

2025-03-11 19:34:00 [1,767ms] [Fatal] [carb.crashreporter-breakpad.plugin] [crash] Thread 3411 backtrace follows:
2025-03-11 19:34:00 [1,800ms] [Fatal] [carb.crashreporter-breakpad.plugin] 000: libpthread.so.0!funlockfile+0x60 (??:?)
2025-03-11 19:34:01 [1,832ms] [Fatal] [carb.crashreporter-breakpad.plugin] 001: libvgpu.so!hacked_cuMemGetInfo_V2+0x4f2 (/libv gpu/src/cuda/memory.c:512 (discriminator 4))
2025-03-11 19:34:01 [1,837ms] [Fatal] [carb.crashreporter-breakpad.plugin] 002: libnvoptix.so.1!rtGetSymbolTable+Oxc5d71 (??:?
2025-03-11 19:34:01 [1,842ms] [Fatal] [carb.crashreporter-breakpad.plugin] 003: libnvoptix.so.1!rtGetSymbolTable+0xc9a99 (??:?
)
2025-03-11 19:34:01 [1,854ms] [Fatal] [carb.crashreporter-breakpad.plugin] 004: librtx.optidenoising.plugin.so!std::string::f ind (char, unsigned long) const+0x30b4 (??:?)
2025-03-11 19:34:01 [1,864ms] [Fatal] [carb.crashreporter-breakpad.plugin] 005: librtx.optidenoising.plugin.so!_init+0x19a3
??: 0)
2025-03-11 19:34:01 [1,877ms] [Fatal] [carb.crashreporter-breakpad.plugin] 006: librtx.optidenoising.plugin.so!_init+0x335e
??: 0)
2025-03-11 19:34:01 [1,903ms] [Fatal] [carb.crashreporter-breakpad.plugin] 007: libcarb.scenerenderer-rtx.plugin.so!std::strin
g:: replace (unsigned long, unsigned long, char const*, unsigned long)+0×312be (??:?)
2025-03-11 19:34:01 [1,913ms] [Fatal] [carb.crashreporter-breakpad.plugin] 008: libcarb.scenerenderer-rtx.plugin.so!std::strin
g:: replace (unsigned long, unsigned long, char const*, unsigned long)+0×3ddd8 (??:?)
2025-03-11 19:34:01 [1,925ms] [Fatal] [carb.crashreporter-breakpad.plugin] 009: libcarb.scenerenderer-rtx.plugin.so!carbOnPlug inShutdown+0×2529 (??:?)
2025-03-11 19:34:01 (1,938ms) (Fatal) Icarb.crashreporter-breakpad.plugin) 010: libomni.hydra.rtx.plugin.so!carbonPluginPostSh
utdown+Oxe3e (??:?)
2025-03-11 19:34:01 [1,958ms] [Fatal] [carb.crashreporter-breakpad.plugin] 011: libomni.usd.so!unsigned long& std::vector<unsi gned long, std: :allocator<unsigned long> >: :emplace_ back<unsigned long>(unsigned long&&) +0x2337 (??:?)
2025-03-11 19:34:01 [1,970ms] [Fatal] [carb.crashreporter-breakpad.plugin] 012: libomni.usd.so!unsigned long& std:: vector<unsi gned long, std::allocator<unsigned long> ›::emplace back<unsigned long>(unsigned long&&) +0x2bb2 (??:?)
2025-03-11 19:34:01 [1,980ms] [Fatal] [carb.crashreporter-breakpad.plugin] 013: libomni.usd.so!unsigned long& std: :vector<unsi gned long, std: :allocator<unsigned long> >:: emplace_back<unsigned long> (unsigned long&&) +0x2c86 (??:?)
2025-03-11 19:34:01 [2,000ms] [Fatal] [carb.crashreporter-breakpad.plugin] 014: libcarb.tasking.plugin.so!char* std::string::
_construct<char*>(char*, char*, std: :allocator<char> const&, std: :forward_iterator_tag)+0×453b (??:?)

To reproduce the problem, you need an ubuntu:20.04 Pod with TurboVNC, VirtualGL and xfce4 installed. The vncserver is launched with vncserver :1 -geometry 1280x800 -depth 24 -SecurityTypes None so it will be accessible at port 5901. The port is exposed via a NodePort service. Connect to the container via VNC and install the latest Issac Sim Full (4.5.0) with pip.

Note that:

  • OpenGL hardware acceleration works and glxgears could run at 1400+ fps.
  • nvidia-smi inside the container works.
  • I have no problem running other workloads (e.g. PyTorch) with HAMi,
  • I could launch Issac Sim with nvidia-device-plugin installed and HAMi removed.

Other information:

  • HAMI version is 2.5.0.
  • The host is a single-node K3S cluster (v1.32.2) with 8x 4090 GPUs (535.230.02).
  • Host OS is Ubuntu Server 24.04, kernel version is 5.15.0-134.
  • The Pod is successfully scheduled so no errors in hami-device-plugin and hami-scheduler outputs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions