Skip to content

cuSolver internal error when running simulations #53

@Zolkin1

Description

@Zolkin1

Occasionally I get this error:

Traceback (most recent call last):
  File "/hydrax/examples/pusht.py", line 34, in <module>
    run_interactive(
  File "/hydrax/hydrax/simulation/deterministic.py", line 177, in run_interactive
    policy_params, rollouts = jit_optimize(mjx_data, policy_params)
                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/miniconda3/envs/hydrax/lib/python3.12/site-packages/jax/_src/traceback_util.py", line 180, in reraise_with_filtered_traceback
    return fun(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^
  File "/miniconda3/envs/hydrax/lib/python3.12/site-packages/jax/_src/pjit.py", line 339, in cache_miss
    pgle_profiler) = _python_pjit_helper(fun, jit_info, *args, **kwargs)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/miniconda3/envs/hydrax/lib/python3.12/site-packages/jax/_src/pjit.py", line 194, in _python_pjit_helper
    out_flat, compiled, profiler = _pjit_call_impl_python(*args_flat, **p.params)
                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/miniconda3/envs/hydrax/lib/python3.12/site-packages/jax/_src/pjit.py", line 1659, in _pjit_call_impl_python
    ).compile()
      ^^^^^^^^^
  File "/miniconda3/envs/hydrax/lib/python3.12/site-packages/jax/_src/interpreters/pxla.py", line 2448, in compile
    executable = UnloadedMeshExecutable.from_hlo(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/miniconda3/envs/hydrax/lib/python3.12/site-packages/jax/_src/interpreters/pxla.py", line 2967, in from_hlo
    xla_executable = _cached_compilation(
                     ^^^^^^^^^^^^^^^^^^^^
  File "/miniconda3/envs/hydrax/lib/python3.12/site-packages/jax/_src/interpreters/pxla.py", line 2758, in _cached_compilation
    xla_executable = compiler.compile_or_get_cached(
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/miniconda3/envs/hydrax/lib/python3.12/site-packages/jax/_src/compiler.py", line 470, in compile_or_get_cached
    return _compile_and_write_cache(
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/miniconda3/envs/hydrax/lib/python3.12/site-packages/jax/_src/compiler.py", line 687, in _compile_and_write_cache
    executable = backend_compile(
                 ^^^^^^^^^^^^^^^^
  File "/miniconda3/envs/hydrax/lib/python3.12/site-packages/jax/_src/profiler.py", line 334, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/miniconda3/envs/hydrax/lib/python3.12/site-packages/jax/_src/compiler.py", line 327, in backend_compile
    raise e
  File "/miniconda3/envs/hydrax/lib/python3.12/site-packages/jax/_src/compiler.py", line 321, in backend_compile
    return backend.compile(built_c, compile_options=options)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
jaxlib.xla_extension.XlaRuntimeError: INTERNAL: cuSolver internal error
Xlib:  extension "NV-GLX" missing on display ":1".

(this is run with JAX_TRACEBACK_FILTERING=off)

I believe others have got this error too. It seems to be very hit or miss - sometimes it happens a few times in a row and sometimes not at all.

Not sure if anyone has any insight here, but it can be annoying. I might be able to put a few cycles into chasing it down, but not sure.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions