Skip to content

Fix ROCm whisper-server startup: add TheRock lib dir to LD_LIBRARY_PATH#2293

Open
matthewjhunter wants to merge 2 commits into
lemonade-sdk:mainfrom
matthewjhunter:fix/whisper-rocm-therock-libpath
Open

Fix ROCm whisper-server startup: add TheRock lib dir to LD_LIBRARY_PATH#2293
matthewjhunter wants to merge 2 commits into
lemonade-sdk:mainfrom
matthewjhunter:fix/whisper-rocm-therock-libpath

Conversation

@matthewjhunter

Copy link
Copy Markdown

Fixes #2292.

WhisperServer was the only ROCm backend launcher not prepending the TheRock ROCm library directory to LD_LIBRARY_PATH, so the rocm whisper-server aborted at startup (exit 134) failing to dlopen libamd_comgr.so.3. This mirrors the existing logic in sd_server.cpp and llamacpp_server.cpp, gated on the rocm backend.

// src/cpp/server/backends/whisper_server.cpp, in WhisperServer::load (#ifndef _WIN32 block)
    std::string lib_path = exe_dir.string();

    // ROCm whisper-server needs the TheRock ROCm libs (libamd_comgr.so.3, etc.)
    // on LD_LIBRARY_PATH, exactly as llamacpp_server.cpp and sd_server.cpp do.
    // Without this it aborts at startup (dlopen libamd_comgr.so.3) on gfx1151.
    if (whispercpp_backend == "rocm") {
        std::string rocm_arch = SystemInfo::get_rocm_arch();
        if (!rocm_arch.empty()) {
            std::string therock_lib = BackendUtils::get_therock_lib_path(rocm_arch);
            if (!therock_lib.empty()) {
                lib_path = therock_lib + ":" + lib_path;
            }
        }
    }

Note: the gate uses the raw whispercpp_backend == "rocm" option value, not the resolved_backend == "rocm-stable" form used by llamacpp/sd_server. This is intentional -- whisper has no resolve step and uses the literal "rocm" string throughout (see the download path in get_install_params).

Testing

Built lemond from the v10.8.0 tag (the patched file is byte-identical on main) in build-environment:ubuntu24.04, then ran it on a gfx1151 (Radeon 8060S, Strix Halo) host with whispercpp.backend=rocm:

  • The rocm whisper-server now boots: (WhisperServer) Using backend: rocm -> whisper-server is ready!
  • POST /api/v1/load {"model_name":"Whisper-Base"} -> status: success
  • POST /api/v1/audio/transcriptions -> returns transcribed text

Causation control on the same bundled rocm whisper-server binary:

  • exe-dir-only LD_LIBRARY_PATH (pre-patch): implib-gen: libamd_comgr.so.3: failed to load library ... Aborted (core dumped)
  • TheRock dir prepended (this patch): found 1 ROCm devices ... Radeon 8060S Graphics, gfx1151, then runs.

WhisperServer was the only ROCm backend launcher not prepending the TheRock
ROCm library directory to LD_LIBRARY_PATH, so the rocm whisper-server aborted
at startup (exit 134) failing to dlopen libamd_comgr.so.3. This mirrors the
existing logic in sd_server.cpp and llamacpp_server.cpp, gated on the rocm
backend.

The gate uses the raw whispercpp_backend == "rocm" option value, not the
resolved_backend == "rocm-stable" form used by llamacpp/sd_server -- whisper
has no resolve step and uses the literal "rocm" string throughout.

Fixes lemonade-sdk#2292.
@github-actions github-actions Bot added engine::whispercpp whisper.cpp backend; audio transcription runtime::rocm AMD ROCm runtime bug Something isn't working labels Jun 17, 2026

@fl0rianr fl0rianr left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM from code review, Thanks! This patch is narrowly scoped to ROCm whisper-server startup and mirrors the existing TheRock LD_LIBRARY_PATH handling used by the other ROCm backends.

The current CI failure IS unrelated to this PR, but sadly prevents me from merging until we have that fixed...

@fl0rianr fl0rianr enabled auto-merge June 18, 2026 14:28
@iswaryaalex

Copy link
Copy Markdown
Contributor

Thanks for the fix, are we missing any tests on lemond to catch this is CI ?

@fl0rianr

Copy link
Copy Markdown
Collaborator

Sadly my fix did not turn out as successful as I intended it to be. CI is still to brittle and fails in some cases with "no reason".

@matthewjhunter

Copy link
Copy Markdown
Author

The cause is more testable than the symptom. The ROCm LD_LIBRARY_PATH setup is copy-pasted inline into llamacpp_server.cpp, sd_server.cpp, and whisper_server.cpp. Two of them prepend the TheRock lib dir; whisper had drifted and didn't. Nothing structurally ties the three together, so the next backend can make the same omission.

I'd suggest:

  1. Factor the duplicated block into one pure helper in backend_utils -- something like build_rocm_ld_library_path(exe_dir, prepend_therock, therock_lib, existing_ld_path). The gate decision stays in the caller and is passed in as the bool, because the backends don't agree on how they express it: llamacpp/sd key off resolved_backend == "rocm-stable", but whisper has no resolve step at all, so it keys off the raw whispercpp_backend == "rocm" option (the same literal its download path in get_install_params uses). The helper shouldn't assume a resolved_backend exists.
  2. Add a test/cpp/ unit test for that helper in the existing assert style -- no GPU needed. Cases: with a non-empty therock lib the result starts with <therock>:; with the gate off it doesn't contain the therock path; empty therock lib falls back to exe-dir only with no stray colon; an existing LD_LIBRARY_PATH is appended in order. The pre-patch whisper behavior fails that first assertion.

That makes the regression catchable on any CI box and removes the copy-paste that caused it in the first place.

What would catch this exactly is launching the rocm whisper-server and hitting /api/v1/load on a gfx-class GPU, but that's the same coverage gap every backend has today: CI builds the ROCm binaries but never starts them, and there's no ROCm runner in the matrix. Adding a hardware smoke test just for whisper would be the flakiest, most expensive option, so I don't think that's the right call on its own.

Caveat: I've only glanced at this code, so weigh the specifics accordingly -- the duplication and the refactor are clear-cut, but whoever owns the backends should sanity-check the helper's signature against the other call sites before collapsing them.

Happy to put the helper + test up as a separate follow-up PR if you'd like -- I'd keep it out of this one so the fix stays a clean cherry-pick.

@fl0rianr

Copy link
Copy Markdown
Collaborator

@matthewjhunter thanks for looking at it. I would really say it's unrelated since we have this issue with multiple PRs including this tiny one which does not affect CI nor source code #2273.
No action on your side needed, I'm working an another fix trial.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working engine::whispercpp whisper.cpp backend; audio transcription runtime::rocm AMD ROCm runtime

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ROCm whisper-server aborts at startup (exit 134, libamd_comgr.so.3): WhisperServer omits TheRock lib dir from LD_LIBRARY_PATH

3 participants