Fix ROCm whisper-server startup: add TheRock lib dir to LD_LIBRARY_PATH#2293
Fix ROCm whisper-server startup: add TheRock lib dir to LD_LIBRARY_PATH#2293matthewjhunter wants to merge 2 commits into
Conversation
WhisperServer was the only ROCm backend launcher not prepending the TheRock ROCm library directory to LD_LIBRARY_PATH, so the rocm whisper-server aborted at startup (exit 134) failing to dlopen libamd_comgr.so.3. This mirrors the existing logic in sd_server.cpp and llamacpp_server.cpp, gated on the rocm backend. The gate uses the raw whispercpp_backend == "rocm" option value, not the resolved_backend == "rocm-stable" form used by llamacpp/sd_server -- whisper has no resolve step and uses the literal "rocm" string throughout. Fixes lemonade-sdk#2292.
fl0rianr
left a comment
There was a problem hiding this comment.
LGTM from code review, Thanks! This patch is narrowly scoped to ROCm whisper-server startup and mirrors the existing TheRock LD_LIBRARY_PATH handling used by the other ROCm backends.
The current CI failure IS unrelated to this PR, but sadly prevents me from merging until we have that fixed...
|
Thanks for the fix, are we missing any tests on lemond to catch this is CI ? |
|
Sadly my fix did not turn out as successful as I intended it to be. CI is still to brittle and fails in some cases with "no reason". |
|
The cause is more testable than the symptom. The ROCm I'd suggest:
That makes the regression catchable on any CI box and removes the copy-paste that caused it in the first place. What would catch this exactly is launching the rocm Caveat: I've only glanced at this code, so weigh the specifics accordingly -- the duplication and the refactor are clear-cut, but whoever owns the backends should sanity-check the helper's signature against the other call sites before collapsing them. Happy to put the helper + test up as a separate follow-up PR if you'd like -- I'd keep it out of this one so the fix stays a clean cherry-pick. |
|
@matthewjhunter thanks for looking at it. I would really say it's unrelated since we have this issue with multiple PRs including this tiny one which does not affect CI nor source code #2273. |
Fixes #2292.
WhisperServerwas the only ROCm backend launcher not prepending the TheRock ROCm library directory toLD_LIBRARY_PATH, so the rocmwhisper-serveraborted at startup (exit 134) failing todlopen libamd_comgr.so.3. This mirrors the existing logic insd_server.cppandllamacpp_server.cpp, gated on the rocm backend.Note: the gate uses the raw
whispercpp_backend == "rocm"option value, not theresolved_backend == "rocm-stable"form used by llamacpp/sd_server. This is intentional -- whisper has no resolve step and uses the literal"rocm"string throughout (see the download path inget_install_params).Testing
Built
lemondfrom the v10.8.0 tag (the patched file is byte-identical onmain) inbuild-environment:ubuntu24.04, then ran it on a gfx1151 (Radeon 8060S, Strix Halo) host withwhispercpp.backend=rocm:whisper-servernow boots:(WhisperServer) Using backend: rocm->whisper-server is ready!POST /api/v1/load {"model_name":"Whisper-Base"}->status: successPOST /api/v1/audio/transcriptions-> returns transcribed textCausation control on the same bundled rocm
whisper-serverbinary:LD_LIBRARY_PATH(pre-patch):implib-gen: libamd_comgr.so.3: failed to load library ... Aborted (core dumped)found 1 ROCm devices ... Radeon 8060S Graphics, gfx1151, then runs.