fix Dockerfile.dev breakage for llmd_fs_backend by saikat-royc · Pull Request #620 · llm-d/llm-d-kv-cache

saikat-royc · 2026-05-29T17:26:25Z

Summary

This PR addresses and fixes two critical build-system issues in Dockerfile.dev that were causing C++/CUDA source compilation failures during docker builds:

CUDA Version Mismatch ( RuntimeError ): Resolves the compiler mismatch error by synchronizing package defaults to CUDA 13.0.
Missing Development Headers: Resolves header lookup failures by dynamically updating key system symlinks during container package updates.

Fixes:

Out-of-Sync Package Defaults (CUDA Version Mismatch)
Problem: The default base image argument VLLM_IMAGE was set to vllm/vllm-openai:v0.21.0 (which packages PyTorch built with CUDA 13.0). However, the default developer toolkit argument CUDA_TOOLKIT_PKG was hardcoded to cuda-toolkit-12-9 (CUDA 12.9). This mismatch triggered a hard PyTorch safety check failure during the source compilation step:

Failure log:

    RuntimeError: ('The detected CUDA version (%s) mismatches the version that was used to compilePyTorch (%s). Please make sure to use the same CUDA versions.', '12.9', '13.0')

Fix: Updated the default build argument CUDA_TOOLKIT_PKG to cuda-toolkit-13-0 in Dockerfile.dev to ensure standard local builds compile in perfect version parity with the default base

Multi-CUDA Symlink Conflict (Missing cusparse.h / Dev Headers)
Problem: The base deployment image packages a minimal, runtime-only CUDA setup (under /usr/local/cuda-13.0/ ) and symlinks /usr/local/cuda to it. This folder lacks all developer headers. When
installing our target developer package ( cuda-toolkit-13-0 ), the tools are correctly installed into a separate folder ( /usr/local/cuda-13.0/ ), but the existing symlink /usr/local/cuda is not updated (it still points to the runtime-only environment). During the compilation phase, PyTorch's builder searches for system headers in the standard symlink path: -I/usr/local/cuda/include/ , which points to the headerless directory, causing the build to

Failure Log Snippet:

    In file included from /usr/local/lib/python3.12/dist-packages/torch/include/ATen/cuda/CUDAContext.h:4,
                     from /workspace/llmd_fs_backend/kv_connectors/llmd_fs_backend/csrc/storage/tensor_copier.cu:17:
    /usr/local/lib/python3.12/dist-packages/torch/include/ATen/cuda/CUDAContextLight.h:10:10: fatal error: cusparse.h: No such file or directory
       10 | #include <cusparse.h>
          |          ^~~~~~~~~~~~
    compilation terminated.

Fix: Implemented a robust symlink resolution phase in the dependencies RUN block:

Parses the exact versioned folder dynamically from the target ${CUDA_TOOLKIT_PKG} argument (e.g., cuda-toolkit-13-0 $\rightarrow$ /usr/local/cuda-13.0 ).
Forces the standard /usr/local/cuda symlink to point to the newly installed developer path containing the standard headers.

Verification of the fixes:

Ensure the build command works

make image-fs-backend-build IMAGE_TAG_BASE=gcr.io/<project> FS_BACKEND_NAME=vllm-llmd-fs DEV_VERSION=vllm-0.21-cu130-client-cache-v1

make image-fs-backend-push IMAGE_TAG_BASE=gcr.io/<project> FS_BACKEND_NAME=vllm-llmd-fs DEV_VERSION=vllm-0.21-cu130-client-cache-v1

Basic inference using inference-perf
Run existing tests kv_connectors/llmd_fs_backend/tests/

saikat-royc · 2026-05-29T17:28:44Z

/cc @kfirtoledo

fix CUDA version mismatch and dev headers symlink - Update default CUDA_TOOLKIT_PKG to cuda-toolkit-13-0 to match the CUDA 13.0 base image and prevent PyTorch compilation version mismatch. - Explicitly parse and update the standard /usr/local/cuda symlink after GKE package installation to resolve missing dev headers (cusparse.h) during compilation Signed-off-by: Saikat Roychowdhury <saikat.royc85@gmail.com>

saikat-royc · 2026-06-01T21:06:34Z

/cc @kfirtoledo request a review for this PR

kfirtoledo · 2026-06-02T06:10:18Z

/lgtm
/approve

saikat-royc requested review from dannyharnik, kfirtoledo, liu-cong and vMaroon as code owners May 29, 2026 17:26

github-actions Bot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label May 29, 2026

github-actions Bot requested review from hyeongyun0916, sagearc and yankay May 29, 2026 17:26

saikat-royc force-pushed the fix-dockerfile-05-28 branch from b678b95 to f6d9d1a Compare May 29, 2026 17:31

saikat-royc mentioned this pull request May 30, 2026

llmd fsconnector metadata cache #621

Closed

github-actions Bot added the lgtm Looks good to me, indicates that a PR is ready to be merged. label Jun 2, 2026

github-actions Bot approved these changes Jun 2, 2026

View reviewed changes

github-actions Bot merged commit c8fff80 into llm-d:main Jun 2, 2026
11 checks passed

miroslavln mentioned this pull request Jun 10, 2026

fix/issue 656 default block size factor miroslavln/llm-d-kv-cache#1

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix Dockerfile.dev breakage for llmd_fs_backend#620

fix Dockerfile.dev breakage for llmd_fs_backend#620
github-actions[bot] merged 1 commit into
llm-d:mainfrom
saikat-royc:fix-dockerfile-05-28

saikat-royc commented May 29, 2026 •

edited

Loading

Uh oh!

saikat-royc commented May 29, 2026

Uh oh!

saikat-royc commented Jun 1, 2026

Uh oh!

kfirtoledo commented Jun 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

saikat-royc commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

saikat-royc commented May 29, 2026

Uh oh!

saikat-royc commented Jun 1, 2026

Uh oh!

kfirtoledo commented Jun 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

saikat-royc commented May 29, 2026 •

edited

Loading