Skip to content

bug: lmcache remote server consistently disconnected from client side #723

@cchen777

Description

@cchen777

Describe the bug

Hi team, i found that example of shared storage seems not working anymore, and consistently failed at the initContainer wait-for-cache-server due to an unknown format code, could be related to a bug in lmcache/vllm-openai:latest-nightly ? is there a more stable version tag that can be used instead of using latest image?

Server side:

cache-server's lmcache-server container logs

/opt/venv/lib/python3.12/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report t
  import pynvml  # type: ignore[import]
[2025-10-06 00:48:17,743] LMCache INFO: Initializing cpu-only cache server (__init__.py:14:lmcache.v1.server.storage_backend)
[2025-10-06 00:48:17,743] LMCache INFO: Server started at 0.0.0.0:8080 (__main__.py:138:lmcache.v1.server.__main__)
[2025-10-06 00:48:20,901] LMCache INFO: Connected by ('10.181.98.109', 53200) (__main__.py:142:lmcache.v1.server.__main__)
[2025-10-06 00:48:20,902] LMCache INFO: Client disconnected (__main__.py:134:lmcache.v1.server.__main__)

Client side:

mistral-deployment's wait-for-cache-server initContainer logs

 Waiting for LMCache server...
 /opt/venv/lib/python3.12/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report t
   import pynvml  # type: ignore[import]
 Error during health check: Unknown format code 'x' for object of type 'str'
 Waiting for LMCache server...
 /opt/venv/lib/python3.12/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report t
   import pynvml  # type: ignore[import]
 Error during health check: Unknown format code 'x' for object of type 'str'

To Reproduce

Following tutorial: https://github.com/vllm-project/production-stack/blob/main/tutorials/assets/values-06-shared-storage.yaml

$ helm install vllm-shared-storage vllm/vllm-stack -f tutorials/assets/values-06-shared-storage.yaml

Expected behavior

Able to run the tutorial without issue, this seems to be a blocker for production use case where cannot leverage the benefits of LMCache remote cache sharing

Additional context

Confirmed that can make connection outside of server, so not related to dns resolution issue, it's that the way health check is not working

root@vllm-shared-storage-deployment-router-6ff6794767-z7bdg:/app# curl -v vllm-shared-storage-cache-server-service:81
* Host vllm-shared-storage-cache-server-service:81 was resolved.
* IPv6: (none)
* IPv4: 172.20.211.86
*   Trying 172.20.211.86:81...
* Connected to vllm-shared-storage-cache-server-service (172.20.211.86) port 81
* using HTTP/1.x
> GET / HTTP/1.1
> Host: vllm-shared-storage-cache-server-service:81
> User-Agent: curl/8.14.1
> Accept: */*
>
* Request completely sent off
... hang here ...

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions