Skip to content

feature: add RBLNLMCacheConnectorV1 for LMCache KV cache offloading#523

Open
rebel-jinhwan wants to merge 3 commits intodevfrom
jinhwan/lmcache-connector-clean
Open

feature: add RBLNLMCacheConnectorV1 for LMCache KV cache offloading#523
rebel-jinhwan wants to merge 3 commits intodevfrom
jinhwan/lmcache-connector-clean

Conversation

@rebel-jinhwan
Copy link
Copy Markdown
Contributor

@rebel-jinhwan rebel-jinhwan commented Apr 13, 2026

🚀 Summary of Changes

What does this PR do? What feature, fix, or improvement does it bring?

Register RBLNLMCacheConnectorV1 in vllm-rbln's KV connector factory as a thin re-export shim over the lmcache_rbln package. All connector logic lives in lmcache-rbln

This PR wires the connector into vllm-rbln's worker / model runner so that vLLM can load it by name.

LMCache provides hierarchical KV cache offloading (CPU / disk) for vLLM. The RBLN-specific implementation lives in the lmcache-rbln package.


📌 Related Issues / Tickets


✅ Type of Change

  • 🚀 Release (release)
  • ✨ Feature (feature)
  • 🧠 Model support (model)
  • 🧬 Core engine changes (core)
  • 🛠 Bug fix (fix)
  • ⚙️ Performance improvement (perf)
  • 🔁 Refactor or code cleanup (refactor)
  • 📄 Documentation (docs)
  • ❓ Other (other): please describe

🧪 How to Test

  1. Run ...
  2. Verify output: ...
  3. Edge case tested: ...

📸 Screenshots / Logs (if applicable)


📋 Checklist

  • PR title follows Conventional Commits format
  • This PR is linked to an existing issue
  • The test method is described, and the expected result is clearly stated
  • Relevant documentation has been updated (if applicable)

💬 Notes

Why lmcache-rbln is a separate repository

LMCache is designed to be engine-agnostic (vLLM, SGLang, …), so the RBLN connector lives in a standalone lmcache-rbln package and this PR only adds a thin registration shim. Keeping them separate preserves that boundary and keeps the option open — merging the two repos later is cheap, but splitting them later would not be.

  • Preserves the engine-agnostic boundary — avoids coupling connector logic to vLLM internals.
  • Easier upstream sync — clean rebases and contributions against LMCache upstream.

@rebel-jinhwan rebel-jinhwan self-assigned this Apr 13, 2026
@rebel-jinhwan rebel-jinhwan added the torch.compile torch.compile based implementation label Apr 13, 2026
@rebel-jinhwan rebel-jinhwan changed the title feat: add RBLNLMCacheConnectorV1 for LMCache KV cache offloading feat(connector): add RBLNLMCacheConnectorV1 for LMCache KV cache offloading Apr 13, 2026
@rebel-jinhwan rebel-jinhwan changed the title feat(connector): add RBLNLMCacheConnectorV1 for LMCache KV cache offloading feature: add RBLNLMCacheConnectorV1 for LMCache KV cache offloading Apr 13, 2026
@RBLN-SW RBLN-SW deleted a comment from github-actions bot Apr 13, 2026
@rebel-jinhwan rebel-jinhwan marked this pull request as ready for review April 14, 2026 05:04
@rebel-jaehwang rebel-jaehwang requested a review from Copilot April 14, 2026 05:06
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR integrates the RBLNLMCacheConnectorV1 KV-cache connector into vllm-rbln as a thin registration/re-export shim over the external lmcache_rbln package, and wires KV-transfer lifecycle/hooks into the V1 worker/model-runner so the connector can be selected by name and used safely during compilation warmup.

Changes:

  • Register RBLNLMCacheConnectorV1 via KVConnectorFactory and expose it under a stable vllm_rbln... import path.
  • Add V1 worker support for KV connector handshake metadata retrieval and connector shutdown.
  • Adjust V1 model runner KV-connector integration to bypass upstream assertions during compilation warmup and to provide an RBLN-specific host transfer buffer copy implementation.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
vllm_rbln/v1/worker/rbln_worker.py Adds handshake metadata helper and ensures KV connector is shutdown during worker shutdown.
vllm_rbln/v1/worker/rbln_model_runner.py Adds warmup-safe KV connector output handling and RBLN host transfer buffer copy hook for KV transfer group.
vllm_rbln/distributed/kv_transfer/kv_connector/v1/rbln_lmcache_connector.py Thin re-export shim for RBLNLMCacheConnectorV1 from lmcache_rbln.
vllm_rbln/distributed/kv_transfer/kv_connector/v1/__init__.py Package init to support stable import path.
vllm_rbln/distributed/kv_transfer/kv_connector/factory.py Registers RBLNLMCacheConnectorV1 in KVConnectorFactory.
vllm_rbln/distributed/kv_transfer/kv_connector/__init__.py Package init for KV connector namespace.
vllm_rbln/distributed/kv_transfer/__init__.py Package init for KV transfer namespace.
vllm_rbln/__init__.py Ensures connector factory registration is imported during plugin ops registration.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread vllm_rbln/v1/worker/rbln_model_runner.py
Comment thread vllm_rbln/v1/worker/rbln_model_runner.py
Register ``RBLNLMCacheConnectorV1`` in vllm-rbln's KV connector factory
as a thin shim that re-exports the connector from the ``lmcache_rbln``
package. All connector logic lives in ``lmcache_rbln``; this side only
wires the connector into vllm-rbln's worker/model_runner.

Changes:
- New ``vllm_rbln.distributed.kv_transfer`` package with factory that
  registers ``RBLNLMCacheConnectorV1``.
- ``RBLNModelRunner``:
  - Initialize ``cross_layers_kv_cache`` / ``cross_layers_attn_backend``
    to ``None`` so the fallback KV allocation path works when
    ``has_kv_transfer_group()`` is True.
  - Override ``maybe_get_kv_connector_output`` to bypass the upstream
    assertion during compilation warmup (kv_connector_metadata is None).
  - Treat warmup scheduler output as a no-op in ``execute_model``.
  - Replace CUDA-only ``copy_kv_blocks`` with ``rbln_copy_kv_blocks``
    that uses the rebel runtime via ``runtime_holder``.
  - Pass ``runtime_holder`` to the connector via ``set_runtime_holder``.
- ``RBLNWorker``: shut down the KV transfer group on worker shutdown.
@rebel-jinhwan rebel-jinhwan force-pushed the jinhwan/lmcache-connector-clean branch from 0d8fff1 to fa9de2a Compare April 15, 2026 04:47
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

torch.compile torch.compile based implementation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants