feature: add RBLNLMCacheConnectorV1 for LMCache KV cache offloading#523
Open
rebel-jinhwan wants to merge 3 commits intodevfrom
Open
feature: add RBLNLMCacheConnectorV1 for LMCache KV cache offloading#523rebel-jinhwan wants to merge 3 commits intodevfrom
rebel-jinhwan wants to merge 3 commits intodevfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR integrates the RBLNLMCacheConnectorV1 KV-cache connector into vllm-rbln as a thin registration/re-export shim over the external lmcache_rbln package, and wires KV-transfer lifecycle/hooks into the V1 worker/model-runner so the connector can be selected by name and used safely during compilation warmup.
Changes:
- Register
RBLNLMCacheConnectorV1viaKVConnectorFactoryand expose it under a stablevllm_rbln...import path. - Add V1 worker support for KV connector handshake metadata retrieval and connector shutdown.
- Adjust V1 model runner KV-connector integration to bypass upstream assertions during compilation warmup and to provide an RBLN-specific host transfer buffer copy implementation.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
vllm_rbln/v1/worker/rbln_worker.py |
Adds handshake metadata helper and ensures KV connector is shutdown during worker shutdown. |
vllm_rbln/v1/worker/rbln_model_runner.py |
Adds warmup-safe KV connector output handling and RBLN host transfer buffer copy hook for KV transfer group. |
vllm_rbln/distributed/kv_transfer/kv_connector/v1/rbln_lmcache_connector.py |
Thin re-export shim for RBLNLMCacheConnectorV1 from lmcache_rbln. |
vllm_rbln/distributed/kv_transfer/kv_connector/v1/__init__.py |
Package init to support stable import path. |
vllm_rbln/distributed/kv_transfer/kv_connector/factory.py |
Registers RBLNLMCacheConnectorV1 in KVConnectorFactory. |
vllm_rbln/distributed/kv_transfer/kv_connector/__init__.py |
Package init for KV connector namespace. |
vllm_rbln/distributed/kv_transfer/__init__.py |
Package init for KV transfer namespace. |
vllm_rbln/__init__.py |
Ensures connector factory registration is imported during plugin ops registration. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Register ``RBLNLMCacheConnectorV1`` in vllm-rbln's KV connector factory
as a thin shim that re-exports the connector from the ``lmcache_rbln``
package. All connector logic lives in ``lmcache_rbln``; this side only
wires the connector into vllm-rbln's worker/model_runner.
Changes:
- New ``vllm_rbln.distributed.kv_transfer`` package with factory that
registers ``RBLNLMCacheConnectorV1``.
- ``RBLNModelRunner``:
- Initialize ``cross_layers_kv_cache`` / ``cross_layers_attn_backend``
to ``None`` so the fallback KV allocation path works when
``has_kv_transfer_group()`` is True.
- Override ``maybe_get_kv_connector_output`` to bypass the upstream
assertion during compilation warmup (kv_connector_metadata is None).
- Treat warmup scheduler output as a no-op in ``execute_model``.
- Replace CUDA-only ``copy_kv_blocks`` with ``rbln_copy_kv_blocks``
that uses the rebel runtime via ``runtime_holder``.
- Pass ``runtime_holder`` to the connector via ``set_runtime_holder``.
- ``RBLNWorker``: shut down the KV transfer group on worker shutdown.
0d8fff1 to
fa9de2a
Compare
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
🚀 Summary of Changes
Register
RBLNLMCacheConnectorV1in vllm-rbln's KV connector factory as a thin re-export shim over thelmcache_rblnpackage. All connector logic lives inlmcache-rblnThis PR wires the connector into vllm-rbln's worker / model runner so that vLLM can load it by name.
LMCache provides hierarchical KV cache offloading (CPU / disk) for vLLM. The RBLN-specific implementation lives in the
lmcache-rblnpackage.📌 Related Issues / Tickets
✅ Type of Change
release)feature)model)core)fix)perf)refactor)docs)other): please describe🧪 How to Test
.........📸 Screenshots / Logs (if applicable)
📋 Checklist
💬 Notes
Why
lmcache-rblnis a separate repositoryLMCache is designed to be engine-agnostic (vLLM, SGLang, …), so the RBLN connector lives in a standalone
lmcache-rblnpackage and this PR only adds a thin registration shim. Keeping them separate preserves that boundary and keeps the option open — merging the two repos later is cheap, but splitting them later would not be.