feature: add RBLNLMCacheConnectorV1 for LMCache KV cache offloading by rebel-jinhwan · Pull Request #523 · RBLN-SW/vllm-rbln

rebel-jinhwan · 2026-04-13T04:30:43Z

🚀 Summary of Changes

What does this PR do? What feature, fix, or improvement does it bring?

Register RBLNLMCacheConnectorV1 in vllm-rbln's KV connector factory as a thin re-export shim over the lmcache_rbln package. All connector logic lives in lmcache-rbln

This PR wires the connector into vllm-rbln's worker / model runner so that vLLM can load it by name.

LMCache provides hierarchical KV cache offloading (CPU / disk) for vLLM. The RBLN-specific implementation lives in the lmcache-rbln package.

📌 Related Issues / Tickets

Resolves #
Related to feature(pdd): enable P/D disaggregation with NIXL host KV transfer #477

✅ Type of Change

🚀 Release (release)
✨ Feature (feature)
🧠 Model support (model)
🧬 Core engine changes (core)
🛠 Bug fix (fix)
⚙️ Performance improvement (perf)
🔁 Refactor or code cleanup (refactor)
📄 Documentation (docs)
❓ Other (other): please describe

🧪 How to Test

Run ...
Verify output: ...
Edge case tested: ...

📸 Screenshots / Logs (if applicable)

📋 Checklist

PR title follows Conventional Commits format
This PR is linked to an existing issue
The test method is described, and the expected result is clearly stated
Relevant documentation has been updated (if applicable)

💬 Notes

Why `lmcache-rbln` is a separate repository

LMCache is designed to be engine-agnostic (vLLM, SGLang, …), so the RBLN connector lives in a standalone lmcache-rbln package and this PR only adds a thin registration shim. Keeping them separate preserves that boundary and keeps the option open — merging the two repos later is cheap, but splitting them later would not be.

Preserves the engine-agnostic boundary — avoids coupling connector logic to vLLM internals.
Easier upstream sync — clean rebases and contributions against LMCache upstream.

Copilot

Pull request overview

This PR integrates the RBLNLMCacheConnectorV1 KV-cache connector into vllm-rbln as a thin registration/re-export shim over the external lmcache_rbln package, and wires KV-transfer lifecycle/hooks into the V1 worker/model-runner so the connector can be selected by name and used safely during compilation warmup.

Changes:

Register RBLNLMCacheConnectorV1 via KVConnectorFactory and expose it under a stable vllm_rbln... import path.
Add V1 worker support for KV connector handshake metadata retrieval and connector shutdown.
Adjust V1 model runner KV-connector integration to bypass upstream assertions during compilation warmup and to provide an RBLN-specific host transfer buffer copy implementation.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
`vllm_rbln/v1/worker/rbln_worker.py`	Adds handshake metadata helper and ensures KV connector is shutdown during worker shutdown.
`vllm_rbln/v1/worker/rbln_model_runner.py`	Adds warmup-safe KV connector output handling and RBLN host transfer buffer copy hook for KV transfer group.
`vllm_rbln/distributed/kv_transfer/kv_connector/v1/rbln_lmcache_connector.py`	Thin re-export shim for `RBLNLMCacheConnectorV1` from `lmcache_rbln`.
`vllm_rbln/distributed/kv_transfer/kv_connector/v1/__init__.py`	Package init to support stable import path.
`vllm_rbln/distributed/kv_transfer/kv_connector/factory.py`	Registers `RBLNLMCacheConnectorV1` in `KVConnectorFactory`.
`vllm_rbln/distributed/kv_transfer/kv_connector/__init__.py`	Package init for KV connector namespace.
`vllm_rbln/distributed/kv_transfer/__init__.py`	Package init for KV transfer namespace.
`vllm_rbln/__init__.py`	Ensures connector factory registration is imported during plugin ops registration.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Register ``RBLNLMCacheConnectorV1`` in vllm-rbln's KV connector factory as a thin shim that re-exports the connector from the ``lmcache_rbln`` package. All connector logic lives in ``lmcache_rbln``; this side only wires the connector into vllm-rbln's worker/model_runner. Changes: - New ``vllm_rbln.distributed.kv_transfer`` package with factory that registers ``RBLNLMCacheConnectorV1``. - ``RBLNModelRunner``: - Initialize ``cross_layers_kv_cache`` / ``cross_layers_attn_backend`` to ``None`` so the fallback KV allocation path works when ``has_kv_transfer_group()`` is True. - Override ``maybe_get_kv_connector_output`` to bypass the upstream assertion during compilation warmup (kv_connector_metadata is None). - Treat warmup scheduler output as a no-op in ``execute_model``. - Replace CUDA-only ``copy_kv_blocks`` with ``rbln_copy_kv_blocks`` that uses the rebel runtime via ``runtime_holder``. - Pass ``runtime_holder`` to the connector via ``set_runtime_holder``. - ``RBLNWorker``: shut down the KV transfer group on worker shutdown.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

rebel-jinhwan self-assigned this Apr 13, 2026

rebel-jinhwan added the torch.compile torch.compile based implementation label Apr 13, 2026

rebel-jinhwan changed the title ~~feat: add RBLNLMCacheConnectorV1 for LMCache KV cache offloading~~ feat(connector): add RBLNLMCacheConnectorV1 for LMCache KV cache offloading Apr 13, 2026

rebel-jinhwan changed the title ~~feat(connector): add RBLNLMCacheConnectorV1 for LMCache KV cache offloading~~ feature: add RBLNLMCacheConnectorV1 for LMCache KV cache offloading Apr 13, 2026

RBLN-SW deleted a comment from github-actions bot Apr 13, 2026

rebel-jinhwan marked this pull request as ready for review April 14, 2026 05:04

rebel-jinhwan requested review from rebel-jiwoopark and rebel-ykchoi April 14, 2026 05:04

rebel-jaehwang requested a review from Copilot April 14, 2026 05:06

Copilot started reviewing on behalf of rebel-jaehwang April 14, 2026 05:06 View session

Copilot AI reviewed Apr 14, 2026

View reviewed changes

Comment thread vllm_rbln/v1/worker/rbln_model_runner.py

Comment thread vllm_rbln/v1/worker/rbln_model_runner.py

rebel-jinhwan added 2 commits April 15, 2026 13:47

add get_kv_connector_handshake_metadata

fa9de2a

rebel-jinhwan force-pushed the jinhwan/lmcache-connector-clean branch from 0d8fff1 to fa9de2a Compare April 15, 2026 04:47

chore: remove accidentally committed benchmark file

2afc02d

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feature: add RBLNLMCacheConnectorV1 for LMCache KV cache offloading#523

feature: add RBLNLMCacheConnectorV1 for LMCache KV cache offloading#523
rebel-jinhwan wants to merge 3 commits intodevfrom
jinhwan/lmcache-connector-clean

rebel-jinhwan commented Apr 13, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

rebel-jinhwan commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🚀 Summary of Changes

📌 Related Issues / Tickets

✅ Type of Change

🧪 How to Test

📸 Screenshots / Logs (if applicable)

📋 Checklist

💬 Notes

Why lmcache-rbln is a separate repository

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rebel-jinhwan commented Apr 13, 2026 •

edited

Loading

Why `lmcache-rbln` is a separate repository