feature: enable P/D disaggregation with NIXL host KV transfer by rebel-ykchoi · Pull Request #477 · RBLN-SW/vllm-rbln

rebel-ykchoi · 2026-03-24T11:47:25Z

🚀 Summary of Changes

wire vLLM KV transfer to a RBLN-specific NIXL connector and host-side buffers so prefill/decode can run on separate engines with H2H transfer.

KV connector / registration

add RblnNixlConnector (scheduler/worker) extending upstream NixlConnector:
register connector name "RblnNixlConnector" in kv_connector factory.

Platform

expose NIXL hints: get_nixl_supported_devices (rbln -> cpu) and get_nixl_memory_type ("DRAM").

Scheduler (rbln_scheduler.py)

handle kv_consumer request to be scheduled with other requests in decode stage

Model runner (rbln_model_runner.py)

override maybe_get_kv_connector_output(..., wait_for_save) using last prefill chunk.
replace generic copy_kv_blocks with rbln_copy_kv_blocks using runtime _update_kv_cache / _fetch_kv_cache
bind_kv_cache_name + per-layer names for mark_static_address when compiling.

Attention backend (flash_attention.py)

report backend name as FLASH_ATTN for upstream compatibility.

Examples

add experimental examples/experimental/pd_disaggregation/toy_proxy_server.py (FastAPI proxy routing chat completions to prefill vs decode HTTP backends).

What does this PR do? What feature, fix, or improvement does it bring?

📌 Related Issues / Tickets

Resolves #
Related to #

✅ Type of Change

🚀 Release (release)
✨ Feature (feature)
🧠 Model support (model)
🧬 Core engine changes (core)
🛠 Bug fix (fix)
⚙️ Performance improvement (perf)
🔁 Refactor or code cleanup (refactor)
📄 Documentation (docs)
❓ Other (other): please describe

🧪 How to Test

Run ...
Verify output: ...
Edge case tested: ...

📸 Screenshots / Logs (if applicable)

📋 Checklist

PR title follows Conventional Commits format
This PR is linked to an existing issue
The test method is described, and the expected result is clearly stated
Relevant documentation has been updated (if applicable)

💬 Notes

wire vLLM KV transfer to a RBLN-specific NIXL connector and host-side buffers so prefill/decode can run on separate engines with H2H transfer. KV connector / registration - add RblnNixlConnector (scheduler/worker) extending upstream NixlConnector: - register connector name "RblnNixlConnector" in kv_connector factory. Platform - expose NIXL hints: get_nixl_supported_devices (rbln -> cpu) and get_nixl_memory_type ("DRAM"). Scheduler (rbln_scheduler.py) - handle kv_consumer request to be scheduled with other requests in decode stage Model runner (rbln_model_runner.py) - override maybe_get_kv_connector_output(..., wait_for_save) using last prefill chunk. - replace generic copy_kv_blocks with rbln_copy_kv_blocks using runtime _update_kv_cache / _fetch_kv_cache - bind_kv_cache_name + per-layer names for mark_static_address when compiling. Attention backend (flash_attention.py) - Report backend name as FLASH_ATTN for upstream compatibility. Examples - add experimental examples/experimental/pd_disaggregation/toy_proxy_server.py (FastAPI proxy routing chat completions to prefill vs decode HTTP backends).

rebel-ykchoi added 2 commits March 24, 2026 18:56

add tools/install_nixl_from_source_ubuntu.py

baa04f8

rebel-jiwoopark assigned rebel-ykchoi Apr 1, 2026

rebel-jiwoopark added the torch.compile torch.compile based implementation label Apr 1, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feature: enable P/D disaggregation with NIXL host KV transfer#477

feature: enable P/D disaggregation with NIXL host KV transfer#477
rebel-ykchoi wants to merge 2 commits intodevfrom
feat_pd_disag

rebel-ykchoi commented Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

rebel-ykchoi commented Mar 24, 2026

🚀 Summary of Changes

📌 Related Issues / Tickets

✅ Type of Change

🧪 How to Test

📸 Screenshots / Logs (if applicable)

📋 Checklist

💬 Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants