Skip to content

feature: pd disagregation docker#510

Open
rebel-jindol21 wants to merge 4 commits intomainfrom
feat_pd_disag_docker
Open

feature: pd disagregation docker#510
rebel-jindol21 wants to merge 4 commits intomainfrom
feat_pd_disag_docker

Conversation

@rebel-jindol21
Copy link
Copy Markdown
Contributor

🚀 Summary of Changes

for PD disaggregation docker


📌 Related Issues / Tickets

  • Resolves #
  • Related to #

✅ Type of Change

  • 🚀 Release (release)
  • ✨ Feature (feature)
  • 🧠 Model support (model)
  • 🧬 Core engine changes (core)
  • 🛠 Bug fix (fix)
  • ⚙️ Performance improvement (perf)
  • 🔁 Refactor or code cleanup (refactor)
  • 📄 Documentation (docs)
  • ❓ Other (other): please describe

🧪 How to Test

  1. Run ...
  2. Verify output: ...
  3. Edge case tested: ...

📸 Screenshots / Logs (if applicable)


📋 Checklist

  • PR title follows Conventional Commits format
  • This PR is linked to an existing issue
  • The test method is described, and the expected result is clearly stated
  • Relevant documentation has been updated (if applicable)

💬 Notes


rebel-ykchoi and others added 3 commits March 24, 2026 18:56
wire vLLM KV transfer to a RBLN-specific NIXL connector and host-side
buffers so prefill/decode can run on separate engines with H2H transfer.

KV connector / registration
- add RblnNixlConnector (scheduler/worker) extending upstream NixlConnector:
- register connector name "RblnNixlConnector" in kv_connector factory.

Platform
- expose NIXL hints: get_nixl_supported_devices (rbln -> cpu) and
  get_nixl_memory_type ("DRAM").

Scheduler (rbln_scheduler.py)
- handle kv_consumer request to be scheduled with other requests in decode
stage

Model runner (rbln_model_runner.py)
- override maybe_get_kv_connector_output(..., wait_for_save)
using last prefill chunk.
- replace generic copy_kv_blocks with rbln_copy_kv_blocks using runtime
  _update_kv_cache / _fetch_kv_cache
- bind_kv_cache_name + per-layer names for mark_static_address when compiling.

Attention backend (flash_attention.py)
- Report backend name as FLASH_ATTN for upstream compatibility.

Examples
- add experimental examples/experimental/pd_disaggregation/toy_proxy_server.py
  (FastAPI proxy routing chat completions to prefill vs decode HTTP backends).
Signed-off-by: Jinseok Lee <jindol21@rebellions.ai>
@rebel-jindol21 rebel-jindol21 changed the title draft: pd disagregation docker feature: pd disagregation docker Apr 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants