Skip to content

Conversation

@cheese-head
Copy link
Contributor

@cheese-head cheese-head commented Jan 27, 2026

Overview:

Add infrastructure to track and report block IDs that fail during asynchronous KV cache load operations in the KVBM connector. This enables vLLM to identify failed blocks and take corrective action (e.g., mark for recomputation).

Details:

  • Add block_ids field to WorkerTransferRequest protocol for tracking which blocks belong to each transfer operation
  • Add failed_block_ids field and get_block_ids_with_load_errors() method to KvConnectorWorker
  • Expose the method through Python bindings (connector_worker.py, dynamo_connector.py)
  • Add failure_tx/failure_rx channel between Scheduler and WorkerSchedulerClient to propagate transfer failures
  • Add drain_failures() method to collect pending failure notifications
  • Track request_iduuidblock_ids mapping in worker to convert failure notifications to block IDs
  • Update handle_immediate_result() to send failure notifications when result.status.is_err()
  • Clean up tracking state when requests complete

Where should the reviewer start?

  1. lib/llm/src/block_manager/connector/scheduler.rs - Core failure notification channel implementation
  2. lib/bindings/kvbm/src/block_manager/vllm/connector/worker.rs - Block ID tracking and error aggregation logic
  3. lib/llm/src/block_manager/connector/protocol.rs - Protocol changes (block_ids field)

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Summary by CodeRabbit

  • New Features
    • Added comprehensive error tracking for block load operations, enabling the system to track and report which blocks fail to load, improving visibility into failures and supporting better diagnostics and recovery mechanisms.

✏️ Tip: You can customize this high-level summary in your review settings.

Add infrastructure to track block IDs that fail during async KV cache
loading, enabling vLLM to identify and handle failed blocks gracefully.

Changes:
- protocol.rs: Add block_ids field to WorkerTransferRequest
- scheduler.rs: Update tests with new block_ids field
- slot.rs: Populate block_ids for onboard/offload operations
- worker.rs: Add failed_block_ids tracking and get_block_ids_with_load_errors method
- connector_worker.py: Expose get_block_ids_with_load_errors to Python
- dynamo_connector.py: Implement KVConnectorBase_V1 interface method

Signed-off-by: Patrick Riel <priel@nvidia.com>
…acking

- Add failure_tx/failure_rx channel between Scheduler and WorkerSchedulerClient
- Add drain_failures() method to collect pending failure notifications
- Track request_id -> uuid -> block_ids mapping in KvConnectorWorker
- Populate failed_block_ids from scheduler failure notifications
- Clean up block_ids tracking when requests complete

Signed-off-by: Patrick Riel <priel@nvidia.com>
@copy-pr-bot
Copy link

copy-pr-bot bot commented Jan 27, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions github-actions bot added the feat label Jan 27, 2026
@cheese-head cheese-head marked this pull request as ready for review January 27, 2026 00:39
@cheese-head cheese-head requested review from a team as code owners January 27, 2026 00:39
Signed-off-by: Patrick Riel <priel@nvidia.com>
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 27, 2026

Walkthrough

This pull request introduces a block load error tracking system that propagates failed block IDs through the KV connector stack. Methods are added to Python and Rust layers to expose and retrieve failed block IDs, with supporting infrastructure to capture, track, and drain failure notifications across workers and schedulers.

Changes

Cohort / File(s) Summary
Python connector API layer
lib/bindings/kvbm/python/kvbm/vllm_integration/connector/dynamo_connector.py, lib/bindings/kvbm/python/kvbm/vllm_integration/connector_worker.py
Added public get_block_ids_with_load_errors() methods to expose failed block IDs; methods delegate to underlying worker/connector or return empty set if unavailable.
Rust protocol
lib/llm/src/block_manager/connector/protocol.rs
Added block_ids: Vec<usize> field to WorkerTransferRequest struct with #[serde(default)] annotation for error tracking during transfers.
Rust worker tracking
lib/bindings/kvbm/src/block_manager/vllm/connector/worker.rs
Implemented error tracking state (request_to_blocks, failed_block_ids); added get_block_ids_with_load_errors() trait method; extended onboarding flow to record block IDs per request; exposed method to Python binding. Block IDs now propagated through transfer requests during onboard/offload operations.
Rust scheduler and failure channels
lib/llm/src/block_manager/connector/scheduler.rs
Added failure notification channel infrastructure (failure_rx, failure_tx); new drain_failures() method aggregates pending failures; immediate-transfer failures now propagated via channel; test updates for block_ids field in WorkerTransferRequest.
Rust slot management
lib/bindings/kvbm/src/block_manager/vllm/connector/leader/slot.rs
Captures block IDs from onboarding/offload paths and includes them in WorkerTransferRequest payloads for downstream error correlation.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 A rabbit hops through the transfer lane,
Tracking blocks through bytes and pain,
When loads go wrong, we note with care,
Each failed ID stored with flair!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main feature: adding infrastructure to track and report block IDs that fail during asynchronous KV cache load operations.
Description check ✅ Passed The description follows the template structure with all required sections (Overview, Details, Where should the reviewer start, Related Issues) and provides comprehensive information about the changes.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@lib/bindings/kvbm/src/block_manager/vllm/connector/worker.rs`:
- Around line 293-305: When enqueuing onboarding_operations we currently stash
request->blocks in request_to_blocks but later remove that mapping in
get_finished before we call get_block_ids_with_load_errors, which can drop the
failure->block correlation; fix by ensuring failures are drained/translated into
block IDs before cleanup: call get_block_ids_with_load_errors (or a shared
helper that drains scheduler failures and maps them to block IDs using
request_to_blocks) prior to removing the request's entry from request_to_blocks
in get_finished; factor the draining/mapping logic into a reusable helper (e.g.,
drain_and_map_load_failures) and use it from both the onboarding loop code paths
and the get_finished cleanup to preserve failure→block mappings.
🧹 Nitpick comments (1)
lib/llm/src/block_manager/connector/protocol.rs (1)

152-154: Confirm whether #[serde(default)] should mask missing block_ids.
Defaulting to empty accepts older senders but silently drops error tracking. If deployments are atomic and you want strictness, consider making the field required so deserialization fails fast when it’s absent.

💡 Possible tweak to keep the protocol explicit
-    #[serde(default)]
-    pub block_ids: Vec<usize>,
+    pub block_ids: Vec<usize>,
Based on learnings, keep the spec explicit if backward compatibility isn't required.

Comment on lines 293 to +305
// immediately enqueue the onboarding operations
for operation in onboarding_operations {
let request_id = operation.request_id.clone();
let uuid = operation.uuid;
let block_ids = operation.block_ids.clone();

// Store block_ids mapping for error tracking (only for load operations)
if !block_ids.is_empty() {
self.request_to_blocks
.entry(request_id.clone())
.or_default()
.insert(uuid, block_ids);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid losing failure → block mappings on request cleanup.
request_to_blocks is removed in get_finished, but failures are only mapped when get_block_ids_with_load_errors() drains the scheduler. If get_finished runs first, failure notifications for that request can no longer be correlated to block IDs.

🛠️ Suggested fix: drain failures before cleanup and reuse a helper
+    fn absorb_failure_notifications(&mut self) {
+        let failures = self.connector.drain_failures();
+        for (request_id, failed_uuids) in failures {
+            if let Some(uuid_to_blocks) = self.request_to_blocks.get(&request_id) {
+                for uuid in failed_uuids {
+                    if let Some(block_ids) = uuid_to_blocks.get(&uuid) {
+                        for &block_id in block_ids {
+                            self.failed_block_ids.insert(block_id as u32);
+                        }
+                        tracing::warn!(
+                            request_id = %request_id,
+                            operation_id = %uuid,
+                            block_ids = ?block_ids,
+                            "load operation failed; marking blocks as failed"
+                        );
+                    }
+                }
+            }
+        }
+    }
@@
-        for request_id in &is_finished_onboarding {
+        // Drain failures while block-id mappings still exist.
+        self.absorb_failure_notifications();
+        for request_id in &is_finished_onboarding {
             self.maybe_finished_onboarding.remove(request_id);
             // Clean up block_ids tracking for this request
             self.request_to_blocks.remove(request_id);
@@
-    fn get_block_ids_with_load_errors(&mut self) -> HashSet<u32> {
-        // Drain failures from the scheduler and convert (request_id, uuid) -> block_ids
-        let failures = self.connector.drain_failures();
-        for (request_id, failed_uuids) in failures {
-            if let Some(uuid_to_blocks) = self.request_to_blocks.get(&request_id) {
-                for uuid in failed_uuids {
-                    if let Some(block_ids) = uuid_to_blocks.get(&uuid) {
-                        for &block_id in block_ids {
-                            self.failed_block_ids.insert(block_id as u32);
-                        }
-                        tracing::warn!(
-                            request_id = %request_id,
-                            operation_id = %uuid,
-                            block_ids = ?block_ids,
-                            "load operation failed; marking blocks as failed"
-                        );
-                    }
-                }
-            }
-        }
-
-        // Return and clear the failed block IDs
-        std::mem::take(&mut self.failed_block_ids)
-    }
+    fn get_block_ids_with_load_errors(&mut self) -> HashSet<u32> {
+        self.absorb_failure_notifications();
+        // Return and clear the failed block IDs
+        std::mem::take(&mut self.failed_block_ids)
+    }

Also applies to: 473-509

🤖 Prompt for AI Agents
In `@lib/bindings/kvbm/src/block_manager/vllm/connector/worker.rs` around lines
293 - 305, When enqueuing onboarding_operations we currently stash
request->blocks in request_to_blocks but later remove that mapping in
get_finished before we call get_block_ids_with_load_errors, which can drop the
failure->block correlation; fix by ensuring failures are drained/translated into
block IDs before cleanup: call get_block_ids_with_load_errors (or a shared
helper that drains scheduler failures and maps them to block IDs using
request_to_blocks) prior to removing the request's entry from request_to_blocks
in get_finished; factor the draining/mapping logic into a reusable helper (e.g.,
drain_and_map_load_failures) and use it from both the onboarding loop code paths
and the get_finished cleanup to preserve failure→block mappings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants