Skip to content

[#12116][fix] prevent KVBM hang when requests abort during KV cache transfer#12117

Open
zyang-Modular wants to merge 1 commit intoNVIDIA:mainfrom
zyang-Modular:zyang/fix-kvbm-hang-on-request-abort
Open

[#12116][fix] prevent KVBM hang when requests abort during KV cache transfer#12117
zyang-Modular wants to merge 1 commit intoNVIDIA:mainfrom
zyang-Modular:zyang/fix-kvbm-hang-on-request-abort

Conversation

@zyang-Modular
Copy link
Contributor

@zyang-Modular zyang-Modular commented Mar 11, 2026

Summary by CodeRabbit

Release Notes

  • New Features

    • Added request abort capability to improve resource management and cleanup.
  • Bug Fixes

    • Enhanced error handling to prevent execution stalls during worker operations.
    • Improved request cancellation with proper system resource cleanup.
  • Tests

    • Added validation tests for request cleanup and error recovery scenarios.

Description

This PR addresses the root cause of #12116. While it ca be prevented by upstream (fixed by this PR), the engine should not hang if a request fails during an active KV transfer, regardless of the failure reason. This change improves robustness by ensuring failed requests are always cleaned up before worker.get_finished() is called, avoiding the collective stall entirely.

When a request fails (e.g. prompt exceeds max_seq_len, GPU OOM, or a KVBM Rust panic), its in-flight async KV transfer state is not cleaned up.
On the next iteration, worker.get_finished() on rank 0 blocks indefinitely waiting for a transfer that will never complete. This prevents rank 0 from reaching the mpi_allgather barrier, causing a collective stall across all TP ranks. The HangDetector fires after 300s and shuts down all ranks.

Test Coverage

I put 2 new test cases

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

…nsfer

Signed-off-by: Zhu Yang <zyang@modular.com>
@zyang-Modular zyang-Modular marked this pull request as ready for review March 11, 2026 18:36
@zyang-Modular zyang-Modular requested a review from a team as a code owner March 11, 2026 18:36
@zyang-Modular zyang-Modular requested a review from joyang-nv March 11, 2026 18:36
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 11, 2026

📝 Walkthrough

Walkthrough

This change adds request abortion capability to the KV cache connector manager, enabling explicit cleanup of request tracking structures during error handling and cancellation. Additionally, exception handling is added to get_finished to prevent MPI barrier stalling when workers encounter errors.

Changes

Cohort / File(s) Summary
Core KV Cache Connector Updates
tensorrt_llm/_torch/pyexecutor/kv_cache_connector.py
Added abort_request() method to remove requests from async tracking structures (new_async_requests, pending_async_requests, local_finished_async_requests) and scheduler output manager. Wrapped worker.get_finished() call in try/except to handle exceptions gracefully and prevent MPI barrier stalling.
Executor Error Handling Integration
tensorrt_llm/_torch/pyexecutor/py_executor.py
Integrated abort_request() calls into error handling and cancellation paths. Failed requests are aborted via kv_connector_manager before standard termination logic, providing early connector-level cleanup.
Test Coverage
tests/unittest/_torch/test_connector.py
Added tests for abort_request cleanup behavior and exception handling in get_finished across MPI ranks to verify proper resource cleanup and barrier unblocking.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 63.64% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ⚠️ Warning The PR description explains the issue and solution, but the 'Description' section is incomplete and the 'Test Coverage' section lacks detail. Complete the Description section with full context, and provide specific names/counts of the test cases added (currently just says 'I put 2 new test cases').
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly identifies the issue (#12116), fix type, and main change (preventing KVBM hang when requests abort during KV cache transfer). It is specific and descriptive.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Warning

Review ran into problems

🔥 Problems

Git: Failed to clone repository. Please run the @coderabbitai full review command to re-trigger a full review. If the issue persists, set path_filters to include or exclude specific files.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
tensorrt_llm/_torch/pyexecutor/kv_cache_connector.py (1)

553-564: Consider narrowing the exception type if specific failures are known.

The broad except Exception catch-all is used here to prevent MPI stalls regardless of failure cause. While this approach achieves the resilience goal described in the PR, the coding guidelines recommend catching specific exceptions.

If the expected failure modes are known (e.g., RuntimeError from KVBM panics as mentioned in tests), consider narrowing the exception type. However, if truly any failure should trigger this recovery path, the current approach is acceptable as a defensive measure.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/_torch/pyexecutor/kv_cache_connector.py` around lines 553 - 564,
The current broad except Exception around self.worker.get_finished(...) should
be narrowed: catch known failure types first (e.g., RuntimeError) and handle
them the same way (set finished_saving/finished_loading = [] and unblock
mpi_allgather), then keep a secondary broad fallback only if truly required;
also replace logger.error(...) with logger.exception(...) or include the
traceback to capture full error details. Update the try/except surrounding
worker.get_finished to explicitly handle RuntimeError (and any other known
worker/KVBM error classes) before an optional generic fallback, ensuring
finished_saving and finished_loading are still set to [] in those branches.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tensorrt_llm/_torch/pyexecutor/kv_cache_connector.py`:
- Around line 558-562: The except block in get_finished() (the handler that
calls logger.error on exception 'e') references logger which is not imported;
add the missing import at the top of the module with the other imports by
importing the shared logger symbol (e.g., from tensorrt_llm.logger import
logger) so the exception handler and any other usages of logger resolve at
runtime.

---

Nitpick comments:
In `@tensorrt_llm/_torch/pyexecutor/kv_cache_connector.py`:
- Around line 553-564: The current broad except Exception around
self.worker.get_finished(...) should be narrowed: catch known failure types
first (e.g., RuntimeError) and handle them the same way (set
finished_saving/finished_loading = [] and unblock mpi_allgather), then keep a
secondary broad fallback only if truly required; also replace logger.error(...)
with logger.exception(...) or include the traceback to capture full error
details. Update the try/except surrounding worker.get_finished to explicitly
handle RuntimeError (and any other known worker/KVBM error classes) before an
optional generic fallback, ensuring finished_saving and finished_loading are
still set to [] in those branches.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 1ea3c1d1-709e-4a42-a8df-8f9b6e55ae9e

📥 Commits

Reviewing files that changed from the base of the PR and between 0350b7f and 699246f.

📒 Files selected for processing (3)
  • tensorrt_llm/_torch/pyexecutor/kv_cache_connector.py
  • tensorrt_llm/_torch/pyexecutor/py_executor.py
  • tests/unittest/_torch/test_connector.py

Comment on lines +558 to +562
except Exception as e:
logger.error(
f"KV connector worker get_finished() raised an exception: {e}. "
"Reporting no completions this iteration to unblock mpi_allgather."
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

logger is not imported — this will raise NameError at runtime.

The exception handler references logger.error(...), but logger is not imported in this file. This will cause a NameError when an exception occurs, defeating the purpose of the exception handling.

🐛 Proposed fix: Add the logger import

Add at the top of the file with other imports:

from tensorrt_llm.logger import logger
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/_torch/pyexecutor/kv_cache_connector.py` around lines 558 - 562,
The except block in get_finished() (the handler that calls logger.error on
exception 'e') references logger which is not imported; add the missing import
at the top of the module with the other imports by importing the shared logger
symbol (e.g., from tensorrt_llm.logger import logger) so the exception handler
and any other usages of logger resolve at runtime.

@svc-trtllm-gh-bot svc-trtllm-gh-bot added the Community want to contribute PRs initiated from Community label Mar 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Community want to contribute PRs initiated from Community

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants