Skip to content

[None][chore] Mass integration of release/1.2 weekly - 6th#11934

Merged
chzblych merged 14 commits intoNVIDIA:mainfrom
dominicshanshan:mi-release-1.2-6
Mar 7, 2026
Merged

[None][chore] Mass integration of release/1.2 weekly - 6th#11934
chzblych merged 14 commits intoNVIDIA:mainfrom
dominicshanshan:mi-release-1.2-6

Conversation

@dominicshanshan
Copy link
Collaborator

@dominicshanshan dominicshanshan commented Mar 5, 2026

Summary by CodeRabbit

Release Notes

  • Documentation

    • Updated documentation links and standardized terminology across deployment and feature guides
    • Added known issue: Disaggregated serving may hang with context pipeline parallelism and generation tensor parallelism
  • Bug Fixes

    • Fixed tokenizer initialization error when special tokens attribute is unavailable
    • Fixed bench command help display functionality
  • Tests

    • Added new release sanity test suite
    • Updated test waives configuration

Description

This is weekly Mass Integration (MI) for release/1.2. Follow PR will not cherry back to main:
#11702 #11778 will merge into main with #11898 (dependency and container image security update) @yiqingy0
#11744 duplicate with #11743 @lancelly
#11771 duplicate with #11296 @yechank-nvidia
#11683 @yibinl-nvidia
#11757 @eopXD
#11775 (waive test already in main)
and 13 infra PR with title: [None][infra] Check in most recent lock file from nightly pipeline

Test Coverage

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 5, 2026

📝 Walkthrough

Walkthrough

Multiple documentation updates standardize TensorRT LLM naming and hyperlinks across deployment and feature guides. Code additions include VSWA window-size memory management and synchronization in resource allocation, click option enhancements for help handling, attribute guarding in tokenizer initialization, and request failure tracking in test reporting. A new release sanity test list was introduced.

Changes

Cohort / File(s) Summary
Documentation – Naming and Link Updates
docs/source/blogs/.../blog5_Disaggregated_Serving_in_TensorRT-LLM.md, docs/source/deployment-guide/configuring-cpu-affinity.md, docs/source/features/disagg-serving.md, examples/models/core/multimodal/README.md, triton_backend/all_models/multimodal/Deprecation_notice.md
Standardized TensorRT-LLM naming to TensorRT LLM and updated NVIDIA Dynamo and documentation hyperlinks to current versions.
Resource Manager – VSWA Block Synchronization & Memory Allocation
tensorrt_llm/_torch/pyexecutor/resource_manager.py
Added VSWA window-size logic including block synchronization across MPI ranks, per-window memory share configuration via environment variable, max block calculation, and window-size adjustment with expanded logging.
CLI & Utility Code Enhancements
tensorrt_llm/commands/bench.py, tensorrt_llm/tokenizer/tokenizer.py
Introduced click.Option subclass for optional required parameters during help display and added guarded attribute access for tokenizer special tokens.
Test Infrastructure – Failure Tracking & Lists
tests/integration/defs/perf/disagg/execution/executor.py, tests/integration/defs/perf/disagg/reporting/report.py, tests/integration/defs/perf/disagg/testlist/release_sanity.txt, tests/integration/test_lists/waives.txt
Added failed/total request extraction and tracking in performance reporting, introduced new release sanity test list, and updated test skip entries in waives.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ❓ Inconclusive The PR description lacks critical details required by the template: no Description section explaining the issue and solution, and Test Coverage section is empty despite significant code changes. Provide a substantive Description section explaining the purpose of this MI PR and its key changes, and list relevant test coverage for the code changes, particularly for resource_manager.py, bench.py, tokenizer.py, and integration test changes.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title '[None][chore] Mass integration of release/1.2 weekly - 6th' accurately describes a weekly mass integration merge from a release branch, which aligns with the PR's purpose as a scheduled integration of changes from mi-release-1.2-6 into main.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
tensorrt_llm/_torch/pyexecutor/resource_manager.py (2)

422-486: ⚠️ Potential issue | 🟠 Major

Cross-rank block reduction is executed twice.

Lines 455-486 duplicate the same MIN allreduce loop already run in Lines 422-453, which doubles cross-rank synchronization overhead and log noise.

Suggested cleanup
                 blocks_per_window = self.calculate_max_num_blocks_for_vswa(
                     kv_cache_config=kv_cache_config,
                     model_config=model_config,
                     extra_cost_memory=0,
                 )
                 if mapping.world_size > 1:
+                    original_blocks_per_window = blocks_per_window.copy()
                     # make sure all ranks use the same number of primary/secondary blocks
                     if mpi_disabled():
@@
                             blocks_per_window[window_size] = (
                                 reduced_primary_blocks,
                                 reduced_secondary_blocks)
                     logger.info(
-                        f"[MPI rank={mapping.rank}] Original blocks_per_window: {blocks_per_window}"
+                        f"[MPI rank={mapping.rank}] Original blocks_per_window: {original_blocks_per_window}"
                     )
                     logger.info(
                         f"[MPI rank={mapping.rank}] Reduced blocks_per_window: {blocks_per_window}"
                     )
-
-                if mapping.world_size > 1:
-                    ...
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/_torch/pyexecutor/resource_manager.py` around lines 422 - 486,
The code runs the same cross-rank MIN allreduce and logging twice for
blocks_per_window; remove the duplicate second if mapping.world_size > 1 block
(the one that repeats the allreduce loops and the two logger.info calls) so the
reduction and logs only occur once; locate the duplicate by searching for the
repeated use of mapping.world_size, mpi_disabled(), torch_comm()/mpi_comm(),
MPI.MIN or torch.distributed.ReduceOp.MIN, and logger.info related to "Original
blocks_per_window" / "Reduced blocks_per_window" and delete the redundant
section.

1178-1333: ⚠️ Potential issue | 🔴 Critical

Fix parser-blocking merge artifacts in VSWA section.

The file cannot be imported due to syntax errors caused by duplicated lines:

  • Duplicate self parameter in adjust_window_sizes_for_vswa signature (line 1180)
  • Duplicated logger.info( statements (lines 1214–1215, 1228–1229, and 1222–1223)
  • Duplicate calculate_max_num_blocks_for_vswa definition (line 1303)

These appear to be unintended merge artifacts. Remove all duplications to restore valid Python syntax.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/_torch/pyexecutor/resource_manager.py` around lines 1178 - 1333,
The VSWA section contains merge-artifact duplicates causing syntax errors: in
adjust_window_sizes_for_vswa remove the extra duplicated self parameter in the
signature and delete duplicate statements (the repeated total_kv_heads
assignment and all duplicated logger.info calls inside
adjust_window_sizes_for_vswa), then remove the duplicate definition of
calculate_max_num_blocks_for_vswa so only one valid def remains; ensure
adjust_window_sizes_for_vswa and calculate_max_num_blocks_for_vswa have single,
correctly formatted signatures and that logging calls are not repeated.
🧹 Nitpick comments (2)
tensorrt_llm/_torch/pyexecutor/resource_manager.py (2)

415-417: Remove the stray tuple assignment before the real VSWA call.

Lines 415-416 assign blocks_per_window to a tuple and then Line 417 overwrites it immediately. This is dead/confusing residue.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/_torch/pyexecutor/resource_manager.py` around lines 415 - 417,
There is a stray tuple assignment to blocks_per_window before the real call;
remove the redundant line so blocks_per_window is only set by the actual call to
calculate_max_num_blocks_for_vswa(...). Locate the assignments around the
calculate_max_num_blocks_for_vswa invocation in resource_manager.py (references:
blocks_per_window and calculate_max_num_blocks_for_vswa, and the KvCacheConfig
assertion) and delete the dead tuple-assignment statement so the single, correct
assignment remains.

1410-1473: Drop the duplicated pre-pass in calculate_max_num_blocks_for_vswa.

This block duplicates work that is executed again in Lines 1474-1506, adding redundant compute/logging and increasing drift risk.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/_torch/pyexecutor/resource_manager.py` around lines 1410 - 1473,
The code contains a duplicated "pre-pass" that computes blocks_per_window and
logs window/allocation info inside calculate_max_num_blocks_for_vswa (the loop
using window_size_to_layers, window_size_shares, calculate_cache_size_per_token,
kv_cache_config, primary_tokens/secondary_tokens,
primary_blocks/secondary_blocks and logger) which is repeated later; remove this
earlier duplicate pass and keep a single canonical computation (the later block
at ~1474-1506), or consolidate both into one function, ensuring
calculate_cache_size_per_token remains defined once (used by the single pass),
window_size_shares logic (env TRTLLM_WINDOW_SIZE_SHARES fallback) is preserved,
and all logging and kv_cache_config/is_vswa handling is performed only in that
single location to avoid redundant compute and logs.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@tensorrt_llm/_torch/pyexecutor/resource_manager.py`:
- Around line 422-486: The code runs the same cross-rank MIN allreduce and
logging twice for blocks_per_window; remove the duplicate second if
mapping.world_size > 1 block (the one that repeats the allreduce loops and the
two logger.info calls) so the reduction and logs only occur once; locate the
duplicate by searching for the repeated use of mapping.world_size,
mpi_disabled(), torch_comm()/mpi_comm(), MPI.MIN or
torch.distributed.ReduceOp.MIN, and logger.info related to "Original
blocks_per_window" / "Reduced blocks_per_window" and delete the redundant
section.
- Around line 1178-1333: The VSWA section contains merge-artifact duplicates
causing syntax errors: in adjust_window_sizes_for_vswa remove the extra
duplicated self parameter in the signature and delete duplicate statements (the
repeated total_kv_heads assignment and all duplicated logger.info calls inside
adjust_window_sizes_for_vswa), then remove the duplicate definition of
calculate_max_num_blocks_for_vswa so only one valid def remains; ensure
adjust_window_sizes_for_vswa and calculate_max_num_blocks_for_vswa have single,
correctly formatted signatures and that logging calls are not repeated.

---

Nitpick comments:
In `@tensorrt_llm/_torch/pyexecutor/resource_manager.py`:
- Around line 415-417: There is a stray tuple assignment to blocks_per_window
before the real call; remove the redundant line so blocks_per_window is only set
by the actual call to calculate_max_num_blocks_for_vswa(...). Locate the
assignments around the calculate_max_num_blocks_for_vswa invocation in
resource_manager.py (references: blocks_per_window and
calculate_max_num_blocks_for_vswa, and the KvCacheConfig assertion) and delete
the dead tuple-assignment statement so the single, correct assignment remains.
- Around line 1410-1473: The code contains a duplicated "pre-pass" that computes
blocks_per_window and logs window/allocation info inside
calculate_max_num_blocks_for_vswa (the loop using window_size_to_layers,
window_size_shares, calculate_cache_size_per_token, kv_cache_config,
primary_tokens/secondary_tokens, primary_blocks/secondary_blocks and logger)
which is repeated later; remove this earlier duplicate pass and keep a single
canonical computation (the later block at ~1474-1506), or consolidate both into
one function, ensuring calculate_cache_size_per_token remains defined once (used
by the single pass), window_size_shares logic (env TRTLLM_WINDOW_SIZE_SHARES
fallback) is preserved, and all logging and kv_cache_config/is_vswa handling is
performed only in that single location to avoid redundant compute and logs.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 2e576cef-f543-4a34-8abd-dd4512a612ac

📥 Commits

Reviewing files that changed from the base of the PR and between 12f2f39 and eb53b26.

📒 Files selected for processing (13)
  • docs/source/blogs/tech_blog/blog5_Disaggregated_Serving_in_TensorRT-LLM.md
  • docs/source/deployment-guide/configuring-cpu-affinity.md
  • docs/source/features/disagg-serving.md
  • docs/source/release-notes.md
  • examples/models/core/multimodal/README.md
  • tensorrt_llm/_torch/pyexecutor/resource_manager.py
  • tensorrt_llm/commands/bench.py
  • tensorrt_llm/tokenizer/tokenizer.py
  • tests/integration/defs/perf/disagg/execution/executor.py
  • tests/integration/defs/perf/disagg/reporting/report.py
  • tests/integration/defs/perf/disagg/testlist/release_sanity.txt
  • tests/integration/test_lists/waives.txt
  • triton_backend/all_models/multimodal/Deprecation_notice.md

@dominicshanshan dominicshanshan changed the title [None][chroe] Mass integration of release/1.2 - 6th [None][chore] Mass integration of release/1.2 - 6th Mar 5, 2026
@dominicshanshan dominicshanshan changed the title [None][chore] Mass integration of release/1.2 - 6th [None][chroe] Mass integration of release/1.2 - 6th Mar 5, 2026
@dominicshanshan dominicshanshan changed the title [None][chroe] Mass integration of release/1.2 - 6th [None][chroe] Mass integration of release/1.2 weekly - 6th Mar 5, 2026
@dominicshanshan
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #37847 [ run ] triggered by Bot. Commit: fd41d49 Link to invocation

kaiyux and others added 2 commits March 5, 2026 01:52
…ughput_mtp test (NVIDIA#11717)

Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
FrankD412 and others added 8 commits March 5, 2026 05:15
…mand help to work. (NVIDIA#11722)

Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
…ing hang with asymmetric PP/TP (NVIDIA#11789)

Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
…1779)

Signed-off-by: Mindy Li <11663212+limin2021@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
Signed-off-by: yuhangh <58161490+heyuhhh@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
Signed-off-by: yingguo-trt <244492186+yingguo-trt@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
@dominicshanshan
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #37875 [ run ] triggered by Bot. Commit: 7e92ce0 Link to invocation

@tensorrt-cicd
Copy link
Collaborator

PR_Github #37875 [ run ] completed with state SUCCESS. Commit: 7e92ce0
/LLM/main/L0_MergeRequest_PR pipeline #29326 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@dominicshanshan
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #37940 [ run ] triggered by Bot. Commit: 7e92ce0 Link to invocation

@dominicshanshan
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #37940 [ run ] completed with state SUCCESS. Commit: 7e92ce0
/LLM/main/L0_MergeRequest_PR pipeline #29384 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@tensorrt-cicd
Copy link
Collaborator

PR_Github #37977 [ run ] triggered by Bot. Commit: 7e92ce0 Link to invocation

@dominicshanshan
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #37979 [ run ] triggered by Bot. Commit: 7e92ce0 Link to invocation

@2ez4bz 2ez4bz changed the title [None][chroe] Mass integration of release/1.2 weekly - 6th [None][chore] Mass integration of release/1.2 weekly - 6th Mar 6, 2026
@tensorrt-cicd
Copy link
Collaborator

PR_Github #37979 [ run ] completed with state SUCCESS. Commit: 7e92ce0
/LLM/main/L0_MergeRequest_PR pipeline #29414 completed with status: 'SUCCESS'

Link to invocation

full:DGX_H100/unittest/_torch/auto_deploy/unit/multigpu/transformations/library/test_bmm_sharding.py::test_sharding[1-1] SKIP (https://nvbugs/5936322) is duplicated by unittest/_torch/auto_deploy/unit/multigpu/transformations/library/test_bmm_sharding.py::test_sharding[1-1] SKIP (https://nvbugs/5875203).

Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
@chzblych
Copy link
Collaborator

chzblych commented Mar 7, 2026

/bot reuse-pipeline

@tensorrt-cicd
Copy link
Collaborator

PR_Github #38094 [ reuse-pipeline ] triggered by Bot. Commit: d170f8d Link to invocation

@tensorrt-cicd
Copy link
Collaborator

PR_Github #38094 [ reuse-pipeline ] completed with state SUCCESS. Commit: d170f8d
Reusing PR_Github #37979 for commit d170f8d

Link to invocation

@chzblych chzblych merged commit a0a9e33 into NVIDIA:main Mar 7, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.