Skip to content

[TRTLLM-10303][feat] Deprecate trtllm-serve CLI options#12106

Open
JunyiXu-nv wants to merge 1 commit intoNVIDIA:mainfrom
JunyiXu-nv:dev-junyix-feat-serve-cli-deprecations
Open

[TRTLLM-10303][feat] Deprecate trtllm-serve CLI options#12106
JunyiXu-nv wants to merge 1 commit intoNVIDIA:mainfrom
JunyiXu-nv:dev-junyix-feat-serve-cli-deprecations

Conversation

@JunyiXu-nv
Copy link
Collaborator

@JunyiXu-nv JunyiXu-nv commented Mar 11, 2026

  • TRTLLM-10303: Deprecate --moe_cluster_parallel_size / --cluster_size (smart router feature no longer supported)
  • TRTLLM-10230: Deprecate --metrics-log-interval in disaggregated command (not connected to any functionality)
  • TRTLLM-10228: Deprecate --fail_fast_on_attention_window_too_large, default to True (only affects TRT backend which is being removed)

All deprecated options emit DeprecationWarning when used and have updated help text. API stability references updated accordingly.

Summary by CodeRabbit

  • Configuration Updates
    • Attention window size validation now defaults to fail-fast behavior for improved error detection during initialization.
    • Three configuration options marked as deprecated: MOE cluster parallel sizing, attention window validation handling, and metrics logging intervals.
    • Runtime warnings display when deprecated options are used.

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

…parallel_size, metrics-log-interval, and fail_fast_on_attention_window_too_large

- TRTLLM-10303: Deprecate --moe_cluster_parallel_size / --cluster_size
  (smart router feature no longer supported)
- TRTLLM-10230: Deprecate --metrics-log-interval in disaggregated command
  (not connected to any functionality)
- TRTLLM-10228: Deprecate --fail_fast_on_attention_window_too_large, default
  to True (only affects TRT backend which is being removed)

All deprecated options emit DeprecationWarning when used and have updated
help text. API stability references updated accordingly.

Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
Made-with: Cursor
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
Made-with: Cursor
@JunyiXu-nv JunyiXu-nv requested review from LinPoly, QiJune and arysef March 11, 2026 08:00
@JunyiXu-nv JunyiXu-nv requested review from a team as code owners March 11, 2026 08:00
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 11, 2026

📝 Walkthrough

Walkthrough

This PR deprecates three CLI options (moe_cluster_parallel_size, fail_fast_on_attention_window_too_large, and metrics_log_interval) across the CLI and API layers, flips the default value of fail_fast_on_attention_window_too_large from False to True, and adds runtime deprecation warnings.

Changes

Cohort / File(s) Summary
CLI Deprecation and Defaults
tensorrt_llm/commands/serve.py
Marked moe_cluster_parallel_size, fail_fast_on_attention_window_too_large, and metrics_log_interval as deprecated in CLI help text; flipped fail_fast_on_attention_window_too_large default from False to True; added runtime warnings for deprecated options in serve() and disaggregated() entry points.
API Layer Deprecation
tensorrt_llm/llmapi/llm_args.py
Updated moe_cluster_parallel_size status from beta to deprecated; updated fail_fast_on_attention_window_too_large status from prototype to deprecated and changed default from False to True.
Test Reference Updates
tests/unittest/api_stability/references/llm.yaml
Updated moe_cluster_parallel_size status from beta to deprecated to align with API layer changes.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ❓ Inconclusive PR description provides clear rationale for deprecations with JIRA ticket references, but lacks formal section structure from template. Expand description section with more detail on why these options are being deprecated and what users should do instead; fill out Test Coverage and Description sections of template.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change—deprecating trtllm-serve CLI options—and is directly related to all changes in the changeset.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tensorrt_llm/commands/serve.py`:
- Around line 1011-1015: The help text for the CLI option metrics_log_interval
incorrectly claims the option is disconnected while its value is still threaded
into OpenAIDisaggServer(metrics_interval_secs=metrics_log_interval); either stop
passing metrics_log_interval into OpenAIDisaggServer (remove the argument or
pass None/default) or soften the help text to indicate it is deprecated but
still forwarded to the server (e.g., "Deprecated: will be removed in a future
release; currently forwarded to OpenAIDisaggServer as metrics_interval_secs").
Update both occurrences where the flag is defined and where it's passed to
OpenAIDisaggServer to keep behavior and messaging consistent.

In `@tensorrt_llm/llmapi/llm_args.py`:
- Around line 2233-2236: Add a runtime deprecation warning in the model-layer
validators so explicitly set deprecated fields (e.g., moe_cluster_parallel_size
and the other deprecated fields around lines 2627-2631) trigger a warning when
callers provide them; update BaseLlmArgs (and TrtLlmArgs if it overrides
validation) to include a root_validator or field-specific `@validator` that checks
if the field value is not None and then emits a deprecation warning (use
warnings.warn(..., DeprecationWarning) or the project logging facility) while
preserving the existing value, and ensure the validator names reference the
exact field names (moe_cluster_parallel_size and the other deprecated field
names) so config-driven and direct API usage both surface the deprecation.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: b143a766-b418-4c82-a2a6-db463801f9ca

📥 Commits

Reviewing files that changed from the base of the PR and between e03e361 and 9ba721c.

📒 Files selected for processing (3)
  • tensorrt_llm/commands/serve.py
  • tensorrt_llm/llmapi/llm_args.py
  • tests/unittest/api_stability/references/llm.yaml

@JunyiXu-nv
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #38562 [ run ] triggered by Bot. Commit: 9ba721c Link to invocation

@tensorrt-cicd
Copy link
Collaborator

PR_Github #38562 [ run ] completed with state SUCCESS. Commit: 9ba721c
/LLM/main/L0_MergeRequest_PR pipeline #29904 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants