[None][fix] Fix FP8 per-tensor torch.compile graph break in dynamic quantization by karljang · Pull Request #11759 · NVIDIA/TensorRT-LLM

karljang · 2026-02-26T22:27:13Z

Summary

The tensorrt_llm::quantize_e4m3_per_tensor lacks a register_fake implementation. Without register_fake, torch.compile's Dynamo tracer cannot infer output shape/dtype metadata, causing a graph break at every dynamic quantization call.

Added register_fake for tensorrt_llm::quantize_e4m3_per_tensor in cpp_custom_ops.py, matching the pattern already used for the static variant (static_quantize_e4m3_per_tensor).

Test plan

Verify FP8 per-tensor eager mode produces identical outputs (same kernel)
Verify torch.compile produces single monolithic FX graph
Run existing FP8 per-tensor unit tests

Observation

Impact on FLUX.2 (B200, 1024x1024, 50 steps, torch.compile):

Before: 36 subgraphs, 491 traced nodes (~3% compile coverage)
After: 1 subgraph, 6,431 traced nodes (full compile coverage)

No latency change observed (GEMMs dominate runtime), but the fix produces a correct monolithic FX graph that enables future Inductor optimizations requiring whole-graph visibility.

Summary by CodeRabbit

Chores
- Internal update to quantization operation namespace path. No changes to user-facing functionality or behavior.

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

coderabbitai · 2026-02-26T22:32:04Z

📝 Walkthrough

Walkthrough

A namespace path is updated for a per-tensor quantization operator in the linear module. The call changes from torch.ops.tensorrt_llm.quantize_e4m3_per_tensor to torch.ops.trtllm.quantize_e4m3_per_tensor, with arguments and return values remaining unchanged.

Changes

Cohort / File(s)	Summary
Quantization Operator Namespace Update `tensorrt_llm/_torch/modules/linear.py`	Updated operator namespace from `torch.ops.tensorrt_llm` to `torch.ops.trtllm` for the `quantize_e4m3_per_tensor` call during dynamic quantization.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	Title clearly describes the main change: fixing FP8 per-tensor torch.compile graph fragmentation in dynamic quantization by switching operator namespaces.
Description check	✅ Passed	PR description clearly explains the issue, solution, and test coverage with specific metrics demonstrating the fix's impact.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

…uantization The C++ op tensorrt_llm::quantize_e4m3_per_tensor (registered via TORCH_LIBRARY_FRAGMENT in fp8Op.cpp) lacks a register_fake implementation. Without register_fake, torch.compile's Dynamo tracer cannot infer output shape/dtype metadata, causing a graph break at every dynamic quantization call. Add register_fake for tensorrt_llm::quantize_e4m3_per_tensor in cpp_custom_ops.py, matching the pattern already used for the static variant (static_quantize_e4m3_per_tensor). Impact on FLUX.2 (B200, 1024x1024, 50 steps, torch.compile): - Before: 36 subgraphs, 491 traced nodes (~8% compile coverage) - After: 1 subgraph, 6,431 traced nodes (full compile coverage) No latency change observed (GEMMs dominate runtime), but the fix produces a correct monolithic FX graph that enables future Inductor optimizations requiring whole-graph visibility. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Kanghwan Jang <861393+karljang@users.noreply.github.com>

karljang · 2026-02-27T00:42:18Z

/bot run

tensorrt-cicd · 2026-02-27T00:48:39Z

PR_Github #36981 [ run ] triggered by Bot. Commit: 2bdd317 Link to invocation

tensorrt_llm/_torch/custom_ops/cpp_custom_ops.py

tensorrt-cicd · 2026-02-27T09:33:33Z

PR_Github #36981 [ run ] completed with state SUCCESS. Commit: 2bdd317
/LLM/main/L0_MergeRequest_PR pipeline #28634 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

Link to invocation

tensorrt_llm/_torch/custom_ops/cpp_custom_ops.py

karljang · 2026-03-02T08:27:10Z

Merged as discussed with @liji-nv ~ thank you!

…uantization (NVIDIA#11759) Signed-off-by: Kanghwan Jang <861393+karljang@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

karljang requested review from NVShreyas and chang-l February 26, 2026 22:27

karljang requested a review from a team as a code owner February 26, 2026 22:27

karljang requested a review from yuxianq February 26, 2026 22:27

karljang force-pushed the user/kanghwan/fix-fp8-per-tensor-graph-break branch from e9829e3 to 3e67458 Compare February 26, 2026 22:58

karljang requested a review from a team as a code owner February 26, 2026 22:58

karljang requested a review from liji-nv February 26, 2026 22:58

Merge branch 'main' into user/kanghwan/fix-fp8-per-tensor-graph-break

2bdd317

yuxianq reviewed Feb 27, 2026

View reviewed changes

tensorrt_llm/_torch/custom_ops/cpp_custom_ops.py Show resolved Hide resolved

karljang requested a review from yuxianq February 27, 2026 04:17

yuxianq approved these changes Feb 27, 2026

View reviewed changes

liji-nv requested changes Mar 2, 2026

View reviewed changes

tensorrt_llm/_torch/custom_ops/cpp_custom_ops.py Show resolved Hide resolved

liji-nv approved these changes Mar 2, 2026

View reviewed changes

karljang merged commit 9013b58 into NVIDIA:main Mar 2, 2026
7 checks passed

karljang deleted the user/kanghwan/fix-fp8-per-tensor-graph-break branch March 2, 2026 08:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[None][fix] Fix FP8 per-tensor torch.compile graph break in dynamic quantization#11759

[None][fix] Fix FP8 per-tensor torch.compile graph break in dynamic quantization#11759
karljang merged 2 commits intoNVIDIA:mainfrom
karljang:user/kanghwan/fix-fp8-per-tensor-graph-break

karljang commented Feb 26, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Feb 26, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

karljang commented Feb 27, 2026

Uh oh!

tensorrt-cicd commented Feb 27, 2026

Uh oh!

Uh oh!

tensorrt-cicd commented Feb 27, 2026

Uh oh!

Uh oh!

Uh oh!

karljang commented Mar 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

karljang commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Observation

Summary by CodeRabbit

Description

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

coderabbitai bot commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

karljang commented Feb 27, 2026

Uh oh!

tensorrt-cicd commented Feb 27, 2026

Uh oh!

Uh oh!

tensorrt-cicd commented Feb 27, 2026

Uh oh!

Uh oh!

Uh oh!

karljang commented Mar 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

karljang commented Feb 26, 2026 •

edited

Loading

coderabbitai bot commented Feb 26, 2026 •

edited

Loading