[pull] main from NVIDIA:main by pull[bot] · Pull Request #17 · yingguo-trt/TensorRT-LLM

pull · 2025-12-29T21:28:52Z

See Commits and Changes for more details.

Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

…kernels (#10304) Add a transform to relace torch.ops.auto_deploy.torch_quant_nvfp4_moe with the optimized torch.ops.auto_deploy.trtllm_quant_nvfp4_moe_fused. Currently generates the wrong results when the number of rows in MoE FC1 weights is not divisible by 128, so torch.ops.auto_deploy.trtllm_quant_nvfp4_moe_fused is not set as the default FP4 MoE implementation (i.e. the transform is disabled). Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>

pull Bot locked and limited conversation to collaborators Dec 29, 2025

pull Bot added the ⤵️ pull label Dec 29, 2025

pull Bot merged commit 966231d into yingguo-trt:main Dec 29, 2025
1 of 3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pull] main from NVIDIA:main#17

[pull] main from NVIDIA:main#17
pull[bot] merged 1 commit intoyingguo-trt:mainfrom
NVIDIA:main

pull Bot commented Dec 29, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

pull Bot commented Dec 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

pull Bot commented Dec 29, 2025 •

edited

Loading