Skip to content

[pull] main from NVIDIA:main#17

Merged
pull[bot] merged 1 commit intoyingguo-trt:mainfrom
NVIDIA:main
Dec 29, 2025
Merged

[pull] main from NVIDIA:main#17
pull[bot] merged 1 commit intoyingguo-trt:mainfrom
NVIDIA:main

Conversation

@pull
Copy link
Copy Markdown

@pull pull Bot commented Dec 29, 2025

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

…kernels (#10304)

Add a transform to relace torch.ops.auto_deploy.torch_quant_nvfp4_moe
with the optimized torch.ops.auto_deploy.trtllm_quant_nvfp4_moe_fused.

Currently generates the wrong results when the number of rows in MoE FC1 weights is not divisible by 128,
so torch.ops.auto_deploy.trtllm_quant_nvfp4_moe_fused is not set as the default FP4 MoE implementation (i.e. the transform is disabled).

Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>
@pull pull Bot locked and limited conversation to collaborators Dec 29, 2025
@pull pull Bot added the ⤵️ pull label Dec 29, 2025
@pull pull Bot merged commit 966231d into yingguo-trt:main Dec 29, 2025
1 of 3 checks passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant