Skip to content

fix: Use fake mxfp8 quant on intermediate tensor of MoE #1

Open
guyueh1 wants to merge 1 commit intoTomerBN-Nvidia:fake-mxfp8from
guyueh1:fake-mxfp8
Open

fix: Use fake mxfp8 quant on intermediate tensor of MoE #1
guyueh1 wants to merge 1 commit intoTomerBN-Nvidia:fake-mxfp8from
guyueh1:fake-mxfp8

Conversation

@guyueh1
Copy link
Copy Markdown

@guyueh1 guyueh1 commented Dec 1, 2025

Purpose

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Guyue Huang <guyueh@nvidia.com>
TomerBN-Nvidia pushed a commit that referenced this pull request Dec 2, 2025
…iton-kernel

Triton kernel, CUDA graphs, and start of prefix caching fixes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant