Simplified NVFP4 quantize kernel for Torch API (#152) by jwfromm · Pull Request #152 · meta-pytorch/MSLK

jwfromm · 2026-02-14T00:15:08Z

Summary:

This diff reworks the mslk nvfp4 stacked quantize kernel to hopefully be a bit simpler. As can be seen in gemm_ops.py, the new op minimizes extra artifacts needed for using the torch api for fp4fp4bf16_grouped_mm. This kernel is as performant as the mega kernel and hopefully robust, as shown in the added tests.

Reviewed By: jiawenliu64

Differential Revision: D93169309

meta-codesync · 2026-02-14T00:15:15Z

@jwfromm has exported this pull request. If you are a Meta employee, you can view the originating Diff in D93169309.

Summary: This diff reworks the mslk nvfp4 stacked quantize kernel to hopefully be a bit simpler. As can be seen in gemm_ops.py, the new op minimizes extra artifacts needed for using the torch api for fp4fp4bf16_grouped_mm. This kernel is as performant as the mega kernel and hopefully robust, as shown in the added tests. Reviewed By: jiawenliu64 Differential Revision: D93169309

meta-cla bot added the cla signed label Feb 14, 2026

meta-codesync bot added fb-exported meta-exported labels Feb 14, 2026

jwfromm force-pushed the export-D93169309 branch from 7d6210e to 5f8c934 Compare March 10, 2026 21:27

jwfromm force-pushed the export-D93169309 branch from 5f8c934 to 5a07c6e Compare March 10, 2026 21:57

jwfromm force-pushed the export-D93169309 branch 2 times, most recently from 33966e0 to a8a9ea5 Compare March 12, 2026 19:51

jwfromm force-pushed the export-D93169309 branch from a8a9ea5 to ae1f096 Compare March 12, 2026 19:53

meta-codesync bot changed the title ~~Simplified NVFP4 quantize kernel for Torch API~~ Simplified NVFP4 quantize kernel for Torch API (#152) Mar 13, 2026

jwfromm force-pushed the export-D93169309 branch from ae1f096 to 662a315 Compare March 13, 2026 16:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simplified NVFP4 quantize kernel for Torch API (#152)#152

Simplified NVFP4 quantize kernel for Torch API (#152)#152
jwfromm wants to merge 1 commit intometa-pytorch:mainfrom
jwfromm:export-D93169309

jwfromm commented Feb 14, 2026 •

edited by meta-codesync bot

Loading

Uh oh!

meta-codesync bot commented Feb 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jwfromm commented Feb 14, 2026 • edited by meta-codesync bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

meta-codesync bot commented Feb 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jwfromm commented Feb 14, 2026 •

edited by meta-codesync bot

Loading