Skip to content

Simplified NVFP4 quantize kernel for Torch API (#152)#152

Open
jwfromm wants to merge 1 commit intometa-pytorch:mainfrom
jwfromm:export-D93169309
Open

Simplified NVFP4 quantize kernel for Torch API (#152)#152
jwfromm wants to merge 1 commit intometa-pytorch:mainfrom
jwfromm:export-D93169309

Conversation

@jwfromm
Copy link
Contributor

@jwfromm jwfromm commented Feb 14, 2026

Summary:

This diff reworks the mslk nvfp4 stacked quantize kernel to hopefully be a bit simpler. As can be seen in gemm_ops.py, the new op minimizes extra artifacts needed for using the torch api for fp4fp4bf16_grouped_mm. This kernel is as performant as the mega kernel and hopefully robust, as shown in the added tests.

Reviewed By: jiawenliu64

Differential Revision: D93169309

@meta-cla meta-cla bot added the cla signed label Feb 14, 2026
@meta-codesync
Copy link

meta-codesync bot commented Feb 14, 2026

@jwfromm has exported this pull request. If you are a Meta employee, you can view the originating Diff in D93169309.

jwfromm added a commit to jwfromm/MSLK-1 that referenced this pull request Mar 10, 2026
Summary:

This diff reworks the mslk nvfp4 stacked quantize kernel to hopefully be a bit simpler. As can be seen in gemm_ops.py, the new op minimizes extra artifacts needed for using the torch api for fp4fp4bf16_grouped_mm. This kernel is as performant as the mega kernel and hopefully robust, as shown in the added tests.

Reviewed By: jiawenliu64

Differential Revision: D93169309
jwfromm added a commit to jwfromm/MSLK-1 that referenced this pull request Mar 10, 2026
Summary:

This diff reworks the mslk nvfp4 stacked quantize kernel to hopefully be a bit simpler. As can be seen in gemm_ops.py, the new op minimizes extra artifacts needed for using the torch api for fp4fp4bf16_grouped_mm. This kernel is as performant as the mega kernel and hopefully robust, as shown in the added tests.

Reviewed By: jiawenliu64

Differential Revision: D93169309
jwfromm added a commit to jwfromm/MSLK-1 that referenced this pull request Mar 10, 2026
Summary:

This diff reworks the mslk nvfp4 stacked quantize kernel to hopefully be a bit simpler. As can be seen in gemm_ops.py, the new op minimizes extra artifacts needed for using the torch api for fp4fp4bf16_grouped_mm. This kernel is as performant as the mega kernel and hopefully robust, as shown in the added tests.

Reviewed By: jiawenliu64

Differential Revision: D93169309
@jwfromm jwfromm force-pushed the export-D93169309 branch 2 times, most recently from 33966e0 to a8a9ea5 Compare March 12, 2026 19:51
jwfromm added a commit to jwfromm/MSLK-1 that referenced this pull request Mar 12, 2026
Summary:

This diff reworks the mslk nvfp4 stacked quantize kernel to hopefully be a bit simpler. As can be seen in gemm_ops.py, the new op minimizes extra artifacts needed for using the torch api for fp4fp4bf16_grouped_mm. This kernel is as performant as the mega kernel and hopefully robust, as shown in the added tests.

Reviewed By: jiawenliu64

Differential Revision: D93169309
jwfromm added a commit to jwfromm/MSLK-1 that referenced this pull request Mar 12, 2026
Summary:

This diff reworks the mslk nvfp4 stacked quantize kernel to hopefully be a bit simpler. As can be seen in gemm_ops.py, the new op minimizes extra artifacts needed for using the torch api for fp4fp4bf16_grouped_mm. This kernel is as performant as the mega kernel and hopefully robust, as shown in the added tests.

Reviewed By: jiawenliu64

Differential Revision: D93169309
Summary:

This diff reworks the mslk nvfp4 stacked quantize kernel to hopefully be a bit simpler. As can be seen in gemm_ops.py, the new op minimizes extra artifacts needed for using the torch api for fp4fp4bf16_grouped_mm. This kernel is as performant as the mega kernel and hopefully robust, as shown in the added tests.

Reviewed By: jiawenliu64

Differential Revision: D93169309
@meta-codesync meta-codesync bot changed the title Simplified NVFP4 quantize kernel for Torch API Simplified NVFP4 quantize kernel for Torch API (#152) Mar 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant