Fix trace-bmm-fp8 test: B should be K-major for subword types by xrq-phys · Pull Request #3184 · flashinfer-ai/flashinfer

xrq-phys · 2026-04-26T19:00:13Z

📌 Description

Issue: Upstream change has introduced a failing CI test case: tests/trace/test_reference_correctness.py::test_bmm_fp8_reference_correctness

Cause: flashinfer.bmm_bf16, flashinfer.bmm_fp8 (any sub-32 dtypes) expect K-major inputs. The cutlass backend checks for this but the default fp8 backend doesn't, causing wrong results.

🔍 Related Issues

Current CI runs

🚀 Pull Request Checklist

Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.

✅ Pre-commit Checks

I have installed pre-commit by running pip install pre-commit (or used your preferred method).
I have installed the hooks with pre-commit install.
I have run the hooks manually with pre-commit run --all-files and fixed any reported issues.

If you are unsure about how to set up pre-commit, see the pre-commit documentation.

🧪 Tests

Tests have been added or updated as needed.
All tests are passing (unittest, etc.).

Reviewer Notes

Summary by CodeRabbit

Tests
- Improved correctness validation for BF16 and FP8 matrix operations by adjusting test inputs to align with kernel expectations.
- Kept original reference comparisons unchanged to ensure consistent validation.
- Preserved existing behavior for skipping when kernels are unavailable and retained the same closeness thresholds.

coderabbitai · 2026-04-26T19:00:26Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 25e3b61a-6e00-4f0b-9a76-ae2be0fed923

📥 Commits

Reviewing files that changed from the base of the PR and between 162eca5 and cebc7a3.

📒 Files selected for processing (1)

tests/trace/test_reference_correctness.py

🚧 Files skipped from review as they are similar to previous changes (1)

tests/trace/test_reference_correctness.py

📝 Walkthrough

Walkthrough

The BMM correctness tests now create contiguity-preserving versions of the batched b operands (b_kmaj, b_fp8_kmaj) via transpose→contiguous→transpose before invoking flashinfer.bmm_bf16 / flashinfer.bmm_fp8; reference trace comparisons still use the original b inputs. No public APIs changed.

Changes

Cohort / File(s)	Summary
Test Preprocessing `tests/trace/test_reference_correctness.py`	Create layout-adjusted tensors for `b` (`b_kmaj`, `b_fp8_kmaj`) using transpose→contiguous→transpose; call `flashinfer.bmm_bf16` / `flashinfer.bmm_fp8` with these adjusted tensors while keeping reference traces computed from original `b`/`b_fp8`. Exception/skip logic and closeness checks unchanged.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Suggested labels

op: gemm

Suggested reviewers

saltyminty
bkryu
aleozlx
sricketts
yongwww
yzh119
cyx-6

Poem

🐰 I hop through tensors, neat and spry,
I flip and hold them, nice and dry,
I make them solid, row by row,
Kernels dance where numbers flow,
A snack of bytes — then off I fly.

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	Title clearly and concisely summarizes the main change: fixing K-major layout requirement for B tensor in BMM tests for subword types (FP8, BF16).
Description check	✅ Passed	Description includes issue closure, root cause analysis, and checked pre-commit/test verification items, meeting template requirements effectively.
Linked Issues check	✅ Passed	Code changes enforce K-major layout for B tensor in BMM tests, directly addressing issue `#3188`'s requirement to fix failing test by ensuring proper tensor layout for subword dtypes.
Out of Scope Changes check	✅ Passed	All modifications are confined to test input preprocessing to enforce K-major layout, staying within scope of fixing the BMM correctness test without unrelated changes.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

xrq-phys · 2026-04-26T19:00:56Z

/bot run

flashinfer-bot · 2026-04-26T19:01:49Z

GitLab MR !605 has been created, and the CI pipeline #49554921 is currently running. I'll report back once the pipeline job completes.

coderabbitai

🧹 Nitpick comments (1)

tests/trace/test_reference_correctness.py (1)
2170-2172: Correct K-major preprocessing for bmm_bf16.

The transpose(1,2).contiguous().transpose(1,2) idiom produces a (B, K, N) view with stride (N*K, 1, K) (K-stride = 1), which matches the column-major layout bmm_bf16 documents for B. Logical values are preserved, so passing original b to the reference remains correct.

Optional: a one-line comment would help future readers understand why the seemingly-noop pattern is necessary — i.e., sub-32-bit BMMs require K-major B, but only the cutlass backend enforces it.
📝 Optional clarifying comment
     a = torch.randn(B, M, K, dtype=torch.bfloat16, device="cuda")
     b = torch.randn(B, K, N, dtype=torch.bfloat16, device="cuda")
+    # bmm_bf16 requires B in K-major (column-major) layout; round-trip through
+    # contiguous() to get strides (N*K, 1, K) without changing logical values.
     b_kmaj = b.transpose(1, 2).contiguous().transpose(1, 2)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/trace/test_reference_correctness.py` around lines 2170 - 2172, The
pre-processing using b_kmaj = b.transpose(1, 2).contiguous().transpose(1, 2) is
a no-op for logical values but was used to get K-major strides required only by
the cutlass backend; update the test to pass the original b (not b_kmaj) to the
reference path and keep the cutlass call as-is (api = flashinfer.bmm_bf16(a,
b_kmaj, backend="cutlass")), and add a one-line comment near b_kmaj explaining
that the transpose/contiguous/transpose is only to enforce K-major memory layout
for cutlass and that logical values are unchanged so the reference uses the
original b.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tests/trace/test_reference_correctness.py`:
- Around line 2170-2172: The pre-processing using b_kmaj = b.transpose(1,
2).contiguous().transpose(1, 2) is a no-op for logical values but was used to
get K-major strides required only by the cutlass backend; update the test to
pass the original b (not b_kmaj) to the reference path and keep the cutlass call
as-is (api = flashinfer.bmm_bf16(a, b_kmaj, backend="cutlass")), and add a
one-line comment near b_kmaj explaining that the transpose/contiguous/transpose
is only to enforce K-major memory layout for cutlass and that logical values are
unchanged so the reference uses the original b.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c60b9edf-6ad1-467f-958f-dcc573d8be88

📥 Commits

Reviewing files that changed from the base of the PR and between 5e1318c and 1c1a4bd.

📒 Files selected for processing (1)

tests/trace/test_reference_correctness.py

gemini-code-assist

Code Review

This pull request modifies the bmm_bf16 and bmm_fp8 reference correctness tests to utilize K-major layout tensors when calling the FlashInfer API. Feedback suggests also using these K-major tensors in the reference implementation calls to maintain consistency and ensure that both the kernel and the reference are tested against the same memory representation.

xrq-phys · 2026-04-27T05:55:57Z

@saltyminty could you approving / merging this PR?

#2711 SageAttn (presumably other CI runs also) is blocked by this failure.

CC @YangXu1990uiuc for vis.

Thanks!

xrq-phys · 2026-04-27T05:56:36Z

/bot help

flashinfer-bot · 2026-04-27T05:58:04Z

FlashInfer CI Bot

Available Commands:

/bot run - Mirror this PR to GitLab and run CI pipeline
/bot status - Check current pipeline status
/bot stop - Cancel running pipeline
/bot help - Show this help message

Authorization:

Only whitelisted users can trigger CI. Contact a maintainer for access.

How It Works:

Authorized user comments /bot run on a PR
Bot mirrors PR to internal GitLab
GitLab CI pipeline runs automatically
Results are posted back to this PR

Note: Any whitelisted user can trigger CI for any PR, not just their own.

saltyminty · 2026-04-27T22:51:02Z

CI looks good (failures are node allocation timeouts)

xrq-phys · 2026-04-28T03:50:43Z

@saltyminty can we skip CI here? Or do we have to wait until nodes are back?

saltyminty · 2026-04-28T05:34:29Z

We can skip internal CI since this change should be safe, but need the pre-merge checks to pass before the merge button appears

Signed-off-by: Ruqing Xu <7891482+xrq-phys@users.noreply.github.com>

xrq-phys requested review from aleozlx, bkryu, cyx-6, jimmyzho, kahyunnam, nv-yunzheq, saltyminty, samuellees, sricketts, yongwww, yyihuang and yzh119 as code owners April 26, 2026 19:00

xrq-phys mentioned this pull request Apr 26, 2026

feat: Add DiT-oriented kernels where Qk (Bmm1) type can be reinterpreted into Int8 or BFloat16 #2711

Open

5 tasks

coderabbitai Bot reviewed Apr 26, 2026

View reviewed changes

gemini-code-assist Bot reviewed Apr 26, 2026

View reviewed changes

Comment thread tests/trace/test_reference_correctness.py

Comment thread tests/trace/test_reference_correctness.py

xrq-phys mentioned this pull request Apr 27, 2026

[Bug]test_bmm_fp8_reference_correctness fails with cos_sim=-0.0019 < 0.99 #3188

Open

saltyminty self-assigned this Apr 27, 2026

saltyminty added the run-ci label Apr 27, 2026

saltyminty approved these changes Apr 27, 2026

View reviewed changes

saltyminty force-pushed the fix/trace-bmm-fp8 branch from 1c1a4bd to 162eca5 Compare April 27, 2026 22:55

saltyminty enabled auto-merge (squash) April 27, 2026 22:56

saltyminty disabled auto-merge April 28, 2026 05:32

Fix trace-bmm-fp8 test: B should be K-major for subword types

cebc7a3

Signed-off-by: Ruqing Xu <7891482+xrq-phys@users.noreply.github.com>

saltyminty force-pushed the fix/trace-bmm-fp8 branch from 162eca5 to cebc7a3 Compare April 28, 2026 17:32

Conversation

xrq-phys commented Apr 26, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📌 Description

🔍 Related Issues

🚀 Pull Request Checklist

✅ Pre-commit Checks

🧪 Tests

Reviewer Notes

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Suggested labels

Suggested reviewers

Poem

Uh oh!

xrq-phys commented Apr 26, 2026

Uh oh!

flashinfer-bot commented Apr 26, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

xrq-phys commented Apr 27, 2026

Uh oh!

xrq-phys commented Apr 27, 2026

Uh oh!

flashinfer-bot commented Apr 27, 2026

FlashInfer CI Bot

Available Commands:

Authorization:

How It Works:

Uh oh!

saltyminty commented Apr 27, 2026

Uh oh!

xrq-phys commented Apr 28, 2026

Uh oh!

saltyminty commented Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

xrq-phys commented Apr 26, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 26, 2026 •

edited

Loading