[fmhav2] skip fp8 tests and add warning by jimmyzho · Pull Request #3050 · flashinfer-ai/flashinfer

jimmyzho · 2026-04-13T19:08:48Z

📌 Description

This PR re-enables the FMHA v2 prefill test suite while properly isolating the known FP8 hang issue.

Previously, the entire test_fmha_v2_prefill.py test file was skipped via a blanket pytestmark = pytest.mark.skip(...), which meant no FMHA v2 prefill tests ran at all — including non-FP8 configurations
(float16, bfloat16) that work correctly.

Changes:

Removed the file-level pytestmark skip — non-FP8 tests (float16/bfloat16) now run again in CI.
Commented out FP8 dtype parametrize entries (float8_e4m3fn) in all test functions instead of skipping
them at runtime. This avoids test collection overhead for known-broken configurations and makes it clear which
combinations are disabled.
Removed now-redundant runtime skips — the per-case pytest.skip() calls for FP8 sliding window bugs,
FP8→FP8 output hangs, and sliding window hangs are no longer needed since those dtype combinations are no longer
parametrized.
Added a logging.warning() in trtllm_fmha_v2_prefill() when FP8 e4m3 inputs are detected, alerting
users that these kernels are known to hang on SM90.

🔍 Related Issues

🚀 Pull Request Checklist

Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.

✅ Pre-commit Checks

I have installed pre-commit by running pip install pre-commit (or used your preferred method).
I have installed the hooks with pre-commit install.
I have run the hooks manually with pre-commit run --all-files and fixed any reported issues.

If you are unsure about how to set up pre-commit, see the pre-commit documentation.

🧪 Tests

Tests have been added or updated as needed.
All tests are passing (unittest, etc.).

Reviewer Notes

Summary by CodeRabbit

Bug Fixes
- Added a runtime warning when using FP8 e4m3 kernels that are known to hang on SM90 devices.
Tests
- Re-enabled FMHA v2 prefill tests for non-FP8 configurations by removing blanket skips.
- Simplified test skip logic to a single FP8-specific skip; sliding-window cases are permitted where applicable.

coderabbitai · 2026-04-13T19:09:04Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a73c242f-3ca4-4397-807d-07cca90cb998

📥 Commits

Reviewing files that changed from the base of the PR and between acb0192 and 0cf9d6f.

📒 Files selected for processing (1)

flashinfer/prefill.py

🚧 Files skipped from review as they are similar to previous changes (1)

flashinfer/prefill.py

📝 Walkthrough

Walkthrough

Adds a runtime logging.warning in trtllm_fmha_v2_prefill when input dtype is FP8 e4m3, and updates tests/attention/test_fmha_v2_prefill.py to remove the module-level skip, simplify FP8 skips (early skip for e4m3), change one FP8 parameterization tuple, and allow sliding-window cases where applicable.

Changes

Cohort / File(s)	Summary
Production Warning `flashinfer/prefill.py`	Add `logging.warning` when `query` is FP8 e4m3 in `trtllm_fmha_v2_prefill`, warning about known SM90 hang before existing FP8/device checks.
Test Reorganization `tests/attention/test_fmha_v2_prefill.py`	Remove module-level skip; replace multiple in-test FP8/sliding-window skips with a single early `pytest.skip` for `torch.float8_e4m3fn`; change one FP8 `(dtype, o_dtype)` entry from `(float8_e4m3fn, bfloat16)` to `(float8_e4m3fn, float16)`; allow `SLIDING_WINDOW` mask cases where other preconditions permit.

Sequence Diagram(s)

(omitted — changes are a warning + test skip/parameterization updates and do not introduce a new multi-component sequential flow)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

tests: skip sliding window + fp8 to prevent hang in fmha_v2 unit tests #2781: Overlapping changes addressing FP8 (e4m3) and sliding-window hang behavior and modifying the same test skip logic.

Suggested reviewers

yzh119
cyx-6
aleozlx
samuellees
nv-yunzheq
sricketts

Poem

🐰 A tiny warning hops along,
FP8 e4m3 hums a cautious song,
Tests now wake and paths unwind,
Sliding windows free to mind,
Carrots, logs, and code — all strong. 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly summarizes the main changes: skipping FP8 tests and adding a warning for SM90 hangs.
Description check	✅ Passed	The description provides detailed context about re-enabling non-FP8 tests while isolating the FP8 hang issue, with clear explanations of all four main changes.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request introduces a warning for FP8 (e4m3) kernels known to hang on SM90 architectures and updates the test suite to skip FP8-related test cases. Feedback was provided regarding the placement of the SM90 hang warning, suggesting it should be moved after the SM120 compatibility check to avoid misleading users on Blackwell hardware.

gemini-code-assist · 2026-04-13T19:12:00Z

        query.dtype == torch.float8_e4m3fn if hasattr(torch, "float8_e4m3fn") else False
    )
    if is_e4m3:
+        logging.warning("The FP8 (e4m3) kernels are currently known to hang on SM90.")


The warning about FP8 kernels hanging on SM90 is issued before the check for SM120 (Blackwell) support. If a user is on an SM120 device, they will see this warning before receiving a ValueError stating that FP8 is not yet supported on their architecture. This is confusing as the hang is specific to SM90. It would be better to move the warning after the SM120 check so it only appears for SM90 users.

coderabbitai

🧹 Nitpick comments (1)

tests/attention/test_fmha_v2_prefill.py (1)

785-788: Prefer explicit skipped params over commented-out cases.

Using pytest.param(..., marks=pytest.mark.skip(...)) keeps FP8 cases visible in test reports while still avoiding hangs.

Suggested refactor

 `@pytest.mark.parametrize`(
     ("dtype", "o_dtype"),
     [
         (torch.float16, torch.float16),
         (torch.bfloat16, torch.bfloat16),
-        # todo(jimmyzho) skip all fp8 tests due to unmitigated hangs
-        # (torch.float8_e4m3fn, torch.float8_e4m3fn),
-        # (torch.float8_e4m3fn, torch.bfloat16),
-        # (torch.float8_e4m3fn, torch.float16),
+        pytest.param(
+            torch.float8_e4m3fn,
+            torch.float8_e4m3fn,
+            marks=pytest.mark.skip(reason="Known FP8 e4m3 hangs (tracked)"),
+        ),
+        pytest.param(
+            torch.float8_e4m3fn,
+            torch.bfloat16,
+            marks=pytest.mark.skip(reason="Known FP8 e4m3 hangs (tracked)"),
+        ),
+        pytest.param(
+            torch.float8_e4m3fn,
+            torch.float16,
+            marks=pytest.mark.skip(reason="Known FP8 e4m3 hangs (tracked)"),
+        ),
     ],
 )
@@
 `@pytest.mark.parametrize`(
     ("dtype", "o_dtype"),
     [
         (torch.float16, torch.float16),
         (torch.bfloat16, torch.bfloat16),
-        # todo(jimmyzho) skip all fp8 tests due to unmitigated hangs
-        # (torch.float8_e4m3fn, torch.float16),
+        pytest.param(
+            torch.float8_e4m3fn,
+            torch.float16,
+            marks=pytest.mark.skip(reason="Known FP8 e4m3 hangs (tracked)"),
+        ),
     ],
 )

Also applies to: 861-863

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/attention/test_fmha_v2_prefill.py` around lines 785 - 788, Replace the
commented-out FP8 param tuples with explicit pytest skip params so they remain
visible in test reports; for each commented tuple (e.g. the FP8 cases near the
block in tests/attention/test_fmha_v2_prefill.py and the similar block at lines
~861-863), add them back as pytest.param((torch.float8_e4m3fn,
torch.float8_e4m3fn), marks=pytest.mark.skip(reason="skipping FP8 tests due to
hangs")) (and likewise for the other two combinations) so the cases are present
but skipped, preserving the original tuple values and adding a clear skip
reason.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tests/attention/test_fmha_v2_prefill.py`:
- Around line 785-788: Replace the commented-out FP8 param tuples with explicit
pytest skip params so they remain visible in test reports; for each commented
tuple (e.g. the FP8 cases near the block in
tests/attention/test_fmha_v2_prefill.py and the similar block at lines
~861-863), add them back as pytest.param((torch.float8_e4m3fn,
torch.float8_e4m3fn), marks=pytest.mark.skip(reason="skipping FP8 tests due to
hangs")) (and likewise for the other two combinations) so the cases are present
but skipped, preserving the original tuple values and adding a clear skip
reason.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 0b33414a-2524-46f6-b6e6-a53e458da02d

📥 Commits

Reviewing files that changed from the base of the PR and between e64ae8b and c4c4559.

📒 Files selected for processing (2)

flashinfer/prefill.py
tests/attention/test_fmha_v2_prefill.py

jimmyzho · 2026-04-13T19:18:02Z

/bot run

flashinfer-bot · 2026-04-13T19:18:18Z

GitLab MR !540 has been created, and the CI pipeline #48429687 is currently running. I'll report back once the pipeline job completes.

jimmyzho · 2026-04-15T21:37:21Z

/bot run

flashinfer-bot · 2026-04-15T21:38:00Z

GitLab MR !540 has been updated with latest changes, and the CI pipeline #48633505 is currently running. I'll report back once the pipeline job completes.

jimmyzho · 2026-04-17T18:24:45Z

/bot run

flashinfer-bot · 2026-04-17T18:26:28Z

GitLab MR !540 has been updated with latest changes, and the CI pipeline #48806071 is currently running. I'll report back once the pipeline job completes.

skip fp8 tests and add warning

c4c4559

jimmyzho requested review from aleozlx, bkryu, cyx-6, kahyunnam, nv-yunzheq, saltyminty, samuellees, sricketts, yongwww, yyihuang and yzh119 as code owners April 13, 2026 19:08

flashinfer-bot added the op: attention label Apr 13, 2026

gemini-code-assist Bot reviewed Apr 13, 2026

View reviewed changes

coderabbitai Bot reviewed Apr 13, 2026

View reviewed changes

jimmyzho added the run-ci label Apr 13, 2026

bobboli approved these changes Apr 14, 2026

View reviewed changes

Comment thread tests/attention/test_fmha_v2_prefill.py Outdated

Comment thread tests/attention/test_fmha_v2_prefill.py Outdated

jimmyzho mentioned this pull request Apr 14, 2026

FMHA v2 SM90 Readiness #3006

Open

5 tasks

bkryu approved these changes Apr 15, 2026

View reviewed changes

yongwww approved these changes Apr 15, 2026

View reviewed changes

pytest.skip

acb0192

jimmyzho requested a review from qsang-nv as a code owner April 15, 2026 21:08

ci

19655aa

jimmyzho enabled auto-merge (squash) April 21, 2026 00:15

Merge branch 'main' into skip-fmhav2

0cf9d6f

jimmyzho merged commit 9e3d8b9 into flashinfer-ai:main Apr 21, 2026
27 of 41 checks passed

coderabbitai Bot mentioned this pull request Apr 25, 2026

doc: align user-facing SM120 messages with SM12x dispatch #3174

Open

5 tasks

jimmyzho mentioned this pull request Apr 28, 2026

[Perf] Add FMHAv2 to flashinfer_benchmark.py and eliminate unnecessary H2D #2841

Open

2 tasks

Conversation

jimmyzho commented Apr 13, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📌 Description

🔍 Related Issues

🚀 Pull Request Checklist

✅ Pre-commit Checks

🧪 Tests

Reviewer Notes

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

jimmyzho commented Apr 13, 2026

Uh oh!

flashinfer-bot commented Apr 13, 2026

Uh oh!

Uh oh!

Uh oh!

jimmyzho commented Apr 15, 2026

Uh oh!

flashinfer-bot commented Apr 15, 2026

Uh oh!

jimmyzho commented Apr 17, 2026

Uh oh!

flashinfer-bot commented Apr 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

jimmyzho commented Apr 13, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 13, 2026 •

edited

Loading