Skip to content

[fmhav2] skip fp8 tests and add warning#3050

Merged
jimmyzho merged 4 commits intoflashinfer-ai:mainfrom
jimmyzho:skip-fmhav2
Apr 21, 2026
Merged

[fmhav2] skip fp8 tests and add warning#3050
jimmyzho merged 4 commits intoflashinfer-ai:mainfrom
jimmyzho:skip-fmhav2

Conversation

@jimmyzho
Copy link
Copy Markdown
Contributor

@jimmyzho jimmyzho commented Apr 13, 2026

📌 Description

This PR re-enables the FMHA v2 prefill test suite while properly isolating the known FP8 hang issue.

Previously, the entire test_fmha_v2_prefill.py test file was skipped via a blanket pytestmark = pytest.mark.skip(...), which meant no FMHA v2 prefill tests ran at all — including non-FP8 configurations
(float16, bfloat16) that work correctly.

Changes:

  1. Removed the file-level pytestmark skip — non-FP8 tests (float16/bfloat16) now run again in CI.

  2. Commented out FP8 dtype parametrize entries (float8_e4m3fn) in all test functions instead of skipping
    them at runtime. This avoids test collection overhead for known-broken configurations and makes it clear which
    combinations are disabled.

  3. Removed now-redundant runtime skips — the per-case pytest.skip() calls for FP8 sliding window bugs,
    FP8→FP8 output hangs, and sliding window hangs are no longer needed since those dtype combinations are no longer
    parametrized.

  4. Added a logging.warning() in trtllm_fmha_v2_prefill() when FP8 e4m3 inputs are detected, alerting
    users that these kernels are known to hang on SM90.

🔍 Related Issues

🚀 Pull Request Checklist

Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.

✅ Pre-commit Checks

  • I have installed pre-commit by running pip install pre-commit (or used your preferred method).
  • I have installed the hooks with pre-commit install.
  • I have run the hooks manually with pre-commit run --all-files and fixed any reported issues.

If you are unsure about how to set up pre-commit, see the pre-commit documentation.

🧪 Tests

  • Tests have been added or updated as needed.
  • All tests are passing (unittest, etc.).

Reviewer Notes

Summary by CodeRabbit

  • Bug Fixes

    • Added a runtime warning when using FP8 e4m3 kernels that are known to hang on SM90 devices.
  • Tests

    • Re-enabled FMHA v2 prefill tests for non-FP8 configurations by removing blanket skips.
    • Simplified test skip logic to a single FP8-specific skip; sliding-window cases are permitted where applicable.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 13, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a73c242f-3ca4-4397-807d-07cca90cb998

📥 Commits

Reviewing files that changed from the base of the PR and between acb0192 and 0cf9d6f.

📒 Files selected for processing (1)
  • flashinfer/prefill.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • flashinfer/prefill.py

📝 Walkthrough

Walkthrough

Adds a runtime logging.warning in trtllm_fmha_v2_prefill when input dtype is FP8 e4m3, and updates tests/attention/test_fmha_v2_prefill.py to remove the module-level skip, simplify FP8 skips (early skip for e4m3), change one FP8 parameterization tuple, and allow sliding-window cases where applicable.

Changes

Cohort / File(s) Summary
Production Warning
flashinfer/prefill.py
Add logging.warning when query is FP8 e4m3 in trtllm_fmha_v2_prefill, warning about known SM90 hang before existing FP8/device checks.
Test Reorganization
tests/attention/test_fmha_v2_prefill.py
Remove module-level skip; replace multiple in-test FP8/sliding-window skips with a single early pytest.skip for torch.float8_e4m3fn; change one FP8 (dtype, o_dtype) entry from (float8_e4m3fn, bfloat16) to (float8_e4m3fn, float16); allow SLIDING_WINDOW mask cases where other preconditions permit.

Sequence Diagram(s)

(omitted — changes are a warning + test skip/parameterization updates and do not introduce a new multi-component sequential flow)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested reviewers

  • yzh119
  • cyx-6
  • aleozlx
  • samuellees
  • nv-yunzheq
  • sricketts

Poem

🐰 A tiny warning hops along,
FP8 e4m3 hums a cautious song,
Tests now wake and paths unwind,
Sliding windows free to mind,
Carrots, logs, and code — all strong. 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main changes: skipping FP8 tests and adding a warning for SM90 hangs.
Description check ✅ Passed The description provides detailed context about re-enabling non-FP8 tests while isolating the FP8 hang issue, with clear explanations of all four main changes.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a warning for FP8 (e4m3) kernels known to hang on SM90 architectures and updates the test suite to skip FP8-related test cases. Feedback was provided regarding the placement of the SM90 hang warning, suggesting it should be moved after the SM120 compatibility check to avoid misleading users on Blackwell hardware.

Comment thread flashinfer/prefill.py
query.dtype == torch.float8_e4m3fn if hasattr(torch, "float8_e4m3fn") else False
)
if is_e4m3:
logging.warning("The FP8 (e4m3) kernels are currently known to hang on SM90.")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The warning about FP8 kernels hanging on SM90 is issued before the check for SM120 (Blackwell) support. If a user is on an SM120 device, they will see this warning before receiving a ValueError stating that FP8 is not yet supported on their architecture. This is confusing as the hang is specific to SM90. It would be better to move the warning after the SM120 check so it only appears for SM90 users.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
tests/attention/test_fmha_v2_prefill.py (1)

785-788: Prefer explicit skipped params over commented-out cases.

Using pytest.param(..., marks=pytest.mark.skip(...)) keeps FP8 cases visible in test reports while still avoiding hangs.

Suggested refactor
 `@pytest.mark.parametrize`(
     ("dtype", "o_dtype"),
     [
         (torch.float16, torch.float16),
         (torch.bfloat16, torch.bfloat16),
-        # todo(jimmyzho) skip all fp8 tests due to unmitigated hangs
-        # (torch.float8_e4m3fn, torch.float8_e4m3fn),
-        # (torch.float8_e4m3fn, torch.bfloat16),
-        # (torch.float8_e4m3fn, torch.float16),
+        pytest.param(
+            torch.float8_e4m3fn,
+            torch.float8_e4m3fn,
+            marks=pytest.mark.skip(reason="Known FP8 e4m3 hangs (tracked)"),
+        ),
+        pytest.param(
+            torch.float8_e4m3fn,
+            torch.bfloat16,
+            marks=pytest.mark.skip(reason="Known FP8 e4m3 hangs (tracked)"),
+        ),
+        pytest.param(
+            torch.float8_e4m3fn,
+            torch.float16,
+            marks=pytest.mark.skip(reason="Known FP8 e4m3 hangs (tracked)"),
+        ),
     ],
 )
@@
 `@pytest.mark.parametrize`(
     ("dtype", "o_dtype"),
     [
         (torch.float16, torch.float16),
         (torch.bfloat16, torch.bfloat16),
-        # todo(jimmyzho) skip all fp8 tests due to unmitigated hangs
-        # (torch.float8_e4m3fn, torch.float16),
+        pytest.param(
+            torch.float8_e4m3fn,
+            torch.float16,
+            marks=pytest.mark.skip(reason="Known FP8 e4m3 hangs (tracked)"),
+        ),
     ],
 )

Also applies to: 861-863

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/attention/test_fmha_v2_prefill.py` around lines 785 - 788, Replace the
commented-out FP8 param tuples with explicit pytest skip params so they remain
visible in test reports; for each commented tuple (e.g. the FP8 cases near the
block in tests/attention/test_fmha_v2_prefill.py and the similar block at lines
~861-863), add them back as pytest.param((torch.float8_e4m3fn,
torch.float8_e4m3fn), marks=pytest.mark.skip(reason="skipping FP8 tests due to
hangs")) (and likewise for the other two combinations) so the cases are present
but skipped, preserving the original tuple values and adding a clear skip
reason.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tests/attention/test_fmha_v2_prefill.py`:
- Around line 785-788: Replace the commented-out FP8 param tuples with explicit
pytest skip params so they remain visible in test reports; for each commented
tuple (e.g. the FP8 cases near the block in
tests/attention/test_fmha_v2_prefill.py and the similar block at lines
~861-863), add them back as pytest.param((torch.float8_e4m3fn,
torch.float8_e4m3fn), marks=pytest.mark.skip(reason="skipping FP8 tests due to
hangs")) (and likewise for the other two combinations) so the cases are present
but skipped, preserving the original tuple values and adding a clear skip
reason.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 0b33414a-2524-46f6-b6e6-a53e458da02d

📥 Commits

Reviewing files that changed from the base of the PR and between e64ae8b and c4c4559.

📒 Files selected for processing (2)
  • flashinfer/prefill.py
  • tests/attention/test_fmha_v2_prefill.py

@jimmyzho
Copy link
Copy Markdown
Contributor Author

/bot run

@flashinfer-bot
Copy link
Copy Markdown
Collaborator

GitLab MR !540 has been created, and the CI pipeline #48429687 is currently running. I'll report back once the pipeline job completes.

Comment thread tests/attention/test_fmha_v2_prefill.py Outdated
Comment thread tests/attention/test_fmha_v2_prefill.py Outdated
@jimmyzho jimmyzho mentioned this pull request Apr 14, 2026
5 tasks
@jimmyzho jimmyzho requested a review from qsang-nv as a code owner April 15, 2026 21:08
@jimmyzho
Copy link
Copy Markdown
Contributor Author

/bot run

@flashinfer-bot
Copy link
Copy Markdown
Collaborator

GitLab MR !540 has been updated with latest changes, and the CI pipeline #48633505 is currently running. I'll report back once the pipeline job completes.

@jimmyzho
Copy link
Copy Markdown
Contributor Author

/bot run

@flashinfer-bot
Copy link
Copy Markdown
Collaborator

GitLab MR !540 has been updated with latest changes, and the CI pipeline #48806071 is currently running. I'll report back once the pipeline job completes.

@jimmyzho jimmyzho enabled auto-merge (squash) April 21, 2026 00:15
@jimmyzho jimmyzho merged commit 9e3d8b9 into flashinfer-ai:main Apr 21, 2026
27 of 41 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants