fix: pass skip_softmax_threshold_scale_factor to prefill wrapper in test by PerkzZheng · Pull Request #3154 · flashinfer-ai/flashinfer

PerkzZheng · 2026-04-23T08:13:34Z

📌 Description

The wrapper consistency check in _test_trtllm_batch_prefill was calling wrapper_trtllm_gen.run() without skip_softmax_threshold_scale_factor, causing it to default to None (standard attention kernel) while the raw API used 1e-30 (skipsSoftmax kernel variant). Different cubin kernels produce bit-different results, failing the exact-equality assert.

🔍 Related Issues

#3029

🚀 Pull Request Checklist

Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.

✅ Pre-commit Checks

I have installed pre-commit by running pip install pre-commit (or used your preferred method).
I have installed the hooks with pre-commit install.
I have run the hooks manually with pre-commit run --all-files and fixed any reported issues.

If you are unsure about how to set up pre-commit, see the pre-commit documentation.

🧪 Tests

Tests have been added or updated as needed.
All tests are passing (unittest, etc.).

Reviewer Notes

Summary by CodeRabbit

Tests
- Improved test to verify that prefill execution respects the softmax-skip threshold configuration, ensuring backend and reference execution paths align.

Re-opening of #3075 which was closed by accident. The decode counterpart was already fixed in main via #2959; this PR applies the equivalent fix to the prefill wrapper consistency check.

coderabbitai · 2026-04-23T08:13:51Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: bee192ce-eff4-4065-b7c6-7ef5636a0f18

📥 Commits

Reviewing files that changed from the base of the PR and between 805fc16 and fb4c91e.

📒 Files selected for processing (1)

tests/attention/test_trtllm_gen_attention.py

🚧 Files skipped from review as they are similar to previous changes (1)

tests/attention/test_trtllm_gen_attention.py

📝 Walkthrough

Walkthrough

The test now forwards skip_softmax_threshold_scale_factor into flashinfer.prefill.trtllm_batch_context_with_kv_cache, ensuring the trtllm-gen prefill path uses the configured softmax-skip value.

Changes

Cohort / File(s)	Summary
Test Parameter Threading `tests/attention/test_trtllm_gen_attention.py`	Forward `skip_softmax_threshold_scale_factor` into the `trtllm_batch_context_with_kv_cache` call in the trtllm-gen prefill test, making the test respect configured softmax-skip behavior.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

fix: Fix trtllm-gen prefill IMA when batch_size==1 #1912: Touches trtllm_batch_context_with_kv_cache and related tests; overlaps test infrastructure and prefill behavior.
feat: Add TRTLLM-Gen Skip-Softmax kernels for prefill and decode #2477: Threads skip-softmax parameters through prefill/context call paths; implements related parameter-forwarding.
feat: Enable TRTLLM-Gen Skip-Softmax attention for MLA #2547: Adds and propagates skip_softmax_threshold_scale_factor through trtllm-gen prefill/paged paths; directly complements this test change.

Suggested labels

run-ci

Suggested reviewers

yzh119
bkryu
nv-yunzheq
cyx-6

Poem

🐰 A tiny flag hops into the test,
One forwarded value does its best,
Prefill now listens to the skip,
Quiet as a carrot-nibbled blip! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the main change: passing skip_softmax_threshold_scale_factor to the prefill wrapper in the test, which directly addresses the root cause of the test failure.
Description check	✅ Passed	The description provides context for the fix, links the related issue, and confirms pre-commit checks and tests are passing. All template sections are addressed appropriately.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Warning

Review ran into problems

🔥 Problems

Timed out fetching pipeline failures after 30000ms

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

PerkzZheng · 2026-04-23T08:16:14Z

/bot run

gemini-code-assist

Code Review

This pull request updates the _test_trtllm_batch_prefill function in the TRT-LLM attention test suite to include the skip_softmax_threshold_scale_factor parameter. I have no feedback to provide as there were no review comments to evaluate.

flashinfer-bot · 2026-04-23T08:17:21Z

GitLab MR !587 has been created, and the CI pipeline #49272474 is currently running. I'll report back once the pipeline job completes.

saltyminty

#3075

The wrapper consistency check in _test_trtllm_batch_prefill was calling wrapper_trtllm_gen.run() without skip_softmax_threshold_scale_factor, causing it to default to None (standard attention kernel) while the raw API used 1e-30 (skipsSoftmax kernel variant). Different cubin kernels produce bit-different results, failing the exact-equality assert. The decode counterpart was already fixed; this mirrors that fix for the prefill test path.

PerkzZheng requested review from bkryu, nv-yunzheq, qsang-nv, saltyminty and yzh119 as code owners April 23, 2026 08:13

flashinfer-bot added the op: attention label Apr 23, 2026

gemini-code-assist Bot reviewed Apr 23, 2026

View reviewed changes

qsang-nv approved these changes Apr 23, 2026

View reviewed changes

saltyminty approved these changes Apr 23, 2026

View reviewed changes

saltyminty force-pushed the fix/trtllm-gen-skip-softmax-wrapper-test branch from 805fc16 to fb4c91e Compare April 23, 2026 17:14

saltyminty added the run-ci label Apr 23, 2026

saltyminty merged commit a457a5e into flashinfer-ai:main Apr 23, 2026
42 of 43 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: pass skip_softmax_threshold_scale_factor to prefill wrapper in test#3154

fix: pass skip_softmax_threshold_scale_factor to prefill wrapper in test#3154
saltyminty merged 1 commit intoflashinfer-ai:mainfrom
PerkzZheng:fix/trtllm-gen-skip-softmax-wrapper-test

PerkzZheng commented Apr 23, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 23, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Review ran into problems

Uh oh!

PerkzZheng commented Apr 23, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

flashinfer-bot commented Apr 23, 2026

Uh oh!

saltyminty left a comment •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

PerkzZheng commented Apr 23, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📌 Description

🔍 Related Issues

🚀 Pull Request Checklist

✅ Pre-commit Checks

🧪 Tests

Reviewer Notes

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Review ran into problems

Uh oh!

PerkzZheng commented Apr 23, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

flashinfer-bot commented Apr 23, 2026

Uh oh!

saltyminty left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

PerkzZheng commented Apr 23, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 23, 2026 •

edited

Loading

saltyminty left a comment •

edited

Loading