Skip to content

fix: pass skip_softmax_threshold_scale_factor to prefill wrapper in test#3154

Merged
saltyminty merged 1 commit intoflashinfer-ai:mainfrom
PerkzZheng:fix/trtllm-gen-skip-softmax-wrapper-test
Apr 23, 2026
Merged

fix: pass skip_softmax_threshold_scale_factor to prefill wrapper in test#3154
saltyminty merged 1 commit intoflashinfer-ai:mainfrom
PerkzZheng:fix/trtllm-gen-skip-softmax-wrapper-test

Conversation

@PerkzZheng
Copy link
Copy Markdown
Contributor

@PerkzZheng PerkzZheng commented Apr 23, 2026

📌 Description

The wrapper consistency check in _test_trtllm_batch_prefill was calling wrapper_trtllm_gen.run() without skip_softmax_threshold_scale_factor, causing it to default to None (standard attention kernel) while the raw API used 1e-30 (skipsSoftmax kernel variant). Different cubin kernels produce bit-different results, failing the exact-equality assert.

🔍 Related Issues

#3029

🚀 Pull Request Checklist

Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.

✅ Pre-commit Checks

  • I have installed pre-commit by running pip install pre-commit (or used your preferred method).
  • I have installed the hooks with pre-commit install.
  • I have run the hooks manually with pre-commit run --all-files and fixed any reported issues.

If you are unsure about how to set up pre-commit, see the pre-commit documentation.

🧪 Tests

  • Tests have been added or updated as needed.
  • All tests are passing (unittest, etc.).

Reviewer Notes

Summary by CodeRabbit

  • Tests
    • Improved test to verify that prefill execution respects the softmax-skip threshold configuration, ensuring backend and reference execution paths align.

Re-opening of #3075 which was closed by accident. The decode counterpart was already fixed in main via #2959; this PR applies the equivalent fix to the prefill wrapper consistency check.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 23, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: bee192ce-eff4-4065-b7c6-7ef5636a0f18

📥 Commits

Reviewing files that changed from the base of the PR and between 805fc16 and fb4c91e.

📒 Files selected for processing (1)
  • tests/attention/test_trtllm_gen_attention.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/attention/test_trtllm_gen_attention.py

📝 Walkthrough

Walkthrough

The test now forwards skip_softmax_threshold_scale_factor into flashinfer.prefill.trtllm_batch_context_with_kv_cache, ensuring the trtllm-gen prefill path uses the configured softmax-skip value.

Changes

Cohort / File(s) Summary
Test Parameter Threading
tests/attention/test_trtllm_gen_attention.py
Forward skip_softmax_threshold_scale_factor into the trtllm_batch_context_with_kv_cache call in the trtllm-gen prefill test, making the test respect configured softmax-skip behavior.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

Suggested labels

run-ci

Suggested reviewers

  • yzh119
  • bkryu
  • nv-yunzheq
  • cyx-6

Poem

🐰 A tiny flag hops into the test,
One forwarded value does its best,
Prefill now listens to the skip,
Quiet as a carrot-nibbled blip! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: passing skip_softmax_threshold_scale_factor to the prefill wrapper in the test, which directly addresses the root cause of the test failure.
Description check ✅ Passed The description provides context for the fix, links the related issue, and confirms pre-commit checks and tests are passing. All template sections are addressed appropriately.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

Review ran into problems

🔥 Problems

Timed out fetching pipeline failures after 30000ms


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@PerkzZheng
Copy link
Copy Markdown
Contributor Author

/bot run

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the _test_trtllm_batch_prefill function in the TRT-LLM attention test suite to include the skip_softmax_threshold_scale_factor parameter. I have no feedback to provide as there were no review comments to evaluate.

@flashinfer-bot
Copy link
Copy Markdown
Collaborator

GitLab MR !587 has been created, and the CI pipeline #49272474 is currently running. I'll report back once the pipeline job completes.

Copy link
Copy Markdown
Collaborator

@saltyminty saltyminty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The wrapper consistency check in _test_trtllm_batch_prefill was calling
wrapper_trtllm_gen.run() without skip_softmax_threshold_scale_factor,
causing it to default to None (standard attention kernel) while the raw
API used 1e-30 (skipsSoftmax kernel variant). Different cubin kernels
produce bit-different results, failing the exact-equality assert.

The decode counterpart was already fixed; this mirrors that fix for the
prefill test path.
@saltyminty saltyminty force-pushed the fix/trtllm-gen-skip-softmax-wrapper-test branch from 805fc16 to fb4c91e Compare April 23, 2026 17:14
@saltyminty saltyminty merged commit a457a5e into flashinfer-ai:main Apr 23, 2026
42 of 43 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants