fix: pass skip_softmax_threshold_scale_factor to prefill wrapper in test#3154
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughThe test now forwards Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning Review ran into problems🔥 ProblemsTimed out fetching pipeline failures after 30000ms Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
/bot run |
There was a problem hiding this comment.
Code Review
This pull request updates the _test_trtllm_batch_prefill function in the TRT-LLM attention test suite to include the skip_softmax_threshold_scale_factor parameter. I have no feedback to provide as there were no review comments to evaluate.
The wrapper consistency check in _test_trtllm_batch_prefill was calling wrapper_trtllm_gen.run() without skip_softmax_threshold_scale_factor, causing it to default to None (standard attention kernel) while the raw API used 1e-30 (skipsSoftmax kernel variant). Different cubin kernels produce bit-different results, failing the exact-equality assert. The decode counterpart was already fixed; this mirrors that fix for the prefill test path.
805fc16 to
fb4c91e
Compare
📌 Description
The wrapper consistency check in _test_trtllm_batch_prefill was calling wrapper_trtllm_gen.run() without skip_softmax_threshold_scale_factor, causing it to default to None (standard attention kernel) while the raw API used 1e-30 (skipsSoftmax kernel variant). Different cubin kernels produce bit-different results, failing the exact-equality assert.
🔍 Related Issues
#3029
🚀 Pull Request Checklist
Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.
✅ Pre-commit Checks
pre-commitby runningpip install pre-commit(or used your preferred method).pre-commit install.pre-commit run --all-filesand fixed any reported issues.🧪 Tests
unittest, etc.).Reviewer Notes
Summary by CodeRabbit
Re-opening of #3075 which was closed by accident. The decode counterpart was already fixed in main via #2959; this PR applies the equivalent fix to the prefill wrapper consistency check.