Skip to content

Commit fb4c91e

Browse files
PerkzZhengsaltyminty
authored andcommitted
fix: pass skip_softmax_threshold_scale_factor to prefill wrapper in test
The wrapper consistency check in _test_trtllm_batch_prefill was calling wrapper_trtllm_gen.run() without skip_softmax_threshold_scale_factor, causing it to default to None (standard attention kernel) while the raw API used 1e-30 (skipsSoftmax kernel variant). Different cubin kernels produce bit-different results, failing the exact-equality assert. The decode counterpart was already fixed; this mirrors that fix for the prefill test path.
1 parent c9eb3cd commit fb4c91e

1 file changed

Lines changed: 1 addition & 0 deletions

File tree

tests/attention/test_trtllm_gen_attention.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -849,6 +849,7 @@ def _test_trtllm_batch_prefill(
849849
v_scale=v_scale / o_scale,
850850
enable_pdl=enable_pdl,
851851
sinks=(sink if enable_sink else None),
852+
skip_softmax_threshold_scale_factor=skip_softmax_threshold_scale_factor,
852853
)
853854
# v_scale, o_scale in wrapper is emulated by multiplying output by v_scale instead of fused into kernel.
854855
if v_scale == o_scale == 1.0:

0 commit comments

Comments
 (0)