Commit fb4c91e
fix: pass skip_softmax_threshold_scale_factor to prefill wrapper in test
The wrapper consistency check in _test_trtllm_batch_prefill was calling
wrapper_trtllm_gen.run() without skip_softmax_threshold_scale_factor,
causing it to default to None (standard attention kernel) while the raw
API used 1e-30 (skipsSoftmax kernel variant). Different cubin kernels
produce bit-different results, failing the exact-equality assert.
The decode counterpart was already fixed; this mirrors that fix for the
prefill test path.1 parent c9eb3cd commit fb4c91e
1 file changed
Lines changed: 1 addition & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
849 | 849 | | |
850 | 850 | | |
851 | 851 | | |
| 852 | + | |
852 | 853 | | |
853 | 854 | | |
854 | 855 | | |
| |||
0 commit comments