Skip to content

Commit 0e7a632

Browse files
Copilottitaiwangms
andcommitted
Set softcap to 0.0f explicitly with comment
Co-authored-by: titaiwangms <18010845+titaiwangms@users.noreply.github.com>
1 parent 042ff32 commit 0e7a632

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

onnxruntime/core/providers/cuda/llm/attention.cc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -218,7 +218,7 @@ Status Attention<T>::ComputeInternal(OpKernelContext* context) const {
218218
gqa_parameters.rotary_interleaved = false;
219219
gqa_parameters.use_smooth_softmax = false;
220220
gqa_parameters.scale = parameters.scale;
221-
gqa_parameters.softcap = parameters.softcap;
221+
gqa_parameters.softcap = 0.0f; // Validated to be 0.0f above
222222
gqa_parameters.mask_type = onnxruntime::contrib::AttentionMaskType::MASK_NONE;
223223
gqa_parameters.qkv_format = contribop_parameters.qkv_format;
224224
gqa_parameters.past_kv_format = onnxruntime::contrib::AttentionQkvFormat::Q_K_V_BNSH;

0 commit comments

Comments
 (0)