Skip to content

[CUDA] Support FP8 (E4M3) KV Cache for Group Query Attention #50649

[CUDA] Support FP8 (E4M3) KV Cache for Group Query Attention

[CUDA] Support FP8 (E4M3) KV Cache for Group Query Attention #50649

Triggered via pull request February 14, 2026 04:12
Status Success
Total duration 31m 31s
Artifacts

lint.yml

on: pull_request
Fit to window
Zoom out
Zoom in

Annotations

1 error and 15 warnings
Optional Lint C++
reviewdog: Too many results (annotations) in diff. You may miss some annotations due to GitHub limitation for annotation created by logging command. Please check GitHub Actions log console to see all results. Limitation: - 10 warning annotations and 10 error annotations per step - 50 annotations per job (sum of annotations from all the steps) - 50 annotations per run (separate from the job annotations, these annotations aren't created by users) Source: https://github.com/orgs/community/discussions/26680#discussioncomment-3252835
Python format
CodeQL Action v3 will be deprecated in December 2026. Please update all occurrences of the CodeQL Action in your workflow files to v4. For more information, see https://github.blog/changelog/2025-10-28-upcoming-deprecation-of-codeql-action-v3/
Python format
The `set-output` command is deprecated and will be disabled soon. Please upgrade to using Environment Files. For more information see: https://github.blog/changelog/2022-10-11-github-actions-deprecating-save-state-and-set-output-commands/
Python format
The `set-output` command is deprecated and will be disabled soon. Please upgrade to using Environment Files. For more information see: https://github.blog/changelog/2022-10-11-github-actions-deprecating-save-state-and-set-output-commands/
Python format
The `set-output` command is deprecated and will be disabled soon. Please upgrade to using Environment Files. For more information see: https://github.blog/changelog/2022-10-11-github-actions-deprecating-save-state-and-set-output-commands/
Python format
The `set-output` command is deprecated and will be disabled soon. Please upgrade to using Environment Files. For more information see: https://github.blog/changelog/2022-10-11-github-actions-deprecating-save-state-and-set-output-commands/
Optional Lint C++: onnxruntime/contrib_ops/cuda/bert/xqa/xqa_loader_bf16_fp8_128.cu#L8
[cpplint] reported by reviewdog 🐶 Include the directory when naming header files [build/include_subdir] [4] Raw Output: onnxruntime/contrib_ops/cuda/bert/xqa/xqa_loader_bf16_fp8_128.cu:8: Include the directory when naming header files [build/include_subdir] [4]
Optional Lint C++: onnxruntime/contrib_ops/cuda/bert/group_query_attention_qkv.cuh#L355
[cpplint] reported by reviewdog 🐶 If an else has a brace on one side, it should have it on both [readability/braces] [5] Raw Output: onnxruntime/contrib_ops/cuda/bert/group_query_attention_qkv.cuh:355: If an else has a brace on one side, it should have it on both [readability/braces] [5]
Optional Lint C++: onnxruntime/contrib_ops/cuda/bert/group_query_attention_qkv.cuh#L222
[cpplint] reported by reviewdog 🐶 If/else bodies with multiple statements require braces [readability/braces] [4] Raw Output: onnxruntime/contrib_ops/cuda/bert/group_query_attention_qkv.cuh:222: If/else bodies with multiple statements require braces [readability/braces] [4]
Optional Lint C++: onnxruntime/contrib_ops/cuda/bert/group_query_attention_qkv.cuh#L222
[cpplint] reported by reviewdog 🐶 If an else has a brace on one side, it should have it on both [readability/braces] [5] Raw Output: onnxruntime/contrib_ops/cuda/bert/group_query_attention_qkv.cuh:222: If an else has a brace on one side, it should have it on both [readability/braces] [5]
Optional Lint C++: onnxruntime/contrib_ops/cuda/bert/group_query_attention_qdq.cuh#L290
[cpplint] reported by reviewdog 🐶 If/else bodies with multiple statements require braces [readability/braces] [4] Raw Output: onnxruntime/contrib_ops/cuda/bert/group_query_attention_qdq.cuh:290: If/else bodies with multiple statements require braces [readability/braces] [4]
Optional Lint C++: onnxruntime/contrib_ops/cuda/bert/group_query_attention_qdq.cuh#L290
[cpplint] reported by reviewdog 🐶 If an else has a brace on one side, it should have it on both [readability/braces] [5] Raw Output: onnxruntime/contrib_ops/cuda/bert/group_query_attention_qdq.cuh:290: If an else has a brace on one side, it should have it on both [readability/braces] [5]
Optional Lint C++: onnxruntime/contrib_ops/cuda/bert/group_query_attention_qdq.cuh#L283
[cpplint] reported by reviewdog 🐶 Using C-style cast. Use static_cast<int64_t>(...) instead [readability/casting] [4] Raw Output: onnxruntime/contrib_ops/cuda/bert/group_query_attention_qdq.cuh:283: Using C-style cast. Use static_cast<int64_t>(...) instead [readability/casting] [4]
Optional Lint C++: onnxruntime/contrib_ops/cuda/bert/group_query_attention_qdq.cuh#L282
[cpplint] reported by reviewdog 🐶 Using C-style cast. Use static_cast<int64_t>(...) instead [readability/casting] [4] Raw Output: onnxruntime/contrib_ops/cuda/bert/group_query_attention_qdq.cuh:282: Using C-style cast. Use static_cast<int64_t>(...) instead [readability/casting] [4]
Optional Lint C++: onnxruntime/contrib_ops/cuda/bert/group_query_attention_qdq.cuh#L281
[cpplint] reported by reviewdog 🐶 Using C-style cast. Use static_cast<int64_t>(...) instead [readability/casting] [4] Raw Output: onnxruntime/contrib_ops/cuda/bert/group_query_attention_qdq.cuh:281: Using C-style cast. Use static_cast<int64_t>(...) instead [readability/casting] [4]
Optional Lint C++: onnxruntime/contrib_ops/cuda/bert/group_query_attention_qdq.cuh#L158
[cpplint] reported by reviewdog 🐶 If an else has a brace on one side, it should have it on both [readability/braces] [5] Raw Output: onnxruntime/contrib_ops/cuda/bert/group_query_attention_qdq.cuh:158: If an else has a brace on one side, it should have it on both [readability/braces] [5]