[CUDA] GroupQueryAttention with XQA and Quantized KV Cache Support · microsoft/onnxruntime@4ac7cfd

Triggered via pull request February 9, 2026 18:36

tianleiwu

synchronize #27246

Status Success

Total duration 31m 32s

Artifacts –

windows_x64_release_xnnpack.yml

on: pull_request

6 warnings

build_x64_release_xnnpack: onnxruntime/core/mlas/lib/amd64/QgemmU8X8KernelAvx2.asm#L1234

epilog offset from end of function exceeds 4095

build_x64_release_xnnpack: onnxruntime/core/mlas/lib/amd64/QgemmU8X8KernelAvx2.asm#L1227

epilog offset from end of function exceeds 4095

build_x64_release_xnnpack: onnxruntime/core/mlas/lib/amd64/QgemmU8X8KernelAvx2.asm#L1220

epilog offset from end of function exceeds 4095

build_x64_release_xnnpack: onnxruntime/core/mlas/lib/amd64/QgemmU8X8KernelAvx2.asm#L1213

epilog offset from end of function exceeds 4095

build_x64_release_xnnpack: onnxruntime/core/mlas/lib/amd64/QgemmU8X8KernelAvx2.asm#L1206

epilog offset from end of function exceeds 4095

build_x64_release_xnnpack: onnxruntime/core/mlas/lib/amd64/QgemmU8X8KernelAvx2.asm#L1199

epilog offset from end of function exceeds 4095