Skip to content

[CUDA] Support FP8 (E4M3) KV Cache for Group Query Attention #9934

[CUDA] Support FP8 (E4M3) KV Cache for Group Query Attention

[CUDA] Support FP8 (E4M3) KV Cache for Group Query Attention #9934

Triggered via pull request February 14, 2026 04:12
Status Success
Total duration 1h 25m 8s
Artifacts 1

linux_cuda_ci.yml

on: pull_request
Build Linux CUDA x64 Release  /  build_test_pipeline
38m 2s
Build Linux CUDA x64 Release / build_test_pipeline
Test Linux CUDA x64 Release
41m 19s
Test Linux CUDA x64 Release
Fit to window
Zoom out
Zoom in

Annotations

8 warnings
Build Linux CUDA x64 Release / build_test_pipeline
Wheel output directory /mnt/vss/_work/_temp/Release/dist does not exist.
Build Linux CUDA x64 Release / build_test_pipeline
stderr: + PATH=/opt/python/cp310-cp310/bin:/usr/local/dotnet:/opt/rh/gcc-toolset-14/root/usr/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/lib/jvm/msopenjdk-17/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin + python3 -m pip install --user -r tools/ci_build/github/linux/python/requirements.txt WARNING: The script isympy is installed in '/home/onnxruntimedev/.local/bin' which is not on PATH. Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location. WARNING: The script pygmentize is installed in '/home/onnxruntimedev/.local/bin' which is not on PATH. Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location. WARNING: The scripts f2py and numpy-config are installed in '/home/onnxruntimedev/.local/bin' which is not on PATH. Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location. WARNING: The scripts dmypy, mypy, mypyc, stubgen and stubtest are installed in '/home/onnxruntimedev/.local/bin' which is not on PATH. Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location. WARNING: The scripts py.test and pytest are installed in '/home/onnxruntimedev/.local/bin' which is not on PATH. Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location. WARNING: The scripts backend-test-tools, check-model and check-node are installed in '/home/onnxruntimedev/.local/bin' which is not on PATH. Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location. [notice] A new release of pip is available: 25.1.1 -> 26.0.1 [notice] To update, run: pip install --upgrade pip + python3 tools/ci_build/build.py --build_dir build/Release --config Release --cmake_generator Ninja --skip_submodule_sync --build_shared_lib --parallel --use_vcpkg --use_vcpkg_ms_internal_asset_cache --enable_onnx_tests --use_cuda --use_binskim_compliant_compile_flags --build_wheel --parallel --nvcc_threads 1 --cuda_version=12.8 --cuda_home=/usr/local/cuda-12.8 --cudnn_home=/usr/local/cuda-12.8 --enable_cuda_profiling --build_java --cmake_extra_defines CMAKE_CUDA_ARCHITECTURES=90 onnxruntime_BUILD_UNIT_TESTS=ON onnxruntime_ENABLE_CUDA_EP_INTERNAL_TESTS=ON --build 2026-02-14 04:27:42,353 build [DEBUG] - Command line arguments: --build_dir build/Release --config Release --cmake_generator Ninja --skip_submodule_sync --build_shared_lib --parallel --use_vcpkg --use_vcpkg_ms_internal_asset_cache --enable_onnx_tests --use_cuda --use_binskim_compliant_compile_flags --build_wheel --parallel --nvcc_threads 1 --cuda_version=12.8 --cuda_home=/usr/local/cuda-12.8 --cudnn_home=/usr/local/cuda-12.8 --enable_cuda_profiling --build_java --cmake_extra_defines CMAKE_CUDA_ARCHITECTURES=90 onnxruntime_BUILD_UNIT_TESTS=ON onnxruntime_ENABLE_CUDA_EP_INTERNAL_TESTS=ON --build 2026-02-14 04:27:42,357 build [INFO] - Build started 2026-02-14 04:27:42,357 build [INFO] - Building targets for Release configuration 2026-02-14 04:27:42,357 build [INFO] - /usr/bin/cmake --build build/Release/Release --config Release -- -j16 2026-02-14 04:48:37,492 build [INFO] - /opt/python/cp310-cp310/bin/python3 /onnxruntime_src/setup.py bdist_wheel --nightly_build --wheel_name_suffix=gpu --cuda_version=12.8 /opt/python/cp310-cp310/lib/python3.10/site-packages/setuptools/dist.py:759: SetuptoolsDeprecationWarning: License classifiers are deprecated. !! ******************************************************************************** Please consider removing the following classifiers in favor of a SPDX license expression: License :: OSI Approved :: MIT License See https://packaging.python.org/en/latest/guides/writing-pyproject-toml/#license for details. ******************************************************************************** !! self._finalize_license_expression() /opt/python/cp310-cp310/lib/python3.10/site
Build Linux CUDA x64 Release / build_test_pipeline
Wheel output directory /mnt/vss/_work/_temp/Release/dist does not exist.
Build Linux CUDA x64 Release / build_test_pipeline
stderr: + PATH=/opt/python/cp310-cp310/bin:/usr/local/dotnet:/opt/rh/gcc-toolset-14/root/usr/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/lib/jvm/msopenjdk-17/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin + python3 -m pip install --user -r tools/ci_build/github/linux/python/requirements.txt WARNING: The script isympy is installed in '/home/onnxruntimedev/.local/bin' which is not on PATH. Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location. WARNING: The script pygmentize is installed in '/home/onnxruntimedev/.local/bin' which is not on PATH. Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location. WARNING: The scripts f2py and numpy-config are installed in '/home/onnxruntimedev/.local/bin' which is not on PATH. Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location. WARNING: The scripts dmypy, mypy, mypyc, stubgen and stubtest are installed in '/home/onnxruntimedev/.local/bin' which is not on PATH. Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location. WARNING: The scripts py.test and pytest are installed in '/home/onnxruntimedev/.local/bin' which is not on PATH. Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location. WARNING: The scripts backend-test-tools, check-model and check-node are installed in '/home/onnxruntimedev/.local/bin' which is not on PATH. Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location. [notice] A new release of pip is available: 25.1.1 -> 26.0.1 [notice] To update, run: pip install --upgrade pip + python3 tools/ci_build/build.py --build_dir build/Release --config Release --cmake_generator Ninja --skip_submodule_sync --build_shared_lib --parallel --use_vcpkg --use_vcpkg_ms_internal_asset_cache --enable_onnx_tests --use_cuda --use_binskim_compliant_compile_flags --build_wheel --parallel --nvcc_threads 1 --cuda_version=12.8 --cuda_home=/usr/local/cuda-12.8 --cudnn_home=/usr/local/cuda-12.8 --enable_cuda_profiling --build_java --cmake_extra_defines CMAKE_CUDA_ARCHITECTURES=90 onnxruntime_BUILD_UNIT_TESTS=ON onnxruntime_ENABLE_CUDA_EP_INTERNAL_TESTS=ON --update 2026-02-14 04:22:47,450 build [DEBUG] - Command line arguments: --build_dir build/Release --config Release --cmake_generator Ninja --skip_submodule_sync --build_shared_lib --parallel --use_vcpkg --use_vcpkg_ms_internal_asset_cache --enable_onnx_tests --use_cuda --use_binskim_compliant_compile_flags --build_wheel --parallel --nvcc_threads 1 --cuda_version=12.8 --cuda_home=/usr/local/cuda-12.8 --cudnn_home=/usr/local/cuda-12.8 --enable_cuda_profiling --build_java --cmake_extra_defines CMAKE_CUDA_ARCHITECTURES=90 onnxruntime_BUILD_UNIT_TESTS=ON onnxruntime_ENABLE_CUDA_EP_INTERNAL_TESTS=ON --update 2026-02-14 04:22:47,453 build [INFO] - Build started 2026-02-14 04:22:47,453 build [INFO] - Generating CMake build tree 2026-02-14 04:22:47,466 build [INFO] - /usr/bin/cmake /onnxruntime_src/cmake -Donnxruntime_ENABLE_EXTERNAL_CUSTOM_OP_SCHEMAS=OFF -Donnxruntime_RUN_ONNX_TESTS=ON -Donnxruntime_GENERATE_TEST_REPORTS=ON -DPython_EXECUTABLE=/opt/python/cp310-cp310/bin/python3 -Donnxruntime_USE_VCPKG=ON -Donnxruntime_USE_MIMALLOC=OFF -Donnxruntime_ENABLE_PYTHON=ON -Donnxruntime_BUILD_CSHARP=OFF -Donnxruntime_BUILD_JAVA=ON -Donnxruntime_BUILD_NODEJS=OFF -Donnxruntime_BUILD_OBJC=OFF -Donnxruntime_BUILD_SHARED_LIB=ON -Donnxruntime_BUILD_APPLE_FRAMEWORK=OFF -Donnxruntime_USE_DNNL=OFF -Donnxruntime_USE_NNAPI_BUILTIN=OFF -Donnxruntime_USE_VSINPU=OFF -Donnxruntime_USE_RKNPU=OFF -Donnxruntime_ENABLE_MICROSOFT_INTERNAL=OFF -Donnxruntime_USE_VITISAI=OFF -Donnxruntime_USE_TENSORRT=OFF -Donnxruntime_USE_NV=OFF -Donnxruntime_USE_TENSORRT_BUILTIN_PARSER=ON -Donnxruntime_USE_TENSORRT_INTERFACE=OFF -Donnxruntime_USE_CUDA_INTERFACE=OFF -Donnxruntime_USE_NV_INTERFACE=OFF -Donnxrun
Build Linux CUDA x64 Release / build_test_pipeline
stderr: WARNING! Your credentials are stored unencrypted in '/home/cloudtest/.docker/config.json'. Configure a credential helper to remove this warning. See https://docs.docker.com/go/credential-store/
Test Linux CUDA x64 Release
stderr: + PATH=/opt/python/cp310-cp310/bin:/usr/local/dotnet:/opt/rh/gcc-toolset-14/root/usr/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/lib/jvm/msopenjdk-17/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin + python3 tools/ci_build/build.py --build_dir build/Release --config Release --cmake_generator Ninja --skip_submodule_sync --build_shared_lib --parallel --use_vcpkg --use_vcpkg_ms_internal_asset_cache --enable_onnx_tests --use_cuda --use_binskim_compliant_compile_flags --cuda_version=12.8 --cuda_home=/usr/local/cuda-12.8 --cudnn_home=/usr/local/cuda-12.8 --enable_cuda_profiling --cmake_extra_defines CMAKE_CUDA_ARCHITECTURES=90 onnxruntime_BUILD_UNIT_TESTS=ON onnxruntime_ENABLE_CUDA_EP_INTERNAL_TESTS=ON --test 2026-02-14 05:27:48,268 tools_python_utils [INFO] - flatbuffers module is not installed. parse_config will not be available 2026-02-14 05:27:48,304 build [DEBUG] - Command line arguments: --build_dir build/Release --config Release --cmake_generator Ninja --skip_submodule_sync --build_shared_lib --parallel --use_vcpkg --use_vcpkg_ms_internal_asset_cache --enable_onnx_tests --use_cuda --use_binskim_compliant_compile_flags --cuda_version=12.8 --cuda_home=/usr/local/cuda-12.8 --cudnn_home=/usr/local/cuda-12.8 --enable_cuda_profiling --cmake_extra_defines CMAKE_CUDA_ARCHITECTURES=90 onnxruntime_BUILD_UNIT_TESTS=ON onnxruntime_ENABLE_CUDA_EP_INTERNAL_TESTS=ON --test 2026-02-14 05:27:48,307 build [INFO] - Build started 2026-02-14 05:27:48,308 build [DEBUG] - create symlink /data/models -> build/Release/models 2026-02-14 05:27:48,308 build [INFO] - Running tests for Release configuration 2026-02-14 05:27:48,308 build [INFO] - /usr/bin/ctest --build-config Release --verbose --timeout 10800 2026-02-14 05:37:04,889 build [INFO] - Build complete
Test Linux CUDA x64 Release
stderr: #0 building with "default" instance using docker driver #1 [internal] load build definition from Dockerfile.manylinux2_28_cuda #1 transferring dockerfile: #1 transferring dockerfile: 1.88kB done #1 DONE 0.4s #2 [auth] internal/azureml/onnxruntime/build/cuda12_x64_almalinux8_gcc14:pull token for onnxruntimebuildcache.azurecr.io #2 DONE 0.0s #3 [internal] load metadata for onnxruntimebuildcache.azurecr.io/internal/azureml/onnxruntime/build/cuda12_x64_almalinux8_gcc14:20251017.1 #3 DONE 1.0s #4 [internal] load .dockerignore #4 transferring context: 2B done #4 DONE 0.0s #5 [1/6] FROM onnxruntimebuildcache.azurecr.io/internal/azureml/onnxruntime/build/cuda12_x64_almalinux8_gcc14:20251017.1@sha256:f9faa2397d114b46b5c281353e2d50ccba0ffce77fde89753bedc07217f7eff2 #5 resolve onnxruntimebuildcache.azurecr.io/internal/azureml/onnxruntime/build/cuda12_x64_almalinux8_gcc14:20251017.1@sha256:f9faa2397d114b46b5c281353e2d50ccba0ffce77fde89753bedc07217f7eff2 0.0s done #5 ... #6 [internal] load build context #6 transferring context: 31.83kB 0.0s done #6 DONE 0.2s #5 [1/6] FROM onnxruntimebuildcache.azurecr.io/internal/azureml/onnxruntime/build/cuda12_x64_almalinux8_gcc14:20251017.1@sha256:f9faa2397d114b46b5c281353e2d50ccba0ffce77fde89753bedc07217f7eff2 #5 DONE 1.1s #5 [1/6] FROM onnxruntimebuildcache.azurecr.io/internal/azureml/onnxruntime/build/cuda12_x64_almalinux8_gcc14:20251017.1@sha256:f9faa2397d114b46b5c281353e2d50ccba0ffce77fde89753bedc07217f7eff2 #5 sha256:695e22347fb7c112af8a97154f75ec253a6a704707330387906a0c962eaca183 0B / 322B 0.2s #5 sha256:695e22347fb7c112af8a97154f75ec253a6a704707330387906a0c962eaca183 322B / 322B 0.3s #5 sha256:7899b1f065e89b1c38de5889fdb6bed5d7cefc983af58530b5082c64add78991 0B / 12.53MB 0.2s #5 sha256:4e48097af40f49731459ab5dce3e3a4cd6a1c6662a9a4661218f9117f38ecd4a 0B / 13.03MB 0.2s #5 sha256:f32bacc35bda8f176e88329303032804f6dd0a57ec19505d32965e7f8daa5d1b 0B / 341.57kB 0.2s #5 sha256:7899b1f065e89b1c38de5889fdb6bed5d7cefc983af58530b5082c64add78991 1.05MB / 12.53MB 0.5s #5 sha256:f32bacc35bda8f176e88329303032804f6dd0a57ec19505d32965e7f8daa5d1b 341.57kB / 341.57kB 0.5s #5 sha256:695e22347fb7c112af8a97154f75ec253a6a704707330387906a0c962eaca183 322B / 322B 0.7s done #5 sha256:7899b1f065e89b1c38de5889fdb6bed5d7cefc983af58530b5082c64add78991 12.53MB / 12.53MB 0.8s #5 sha256:f32bacc35bda8f176e88329303032804f6dd0a57ec19505d32965e7f8daa5d1b 341.57kB / 341.57kB 0.6s done #5 sha256:7899b1f065e89b1c38de5889fdb6bed5d7cefc983af58530b5082c64add78991 12.53MB / 12.53MB 0.8s done #5 sha256:4e48097af40f49731459ab5dce3e3a4cd6a1c6662a9a4661218f9117f38ecd4a 5.24MB / 13.03MB 0.9s #5 sha256:cdc8085a23f343004fa5874223972150c5e86217a37e04b6f89f8b59645d8cb3 0B / 311B 0.2s #5 sha256:4564c69d9d5fab85b558f5c755765aef0e47996f677c8dc1b893dc2121029a86 0B / 56.55MB 0.2s #5 sha256:6f491630317942bc72b01405eba4e4fd9bafbd89da5a0c8a3344522f9e7b1b26 634B / 634B 0.2s done #5 sha256:4e48097af40f49731459ab5dce3e3a4cd6a1c6662a9a4661218f9117f38ecd4a 8.39MB / 13.03MB 1.1s #5 sha256:cdc8085a23f343004fa5874223972150c5e86217a37e04b6f89f8b59645d8cb3 311B / 311B 0.4s done #5 sha256:f0763b7a2035ebd7aaa92325e7a6a26b84938cbfaf625eae80f3e248a5edbd4d 0B / 105.51MB 0.2s #5 sha256:4e48097af40f49731459ab5dce3e3a4cd6a1c6662a9a4661218f9117f38ecd4a 10.49MB / 13.03MB 1.2s #5 sha256:4564c69d9d5fab85b558f5c755765aef0e47996f677c8dc1b893dc2121029a86 4.19MB / 56.55MB 0.6s #5 sha256:75541d4d3243d3f4e1155f7746a13f86c7f28d74ecdd702d07899dc9ea6e2e31 0B / 767.93kB 0.2s #5 sha256:4e48097af40f49731459ab5dce3e3a4cd6a1c6662a9a4661218f9117f38ecd4a 13.03MB / 13.03MB 1.4s done #5 sha256:4564c69d9d5fab85b558f5c755765aef0e47996f677c8dc1b893dc2121029a86 12.58MB / 56.55MB 0.8s #5 sha256:4564c69d9d5fab85b558f5c755765aef0e47996f677c8dc1b893dc2121029a86 20.97MB / 56.55MB 0.9s #5 sha256:75541d4d3243d3f4e1155f7746a13f86c7f28d74ecdd702d07899dc9ea6e2e31 767.93kB / 767.93kB 0.4s done #5 sha256:2cd82f876e5f0cfebcbb33a2e32b78cd5c4cd6f9209b5f79d30f94a330652e77 0B / 12.36MB 0.2s #5 sha256:4564c69d9d5fab85b558f5c755765aef0e47996f677c8dc1b893dc2121029a86 29.36MB / 56.55MB 1.1s #5 sha25
Test Linux CUDA x64 Release
stderr: WARNING! Your credentials are stored unencrypted in '/home/cloudtest/.docker/config.json'. Configure a credential helper to remove this warning. See https://docs.docker.com/go/credential-store/

Artifacts

Produced during runtime
Name Size Digest
build-output-x64-Release
1.12 GB
sha256:6b792b2768703915eacfa375733be07729246492e4104759a388ccd24ff964f9