Skip to content

[MLAS/NEON] Add dedicated kernel for depthwise convolution for ARM64 using NEON intrinsics #9606

[MLAS/NEON] Add dedicated kernel for depthwise convolution for ARM64 using NEON intrinsics

[MLAS/NEON] Add dedicated kernel for depthwise convolution for ARM64 using NEON intrinsics #9606

Triggered via pull request January 16, 2026 04:15
Status Success
Total duration 6h 53m 25s
Artifacts 1

windows_cuda.yml

on: pull_request
Windows GPU CUDA CI Pipeline
40m 36s
Windows GPU CUDA CI Pipeline
Windows GPU CUDA CI Pipeline Test Job
30m 10s
Windows GPU CUDA CI Pipeline Test Job
Fit to window
Zoom out
Zoom in

Annotations

6 warnings
Windows GPU CUDA CI Pipeline: onnxruntime/core/mlas/lib/amd64/QgemmU8X8KernelAvx2.asm#L1234
epilog offset from end of function exceeds 4095
Windows GPU CUDA CI Pipeline: onnxruntime/core/mlas/lib/amd64/QgemmU8X8KernelAvx2.asm#L1227
epilog offset from end of function exceeds 4095
Windows GPU CUDA CI Pipeline: onnxruntime/core/mlas/lib/amd64/QgemmU8X8KernelAvx2.asm#L1220
epilog offset from end of function exceeds 4095
Windows GPU CUDA CI Pipeline: onnxruntime/core/mlas/lib/amd64/QgemmU8X8KernelAvx2.asm#L1213
epilog offset from end of function exceeds 4095
Windows GPU CUDA CI Pipeline: onnxruntime/core/mlas/lib/amd64/QgemmU8X8KernelAvx2.asm#L1206
epilog offset from end of function exceeds 4095
Windows GPU CUDA CI Pipeline: onnxruntime/core/mlas/lib/amd64/QgemmU8X8KernelAvx2.asm#L1199
epilog offset from end of function exceeds 4095

Artifacts

Produced during runtime
Name Size Digest
build-artifacts Expired
1.97 GB
sha256:2a6c61574c6fdff3615b42b9a431d620d7dcf8aaad263f5a65d2fbff938d5b6c