Problem Description
Summary
We're seeing build errors like https://github.com/ROCm/TheRock/actions/runs/25201744067/job/73894021744:
-- [AOTriton] Skipping triton due to AOTRITON_NOIMAGE_MODE
CMAKE_SOURCE_DIR /__w/TheRock/TheRock/external-builds/pytorch/pytorch/build/aotriton/src/aotriton_runtime
CMAKE_CURRENT_SOURCE_DIR /__w/TheRock/TheRock/external-builds/pytorch/pytorch/build/aotriton/src/aotriton_runtime/v3src
CMAKE_CURRENT_SOURCE_PARENT_DIR /__w/TheRock/TheRock/external-builds/pytorch/pytorch/build/aotriton/src/aotriton_runtime
CMAKE_CURRENT_LIST_DIR /__w/TheRock/TheRock/external-builds/pytorch/pytorch/build/aotriton/src/aotriton_runtime/v3src
CMAKE_CURRENT_BINARY_DIR /__w/TheRock/TheRock/external-builds/pytorch/pytorch/build/aotriton/src/aotriton_runtime-build/v3src
-- AOTRITON_TARGET_ARCH gfx900
-- AOTRITON_OVERRIDE_TARGET_GPUS
-- EFFECTIVE_TARGET_GPUS
AOTRITON_COMPILER /__w/TheRock/TheRock/external-builds/pytorch/pytorch/build/aotriton/src/aotriton_runtime/v3python/compile.py
usage: generate.py [-h]
'/opt/_internal/cpython-3.12.10/lib/python3.12/site-packages/cmake/data/bin/cmake' '-E' 'env' 'VIRTUAL_ENV=/__w/TheRock/TheRock/external-builds/pytorch/pytorch/build/aotriton/src/aotriton_runtime-build/venv' 'AOTRITON_ENABLE_FP32=1' '/__w/TheRock/TheRock/external-builds/pytorch/pytorch/build/aotriton/src/aotriton_runtime-build/venv/bin/python' '-X' 'utf8' '-m' 'v3python.generate' '--target_gpus' '--build_dir' '/__w/TheRock/TheRock/external-builds/pytorch/pytorch/build/aotriton/src/aotriton_runtime-build/v3src' '--noimage_mode'
[--target_gpus {gfx90a_mod0,gfx942_mod0,gfx950_mod0,gfx1100_mod0,gfx1101_mod0,gfx1102_mod0,gfx1151_mod0,gfx1150_mod0,gfx1201_mod0,gfx1200_mod0,gfx1250_mod0} [{gfx90a_mod0,gfx942_mod0,gfx950_mod0,gfx1100_mod0,gfx1101_mod0,gfx1102_mod0,gfx1151_mod0,gfx1150_mod0,gfx1201_mod0,gfx1200_mod0,gfx1250_mod0} ...]]
[--build_dir BUILD_DIR] [--root_dir ROOT_DIR]
[--archive_only] [--library_suffix LIBRARY_SUFFIX]
[--noimage_mode] [--build_for_tuning]
[--build_for_tuning_second_pass]
[--build_for_tuning_but_skip_kernel [BUILD_FOR_TUNING_BUT_SKIP_KERNEL ...]]
[--verbose] [--lut_sanity_check]
generate.py: error: argument --target_gpus: expected at least one argument
CMake Error at v3src/CMakeLists.txt:71 (execute_process):
execute_process failed command indexes:
1: "Child return code: 2"
-- Configuring incomplete, errors occurred!
Context
In https://github.com/ROCm/TheRock we build pytorch using https://github.com/ROCm/TheRock/blob/main/external-builds/pytorch/build_prod_wheels.py in a few configurations:
- Per-family releases: separate builds for
PYTORCH_ROCM_ARCH=gfx900, PYTORCH_ROCM_ARCH=gfx1151, PYTORCH_ROCM_ARCH=gfx942, etc.
- Multi-arch releases (new): one single build for
PYTORCH_ROCM_ARCH=gfx1100;gfx1101;gfx1102;gfx1103;gfx1151;gfx1200;gfx1201;gfx900;gfx906;gfx908;gfx90a;gfx1010;gfx1011;gfx1012;gfx1030;gfx1031;gfx1032;gfx1033;gfx1034;gfx1035;gfx1036;gfx1150;gfx1152;gfx1153
See also:
Analysis and remediation
If there is at least one supported architecture in the list then target filtering and fallback behavior appears to be working. If there are only unsupported architectures in the list then we hit the above error.
For per-family releases in TheRock we solved this by disabling flash attention and aotriton entirely if any unsupported architecture was included. I'm going to invert this for multi-arch releases to disable flash attention and aotriton only if all architectures are unsupported, now that we see how the filtering and fallback code paths are working.
If aotriton is patched to not error during generate.py (and produce a library where check_gpu() always returns failure) for target lists including only unsupported architectures then we can remove our downstream filtering in TheRock entirely.
Operating System
Linux and Windows
Steps to Reproduce
Build pytorch building with USE_FLASH_ATTENTION=ON and PYTORCH_ROCM_ARCH=gfx900 (or any set of archs with no aotriton support)
Problem Description
Summary
We're seeing build errors like https://github.com/ROCm/TheRock/actions/runs/25201744067/job/73894021744:
Context
In https://github.com/ROCm/TheRock we build pytorch using https://github.com/ROCm/TheRock/blob/main/external-builds/pytorch/build_prod_wheels.py in a few configurations:
PYTORCH_ROCM_ARCH=gfx900,PYTORCH_ROCM_ARCH=gfx1151,PYTORCH_ROCM_ARCH=gfx942, etc.PYTORCH_ROCM_ARCH=gfx1100;gfx1101;gfx1102;gfx1103;gfx1151;gfx1200;gfx1201;gfx900;gfx906;gfx908;gfx90a;gfx1010;gfx1011;gfx1012;gfx1030;gfx1031;gfx1032;gfx1033;gfx1034;gfx1035;gfx1036;gfx1150;gfx1152;gfx1153See also:
Analysis and remediation
If there is at least one supported architecture in the list then target filtering and fallback behavior appears to be working. If there are only unsupported architectures in the list then we hit the above error.
For per-family releases in TheRock we solved this by disabling flash attention and aotriton entirely if any unsupported architecture was included. I'm going to invert this for multi-arch releases to disable flash attention and aotriton only if all architectures are unsupported, now that we see how the filtering and fallback code paths are working.
If aotriton is patched to not error during
generate.py(and produce a library wherecheck_gpu()always returns failure) for target lists including only unsupported architectures then we can remove our downstream filtering in TheRock entirely.Operating System
Linux and Windows
Steps to Reproduce
Build pytorch building with
USE_FLASH_ATTENTION=ONandPYTORCH_ROCM_ARCH=gfx900(or any set of archs with no aotriton support)