Auto-build flash-attn wheels on push, upload to S3#910
Draft
mgehre-amd wants to merge 8 commits into
Draft
Conversation
Replace the GitHub Releases / gh-pages publishing path with a direct upload to s3://aig-embd-gfx11-wheels/simple/flash-attn/ (the same PEP 503 index used by build-rocm-wheels.yml). Each push to gfx11 runs a check job that resolves the upstream Dao-AILab/flash-attention `main` HEAD and queries S3; the build job is skipped when a wheel matching the upstream short SHA already exists. The wheel version is derived from `git describe` against the latest v2.* tag, e.g. `2.8.4.dev472+gb995b246` for 472 commits past v2.8.3, and becomes plain `2.8.3` (or `2.8.4`) again once upstream lands a new tag. Changes: - Switch source from the v2.* release-list to upstream `main` because the latest release (v2.8.3, Aug 2025) is too old to include the gfx11 improvements we want. - Drop schedule and workflow_dispatch triggers (the workflow file is not on the default branch, so neither would actually fire). - Drop the create-release and publish-to-gh-pages jobs. - Drop the FLASH_ATTN_LOCAL_VERSION suffix; the SHA in the version string is enough to identify the build. Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>
Adds pull_request trigger so PRs targeting gfx11 exercise the build, and gates the upload-wheel job on github.event_name == 'push' so PR runs validate the build without populating the S3 index. Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>
b91bd8a to
ce06bc5
Compare
Upstream commit 3f94643 ("[AMD] Migrate to Triton Backend to Aiter")
introduced a hard triton==3.5.1 pin and moved the AMD Triton backend
out of the flash_attn package into aiter. This breaks ROCm users:
the triton pin downgrades their ROCm triton, and the wheel is no
longer self-contained.
Pin to bbe25ba (the parent commit) which still bundles
flash_attn_triton_amd/ and has no triton version constraint.
Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>
Remove the push-only guard so the upload-wheel job runs on both push and pull_request events, enabling upload from this PR. Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>
Extend the workflow with check/build/upload jobs for AITER, mirroring
the flash-attn pipeline:
* check-aiter resolves AITER_REF (default branch matthias.gfx11) to a
SHA and short-circuits when a matching wheel is already in
s3://aig-embd-gfx11-wheels/simple/amd-aiter/.
* build-aiter-wheel runs on ubuntu-latest, recursive-clones aiter,
derives a PEP 440 version via git describe + SETUPTOOLS_SCM_PRETEND_VERSION
(aiter uses setuptools_scm), installs torch from the same ROCm
nightly index, and builds bdist_wheel with PREBUILD_KERNELS=0 so no
HIP toolchain is required on the runner.
* upload-aiter-wheel is wired up but gated `if: false` so nothing
lands in S3 until validated; flipping to the standard
`github.repository_owner == 'ROCm'` guard enables it.
GPU_ARCHS=gfx1151 is set in env now even though aiter/setup.py ignores
it at PREBUILD_KERNELS=0, so switching to kernels-on later is a single
env-var change (plus moving the build job into a ROCm container).
Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>
`check` -> `check-flash-attn`, `build-wheel` -> `build-flash-attn-wheel`. Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>
Switch the AITER build from PREBUILD_KERNELS=0 (no kernels) to PREBUILD_ONLY=module_gemm_w4a16, which restricts HIP compilation to that single module while leaving the rest of the package as JIT. NOTE: this enables HIP compilation, so `ubuntu-latest` without ROCm will no longer satisfy the build. The build-aiter-wheel job will need a ROCm container (e.g. `rocm/dev-ubuntu-22.04`) or an in-job ROCm install before the next run can succeed. This commit changes the env var only; the runner upgrade is a follow-up. Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>
PREBUILD_ONLY=module_gemm_w4a16 needs hipcc + ROCm headers on the runner. Instead of switching to a `rocm/dev-ubuntu-22.04` container, pip-install the multi-arch ROCm wheels (libraries + devel + the gfx1151 device package) from rocm.nightlies.amd.com, run `rocm-sdk init`, and export ROCM_PATH / ROCM_HOME / HIP_PATH + PATH so setup.py and torch.utils.cpp_extension find hipcc. Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.