Skip to content

Auto-build flash-attn wheels on push, upload to S3#910

Draft
mgehre-amd wants to merge 8 commits into
gfx11from
matthias.flash-attn-s3-auto
Draft

Auto-build flash-attn wheels on push, upload to S3#910
mgehre-amd wants to merge 8 commits into
gfx11from
matthias.flash-attn-s3-auto

Conversation

@mgehre-amd
Copy link
Copy Markdown

@mgehre-amd mgehre-amd commented Apr 30, 2026

  • Revert trigger on PR

mgehre-amd added 2 commits May 4, 2026 18:05
Replace the GitHub Releases / gh-pages publishing path with a direct
upload to s3://aig-embd-gfx11-wheels/simple/flash-attn/ (the same PEP 503
index used by build-rocm-wheels.yml).

Each push to gfx11 runs a check job that resolves the upstream
Dao-AILab/flash-attention `main` HEAD and queries S3; the build job is
skipped when a wheel matching the upstream short SHA already exists.

The wheel version is derived from `git describe` against the latest
v2.* tag, e.g. `2.8.4.dev472+gb995b246` for 472 commits past v2.8.3, and
becomes plain `2.8.3` (or `2.8.4`) again once upstream lands a new tag.

Changes:
- Switch source from the v2.* release-list to upstream `main` because
  the latest release (v2.8.3, Aug 2025) is too old to include the gfx11
  improvements we want.
- Drop schedule and workflow_dispatch triggers (the workflow file is
  not on the default branch, so neither would actually fire).
- Drop the create-release and publish-to-gh-pages jobs.
- Drop the FLASH_ATTN_LOCAL_VERSION suffix; the SHA in the version
  string is enough to identify the build.

Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>
Adds pull_request trigger so PRs targeting gfx11 exercise the build,
and gates the upload-wheel job on github.event_name == 'push' so PR
runs validate the build without populating the S3 index.

Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>
@mgehre-amd mgehre-amd changed the base branch from matthias.pep503-index to gfx11 May 4, 2026 16:09
@mgehre-amd mgehre-amd force-pushed the matthias.flash-attn-s3-auto branch from b91bd8a to ce06bc5 Compare May 4, 2026 16:09
mgehre-amd added 6 commits May 4, 2026 18:42
Upstream commit 3f94643 ("[AMD] Migrate to Triton Backend to Aiter")
introduced a hard triton==3.5.1 pin and moved the AMD Triton backend
out of the flash_attn package into aiter. This breaks ROCm users:
the triton pin downgrades their ROCm triton, and the wheel is no
longer self-contained.

Pin to bbe25ba (the parent commit) which still bundles
flash_attn_triton_amd/ and has no triton version constraint.

Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>
Remove the push-only guard so the upload-wheel job runs on both
push and pull_request events, enabling upload from this PR.

Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>
Extend the workflow with check/build/upload jobs for AITER, mirroring
the flash-attn pipeline:

  * check-aiter resolves AITER_REF (default branch matthias.gfx11) to a
    SHA and short-circuits when a matching wheel is already in
    s3://aig-embd-gfx11-wheels/simple/amd-aiter/.
  * build-aiter-wheel runs on ubuntu-latest, recursive-clones aiter,
    derives a PEP 440 version via git describe + SETUPTOOLS_SCM_PRETEND_VERSION
    (aiter uses setuptools_scm), installs torch from the same ROCm
    nightly index, and builds bdist_wheel with PREBUILD_KERNELS=0 so no
    HIP toolchain is required on the runner.
  * upload-aiter-wheel is wired up but gated `if: false` so nothing
    lands in S3 until validated; flipping to the standard
    `github.repository_owner == 'ROCm'` guard enables it.

GPU_ARCHS=gfx1151 is set in env now even though aiter/setup.py ignores
it at PREBUILD_KERNELS=0, so switching to kernels-on later is a single
env-var change (plus moving the build job into a ROCm container).

Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>
`check` -> `check-flash-attn`, `build-wheel` -> `build-flash-attn-wheel`.

Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>
Switch the AITER build from PREBUILD_KERNELS=0 (no kernels) to
PREBUILD_ONLY=module_gemm_w4a16, which restricts HIP compilation to
that single module while leaving the rest of the package as JIT.

NOTE: this enables HIP compilation, so `ubuntu-latest` without ROCm
will no longer satisfy the build. The build-aiter-wheel job will need
a ROCm container (e.g. `rocm/dev-ubuntu-22.04`) or an in-job ROCm
install before the next run can succeed. This commit changes the env
var only; the runner upgrade is a follow-up.

Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>
PREBUILD_ONLY=module_gemm_w4a16 needs hipcc + ROCm headers on the
runner. Instead of switching to a `rocm/dev-ubuntu-22.04` container,
pip-install the multi-arch ROCm wheels (libraries + devel + the
gfx1151 device package) from rocm.nightlies.amd.com, run
`rocm-sdk init`, and export ROCM_PATH / ROCM_HOME / HIP_PATH + PATH
so setup.py and torch.utils.cpp_extension find hipcc.

Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant