Open
Conversation
## Motivation Some release reruns need to reuse already-built TheRock artifacts, patch them manually, and reupload them to a fixed S3 prefix. The release workflow should then point at that prefix and build tarballs, Python packages, and native packages from those artifacts without starting TheRock source builds. ## Changes artifact_manager.py copy: - Add --source-prefix-only that builds the source backend with lookup_workflow_run=False, reusing the dest backend's env-based bucket selection. Use this when the source artifacts live under a manually populated prefix in the active run's artifact namespace. - Add --stage=all that unions every build stage's produced artifacts. - Add --require-matches that fails when a requested family yields no matching source artifact. With --expand-family-to-targets, a family is satisfied by either a family-named artifact or any of its expanded target-named artifacts. - Accept either ',' or ';' in --amdgpu-families so callers passing CSV workflow inputs can pass them through unchanged. build_tools/github_actions/verify_artifacts_ready.py: - New helper that picks between the source-build-result and prebuilt-copy-result based on the prebuilt_prefix value and exits non-zero when the active producer failed. Encapsulates the gate logic so it is testable and reusable across Linux and Windows release workflows. .github/workflows/copy_prebuilt_artifacts.yml: - New reusable + workflow_dispatch workflow that invokes artifact_manager.py copy. Real copy steps are gated on prebuilt_prefix != '' so the reusable workflow is a successful no-op in source-build mode and downstream jobs can keep plain needs: wiring. multi_arch_release.yml, multi_arch_release_linux.yml, multi_arch_release_windows.yml: - Plumb prebuilt_prefix through both triggers and into the Linux and Windows sub-workflows. - Split source-build vs prebuilt-copy producers with a build_artifacts verifier gate that runs `if: always()` so producer failures are not masked by skipped jobs. - Skip the PyTorch dispatch in prebuilt mode. release_portable_linux_packages.yml, release_windows_packages.yml: - Add prebuilt_prefix input on workflow_call and workflow_dispatch. - Add a top-level always-runs reusable copy job. - Per-family job depends on the copy job. - Source-build steps gated to source mode. - Prebuilt mode runs two artifact_manager fetches sharing a download cache: one per-artifact layout into BUILD_DIR/artifacts for build_python_packages.py, one flattened into BUILD_DIR/dist/rocm for the dist tarball, then tars the flattened tree so the existing upload step works in both modes. - PyTorch (and JAX, on Linux) dispatches gated to source mode. - Native package dispatches kept in both modes. - TODO note flagging that empty `inputs.families` (defaults from fetch_package_targets.py) does not work in prebuilt mode yet. ## Test Plan - python -m pytest build_tools/tests/artifact_manager_tool_test.py build_tools/github_actions/tests/verify_artifacts_ready_test.py - YAML parse for the six touched/added workflows. - git diff --check. ## Test Result - 48 tests pass (29 existing + 8 new copy + 11 new verifier tests). - All workflows parse cleanly. - No whitespace issues. Co-Authored-By: Claude <noreply@anthropic.com>
The single-family release workflows passed `inputs.families` directly to the copy_prebuilt job. When a caller leaves `families` empty and relies on fetch_package_targets.py to fill in the default list, the copy job receives an empty amdgpu_families and copy --require-matches exits 1. Resolve the family list from the already-computed package_targets matrix output instead, which honors the same defaults the matrix itself uses. Removes the corresponding TODO comments. Changes: - release_portable_linux_packages.yml and release_windows_packages.yml: amdgpu_families is now `join(fromJSON(needs.setup_metadata.outputs .package_targets).*.amdgpu_family, ';')`. Co-Authored-By: Claude <noreply@anthropic.com>
The empty-copy_requests early return in do_copy short-circuited before checking --require-matches. The flag's contract is "fail if the source delivers no matching artifacts," and that contract has to hold whether the empty result comes from a per-family miss or from an empty source prefix. Without this, a prebuilt copy job can succeed while delivering nothing - exactly the silent regression --require-matches exists to prevent. Move the require_matches check before the early return. Test: new test covers --require-matches + no families + empty source prefix, expecting SystemExit(1). Co-Authored-By: Claude <noreply@anthropic.com>
`fetch --stage=all` returns set(topology.artifacts.keys()) - every artifact in the topology. `copy --stage=all` was looping over every build stage and unioning topology.get_produced_artifacts(stage), so any artifact present in the topology but not produced by any stage was silently skipped on copy. Make `copy --stage=all` use the same direct topology-artifacts assignment as fetch. Test: extend the test topology with an orphan-group whose artifact no build stage produces, and assert it is included in `copy --stage= all` results. Co-Authored-By: Claude <noreply@anthropic.com>
The prebuilt-mode "Fetch prebuilt artifacts" and "Build dist tarball
from prebuilt artifacts" steps were guarded with both
`prebuilt_prefix != ''` and `github.repository_owner == 'ROCm'`.
Source-build steps are guarded only on `prebuilt_prefix == ''`, so
in any non-ROCm-owned context (forks, manual reruns from a fork)
prebuilt mode silently broke: source build was skipped, fetch was
skipped, then "Build Python Packages" ran against an empty
${BUILD_DIR}/artifacts.
Make the prebuilt gates symmetric with the source-build gates: only
prebuilt_prefix == '' / != '' decides which producer runs.
Co-Authored-By: Claude <noreply@anthropic.com>
The Linux release workflow correctly skips the PyTorch wheel dispatch when prebuilt_prefix is set (framework wheel builds do not yet accept the prebuilt artifact source). Windows was missed - it checked only github.repository_owner and expect_pytorch_failure. Add the prebuilt_prefix == '' guard so the Windows gate matches. Co-Authored-By: Claude <noreply@anthropic.com>
_create_source_backend in prefix-only mode previously re-derived the source bucket via WorkflowOutputRoot.from_workflow_run with lookup_workflow_run=False - i.e. the same env-only path. That was equivalent to dest today only because both went through env. If do_copy ever passes a CLI --run-github-repo override into create_backend_from_env, dest would pick it up and source would not, silently diverging. Have prefix-only construct WorkflowOutputRoot directly from dest's resolved bucket and external_repo (with the source's own run_id and platform). Construct dest first in do_copy and pass its output_root in. Bake the same invariant into FailingBackend so existing tests that go through do_copy keep working. Test: replace the lookup_workflow_run=False assertion with one that gives dest a deliberately non-env bucket/external_repo and verifies the source S3Backend is constructed with those exact values plus the source run_id. Co-Authored-By: Claude <noreply@anthropic.com>
The single-family release workflows had a static one-line run-name that did not surface whether a run was in source-build or prebuilt- copy mode, making it hard to spot prebuilt reruns in the run list. Convert both run-names to folded scalars that append " | prebuilt: <prefix>" when inputs.prebuilt_prefix is set, matching the pattern already used by multi_arch_release.yml and the rockrel wrappers. Co-Authored-By: Claude <noreply@anthropic.com>
Without an explicit run-name, workflow_dispatch runs of this reusable workflow show only the workflow name in the run list, so parameters (target platform, release type, source prefix) only become visible after opening the run. Add a run-name that surfaces those at the top level for easier triage. Co-Authored-By: Claude <noreply@anthropic.com>
Two stale comments lied about behavior after the recent fixes, and the Windows PyTorch dispatch was missing the clarifying comment its Linux sibling has. Update artifact_manager.py copy --stage help to say "all" copies every artifact in the topology (mirrors fetch --stage=all) instead of "unions every build stage". Rewrite the Linux release comment that claimed native package dispatches are skipped in prebuilt mode - only PyTorch and JAX are skipped; native packages still run from the copied artifacts. Add the same explanation on the Windows PyTorch dispatch. Co-Authored-By: Claude <noreply@anthropic.com>
ScottTodd
reviewed
May 6, 2026
Member
ScottTodd
left a comment
There was a problem hiding this comment.
We discussed this PR offline. Some ideas:
- Frame as "release repackaging" or "release promotion" instead of "prebuilt artifacts"
- Allow CI workflows to mix prebuilt and built-from-source artifacts, draw a harder line in release workflows to only either use all prebuilt (for repacking) or all built-from-source
- Move to https://github.com/ROCm/rockrel if focused on promotion
ScottTodd
reviewed
May 6, 2026
Comment on lines
+63
to
+65
| copy_prebuilt: | ||
| name: Copy Prebuilt Artifacts | ||
| uses: ./.github/workflows/copy_prebuilt_artifacts.yml |
Member
There was a problem hiding this comment.
Another thing to watch for:
the base_lib_generic.tar.zst contains dist_info.json with contents like
{
"dist_amdgpu_targets" : "gfx942;gfx1100;gfx1101;gfx1102;gfx1103;gfx1151;gfx1200;gfx1201;gfx950"
}just repackaging a subset of existing artifacts (e.g. choosing to release a subset of the targets that were built in a nightly release for a stable release) with a changed version will include the original list there, resulting in tools like rocm-sdk targets returning a list that may not match expectations (see also #4687)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
In rare cases release re-runs need to reuse already-built artifacts. Those might be manually patched or we restrict the re-run to a subset of targets and solely use the release run to repackage for this limited subset. The release workflows will build tarballs, Python packages and native Linux packages without starting a source build.
Technical Details
Test Plan
Test Result
Submission Checklist