Skip to content

add mosaic ecr publish workflow#173

Draft
madan-oss wants to merge 18 commits intomainfrom
feat/str-3103-mosaic-ci
Draft

add mosaic ecr publish workflow#173
madan-oss wants to merge 18 commits intomainfrom
feat/str-3103-mosaic-ci

Conversation

@madan-oss
Copy link
Copy Markdown
Collaborator

@madan-oss madan-oss commented Apr 17, 2026

Description

Add a manual GitHub Actions workflow to build and publish the Mosaic Docker image to both private ECR and public ECR.

The workflow:

  • accepts optional ref and image_tag inputs
  • validates those manual inputs inside the publish job
  • checks out the selected ref, resolves the final SHA and image tag, and builds docker/Dockerfile for linux/amd64
  • assumes the private AWS role with OIDC, logs in to private ECR, and pushes 888577024788.dkr.ecr.us-east-1.amazonaws.com/tn1/mosaic:${IMAGE_TAG}
  • switches to the public AWS role, logs in to public ECR, and pushes public.ecr.aws/z5c7y9u9/mosaic:${IMAGE_TAG}
  • runs a Trivy vulnerability scan, uploads the scan artifact, and adds the pushed image references plus private digest to the workflow summary

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature/Enhancement (non-breaking change which adds functionality or enhances an existing one)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Refactor
  • New or updated tests
  • Dependency update
  • Security fix

Notes to Reviewers

  • This PR adds a new workflow only: .github/workflows/docker-publish-ecr.yml.
  • The workflow uses a single build_and_publish job.
  • The private ECR role for Mosaic is arn:aws:iam::888577024788:role/github-actions-mosaic-ecr-publish.
  • The GitHub repo variables now include AWS_REGION, AWS_ROLE_TO_ASSUME, ECR_REGISTRY, ECR_REPOSITORY, and PUBLIC_AWS_ROLE_TO_ASSUME.
  • The Trivy step is report-only right now because it uses --exit-code 0; it does not block publication.
  • End-to-end publish was not exercised locally because manual workflow_dispatch still requires the workflow file to be present on the default branch.

Validation

  • actionlint .github/workflows/docker-publish-ecr.yml

Checklist

  • I have performed a self-review of my code.
  • I have commented my code where necessary.
  • I have updated the documentation if needed.
  • My changes do not introduce new warnings.
  • I have added tests that prove my changes are effective or that my feature works.
  • New and existing tests pass with my changes.

Related Issues

…rdering

- Switch DOCKER_PLATFORMS from linux/amd64,linux/arm64 to linux/amd64 only;
  mosaic node group uses c6id.2xlarge (Intel x86_64), ARM64 via QEMU was
  causing OOM/timeout after 73 minutes
- Add --cache-from/--cache-to type=gha to the buildx build step so Rust
  compile layers are reused across runs
- Split build into two steps: build+push to private ECR first (with private
  credentials active), then configure public ECR credentials and re-tag via
  imagetools create -- fixes credential overwrite bug
- Fix concurrency group key (was hardcoded SHA, now uses github.ref)
… step

imagetools create failed because AWS env vars were pointing at the public ECR
role when it tried to pull from private ECR. Fix by authenticating to both
registries upfront (docker login creds persist in config.json regardless of
AWS env var changes), then pass both --tag flags to the single buildx build
so private and public ECR are written in one push.
buildx --push with two --tag flags fails with 403 on public ECR because
the buildx container builder does not share the host docker config where
public ECR login is stored. Fix: push to private ECR only during build,
then use imagetools create to copy the manifest to public ECR after
authenticating with the public ECR role.
PUBLIC_ECR_NAMESPACE was set to z5c7y9u9 (wrong); all public ECR repos
in this account use r1l9t0r6. Also created the public.ecr.aws/r1l9t0r6/mosaic
repository which did not exist, causing 403 on every push attempt.
r1l9t0r6 belongs to account 888577024788; the PUBLIC_AWS_ROLE_TO_ASSUME
(github-actions-ecr-push) lives in account 496607027995 whose public ECR
alias is z5c7y9u9 -- same as strata-bridge.
buildx --push does not load the image into the local Docker daemon.
Trivy runs inside its own container and cannot access host docker login
credentials to pull from private ECR. Pulling explicitly first puts the
image in the local daemon so Trivy resolves it via the docker socket.
…pdated actions

- Remove push trigger on feature branch; workflow_dispatch only
- Remove hardcoded DEFAULT_BUILD_REF; fall back to github.sha
- Login to both private and public ECR upfront before build so credentials
  are stable throughout — eliminates mid-workflow credential swap
- Remove continue-on-error from public ECR steps; failures are now fatal
- Remove redundant Reconfigure + Login to ECR for follow-up steps
- Bump timeout from 180m to 30m
- Upgrade actions to Node.js 24 compatible versions:
  configure-aws-credentials v4.0.2 -> v6.1.0
  setup-buildx-action v3.12.0 -> v4.0.0
  upload-artifact v4.6.2 -> v7.0.1
…ctive

imagetools create uses the ECR credential helper which re-authenticates
using AWS env vars — it ignores cached docker login tokens. Moving the
public ECR credential configure + login steps to immediately before the
copy step, then restoring private credentials afterward for digest lookup,
pull, and Trivy scan.
…ic ECR

imagetools create uses BuildKit's credential helper which re-authenticates
via AWS env vars and fails cross-account. strata-bridge uses docker tag +
docker push which reads the stored docker login token from config.json and
works reliably. Pull the image into the local daemon after the private ECR
push, then tag and push to public ECR while public credentials are active.
Image is already in daemon for Trivy so the separate pull step is removed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant