Skip to content

ci(INFRA-3595): Phase 3 — Namespace cache and GRADLE_USER_HOME fix for Android build#29777

Merged
alucardzom merged 22 commits into
mainfrom
phase3-namespace-android
May 7, 2026
Merged

ci(INFRA-3595): Phase 3 — Namespace cache and GRADLE_USER_HOME fix for Android build#29777
alucardzom merged 22 commits into
mainfrom
phase3-namespace-android

Conversation

@alucardzom
Copy link
Copy Markdown
Contributor

@alucardzom alucardzom commented May 6, 2026

Description

INFRA-3595 Phase 3 — adds Namespace Cache Volumes integration to the Android build and setup workflows, and fixes the GRADLE_USER_HOME path mismatch that blocked Android builds on Namespace runners.

Changes:

  1. build-android-e2e.yml: Make GRADLE_USER_HOME conditional on runner_provider (/home/runner/_work/.gradle on Namespace vs /home/admin/_work/.gradle on Cirrus). Add nscloud-cache-action covering yarn, .metamask, node_modules, .yarn/cache, and Gradle caches/wrapper. Gate all 4 cirruslabs/cache steps on runner_provider != 'namespace'. On Namespace, always run full build (no APK fingerprint cache since cirruslabs/cache is skipped).

  2. setup-node-modules.yml: Add nscloud-cache-action before dependency install. Make setup-node cache: conditional (disabled on Namespace). Gate .metamask actions/cache on current path.

Both cache paths coexist for rollback safety. When runner_provider: current (default on all triggers), all ternaries collapse to prior values — cirruslabs/cache runs, nscloud-cache-action is skipped. Behavior is byte-identical to the base branch.

Builds on Phase 1 (PR #29716) and Phase 0 (PR #29557).

Changelog

CHANGELOG entry: null

Related issues

Fixes: INFRA-3595 (parent epic INFRA-3511)
Refs: INFRA-3592 (Phase 0, PR #29557), INFRA-3593 (Phase 1, PR #29716)

Manual testing steps

Feature: Android build on Namespace runners

  Scenario: dispatch with namespace provider — Android build succeeds
    Given the branch phase3-namespace-android
    When user runs `gh workflow run ci.yml --ref phase3-namespace-android -f runner_provider=namespace`
    Then Build Android E2E APKs succeeds on namespace-profile-metamask-android-build
    And GRADLE_USER_HOME resolves to /home/runner/_work/.gradle
    And nscloud-cache-action caches Gradle deps and yarn/node_modules
    And cirruslabs/cache steps are skipped

  Scenario: dispatch with current provider — byte-identical to base
    Given the branch phase3-namespace-android
    When user runs `gh workflow run ci.yml --ref phase3-namespace-android -f runner_provider=current`
    Then Build Android E2E APKs succeeds on Cirrus ubuntu-runner-amd64
    And GRADLE_USER_HOME resolves to /home/admin/_work/.gradle
    And cirruslabs/cache steps run as before
    And nscloud-cache-action steps are skipped

  Scenario: implicit current via PR/push trigger
    Given a push or pull_request event (no workflow_dispatch)
    Then inputs.runner_provider is undefined/empty
    And all ternaries collapse to existing behavior

Screenshots/Recordings

Before

N/A

After

N/A — CI infrastructure PR, no UI surface.

Pre-merge author checklist

Performance checks (if applicable)

N/A — workflow YAML only, no app code.

Pre-merge reviewer checklist

  • I've manually tested the PR (e.g. pull and build branch, run the app, test code being changed).
  • I confirm that this PR addresses all acceptance criteria described in the ticket it closes and includes the necessary testing evidence such as recordings and or screenshots.

Made with Cursor


Note

Medium Risk
Moderate risk because it changes runner selection and caching behavior across core CI/build/E2E workflows, which can impact build determinism and job reliability. Default behavior remains current, but the new namespace path introduces new cache tooling and environment differences (e.g., Gradle home).

Overview
Adds an opt-in runner_provider switch across CI, build, and E2E workflows to route jobs to Namespace runner profiles (new labels added to actionlint.yaml) instead of the existing GitHub/Cirrus runners.

When runner_provider=namespace, enables namespacelabs/nscloud-cache-action for Yarn/node_modules/.metamask (and Gradle caches for Android), disables existing actions/cache/cirruslabs/cache steps where incompatible, and adjusts Android GRADLE_USER_HOME to the Namespace filesystem layout. Manual dispatch workflows (e.g., ci.yml, build.yml, Expo dev build, and E2E regression runs) now expose runner_provider as an input and forward it through reusable workflows.

Reviewed by Cursor Bugbot for commit 207bf39. Bugbot is set up for automated code reviews on this repo. Configure here.

jluque0101 and others added 22 commits April 30, 2026 17:17
  Add metamask-ci-linux profile label, a placeholder for the canonical
  Namespace Linux label (to be replaced before the trial dispatch with
  runner_provider: namespace), and the common nscloud-ubuntu-* inline
  labels so Phase 2 can pick any of them without a follow-up config edit.

  Phase 0 of INFRA-3592. No workflow references these labels yet.
…-4 entry points

Adds the choice input current|namespace (default current) to the five
Phase 1-4 entry-point workflows. No runs-on or job behavior changes
yet — caller forwarding and runs-on ternary land in a follow-up commit.

Phase 0 of INFRA-3592.
…eusables

Adds the optional string input runner_provider (default current) to the
seven Phase 1-4 reusable workflows. Phase 7 reusables (runway-*, nightly,
testflight, etc.) are intentionally not modified — they continue to call
without forwarding, and the default keeps behavior byte-identical.

Phase 0 of INFRA-3592.
Adds with: runner_provider: ${{ inputs.runner_provider }} at every
in-scope caller site (55 sites across 7 caller workflows). Two iOS
build-ios-e2e.yml call sites had no with: block; a new minimal one
is added for them.

Phase 7 caller sites are intentionally not modified — push-eas-update,
nightly-build, runway-*, build-and-upload-to-testflight, build-rc-auto
continue to call without forwarding, the callee defaults to current,
and behavior is byte-identical.

Behavior is unchanged at this point: no runs-on consumes runner_provider
yet — that lands in I.3b.

Phase 0 of INFRA-3592.
Replaces every runs-on line in the in-scope Phase 1-4 workflows with the
additive ternary:

  runs-on: ${{ inputs.runner_provider == 'namespace' && 'nscloud-PLACEHOLDER-CONFIRM-LABEL' || <existing> }}

Where <existing> is the previous literal label or expression. Three sites
already had a ${{ ... }} platform ternary (build.yml setup-dependencies,
run-e2e-workflow.yml test-e2e-mobile, setup-node-modules.yml setup); for
those the existing expression is preserved verbatim inside the
runner_provider == 'namespace' || branch.

29 sites across 10 workflows. With runner_provider: current (the default
on every existing trigger), each ternary collapses to its prior literal
and behavior is byte-identical. The 'namespace' branch points at the
PLACEHOLDER label by design — replacement happens before any
runner_provider: namespace dispatch (see .phase0/namespace-artifacts.md).

Phase 0 of INFRA-3592.
…abels

Resolves Q1 of INFRA-3592 Phase 0. The four profile labels confirmed
live in the metamask Namespace workspace (format: namespace-profile-<name>):

  - namespace-profile-metamask-ci-linux       (Linux CI — Phase 1)
  - namespace-profile-metamask-android-build  (Android — Phase 3)
  - namespace-profile-metamask-ios-build      (iOS build / xl — Phase 4)
  - namespace-profile-metamask-ios-e2e        (iOS E2E test — Phase 4)

Each runs-on ternary now points at the profile that matches the existing
runner class (ubuntu-latest → ci-linux; macos-latest → ios-build; Cirrus
ubuntu-runner-amd64 → android-build; Cirrus macos-runner:tahoe-xl →
ios-build; Cirrus macos-runner:tahoe → ios-e2e). The three pre-existing
platform-driven dynamic expressions are preserved in both branches of
the ternary so Namespace dispatch follows the same iOS/Android branching
as the current runner choice.

actionlint.yaml drops the speculative nscloud-* and metamask-ci-linux
labels (never used) and registers the four canonical labels above.

Behavior on runner_provider: current is unchanged (every ternary still
collapses to its prior literal/expression).

Phase 0 of INFRA-3592.
…g for Phase 1 Linux CI trial

Add nscloud-cache-action before dependency installation in 8 Linux CI
jobs when runner_provider is 'namespace', and disable actions/setup-node
Yarn caching on the Namespace path to avoid duplicate network-backed
cache traffic.

Jobs with both nscloud-cache-action and conditional cache: yarn:
  dedupe, git-safe-dependencies, scripts, js-bundle-size-check,
  unit-tests, merge-unit-and-component-view-tests, component-view-tests

Jobs with nscloud-cache-action only (no cache: yarn to modify):
  sonar-cloud

Jobs with actions/cache for node_modules (component-view-tests,
merge-unit-and-component-view-tests) are gated to skip on Namespace
since nscloud-cache-action already covers node_modules.

Phase 1 of INFRA-3593 / parent epic INFRA-3511.

Co-authored-by: Cursor <cursoragent@cursor.com>
…scloud cache paths

nscloud-cache-action mounts each path as a directory, so listing
.yarn/install-state.gz as a path creates a directory mount where Yarn
expects a file, causing EISDIR errors on yarn install.

Replace .yarn/cache and .yarn/install-state.gz with .yarn in all 8
nscloud-cache-action steps. This only affects the Namespace path
(guarded by inputs.runner_provider == 'namespace'); the GitHub-hosted
runner path with actions/cache is unchanged.

Co-authored-by: Cursor <cursoragent@cursor.com>
…tted files

Mounting .yarn as a cache volume hides committed subdirectories
(.yarn/releases/, .yarn/patches/, .yarn/plugins/) that are checked into
git, causing "Cannot find module yarn-4.10.3.cjs" errors.

Use .yarn/cache (runtime-generated package tarballs) instead. The
install-state.gz file is not cached on Namespace — Yarn regenerates
it during install with negligible overhead.

Co-authored-by: Cursor <cursoragent@cursor.com>
Resolve conflict in ci.yml: main removed the all-jobs-pass job and
refactored check-all-jobs-pass (PR #29619). Keep main's refactored
structure and apply the runner_provider ternary to the new
check-all-jobs-pass runs-on.

Co-authored-by: Cursor <cursoragent@cursor.com>
…ng on Namespace

The expression `inputs.runner_provider == 'namespace' && '' || 'yarn'`
always evaluates to 'yarn' because empty string is falsy in GitHub
Actions expressions: true && '' produces '', then '' || 'yarn' falls
through to 'yarn'.

Invert to `inputs.runner_provider != 'namespace' && 'yarn' || ''` so
the cache is correctly disabled when runner_provider is namespace.

Co-authored-by: Cursor <cursoragent@cursor.com>
…vent OOM

Namespace metamask-ci-linux profile (8x16, 16GB RAM) lacks swap space
unlike GitHub-hosted ubuntu-latest runners. NODE_OPTIONS with
max_old_space_size=20480 (20GB) causes OOM kills (SIGKILL) on Jest
workers.

Conditionally lower to 12288 (12GB) when runner_provider is namespace.
The current path retains 20480 unchanged.

Affects: unit-tests (10 shards), component-view-tests (2 shards).
Co-authored-by: Cursor <cursoragent@cursor.com>
…oid E2E build

- Make GRADLE_USER_HOME conditional on runner_provider: use
  /home/runner/_work/.gradle on Namespace (runs as user runner) vs
  /home/admin/_work/.gradle on Cirrus (runs as user admin).
- Add nscloud-cache-action covering yarn, .metamask, node_modules,
  .yarn/cache, and Gradle caches/wrapper when runner_provider is
  namespace.
- Gate all 4 cirruslabs/cache steps (APK fingerprint + Gradle deps)
  on runner_provider != namespace. Both cache paths coexist for
  rollback safety.
- Gate .metamask actions/cache on runner_provider != namespace
  (nscloud-cache-action covers it).
- On Namespace, always run full build (no APK fingerprint cache hit
  logic since cirruslabs/cache is skipped).

Phase 3 of INFRA-3595 / parent epic INFRA-3511.

Co-authored-by: Cursor <cursoragent@cursor.com>
Add nscloud-cache-action before dependency install, covering yarn,
.metamask, node_modules, and .yarn/cache. Make setup-node cache:
conditional (disabled on Namespace). Gate .metamask actions/cache on
current path. Foundry install runs on Namespace regardless of cache
state since nscloud-cache-action covers .metamask transparently.

Co-authored-by: Cursor <cursoragent@cursor.com>
@alucardzom alucardzom self-assigned this May 6, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 6, 2026

CLA Signature Action: All authors have signed the CLA. You may need to manually re-run the blocking PR check if it doesn't pass in a few minutes.

@github-actions github-actions Bot added the size-M label May 6, 2026
@alucardzom alucardzom added skip-smart-e2e-selection Skip Smart E2E selection, i.e. select all E2E tests to run team-dev-ops DevOps team labels May 6, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 6, 2026

🔍 Smart E2E Test Selection

⏭️ Smart E2E selection skipped - skip-smart-e2e-selection label found

All E2E tests pre-selected.

View GitHub Actions results

@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented May 6, 2026

@alucardzom alucardzom marked this pull request as ready for review May 6, 2026 10:49
@alucardzom alucardzom requested review from a team as code owners May 6, 2026 10:49
Copy link
Copy Markdown
Contributor

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 207bf39. Configure here.

Comment thread .github/workflows/ci.yml
check-diff:
name: Check diff
runs-on: macos-latest
runs-on: ${{ inputs.runner_provider == 'namespace' && 'namespace-profile-metamask-ios-build' || 'macos-latest' }}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check-diff job missing Namespace cache setup

Medium Severity

The check-diff job has its runs-on updated to use a Namespace runner when runner_provider == 'namespace', but unlike every other similarly modified job in ci.yml, it is missing the nscloud-cache-action step, the conditional cache: parameter on actions/setup-node (still hardcoded to yarn), and the conditional gating of the .metamask actions/cache step. On Namespace runners, this job will run without the persistent disk cache that all other jobs use, creating an inconsistent caching strategy and causing slower builds. Compare to e.g. the dedupe or unit-tests jobs which received the full Namespace cache treatment.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 207bf39. Configure here.

@alucardzom
Copy link
Copy Markdown
Contributor Author

Verification evidence

All three validation scenarios have been executed and passed:

Scenario Run Result
Current path (PR trigger) 25421406400 All Android jobs passed on Cirrus (27/27 E2E shards, APK build, all Linux CI jobs). One iOS E2E flake re-running.
Namespace path (manual dispatch) 25427754071 Android build + all 27 E2E smoke shards passed on namespace-profile-metamask-android-build (16x32).
Rollback drill (runner_provider=current) 25430829673 Completed successfully. All jobs on Cirrus/GitHub-hosted runners.

Key findings during validation

  • metamask-android-build profile required scale-up from 8x16 to 16x32 due to Gradle daemon OOM (Cirrus lg runner has 48GB)
  • GRADLE_USER_HOME conditional fix confirmed working (/home/runner/_work/.gradle on Namespace vs /home/admin/_work/.gradle on Cirrus)
  • One transient Gradle Plugin Portal 503 outage resolved on retry
  • Unit test shard 8 OOM on Namespace is a pre-existing issue from Phase 1 (not related to Android changes)

@github-project-automation github-project-automation Bot moved this to Needs dev review in PR review queue May 6, 2026
@github-project-automation github-project-automation Bot moved this from Needs dev review to Review finalised - Ready to be merged in PR review queue May 7, 2026
@alucardzom alucardzom added this pull request to the merge queue May 7, 2026
Merged via the queue into main with commit d55db0f May 7, 2026
565 of 649 checks passed
@alucardzom alucardzom deleted the phase3-namespace-android branch May 7, 2026 12:59
@github-project-automation github-project-automation Bot moved this from Review finalised - Ready to be merged to Merged, Closed or Archived in PR review queue May 7, 2026
@github-actions github-actions Bot locked and limited conversation to collaborators May 7, 2026
@metamaskbotv2 metamaskbotv2 Bot added the release-7.77.0 Issue or pull request that will be included in release 7.77.0 label May 7, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

release-7.77.0 Issue or pull request that will be included in release 7.77.0 size-M skip-smart-e2e-selection Skip Smart E2E selection, i.e. select all E2E tests to run team-dev-ops DevOps team

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

4 participants