ci(INFRA-3631): Phase 5d — Shadow CI benchmark workflow for Namespace evaluation (#30158)

alucardzom · cursoragent · bsgrigorov · web-flow · commit 0b33167e2aae · 2026-05-15T10:57:36.000Z
## **Description** INFRA-3631 Phase 5d — Adds a shadow CI workflow that runs the full CI pipeline on Namespace runners via `workflow_dispatch`, enabling side-by-side performance and reliability comparison with the current Cirrus/GitHub runner path. **Changes:** ### Shadow CI Workflow - **`ci-namespace-shadow.yml`**: Calls `ci.yml` with `runner_provider: namespace` via `workflow_call`. Automatic triggers (PR, push, hourly cron) are commented out for initial validation — only `workflow_dispatch` is active. Ready to enable after review. - **`ci.yml`**: Added `workflow_call:` trigger with `runner_provider` input (string, default `current`). The 44 existing `inputs.runner_provider` references already handle undefined gracefully, so this is backward-compatible. ### Benchmark Script - **`scripts/namespace-benchmark.sh`**: Collects per-job p50/p95 wall-clock durations comparing `ci.yml` vs `ci-namespace-shadow.yml` runs over a configurable time window. ### Rollback Safety - All Namespace-specific logic gated on `inputs.runner_provider == 'namespace'` - Shadow workflow concurrency group uses caller's workflow name, so it never cancels normal CI runs - `runner_provider=current` path unchanged and validated ### Includes Phase 5 (INFRA-3597) cache architecture This branch is based on `phase5/cache-and-artifacts` (PR #29886) rebased on latest `main`, so it includes the full cache and artifact architecture work. The shadow-specific changes are in the last commit. ## **Validation Runs** | Run | Provider | Result | |-----|----------|--------| | [25826373445](https://github.com/MetaMask/metamask-mobile/actions/runs/25826373445) | namespace | 81 success, Android 27/27 E2E pass, iOS build pass, 2 iOS E2E flakes (pre-existing) | | [25828982183](https://github.com/MetaMask/metamask-mobile/actions/runs/25828982183) | current | Rollback validation (in progress) | ## **Changelog** CHANGELOG entry: null ## **Related issues** Fixes: INFRA-3631 (parent epic INFRA-3511) ## **Manual testing steps** ```gherkin Feature: INFRA-3631 Shadow CI benchmark workflow Scenario: Current runner path is unchanged Given ci.yml runs with runner_provider current (or default) When all jobs execute Then the same runner selection and caching behavior runs as before And no shadow or namespace-specific logic is triggered Scenario: Namespace runner path works via workflow_call Given ci.yml is called with runner_provider namespace (via workflow_dispatch or workflow_call) When all jobs execute Then Namespace runner profiles are selected for all jobs And nscloud-cache-action is used for caching And all Android E2E shards pass Scenario: Shadow workflow calls ci.yml correctly Given ci-namespace-shadow.yml is dispatched (requires merge to main) When the shadow-ci job invokes ci.yml with runner_provider namespace Then the full CI pipeline runs on Namespace runners And the shadow run does not cancel or interfere with normal CI runs Scenario: Benchmark script produces comparison data Given at least one successful ci.yml and ci-namespace-shadow.yml run exist When scripts/namespace-benchmark.sh is executed Then per-job p50 and p95 durations are printed for both workflows ``` ## **Screenshots/Recordings** N/A — CI infrastructure PR. ## **Pre-merge author checklist** - [x] I've followed MetaMask Contributor Docs and Coding Standards. - [x] I've completed the PR template to the best of my ability - [x] I've included tests if applicable - [x] I've documented my code using JSDoc format if applicable - [x] I've applied the right labels on the PR ## **Pre-merge reviewer checklist** - [ ] I've manually tested the PR - [ ] I confirm that this PR addresses all acceptance criteria Made with [Cursor](https://cursor.com)  --- > [!NOTE] > **Medium Risk** > Touches GitHub Actions orchestration by introducing a new scheduled/PR-triggered shadow workflow and making `ci.yml` reusable via `workflow_call`, which could affect CI load or execution paths if misconfigured, though it remains advisory and defaults preserve existing behavior. > > **Overview** > Adds a new **advisory** GitHub Actions workflow, `ci-namespace-shadow.yml`, that runs the full CI pipeline by calling `ci.yml` with `runner_provider: namespace` (including PR/push/hourly schedule triggers) to benchmark Namespace runners without gating merges. > > Updates `ci.yml` to support `workflow_call` with a `runner_provider` input (defaulting to `current`) so it can be invoked by the shadow workflow, and adds `scripts/namespace-benchmark.sh` to compute per-job p50/p95 wall-clock times across `ci.yml` vs `ci-namespace-shadow.yml` runs. Also documents `[shadow]` CI jobs in `CONTRIBUTING.md`. > > <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit 0a3a5cc. Bugbot is set up for automated code reviews on this repo. Configure [here](https://www.cursor.com/dashboard/bugbot).</sup>  --------- Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Borislav Grigorov <11405770+bsgrigorov@users.noreply.github.com>
diff --git a/.github/CONTRIBUTING.md b/.github/CONTRIBUTING.md
@@ -23,4 +23,8 @@ When you're done with your project / bugfix / feature and ready to submit a PR,
 - [ ] **Get the PR reviewed by code owners**: At least two code owner approvals are mandatory before merging any PR.
 - [ ] **Ensure the PR is correctly labeled.**: More detail about labels definitions can be found [here](https://github.com/MetaMask/metamask-mobile/blob/main/.github/guidelines/LABELING_GUIDELINES.md).
 
+### Shadow CI jobs
+
+CI jobs prefixed with `[shadow]` (e.g., from `ci-namespace-shadow.yml`) are **advisory only** and never gate merge. They run the same test suite on Namespace runners for performance benchmarking. If a shadow job fails, it does not indicate a problem with your PR -- it reflects the state of the Namespace runner migration trial.
+
 And that's it! Thanks for helping out.
diff --git a/.github/workflows/ci-namespace-shadow.yml b/.github/workflows/ci-namespace-shadow.yml
@@ -0,0 +1,31 @@
+name: CI (Namespace shadow)
+
+on:
+  pull_request:
+    types: [opened, synchronize, reopened, ready_for_review]
+    paths-ignore:
+      - 'docs/**'
+      - '**/*.md'
+      - '.github/CODEOWNERS'
+  push:
+    branches: [main]
+  schedule:
+    - cron: '0 * * * *'
+  workflow_dispatch:
+
+concurrency:
+  group: ns-shadow-${{ github.workflow }}-${{ github.ref }}
+  cancel-in-progress: true
+
+permissions:
+  contents: read
+  actions: read
+  id-token: write
+
+jobs:
+  shadow-ci:
+    name: '[shadow] CI'
+    uses: ./.github/workflows/ci.yml
+    with:
+      runner_provider: namespace
+    secrets: inherit
diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
@@ -13,6 +13,12 @@ on:
     # Run the full suite "overnight," once every hour from 2:00am UTC until 6:00am UTC.
     # This helps to identy the flaky and failed tests on main branch
     - cron: '0 2-6 * * *'
+  workflow_call:
+    inputs:
+      runner_provider:
+        type: string
+        required: false
+        default: current
   workflow_dispatch:
     inputs:
       runner_provider:
diff --git a/scripts/namespace-benchmark.sh b/scripts/namespace-benchmark.sh
@@ -0,0 +1,52 @@
+#!/usr/bin/env bash
+# Usage: scripts/namespace-benchmark.sh [hours-ago]
+# Prints median + p95 wall-clock per job for current vs namespace shadow
+# over the last N hours (default 24).
+
+set -euo pipefail
+
+HOURS=${1:-24}
+REPO=MetaMask/metamask-mobile
+
+# macOS date vs GNU date
+if date -v-1H +%s >/dev/null 2>&1; then
+  SINCE=$(date -u -v-${HOURS}H +%Y-%m-%dT%H:%M:%SZ)
+else
+  SINCE=$(date -u -d "${HOURS} hours ago" +%Y-%m-%dT%H:%M:%SZ)
+fi
+
+echo "Comparing runs since ${SINCE} (last ${HOURS}h)"
+echo
+
+for WF in ci.yml ci-namespace-shadow.yml; do
+  echo "=== ${WF} ==="
+
+  gh run list --repo "${REPO}" --workflow "${WF}" --status success --limit 100 \
+    --json databaseId,createdAt \
+    --jq "[.[] | select(.createdAt >= \"${SINCE}\")] | length" \
+  | xargs -I{} echo "  Successful runs in window: {}"
+
+  gh run list --repo "${REPO}" --workflow "${WF}" --status success --limit 100 \
+    --json databaseId,createdAt \
+    --jq "[.[] | select(.createdAt >= \"${SINCE}\") | .databaseId][]" \
+  | while read -r RUN_ID; do
+      gh run view "${RUN_ID}" --repo "${REPO}" --json jobs \
+        --jq '.jobs[] | select(.conclusion == "success" and .startedAt != "0001-01-01T00:00:00Z" and .completedAt != "0001-01-01T00:00:00Z") | [.name, ((.completedAt[:19] | strptime("%Y-%m-%dT%H:%M:%S") | mktime) - (.startedAt[:19] | strptime("%Y-%m-%dT%H:%M:%S") | mktime))] | @tsv' 2>/dev/null
+    done \
+  | awk -F'\t' '
+      { sum[$1]+=$2; n[$1]++; data[$1]=data[$1]" "$2 }
+      END {
+        for (j in sum) {
+          split(data[j], arr, " ")
+          asort(arr)
+          k=n[j]
+          p50=arr[int(k*0.5)+1]
+          p95=arr[int(k*0.95)+1]
+          printf "  %-55s n=%-4d p50=%6.0fs  p95=%6.0fs\n", j, k, p50, p95
+        }
+      }
+    ' \
+  | sort
+
+  echo
+done