Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
168cc3d
ci: Add rolling update kind testing
fegmorte Mar 25, 2026
8efacd2
chore: add pr trigger
fegmorte Mar 26, 2026
f752aa6
chore: fix typo for TLS
fegmorte Mar 26, 2026
27bdc39
fix: Fix docker login
fegmorte Mar 26, 2026
dae6cea
Merge branch 'main' into fred/ci/testing-rolling-update-mix
fegmorte Mar 26, 2026
4e04a17
fix: Fix docker login
fegmorte Mar 26, 2026
09e1a76
fix: Fix chart version for old and new version
fegmorte Mar 26, 2026
fe39b0d
fix: Fix chart version for old and new version
fegmorte Mar 26, 2026
d46fb86
fix: fix ghcr url
fegmorte Mar 26, 2026
6587bfe
fix: fix ghcr url
fegmorte Mar 26, 2026
b76e57b
fix: fix image for upgrade and helm uninstall
fegmorte Mar 26, 2026
2fc9cdf
fix: fix chart version for kms version upgrade
fegmorte Mar 26, 2026
b68afa6
fix: fix core-client command with -a
fegmorte Mar 26, 2026
4c668c4
fix: add new workflow for rolling update
fegmorte Mar 26, 2026
74720f7
fix: update workflow summary
fegmorte Mar 26, 2026
84e236b
fix: update workflow without -a
fegmorte Mar 26, 2026
5ec24d1
Merge branch 'main' into fred/ci/testing-rolling-update-mix
fegmorte Mar 27, 2026
14a7160
fix: try with 4 parties
fegmorte Mar 27, 2026
dc96fa7
Merge branch 'fred/ci/testing-rolling-update-mix' of github.com:zama-…
fegmorte Mar 27, 2026
387d0e0
fix: try with 4 parties
fegmorte Mar 27, 2026
846753a
fix: update core-client v0.13.10-rc.2
fegmorte Mar 27, 2026
f597efe
fix: update core-client v0.13.10-rc.2
fegmorte Mar 27, 2026
86c8172
fix: test with migration fix 26e1273
fegmorte Mar 28, 2026
1a57a54
fix: fix zizmor and remove pr trigger
fegmorte Mar 30, 2026
6cf97fa
Merge branch 'main' into fred/ci/testing-rolling-update-mix
fegmorte Mar 30, 2026
ddaa8d7
fix: upgrade readme and comment
fegmorte Mar 30, 2026
2baa2f7
Merge branch 'main' into fred/ci/testing-rolling-update-mix
fegmorte Mar 30, 2026
0c5c96d
Merge branch 'main' into fred/ci/testing-rolling-update-mix
fegmorte Apr 1, 2026
fda3467
Merge branch 'main' into fred/ci/testing-rolling-update-mix
fegmorte Apr 2, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
58 changes: 58 additions & 0 deletions .github/workflows/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,7 @@ Release
| [`docker-build.yml`](docker-build.yml) | Reusable Docker build | Called by other workflows, releases |
| [`kind-testing.yml`](kind-testing.yml) | Kind cluster integration tests | Called by build-and-test, nightly |
| [`performance-testing.yml`](performance-testing.yml) | Performance benchmarks | Manual, nightly |
| [`rolling-upgrade-testing.yml`](rolling-upgrade-testing.yml) | Enclave rolling upgrade (mixed-version) perf tests | Manual only |
| [`pr-preview-deploy.yml`](pr-preview-deploy.yml) | Ephemeral PR environments | Called by build-and-test |
| [`pr-preview-destroy.yml`](pr-preview-destroy.yml) | Cleanup PR environments | PR close, label removal, nightly |

Expand Down Expand Up @@ -539,6 +540,63 @@ The workflow now uses `ci/scripts/deploy.sh` which:

---

## 🔄 Rolling Upgrade Testing

[`.github/workflows/rolling-upgrade-testing.yml`](rolling-upgrade-testing.yml)

End-to-end test of **partial rolling upgrades** for `thresholdWithEnclave`: deploy 13 parties on an **old** KMS Core image, upgrade two configurable batches to a **new** image, and run Argo performance workflows in **mixed-version** states (default progression: all old → 5/13 upgraded → 9/13 upgraded). Validates per-party AWS KMS policies, dual `trustedReleases` PCRs for TLS, and selective Helm upgrades via [`ci/scripts/rolling_upgrade.sh`](../../ci/scripts/rolling_upgrade.sh).

### Trigger Types

| Trigger | Timing | Purpose |
|---------|--------|---------|
| 🔄 **Manual Dispatch** | On demand | Rolling upgrade scenarios with chosen old/new tags and batches |

### Workflow Parameters

| Parameter | Default | Purpose |
|-----------|---------|---------|
| **old_image_tag** | (required) | Baseline KMS Core image tag for the initial full deploy |
| **new_image_tag** | (required) | Target KMS Core image tag for upgraded parties (ignored when `build=true`) |
| **build** | `false` | Build a new image with `docker-build.yml`; use build output as the new tag |
| **kms_branch** | (optional) | Branch for `build=true` and/or chart checkout when `new_kms_chart_version` is `repository` |
| **fhe_params** | `Test` | `Default` or `Test` — FHE parameters for Argo keygen/preprocessing |
| **old_kms_chart_version** | `1.5.1` | KMS Helm chart version for the all-old deployment |
| **new_kms_chart_version** | `repository` | KMS Helm chart for upgraded parties; version string or `repository` for repo charts |
| **tkms_infra_chart_version** | `0.3.2` | TKMS Infra Helm chart version |
| **first_batch_parties** | `1,2,3,4,5` | Comma-separated party IDs for the first upgrade wave |
| **second_batch_parties** | `6,7,8,9` | Comma-separated party IDs for the second upgrade wave |

### Jobs

| Job | Purpose | Notes |
|-----|---------|--------|
| **docker-build** | Optional image build | Runs only when `build=true`; calls reusable `docker-build.yml` |
| **start-runner** | EC2 runner (SLAB) | `small-instance` profile for the long test job |
| **rolling-upgrade-testing** | Deploy, baseline perf, two upgrade batches, mixed perf, cleanup | Uses `aws-perf`, namespace `kms-ci`, Argo workflows under `ci/perf-testing/argo-workflow/` |
| **stop-runner** | Tear down EC2 runner | Runs `always()` after the main job |

### Job and step flow

```mermaid
graph TD
dispatch[workflow_dispatch] --> buildChoice{build_true}
buildChoice -->|yes| dockerBuild[docker_build]
buildChoice -->|no| startRunner[start_runner]
dockerBuild --> startRunner
startRunner --> mainJob[rolling_upgrade_testing]
mainJob --> step1[deploy_13_nodes_old]
step1 --> step2[baseline_perf_tests]
step2 --> step3[upgrade_first_batch]
step3 --> step4[perf_tests_mixed_first_batch]
step4 --> step5[upgrade_second_batch]
step5 --> step6[perf_tests_mixed_second_batch]
step6 --> cleanup[cleanup]
mainJob --> stopRunner[stop_runner]
```

---

## 🛠️ Reusable Workflow Infrastructure

### 1. 🖥️ Big Instance Testing
Expand Down
Loading
Loading