Skip to content

Commit 39ac602

Browse files
authored
ci: Add rolling update kind testing (#480)
1 parent 628264f commit 39ac602

File tree

11 files changed

+1872
-2
lines changed

11 files changed

+1872
-2
lines changed

.github/workflows/README.md

Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,7 @@ Release
4949
| [`docker-build.yml`](docker-build.yml) | Reusable Docker build | Called by other workflows, releases |
5050
| [`kind-testing.yml`](kind-testing.yml) | Kind cluster integration tests | Called by build-and-test, nightly |
5151
| [`performance-testing.yml`](performance-testing.yml) | Performance benchmarks | Manual, nightly |
52+
| [`rolling-upgrade-testing.yml`](rolling-upgrade-testing.yml) | Enclave rolling upgrade (mixed-version) perf tests | Manual only |
5253
| [`pr-preview-deploy.yml`](pr-preview-deploy.yml) | Ephemeral PR environments | Called by build-and-test |
5354
| [`pr-preview-destroy.yml`](pr-preview-destroy.yml) | Cleanup PR environments | PR close, label removal, nightly |
5455

@@ -539,6 +540,63 @@ The workflow now uses `ci/scripts/deploy.sh` which:
539540

540541
---
541542

543+
## 🔄 Rolling Upgrade Testing
544+
545+
[`.github/workflows/rolling-upgrade-testing.yml`](rolling-upgrade-testing.yml)
546+
547+
End-to-end test of **partial rolling upgrades** for `thresholdWithEnclave`: deploy 13 parties on an **old** KMS Core image, upgrade two configurable batches to a **new** image, and run Argo performance workflows in **mixed-version** states (default progression: all old → 5/13 upgraded → 9/13 upgraded). Validates per-party AWS KMS policies, dual `trustedReleases` PCRs for TLS, and selective Helm upgrades via [`ci/scripts/rolling_upgrade.sh`](../../ci/scripts/rolling_upgrade.sh).
548+
549+
### Trigger Types
550+
551+
| Trigger | Timing | Purpose |
552+
|---------|--------|---------|
553+
| 🔄 **Manual Dispatch** | On demand | Rolling upgrade scenarios with chosen old/new tags and batches |
554+
555+
### Workflow Parameters
556+
557+
| Parameter | Default | Purpose |
558+
|-----------|---------|---------|
559+
| **old_image_tag** | (required) | Baseline KMS Core image tag for the initial full deploy |
560+
| **new_image_tag** | (required) | Target KMS Core image tag for upgraded parties (ignored when `build=true`) |
561+
| **build** | `false` | Build a new image with `docker-build.yml`; use build output as the new tag |
562+
| **kms_branch** | (optional) | Branch for `build=true` and/or chart checkout when `new_kms_chart_version` is `repository` |
563+
| **fhe_params** | `Test` | `Default` or `Test` — FHE parameters for Argo keygen/preprocessing |
564+
| **old_kms_chart_version** | `1.5.1` | KMS Helm chart version for the all-old deployment |
565+
| **new_kms_chart_version** | `repository` | KMS Helm chart for upgraded parties; version string or `repository` for repo charts |
566+
| **tkms_infra_chart_version** | `0.3.2` | TKMS Infra Helm chart version |
567+
| **first_batch_parties** | `1,2,3,4,5` | Comma-separated party IDs for the first upgrade wave |
568+
| **second_batch_parties** | `6,7,8,9` | Comma-separated party IDs for the second upgrade wave |
569+
570+
### Jobs
571+
572+
| Job | Purpose | Notes |
573+
|-----|---------|--------|
574+
| **docker-build** | Optional image build | Runs only when `build=true`; calls reusable `docker-build.yml` |
575+
| **start-runner** | EC2 runner (SLAB) | `small-instance` profile for the long test job |
576+
| **rolling-upgrade-testing** | Deploy, baseline perf, two upgrade batches, mixed perf, cleanup | Uses `aws-perf`, namespace `kms-ci`, Argo workflows under `ci/perf-testing/argo-workflow/` |
577+
| **stop-runner** | Tear down EC2 runner | Runs `always()` after the main job |
578+
579+
### Job and step flow
580+
581+
```mermaid
582+
graph TD
583+
dispatch[workflow_dispatch] --> buildChoice{build_true}
584+
buildChoice -->|yes| dockerBuild[docker_build]
585+
buildChoice -->|no| startRunner[start_runner]
586+
dockerBuild --> startRunner
587+
startRunner --> mainJob[rolling_upgrade_testing]
588+
mainJob --> step1[deploy_13_nodes_old]
589+
step1 --> step2[baseline_perf_tests]
590+
step2 --> step3[upgrade_first_batch]
591+
step3 --> step4[perf_tests_mixed_first_batch]
592+
step4 --> step5[upgrade_second_batch]
593+
step5 --> step6[perf_tests_mixed_second_batch]
594+
step6 --> cleanup[cleanup]
595+
mainJob --> stopRunner[stop_runner]
596+
```
597+
598+
---
599+
542600
## 🛠️ Reusable Workflow Infrastructure
543601

544602
### 1. 🖥️ Big Instance Testing

0 commit comments

Comments
 (0)