feat: add e2e tests for deployment policy #112

t0mmylam · 2025-10-30T15:16:30Z

Adds comprehensive end-to-end tests for DeploymentPolicy functionality, including multi-compartment rollouts, various strategies (Fixed, Linear, Exponential), and backwards compatibility with InterruptionBudget.

Changes

4 new E2E test suites covering core deployment policy features
Prometheus metrics validation for all 8 rollout metrics
CI integration with new deployment-policy-tests job
Makefile targets for easy local testing
Podman keyring quota fix for macOS and Linux

Testing

All 4 test suites passing locally
Metrics validation working correctly
CI job configured and ready to run
Tested on 15-node Kind cluster

Test Coverage

✅ Multi-compartment rollouts (15 nodes)
✅ Linear ramp-up strategy (8 nodes)
✅ Overlapping label selectors (6 nodes)
✅ Legacy InterruptionBudget compatibility (6 nodes)
✅ Prometheus metrics export and validation
✅ Budget enforcement (count and percent-based)
✅ Batch progression and state tracking

lockwobr

These tests are not making assertions it seems

lockwobr · 2025-10-31T16:28:17Z

k8s-tests/chainsaw/deployment-policy/legacy-compatibility/chainsaw-test.yaml

+    try:
+    - apply:
+        file: skyhook.yaml
+    - script:


Sleep is built into chainsaw
https://kyverno.github.io/chainsaw/latest/operations/sleep/#examples

lockwobr · 2025-10-31T16:30:16Z

k8s-tests/chainsaw/deployment-policy/legacy-compatibility/chainsaw-test.yaml

+    - script:
+        content: |
+          echo "=== Verifying synthetic __default__ compartment ==="
+          
+          COMPARTMENTS=$(kubectl get skyhook legacy-interruption-budget-test -o jsonpath='{.status.compartmentStatuses}' | jq -r 'keys[]')
+          echo "Compartments found: $COMPARTMENTS"
+          
+          DEFAULT=$(echo "$COMPARTMENTS" | grep -c "__default__" || echo "0")
+          
+          if [ "$DEFAULT" != "1" ]; then
+            echo "ERROR: Expected __default__ compartment to be created"
+            echo "Found compartments: $COMPARTMENTS"
+            kubectl get skyhook legacy-interruption-budget-test -o yaml
+            exit 1
+          fi
+          
+          echo "✓ Synthetic __default__ compartment created"


This should be done with an assertion I think, its cleaner and is how the other tests work.
https://kyverno.github.io/chainsaw/latest/operations/assert/#examples

riceriley59 · 2025-10-31T17:04:47Z

k8s-tests/chainsaw/metrics_test.py

Isn't there already a metrics_test.py?

riceriley59 · 2025-10-31T17:05:51Z

k8s-tests/chainsaw/deployment-policy/legacy-compatibility/assert-default-compartment.yaml

I believe there is already a script to do this as well.

Never mind there isn't, but I see this same script a few times in different tests. I would just make a top-level script that can add or remove given labels and then call that in the tests when needed instead of writing a new script for each test.

riceriley59 · 2025-10-31T17:13:36Z

k8s-tests/chainsaw/deployment-policy/linear-strategy/setup-nodes.sh

+# Label all 8 nodes as production tier
+echo "Labeling production nodes (0-7)..."
+for i in {0..7}; do
+  kubectl label node ${WORKER_ARRAY[$i]} tier=production skyhook.nvidia.com/test-node=skyhooke2e --overwrite


Same thing here with these scripts, I see it a few times, I would make a somewhat generalized script at the top level and then call that in these tests.

riceriley59 · 2025-10-31T17:39:58Z

operator/Makefile

+	@if command -v podman >/dev/null 2>&1; then \
+		echo "📝 Detected Podman - increasing kernel keyring quota..."; \
+		if [ "$$(uname)" = "Darwin" ]; then \
+			podman machine ssh sudo sysctl -w kernel.keys.maxkeys=20000 2>/dev/null || true; \


Why was this needed? Were the 15 nodes causing memory issues?

Kinda, basically Podman has default keyring limits since each container uses keys for security stuff (i think). This just bumps those numbers up so the nodes don't get into a crashloop

lockwobr · 2025-11-01T00:08:25Z

.github/workflows/operator-ci.yaml

+    steps:
+      - uses: actions/checkout@v4
+        with:
+          fetch-tags: true
+          fetch-depth: 0
+      - name: Setup Go ${{ env.GO_VERSION }}
+        uses: actions/setup-go@v5
+        with:
+          go-version: ${{ env.GO_VERSION }}
+          cache-dependency-path: operator/go.sum
+      - name: Log in to the Container registry
+        uses: docker/login-action@v3
+        with:
+          registry: ${{ env.REGISTRY }}
+          username: ${{ github.actor }}
+          password: ${{ secrets.GITHUB_TOKEN }}
+      - name: Create 15-node Kind Cluster
+        id: kind
+        uses: helm/kind-action@v1
+        with:
+          version: v0.30.0
+          node_image: kindest/node:v1.34.0
+          config: k8s-tests/chainsaw/deployment-policy/kind-config.yaml
+          cluster_name: skyhook-dp-test
+      # Cache build tools and dependencies for faster builds
+      - name: Restore cached Binaries
+        id: cached-binaries
+        uses: actions/cache/restore@v4
+        with:
+          key: ${{ env.GO_VERSION }}-${{ runner.os }}-${{ runner.arch }}-bin-${{ hashFiles('operator/deps.mk') }}
+          restore-keys: ${{ env.GO_VERSION }}-${{ runner.os }}-${{ runner.arch }}-bin-
+          path: |
+            ${{ github.workspace }}/operator/bin
+            ~/.cache/go-build
+      - name: Install dependencies
+        if: steps.cached-binaries.outputs.cache-hit != 'true'
+        run: |
+          cd operator
+          make install-deps
+      - name: Save cached Binaries
+        id: save-cached-binaries
+        if: steps.cached-binaries.outputs.cache-hit != 'true'
+        uses: actions/cache/save@v4
+        with:
+          key: ${{ env.GO_VERSION }}-${{ runner.os }}-${{ runner.arch }}-bin-${{ hashFiles('operator/deps.mk') }}
+          path: |
+            ${{ github.workspace }}/operator/bin
+            ~/.cache/go-build
+      # Run deployment policy E2E tests


This seems like a clone of the other test. Is there away we reduce this boilerplate? maybe yaml anchor? Also this seems to run pretty fast. I wonder if we make the other tests fast by breaking them up and making the parallel? maybe that should be a different PR.

We could split unit/e2e/helm into separate matrix jobs, but with 4 K8s versions that'd create 12+ parallel jobs. Maybe we could split just the e2e tests into its own matrix as a follow-up PR?

t0mmylam requested review from ayuskauskas, lockwobr, mskalka and riceriley59 as code owners October 30, 2025 15:16

feat: add e2e tests for deployment policy

c0cb322

t0mmylam changed the title ~~add e2e tests for deployment policy~~ feat: add e2e tests for deployment policy Oct 30, 2025

t0mmylam force-pushed the tests branch from aedd4f8 to c0cb322 Compare October 30, 2025 15:17

t0mmylam added 2 commits October 30, 2025 08:28

fix shell issue

cc13e9f

update wording

4c7a525

t0mmylam force-pushed the tests branch from 83ff686 to 4c7a525 Compare October 30, 2025 15:44

fix webhook timing

30274a3

lockwobr reviewed Oct 31, 2025

View reviewed changes

riceriley59 reviewed Oct 31, 2025

View reviewed changes

replace scripts with Chainsaw assertions in deployment policy tests

e7b3bfe

lockwobr reviewed Nov 1, 2025

View reviewed changes

consolidate test jobs in ci

543a98d

t0mmylam force-pushed the tests branch from 22619eb to 543a98d Compare November 3, 2025 16:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add e2e tests for deployment policy #112

feat: add e2e tests for deployment policy #112

Uh oh!

t0mmylam commented Oct 30, 2025

Uh oh!

lockwobr left a comment

Uh oh!

lockwobr Oct 31, 2025

Uh oh!

lockwobr Oct 31, 2025

Uh oh!

riceriley59 Oct 31, 2025

Uh oh!

riceriley59 Oct 31, 2025

Uh oh!

riceriley59 Oct 31, 2025

Uh oh!

riceriley59 Oct 31, 2025

Uh oh!

riceriley59 Oct 31, 2025 •

edited

Loading

Uh oh!

t0mmylam Oct 31, 2025

Uh oh!

lockwobr Nov 1, 2025

Uh oh!

t0mmylam Nov 3, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

feat: add e2e tests for deployment policy #112

Are you sure you want to change the base?

feat: add e2e tests for deployment policy #112

Uh oh!

Conversation

t0mmylam commented Oct 30, 2025

Changes

Testing

Test Coverage

Uh oh!

lockwobr left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

riceriley59 Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

t0mmylam Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

riceriley59 Oct 31, 2025 •

edited

Loading

t0mmylam Nov 3, 2025 •

edited

Loading