- 
                Notifications
    
You must be signed in to change notification settings  - Fork 3
 
feat: add e2e tests for deployment policy #112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests are not making assertions it seems
| try: | ||
| - apply: | ||
| file: skyhook.yaml | ||
| - script: | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sleep is built into chainsaw
https://kyverno.github.io/chainsaw/latest/operations/sleep/#examples
| - script: | ||
| content: | | ||
| echo "=== Verifying synthetic __default__ compartment ===" | ||
| COMPARTMENTS=$(kubectl get skyhook legacy-interruption-budget-test -o jsonpath='{.status.compartmentStatuses}' | jq -r 'keys[]') | ||
| echo "Compartments found: $COMPARTMENTS" | ||
| DEFAULT=$(echo "$COMPARTMENTS" | grep -c "__default__" || echo "0") | ||
| if [ "$DEFAULT" != "1" ]; then | ||
| echo "ERROR: Expected __default__ compartment to be created" | ||
| echo "Found compartments: $COMPARTMENTS" | ||
| kubectl get skyhook legacy-interruption-budget-test -o yaml | ||
| exit 1 | ||
| fi | ||
| echo "✓ Synthetic __default__ compartment created" | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be done with an assertion I think, its cleaner and is how the other tests work.
https://kyverno.github.io/chainsaw/latest/operations/assert/#examples
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't there already a metrics_test.py?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe there is already a script to do this as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Never mind there isn't, but I see this same script a few times in different tests. I would just make a top-level script that can add or remove given labels and then call that in the tests when needed instead of writing a new script for each test.
| # Label all 8 nodes as production tier | ||
| echo "Labeling production nodes (0-7)..." | ||
| for i in {0..7}; do | ||
| kubectl label node ${WORKER_ARRAY[$i]} tier=production skyhook.nvidia.com/test-node=skyhooke2e --overwrite | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same thing here with these scripts, I see it a few times, I would make a somewhat generalized script at the top level and then call that in these tests.
| @if command -v podman >/dev/null 2>&1; then \ | ||
| echo "📝 Detected Podman - increasing kernel keyring quota..."; \ | ||
| if [ "$$(uname)" = "Darwin" ]; then \ | ||
| podman machine ssh sudo sysctl -w kernel.keys.maxkeys=20000 2>/dev/null || true; \ | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was this needed? Were the 15 nodes causing memory issues?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kinda, basically Podman has default keyring limits since each container uses keys for security stuff (i think). This just bumps those numbers up so the nodes don't get into a crashloop
        
          
                .github/workflows/operator-ci.yaml
              
                Outdated
          
        
      | steps: | ||
| - uses: actions/checkout@v4 | ||
| with: | ||
| fetch-tags: true | ||
| fetch-depth: 0 | ||
| - name: Setup Go ${{ env.GO_VERSION }} | ||
| uses: actions/setup-go@v5 | ||
| with: | ||
| go-version: ${{ env.GO_VERSION }} | ||
| cache-dependency-path: operator/go.sum | ||
| - name: Log in to the Container registry | ||
| uses: docker/login-action@v3 | ||
| with: | ||
| registry: ${{ env.REGISTRY }} | ||
| username: ${{ github.actor }} | ||
| password: ${{ secrets.GITHUB_TOKEN }} | ||
| - name: Create 15-node Kind Cluster | ||
| id: kind | ||
| uses: helm/kind-action@v1 | ||
| with: | ||
| version: v0.30.0 | ||
| node_image: kindest/node:v1.34.0 | ||
| config: k8s-tests/chainsaw/deployment-policy/kind-config.yaml | ||
| cluster_name: skyhook-dp-test | ||
| # Cache build tools and dependencies for faster builds | ||
| - name: Restore cached Binaries | ||
| id: cached-binaries | ||
| uses: actions/cache/restore@v4 | ||
| with: | ||
| key: ${{ env.GO_VERSION }}-${{ runner.os }}-${{ runner.arch }}-bin-${{ hashFiles('operator/deps.mk') }} | ||
| restore-keys: ${{ env.GO_VERSION }}-${{ runner.os }}-${{ runner.arch }}-bin- | ||
| path: | | ||
| ${{ github.workspace }}/operator/bin | ||
| ~/.cache/go-build | ||
| - name: Install dependencies | ||
| if: steps.cached-binaries.outputs.cache-hit != 'true' | ||
| run: | | ||
| cd operator | ||
| make install-deps | ||
| - name: Save cached Binaries | ||
| id: save-cached-binaries | ||
| if: steps.cached-binaries.outputs.cache-hit != 'true' | ||
| uses: actions/cache/save@v4 | ||
| with: | ||
| key: ${{ env.GO_VERSION }}-${{ runner.os }}-${{ runner.arch }}-bin-${{ hashFiles('operator/deps.mk') }} | ||
| path: | | ||
| ${{ github.workspace }}/operator/bin | ||
| ~/.cache/go-build | ||
| # Run deployment policy E2E tests | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a clone of the other test. Is there away we reduce this boilerplate? maybe yaml anchor? Also this seems to run pretty fast. I wonder if we make the other tests fast by breaking them up and making the parallel? maybe that should be a different PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could split unit/e2e/helm into separate matrix jobs, but with 4 K8s versions that'd create 12+ parallel jobs. Maybe we could split just the e2e tests into its own matrix as a follow-up PR?
Adds comprehensive end-to-end tests for DeploymentPolicy functionality, including multi-compartment rollouts, various strategies (Fixed, Linear, Exponential), and backwards compatibility with InterruptionBudget.
Changes
deployment-policy-testsjobTesting
Test Coverage