fix(eks): stabilize UDP NetworkPolicy e2e coverage#2666
Conversation
Greptile SummaryThis PR stabilises the UDP NetworkPolicy E2E test on EKS by fixing two independent root causes: missing cross-node UDP SG rules and a flaky one-shot listener pattern. The approach replaces one-shot
Confidence Score: 5/5Safe to merge — all three changes are scoped to test infrastructure and do not touch production paths All changes are confined to EKS test infra, a test-only workload manifest, and the E2E spec file. The logic changes are well-reasoned and validated in CI. Two minor observations exist — an overly broad SG port range and an early-exit condition in waitForUdpLog that is never reached — but neither causes test failures or behavioral regressions. .github/test-infra/aws/eks/cluster.tf (UDP SG port range) and test/vitest/network.spec.ts (waitForUdpLog early-exit logic) Important Files Changed
Sequence DiagramsequenceDiagram
participant Test as E2E Test
participant Client as udp-echo-client pod
participant Server as udp-echo-server pod
participant Log as /tmp/udp.log
Note over Server,Log: Container runs persistent nc loop
Server->>Log: nc -u -l -p 5000 -w 1 loop
Test->>Server: clearUdpLog truncate log
Test->>Client: execInPod send 3 pings to serverIP port 5000
Client->>Server: UDP ping x3
Server->>Log: append ping per nc session
loop poll every 250ms up to 5s
Test->>Log: readUdpLog
Log-->>Test: log content
end
Test->>Test: expectUdpPingLog all lines equal ping
Note over Test: Denied path
Test->>Server: clearUdpLog
Test->>Client: execInPod from deny-all namespace
Client--xServer: blocked by NetworkPolicy
loop poll every 250ms up to 2s
Test->>Log: readUdpLog
Log-->>Test: empty string
end
Test->>Test: expect deniedLog toBe empty
Reviews (2): Last reviewed commit: "final cleanup" | Re-trigger Greptile |
|
@greptileai review this PR |
jasonwashburn
left a comment
There was a problem hiding this comment.
One nit and a copyright update, otherwise everything looks good to me!
e50b377
jasonwashburn
left a comment
There was a problem hiding this comment.
Looks good (pending CI)
🤖 I have created a release *beep* *boop* --- ## [1.5.0](v1.4.0...v1.5.0) (2026-05-26) ### Bug Fixes * avoid virtual threads in Keycloak ([#2686](#2686)) ([e07ddb2](e07ddb2)) * broken grafana tests ([#2696](#2696)) ([202c8ac](202c8ac)) * **eks:** stabilize UDP NetworkPolicy e2e coverage ([#2666](#2666)) ([3d45af4](3d45af4)) ### Miscellaneous * add 1.5.0 release notes ([#2700](#2700)) ([197dc46](197dc46)) * **ci:** add test to verify loki able to flush to s3 ([#2673](#2673)) ([4783ffb](4783ffb)) * **deps:** migrate unicorn flavor images from RapidFort to Chainguard ([#2650](#2650)) ([b0d4c87](b0d4c87)) * **deps:** update grafana ([#2584](#2584)) ([f07a6a7](f07a6a7)) * **deps:** update grafana to v2.7.3 ([#2691](#2691)) ([0aaf351](0aaf351)) * **deps:** update iac support dependencies to v2.0.1 ([#2677](#2677)) ([40cf6a6](40cf6a6)) * **deps:** update iac-support-deps ([#2670](#2670)) ([ab1b90d](ab1b90d)) * **deps:** update loki ([#2586](#2586)) ([396bb53](396bb53)) * **deps:** update loki to v2.7.3 ([#2690](#2690)) ([6b773ed](6b773ed)) * **deps:** update prometheus-stack ([#2644](#2644)) ([1bfbfaf](1bfbfaf)) * **deps:** update prometheus-stack ([#2684](#2684)) ([1fae685](1fae685)) * **deps:** update prometheus-stack ([#2687](#2687)) ([ceab924](ceab924)) * **deps:** update support-deps ([#2683](#2683)) ([f725d10](f725d10)) * **deps:** update support-deps ([#2689](#2689)) ([83622c3](83622c3)) * **deps:** update velero ([#2678](#2678)) ([70f0106](70f0106)) * **docs:** add legacy upgrade notes and local demo deploy warning ([#2667](#2667)) ([ded7c08](ded7c08)) * updating cert bundle ([#2675](#2675)) ([7da8b6c](7da8b6c)) ### Documentation * add time-sync prereqs callout in docs ([#2679](#2679)) ([3d45a2c](3d45a2c)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Description
Problem
The UDP NetworkPolicy E2E test was flaky on EKS for two independent reasons:
Cross-node UDP traffic was not actually allowed by the EKS node security group configuration.
The EKS test infra only allowed node-to-node UDP/53 by default, and the earlier SG change used the cluster security group as the source instead of a self-referencing node security group rule. That left cross-node UDP/5000 traffic unreliable.
The test itself depended on a one-shot UDP listener started via execInPod.
The old test launched nc -u -l -p 5000 at assertion time and immediately raced client sends against listener startup and exec WebSocket timing. That made the test flaky even when networking was healthy.
Fix
This PR fixes both issues with the smallest set of real changes:
Why this approach
This keeps the test focused on what UDS Core owns:
It avoids unrelated sources of flake:
Validation
Validated successfully across CI flavors after these changes.
Type of change
Checklist before merging