-
Notifications
You must be signed in to change notification settings - Fork 616
draft: fast e2e tests #12993
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
draft: fast e2e tests #12993
Conversation
| // Install kgateway | ||
| testInstallation.InstallKgatewayFromLocalChart(ctx) | ||
|
|
||
| common.SetupBaseConfig(ctx, t, testInstallation, filepath.Join("manifests", "agent-gateway-base.yaml")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Want to make you aware of this PR which added nightly e2e tests with different Gateway API versions. It mostly works by tagging tests with minimum versions, but it also introduced a mechanism to use different manifests for suite setup. The main use case is to switch between a gateway that defines "allowedListeners" and one that doesn't based on the API version.
I don't think its relevant for agentgateway at the moment, but may be in the future, and will be if we extend these improvements to the kgateway tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR optimizes e2e test execution by implementing several performance improvements, reducing test time from ~36s to ~0.2s for sample tests. The main optimization is deploying a single shared Gateway and backend that is reused across tests, replacing individual per-test deployments. Additional improvements include replacing kubectl with Istio's native apply/delete, using native Go HTTP requests instead of curl pods, and optimizing helper functions.
Key changes:
- Shared test infrastructure with reusable Gateway/backend in agentgateway-base namespace
- Native Go HTTP client implementation replacing curl pods
- Istio client integration for faster resource operations
- Test manifest updates to reference shared resources
Reviewed changes
Copilot reviewed 89 out of 92 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| test/e2e/common/base.go | New shared gateway setup and native HTTP client |
| test/e2e/tests/base/base_suite.go | Replaced kubectl with Istio apply/delete |
| test/e2e/testutils/cluster/kind.go | Added IstioClient integration |
| pkg/utils/requestutils/curl/native_request.go | New native Go HTTP request implementation |
| test/e2e/tests/manifests/agent-gateway-base.yaml | New shared base resources |
| Various test manifests | Updated to use shared agentgateway-base namespace |
| pkg/agentgateway/translator/ | TLS/mTLS support improvements |
| go.mod | Local replace directive and dependency updates |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review for a chance to win a $100 gift card. Take the survey.
test/e2e/tests/base/base_suite.go
Outdated
| //for _, manifest := range testCase.Manifests { | ||
| // log.Errorf("howardjohn: apply..") | ||
| err := s.TestInstallation.ClusterContext.IstioClient.ApplyYAMLFiles("", testCase.Manifests...) | ||
| gomega.Expect(err).NotTo(gomega.HaveOccurred()) | ||
| //gomega.Eventually(func() error { | ||
| // err := s.TestInstallation.Actions.Kubectl().ApplyFile(s.Ctx, manifest) | ||
| // return err | ||
| //}, 10*time.Second, 1*time.Second).Should(gomega.Succeed(), "can apply "+manifest) | ||
| //} |
Copilot
AI
Nov 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove commented-out debug code before merging. The commented code includes a debug log statement and the old implementation, which should be cleaned up for production code.
test/e2e/tests/base/base_suite.go
Outdated
| // TODO: this is a bottleneck. Doing it async is great but we need barriers to make sure if we do | ||
| // delete(foo); add(foo) we end up in the right state. | ||
| // We also need to make sure we finish the deletion before exiting the process | ||
| //go func() { | ||
| err := s.TestInstallation.ClusterContext.IstioClient.DeleteYAMLFiles("", testCase.Manifests...) | ||
| gomega.Expect(err).NotTo(gomega.HaveOccurred()) | ||
| //}() | ||
| return |
Copilot
AI
Nov 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Address the TODO comment about async deletion barriers. The commented async implementation and early return suggest incomplete work that needs proper synchronization before enabling.
test/e2e/tests/base/base_suite.go
Outdated
| //for _, manifest := range testCase.Manifests { | ||
| //gomega.Eventually(func() error { | ||
| // err := s.TestInstallation.Actions.Kubectl().DeleteFileSafe(s.Ctx, manifest) | ||
| // return err | ||
| //}, 10*time.Second, 1*time.Second).Should(gomega.Succeed(), "can delete "+manifest) | ||
| //} |
Copilot
AI
Nov 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove commented-out old delete implementation code. This dead code should be cleaned up before merging.
go.mod
Outdated
| sigs.k8s.io/kind | ||
| ) | ||
|
|
||
| replace github.com/agentgateway/agentgateway => /home/john/solo/agentgateway |
Copilot
AI
Nov 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove local replace directive that points to a developer's local filesystem path. This will break builds for anyone else and should not be committed.
| replace github.com/agentgateway/agentgateway => /home/john/solo/agentgateway |
| //agentgatewaySuiteRunner.Register("A2A", a2a.NewTestingSuite) | ||
| //agentgatewaySuiteRunner.Register("BasicRouting", agentgateway.NewTestingSuite) |
Copilot
AI
Nov 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Multiple test suites are commented out. If these are intentionally disabled for this draft PR, add a comment explaining why or create a tracking issue to re-enable them with the new optimizations.
| // Handle SNI with custom host resolution | ||
| // TODO | ||
| if c.sni != "" { | ||
| panic("sni is not implemented") | ||
| } |
Copilot
AI
Nov 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SNI support is not implemented and will panic at runtime if used. Either implement this feature or remove SNI as an option until it can be properly supported.
| kubeCtx := os.Getenv(testutils.KubeCtx) | ||
| if len(kubeCtx) == 0 { | ||
| kubeCtx = fmt.Sprintf("kind-%s", clusterName) | ||
| } | ||
|
|
||
| restCfg, err := kubeutils.GetRestConfigWithKubeContext(kubeCtx) |
Copilot
AI
Nov 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the default context name fallback logic. If KubeCtx environment variable is empty, this will attempt to use an empty context which may fail. Consider restoring the default logic or documenting the requirement to set KubeCtx.
| case *gatewayx.XListenerSet: | ||
| return wellknown.XListenerSetGVK | ||
| default: | ||
| panic("Uknown GVK") |
Copilot
AI
Nov 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Corrected spelling of 'Unknown' to 'Unknown'.
| panic("Uknown GVK") | |
| panic("Unknown GVK") |
f252461 to
0a9f683
Compare
|
how do you make sure there's no test pollution? i.e. that a test passes and/or flakes because some policy from previous test didn't propagate yet for whatever reason (maybe async delete took its time)? for example, when testing proxy protocol, no other test will pass until the proxy protocol is fully removed from the data plane |
FWIW I disabled async delete, I don't think its worth the benefit. However, you can still have pollution due to eventual consistency (as could you before this change, though it was much less likely). In my experience in Istio we would occasionally have this issue. Usually it shows up pretty quickly as flakes that are then diagnosed as "test was really broken and incorrectly passed sometimes". In kgw I think we also randomize test order which would make the chance of it correctly failing much much higher. If we need to, we can add some additional barriers to check the config is cleaned up end to end (e.g. send requests that should fail?). Or, we can also make the tests use non-overlapping configs like distinct hostnames. Given 90% of the tests are just "apply a policy, send some traffic" we could also just have the framework entirely manage an assigned hostname too. |
71c90d3 to
b172b1e
Compare
e257db3 to
207ac79
Compare
Signed-off-by: John Howard <[email protected]>
Signed-off-by: John Howard <[email protected]>
Signed-off-by: John Howard <[email protected]>
df6efa1 to
f68db2a
Compare
Signed-off-by: John Howard <[email protected]>
Signed-off-by: John Howard <[email protected]>
Signed-off-by: John Howard <[email protected]>
Description
Read #12981 for background.
This PR optimizes 2 e2e tests. I randomly picked these as samples, there is nothing special about these tests, and the same could be applied to most or all of our tests.
Before this PR:
After this PR:
Changes I applied:
helm statuswith native code: Cuts 150ms off the test suiteedit: did some more.
--- PASS: TestAgentgatewayIntegration (1.15s)
--- PASS: TestAgentgatewayIntegration/ApiKeyAuth (0.16s)
--- PASS: TestAgentgatewayIntegration/ApiKeyAuth/TestGatewayPolicy (0.05s)
--- PASS: TestAgentgatewayIntegration/ApiKeyAuth/TestRoutePolicy (0.07s)
--- PASS: TestAgentgatewayIntegration/JwtAuth (0.21s)
--- PASS: TestAgentgatewayIntegration/JwtAuth/TestGatewayPolicy (0.04s)
--- PASS: TestAgentgatewayIntegration/JwtAuth/TestGatewayPolicyWithRbac (0.04s)
--- PASS: TestAgentgatewayIntegration/JwtAuth/TestRoutePolicy (0.04s)
--- PASS: TestAgentgatewayIntegration/JwtAuth/TestRoutePolicyWithRbac (0.04s)
--- PASS: TestAgentgatewayIntegration/CSRF (0.10s)
--- PASS: TestAgentgatewayIntegration/CSRF/TestGatewayLevelCSRF (0.06s)
--- PASS: TestAgentgatewayIntegration/RBAC (0.08s)
--- PASS: TestAgentgatewayIntegration/RBAC/TestRBACHeaderAuthorization (0.01s)
--- PASS: TestAgentgatewayIntegration/Transformation (0.11s)
--- PASS: TestAgentgatewayIntegration/Transformation/TestGatewayWithTransformedRoute (0.01s)
--- PASS: TestAgentgatewayIntegration/BackendTLSPolicy (0.27s)
--- PASS: TestAgentgatewayIntegration/BackendTLSPolicy/TestBackendTLSPolicyAndStatus (0.12s)
--- PASS: TestAgentgatewayIntegration/BasicAuth (0.20s)
--- PASS: TestAgentgatewayIntegration/BasicAuth/TestGatewayPolicy (0.04s)
--- PASS: TestAgentgatewayIntegration/BasicAuth/TestRoutePolicy (0.12s)
End to end full GHA flow: 8minutes
Actual time of testing: 23.956s
Change Type
/kind cleanup
Changelog
Additional Notes