feat: system test framework proposal by SamBarker · Pull Request #111 · kroxylicious/design

SamBarker · 2026-05-20T04:00:59Z

Summary

Introduces a layered abstraction for system tests separating test intent (ProxyScenario) from deployment mechanism (ProxyFixture) with convergence gating (ProxyHandle)
Organises tests into four modules by concern: systemtest-feature, systemtest-operator, systemtest-webhook, systemtest-installer
Defines Installer as the primary downstream extension point — downstream varies by installation method, not proxy deployment
Feature tests are portable across all fixtures (CRD, manifest, sidecar, standalone) with no Kubernetes dependency

Test plan

Review proposal for internal consistency across all sections
Verify the abstraction model covers existing system test patterns
Confirm TCK model works for downstream distributors

🤖 Generated with Claude Code

Introduces layered abstractions (ProxyScenario, FilterSpec, ProxyFixture, ProxyHandle) that separate test intent from deployment mechanism, enabling deployment-agnostic feature tests and test-first development. Assisted-by: Claude claude-opus-4-6 <noreply@anthropic.com> Signed-off-by: Sam Barker <sam@quadrocket.co.uk>

OperatorCapability now models the operator's externally observable state via generation-based reconciliation observation rather than the checksum annotation. Tests that assert on specific operator mechanisms (e.g. checksum change detection) observe resource state directly. Assisted-by: Claude claude-opus-4-6 <noreply@anthropic.com> Signed-off-by: Sam Barker <sam@quadrocket.co.uk>

OperatorCapability handles convergence and deployment agnosticism. Tests that assert on specific resource state (e.g. checksum annotations) use an injected KubernetesClient to observe resources directly, keeping the two concerns separate. Assisted-by: Claude claude-opus-4-6 <noreply@anthropic.com> Signed-off-by: Sam Barker <sam@quadrocket.co.uk>

Three distinct test categories become separate modules with different compile-time dependencies: systemtest-feature (no K8s dependency), systemtest-operator (K8s client and CRD types), and systemtest-installer (one test per install method). All three are TCK-consumable. Installer is a public interface — the primary downstream extension point. CrdProxyFixture replaces OperatorProxyFixture to reflect that the fixture uses the CRD API, not the operator's internals. Assisted-by: Claude claude-opus-4-6 <noreply@anthropic.com> Signed-off-by: Sam Barker <sam@quadrocket.co.uk>

Fixture/installer composition is extension-internal, driven by system properties. Installer tests use a single smoke test — the test does not vary between installers, CI provides the matrix. Assisted-by: Claude claude-opus-4-6 <noreply@anthropic.com> Signed-off-by: Sam Barker <sam@quadrocket.co.uk>

- Summary now mentions all four test categories and the Installer - "Three categories/modules" → "Four" throughout - Add systemtest-webhook to TCK module list with description - Feature test example: remove namespace parameter, use kafkaClient - Tags section: clarify module boundaries as primary separation - Rejected alternatives: @operator tag "ensures present", not "selects fixture" - Webhook section: rewrite as two-module story (installer + behaviour) - Affected projects: add systemtest-webhook module - Fix grammar: "An CrdProxyFixture" → "A CrdProxyFixture" Assisted-by: Claude claude-opus-4-6 <noreply@anthropic.com> Signed-off-by: Sam Barker <sam@quadrocket.co.uk>

Assisted-by: Claude claude-opus-4-6 <noreply@anthropic.com> Signed-off-by: Sam Barker <sam@quadrocket.co.uk>

franvila · 2026-05-29T09:49:09Z

+
+2. **No convergence contract**: the framework does not define when the proxy is ready. Each test class independently polls for readiness, with varying strategies and varying reliability.
+
+3. **Operator coupling**: every test implicitly requires the operator. Feature tests — which care only that a correctly-configured proxy is serving traffic — cannot run without the full operator installation. This conflates feature correctness with operator correctness and prevents fast local iteration.


The purpose of a System Tests is having all installed as you want on a real environment. We decided to go only for Kubernetes installation as we didn't have enough resources and time to cover everything. Ideally, we should have all kind of installations, such as bare-metal, Kubernetes...

Totally agree with system tests decoupling with the installation. The test cases should not care about how Kroxylicious has been installed, just check what is needed. Having a quicker installation will easy the Test-first development as we don't need to depend on the operator.

franvila · 2026-05-29T09:55:18Z

+- **Test-first development**: a developer writing a new filter can write a failing system test as the first commit of their feature branch, without reading framework documentation or asking QE for help.
+- **Deployment-agnostic feature tests**: the same test runs against an operator-managed proxy, a manifest-managed proxy, or a Helm installation, with no changes to the test body.
+- **Reliable convergence**: `proxyFixture.apply()` is a blocking call with a defined contract — when it returns, the proxy is serving the requested configuration. Manual polling disappears from test classes.
+- **A TCK for downstream distributions**: downstream distributors implement `Installer` for their distribution and run upstream's test modules — feature, operator, installer, and webhook — without forking.


What does TCK mean?

Sam's borrowing the Java parlance https://en.wikipedia.org/wiki/Technology_Compatibility_Kit

k-wall · 2026-06-01T09:26:27Z

+
+A `ProxyScenario` describes the desired proxy configuration in deployment-agnostic terms; a `ProxyFixture` translates that into running infrastructure and blocks until convergence; the resulting `ProxyHandle` is a token of convergence that gates all subsequent interaction. An `Installer` — the primary downstream extension point — handles getting the operator, CRDs, and RBAC into the cluster independently of how proxies are deployed.
+
+Feature tests become portable across deployment mechanisms — the same test runs against a CRD-deployed proxy, a manifest-managed proxy, a standalone process, or a downstream distribution — and cheap enough to write before the production code, as a specification.


Aside: I'd like to get Kroxylicious into Operatorhub soon. So having test that can are agnostic to their install methodology will be an enabler.

k-wall · 2026-06-01T09:28:33Z

+
+Every feature test class has a private `deployXxx()` method that reimplements the same builder/template pattern against the operator's CRD types. Adding a new optional parameter (e.g. `ExperimentalKmsConfig`) requires touching every one of them. Timing workarounds are scattered across test classes with comments pointing at unresolved issues. The convergence question — "is the proxy actually serving the configuration I just applied?" — is answered by ad hoc polling in each test class rather than by a framework-level contract.
+
+This setup cost has a second-order effect: system tests are written after features merge, delegated to QE because they are too expensive for a developer to include in a feature PR. The test framework is the bottleneck, not the assertions.


Let's describe the problem without the Red Hat terminology.

k-wall · 2026-06-01T09:57:10Z

+These move to a `KubernetesClientCapability`:
+
+```java
+interface KafkaClient {


One thing I dislike about the system tests is that all our client interactions are via a client CLI spawned in a separate process.

CLIs expose pretty basic operators: they boil down to produce or fetch from a topic. Consumer groups are possible but testing things like produce transactions + commit offsets is pretty much impossible. This will become a real impediment when we come to test things like Router implementations.

I think we need to be able to program the KafkaClient from the test so we are freer to express the fully richness of the Kafka.

A secondary issue is that relying on a CLI is slow: you pay the process startup time.

I have had an itch to scratch for a while which I think I've expressed to you both. Could we expose a common Java interface for Librdkafka and Sarama using Java Foreign Functions? We could even partially implement the Kafka Consumer, Producer and Admins client interfaces. The system tests could then be written in term of the Java Kafka API and exercise java/go/librdkafka using the same test.
Now we've got AI assistance I think writing a POC for this is in reach.

k-wall · 2026-06-01T09:59:35Z

I like the ideas you are expressing in the proposal. I think it is the right way to go.

Replace "delegated to QE" with community-neutral language and frame setup cost as a perceived barrier rather than a statement of fact. Assisted-by: Claude claude-opus-4-6 <noreply@anthropic.com> Signed-off-by: Sam Barker <sam@quadrocket.co.uk>

Assisted-by: Claude claude-opus-4-6 <noreply@anthropic.com> Signed-off-by: Sam Barker <sam@quadrocket.co.uk>

The same separation of concerns that motivates the fixture model applies to how tests interact with Kafka. Three independent axes: test intent (what), client driver (which implementation), and execution environment (where). The KafkaClient interface shows a richer target shape including transactions and consumer group management, enabled by in-process drivers (Java client, librdkafka/Sarama via FFI). CLI drivers implement produce/consume only. Produce and consume are the starting point; richer operations are the target. Assisted-by: Claude claude-opus-4-6 <noreply@anthropic.com> Signed-off-by: Sam Barker <sam@quadrocket.co.uk>

henryZrncik

Looks Great to me.
I think most of the proposed solution suits perfectly. I think to to gret extent current situation of System tests addressed mainly testing assuming operator and subsequently using resource manager mostly on test level instead of inside function calls. Because there is also mention of different enviroments (minikube, ocp etc) we need to also address the need to make http calls and client calls regardless of acessing local cluster or remote, but this represents minimal changes that are to be part of this. 👍

SamBarker added 6 commits May 14, 2026 16:17

SamBarker requested a review from a team as a code owner May 20, 2026 04:01

Rename proposal to use PR number

a68aec3

Assisted-by: Claude claude-opus-4-6 <noreply@anthropic.com> Signed-off-by: Sam Barker <sam@quadrocket.co.uk>

franvila reviewed May 29, 2026

View reviewed changes

k-wall reviewed Jun 1, 2026

View reviewed changes

SamBarker added 3 commits June 2, 2026 14:12

Expand TCK acronym at first use

4a8abb3

Assisted-by: Claude claude-opus-4-6 <noreply@anthropic.com> Signed-off-by: Sam Barker <sam@quadrocket.co.uk>

henryZrncik approved these changes Jul 1, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat: system test framework proposal#111

feat: system test framework proposal#111
SamBarker wants to merge 10 commits into
kroxylicious:mainfrom
SamBarker:feat/system-test-framework-proposal

SamBarker commented May 20, 2026 •

edited by github-actions Bot

Loading

Uh oh!

franvila May 29, 2026

Uh oh!

franvila May 29, 2026

Uh oh!

k-wall Jun 1, 2026

Uh oh!

k-wall Jun 1, 2026

Uh oh!

k-wall Jun 1, 2026

Uh oh!

k-wall Jun 1, 2026

Uh oh!

k-wall commented Jun 1, 2026

Uh oh!

henryZrncik left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants


		2. No convergence contract: the framework does not define when the proxy is ready. Each test class independently polls for readiness, with varying strategies and varying reliability.

		3. Operator coupling: every test implicitly requires the operator. Feature tests — which care only that a correctly-configured proxy is serving traffic — cannot run without the full operator installation. This conflates feature correctness with operator correctness and prevents fast local iteration.


		A `ProxyScenario` describes the desired proxy configuration in deployment-agnostic terms; a `ProxyFixture` translates that into running infrastructure and blocks until convergence; the resulting `ProxyHandle` is a token of convergence that gates all subsequent interaction. An `Installer` — the primary downstream extension point — handles getting the operator, CRDs, and RBAC into the cluster independently of how proxies are deployed.

		Feature tests become portable across deployment mechanisms — the same test runs against a CRD-deployed proxy, a manifest-managed proxy, a standalone process, or a downstream distribution — and cheap enough to write before the production code, as a specification.


		Every feature test class has a private `deployXxx()` method that reimplements the same builder/template pattern against the operator's CRD types. Adding a new optional parameter (e.g. `ExperimentalKmsConfig`) requires touching every one of them. Timing workarounds are scattered across test classes with comments pointing at unresolved issues. The convergence question — "is the proxy actually serving the configuration I just applied?" — is answered by ad hoc polling in each test class rather than by a framework-level contract.

		This setup cost has a second-order effect: system tests are written after features merge, delegated to QE because they are too expensive for a developer to include in a feature PR. The test framework is the bottleneck, not the assertions.

Uh oh!

Uh oh!

Conversation

SamBarker commented May 20, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

franvila May 29, 2026

Choose a reason for hiding this comment

Uh oh!

franvila May 29, 2026

Choose a reason for hiding this comment

Uh oh!

k-wall Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

k-wall Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

k-wall Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

k-wall Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

k-wall commented Jun 1, 2026

Uh oh!

henryZrncik left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

SamBarker commented May 20, 2026 •

edited by github-actions Bot

Loading