Skip to content

feat: system test framework proposal#111

Open
SamBarker wants to merge 10 commits into
kroxylicious:mainfrom
SamBarker:feat/system-test-framework-proposal
Open

feat: system test framework proposal#111
SamBarker wants to merge 10 commits into
kroxylicious:mainfrom
SamBarker:feat/system-test-framework-proposal

Conversation

@SamBarker

@SamBarker SamBarker commented May 20, 2026

Copy link
Copy Markdown
Member

Summary

  • Introduces a layered abstraction for system tests separating test intent (ProxyScenario) from deployment mechanism (ProxyFixture) with convergence gating (ProxyHandle)
  • Organises tests into four modules by concern: systemtest-feature, systemtest-operator, systemtest-webhook, systemtest-installer
  • Defines Installer as the primary downstream extension point — downstream varies by installation method, not proxy deployment
  • Feature tests are portable across all fixtures (CRD, manifest, sidecar, standalone) with no Kubernetes dependency

Test plan

  • Review proposal for internal consistency across all sections
  • Verify the abstraction model covers existing system test patterns
  • Confirm TCK model works for downstream distributors

🤖 Generated with Claude Code

SamBarker added 6 commits May 14, 2026 16:17
Introduces layered abstractions (ProxyScenario, FilterSpec, ProxyFixture,
ProxyHandle) that separate test intent from deployment mechanism, enabling
deployment-agnostic feature tests and test-first development.

Assisted-by: Claude claude-opus-4-6 <noreply@anthropic.com>
Signed-off-by: Sam Barker <sam@quadrocket.co.uk>
OperatorCapability now models the operator's externally observable state
via generation-based reconciliation observation rather than the checksum
annotation. Tests that assert on specific operator mechanisms (e.g.
checksum change detection) observe resource state directly.

Assisted-by: Claude claude-opus-4-6 <noreply@anthropic.com>
Signed-off-by: Sam Barker <sam@quadrocket.co.uk>
OperatorCapability handles convergence and deployment agnosticism.
Tests that assert on specific resource state (e.g. checksum annotations)
use an injected KubernetesClient to observe resources directly, keeping
the two concerns separate.

Assisted-by: Claude claude-opus-4-6 <noreply@anthropic.com>
Signed-off-by: Sam Barker <sam@quadrocket.co.uk>
Three distinct test categories become separate modules with different
compile-time dependencies: systemtest-feature (no K8s dependency),
systemtest-operator (K8s client and CRD types), and systemtest-installer
(one test per install method). All three are TCK-consumable.

Installer is a public interface — the primary downstream extension
point. CrdProxyFixture replaces OperatorProxyFixture to reflect that
the fixture uses the CRD API, not the operator's internals.

Assisted-by: Claude claude-opus-4-6 <noreply@anthropic.com>
Signed-off-by: Sam Barker <sam@quadrocket.co.uk>
Fixture/installer composition is extension-internal, driven by system
properties. Installer tests use a single smoke test — the test does
not vary between installers, CI provides the matrix.

Assisted-by: Claude claude-opus-4-6 <noreply@anthropic.com>
Signed-off-by: Sam Barker <sam@quadrocket.co.uk>
- Summary now mentions all four test categories and the Installer
- "Three categories/modules" → "Four" throughout
- Add systemtest-webhook to TCK module list with description
- Feature test example: remove namespace parameter, use kafkaClient
- Tags section: clarify module boundaries as primary separation
- Rejected alternatives: @operator tag "ensures present", not "selects fixture"
- Webhook section: rewrite as two-module story (installer + behaviour)
- Affected projects: add systemtest-webhook module
- Fix grammar: "An CrdProxyFixture" → "A CrdProxyFixture"

Assisted-by: Claude claude-opus-4-6 <noreply@anthropic.com>
Signed-off-by: Sam Barker <sam@quadrocket.co.uk>
@SamBarker SamBarker requested a review from a team as a code owner May 20, 2026 04:01
Assisted-by: Claude claude-opus-4-6 <noreply@anthropic.com>
Signed-off-by: Sam Barker <sam@quadrocket.co.uk>

2. **No convergence contract**: the framework does not define when the proxy is ready. Each test class independently polls for readiness, with varying strategies and varying reliability.

3. **Operator coupling**: every test implicitly requires the operator. Feature tests — which care only that a correctly-configured proxy is serving traffic — cannot run without the full operator installation. This conflates feature correctness with operator correctness and prevents fast local iteration.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose of a System Tests is having all installed as you want on a real environment. We decided to go only for Kubernetes installation as we didn't have enough resources and time to cover everything. Ideally, we should have all kind of installations, such as bare-metal, Kubernetes...

Totally agree with system tests decoupling with the installation. The test cases should not care about how Kroxylicious has been installed, just check what is needed. Having a quicker installation will easy the Test-first development as we don't need to depend on the operator.

Comment thread proposals/111-system-test-framework.md Outdated
- **Test-first development**: a developer writing a new filter can write a failing system test as the first commit of their feature branch, without reading framework documentation or asking QE for help.
- **Deployment-agnostic feature tests**: the same test runs against an operator-managed proxy, a manifest-managed proxy, or a Helm installation, with no changes to the test body.
- **Reliable convergence**: `proxyFixture.apply()` is a blocking call with a defined contract — when it returns, the proxy is serving the requested configuration. Manual polling disappears from test classes.
- **A TCK for downstream distributions**: downstream distributors implement `Installer` for their distribution and run upstream's test modules — feature, operator, installer, and webhook — without forking.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does TCK mean?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


A `ProxyScenario` describes the desired proxy configuration in deployment-agnostic terms; a `ProxyFixture` translates that into running infrastructure and blocks until convergence; the resulting `ProxyHandle` is a token of convergence that gates all subsequent interaction. An `Installer` — the primary downstream extension point — handles getting the operator, CRDs, and RBAC into the cluster independently of how proxies are deployed.

Feature tests become portable across deployment mechanisms — the same test runs against a CRD-deployed proxy, a manifest-managed proxy, a standalone process, or a downstream distribution — and cheap enough to write before the production code, as a specification.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside: I'd like to get Kroxylicious into Operatorhub soon. So having test that can are agnostic to their install methodology will be an enabler.

Comment thread proposals/111-system-test-framework.md Outdated

Every feature test class has a private `deployXxx()` method that reimplements the same builder/template pattern against the operator's CRD types. Adding a new optional parameter (e.g. `ExperimentalKmsConfig`) requires touching every one of them. Timing workarounds are scattered across test classes with comments pointing at unresolved issues. The convergence question — "is the proxy actually serving the configuration I just applied?" — is answered by ad hoc polling in each test class rather than by a framework-level contract.

This setup cost has a second-order effect: system tests are written after features merge, delegated to QE because they are too expensive for a developer to include in a feature PR. The test framework is the bottleneck, not the assertions.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's describe the problem without the Red Hat terminology.

These move to a `KubernetesClientCapability`:

```java
interface KafkaClient {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I dislike about the system tests is that all our client interactions are via a client CLI spawned in a separate process.

CLIs expose pretty basic operators: they boil down to produce or fetch from a topic. Consumer groups are possible but testing things like produce transactions + commit offsets is pretty much impossible. This will become a real impediment when we come to test things like Router implementations.

I think we need to be able to program the KafkaClient from the test so we are freer to express the fully richness of the Kafka.

A secondary issue is that relying on a CLI is slow: you pay the process startup time.

I have had an itch to scratch for a while which I think I've expressed to you both. Could we expose a common Java interface for Librdkafka and Sarama using Java Foreign Functions? We could even partially implement the Kafka Consumer, Producer and Admins client interfaces. The system tests could then be written in term of the Java Kafka API and exercise java/go/librdkafka using the same test.
Now we've got AI assistance I think writing a POC for this is in reach.

@k-wall

k-wall commented Jun 1, 2026

Copy link
Copy Markdown
Member

I like the ideas you are expressing in the proposal. I think it is the right way to go.

SamBarker added 3 commits June 2, 2026 14:12
Replace "delegated to QE" with community-neutral language and
frame setup cost as a perceived barrier rather than a statement of fact.

Assisted-by: Claude claude-opus-4-6 <noreply@anthropic.com>
Signed-off-by: Sam Barker <sam@quadrocket.co.uk>
Assisted-by: Claude claude-opus-4-6 <noreply@anthropic.com>
Signed-off-by: Sam Barker <sam@quadrocket.co.uk>
The same separation of concerns that motivates the fixture model
applies to how tests interact with Kafka. Three independent axes:
test intent (what), client driver (which implementation), and
execution environment (where).

The KafkaClient interface shows a richer target shape including
transactions and consumer group management, enabled by in-process
drivers (Java client, librdkafka/Sarama via FFI). CLI drivers
implement produce/consume only. Produce and consume are the
starting point; richer operations are the target.

Assisted-by: Claude claude-opus-4-6 <noreply@anthropic.com>
Signed-off-by: Sam Barker <sam@quadrocket.co.uk>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants