Skip to content

[Operator] Reconciler returns error when multiple IntelligentRoutes coexist in a namespace, blocking per-CR validation status updates #1908

@shraderdm

Description

@shraderdm

Summary

Reconciler.getIntelligentRoute (src/semantic-router/pkg/k8s/reconciler.go:222-237) returns an error when more than one IntelligentRoute exists in the watched namespace. The reconcile loop's caller in reconcile() short-circuits on this error and never reaches the validation code path that calls updateRouteStatus. The result: when a second IntelligentRoute is applied alongside a valid one, neither CR gets a status update for as long as both exist.

This was surfaced while attempting to write an end-to-end test that exercises the embedding-modality validator landed in #1895 against a real cluster. The unit test in src/semantic-router/pkg/k8s/reconciler_embedding_modality_test.go (also from #1895) hand-feeds the validator a single CR at a time via controller-runtime's fake client, so it passes. On a live cluster, the constraint blocks the same shape of test.

Reproduction

  1. Apply a valid IntelligentRoute to namespace default. It reconciles to Ready=True.
  2. Apply a second IntelligentRoute to the same namespace. (The test fixture used a queryModality: audio rule, but the constraint is independent of CR contents - any second CR triggers it.)
  3. Observe the reconciler's watchLoop. Every 5 seconds, reconcile() returns the same error.

Observed log output

reconciler.go:159 "Reconciliation check: failed to get IntelligentRoute: multiple IntelligentRoutes found in namespace default, expected exactly 1"

That message repeats indefinitely while both CRs exist. The reconciler does not attempt to validate either route; it does not call updateRouteStatus; both CRs sit with empty status: {}.

Why the e2e test gets stuck

The embedding-signal-modality-validation testcase in the draft of #1881 polls for Ready=False, Reason=ValidationFailed on the bad CR within a 60-second window. The reconciler can't reach the validator while the good CR is also present, so the bad CR's status remains empty for the full polling window and the test times out.

Receipts from a kind run on 2026-05-14:

Test FAIL: embedding-signal-modality-validation
  timed out (1m0s) waiting for Ready=False+Reason=ValidationFailed on intelligentroute/default/bad-audio-route
  last observed status="" reason=""

Suggested direction

Make Reconciler reconcile each IntelligentRoute in the namespace independently rather than asserting exactly-one. Concretely:

  • getIntelligentRoute becomes listIntelligentRoutes (or similar) returning the full slice.
  • The reconcile loop iterates the slice and calls validateAndUpdate per CR, so each gets its own Ready/ValidationFailed status.
  • The "exactly one" assumption appears to be load-bearing only inside validateAndUpdate where a single canonical config is emitted. That step would need a decision about what to do when multiple valid CRs disagree: merge, last-wins, deny-on-conflict, or only-one-active-at-a-time-with-an-explicit-selector. None of those decisions are made here - that's the design conversation this issue exists to start.

IntelligentPool is in the same shape today (line 216 has the equivalent "expected exactly 1" check). The same lift would apply if/when multi-pool composition becomes a real use case.

What this unblocks

Once per-CR reconcile is in, the embedding-signal-modality-validation e2e testcase in the draft branch multimodal-routing-e2e-readd can land cleanly. The shape is in e2e/testcases/embedding_signal_modality_validation.go and the bad fixture is in e2e/profiles/multimodal-routing/crds/intelligentroute-bad-audio.yaml on that branch (currently being dropped from #1881 ahead of merge for the reason above).

The validator coverage today is the controller-runtime fake-client unit test in src/semantic-router/pkg/k8s/reconciler_embedding_modality_test.go, which exercises all six validator branches. That coverage is sufficient until live-cluster validation becomes feasible.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status

    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions