fix: close override snapshot transition race on update by ytimocin · Pull Request #691 · kubefleet-dev/kubefleet

Yetkin Timocin (ytimocin) · 2026-05-05T14:16:35Z

Description of your changes

When a ClusterResourceOverride or ResourceOverride is updated, the controller previously flipped is-latest-snapshot=false on the old snapshot before creating the new one. The rollout snapshot watcher only handles Create/Generic events (no UpdateFunc), so a crash between flip and create left rollout un-enqueued until the next resync (6 h hub-agent default) or parent-CR mutation, and FetchAllMatchingOverridesForResourceSnapshot would silently drop the affected override.

This PR reorders the controller to create-first, demote-after, with two additional safety steps so transient duplicate latest=true is benign:

New snapshot is Created with latest=true first; the previous snapshot is flipped to false afterwards.
On AlreadyExists, the existing snapshot is fetched via mgr.GetAPIReader() (uncached) and its OverrideHash verified before proceeding. Mismatches surface as a Warning event on the parent CRO/RO and an ExpectedBehaviorError (no stack trace, requeues for transient causes like etcd restore).
A new cleanupStaleLatestSiblings helper flips any older sibling that still carries latest=true. Runs on both the hash-match and new-snapshot paths.
FetchAllMatchingOverridesForResourceSnapshot deduplicates at read time, picking the highest OverrideIndexLabel per parent (CRO key: tracking label; RO key: namespace + tracking label). Malformed labels log loudly and skip rather than blocking the entire rollout hot path.
IsNotFound on the demote Update is tolerated so the sibling audit still runs after a concurrent prune or parent deletion.

Fixes #

I have:

Associated this change with a known KubeFleet Issue (Bug, Feature, etc).
Run make reviewable to ensure this PR is ready for review.

How has this code been tested

New unit tests in pkg/utils/overrider/overrider_test.go cover the generic dedup helper (CRO and RO, including the namespace-collision and malformed-label cases) and a FetchAllMatchingOverridesForResourceSnapshot integration case with two latest=true snapshots for the same parent.
New integration tests in pkg/controllers/overrider/common_test.go cover the sibling-audit helper (steady-state no-op, post-crash recovery) and the AlreadyExists recovery branch (matching hash → demote proceeds; mismatched hash → ExpectedBehaviorError + Warning event, no demote).
All existing override controller integration tests (envtest) and pkg/utils/overrider unit tests pass.

Special notes for your reviewer

Placement == nil rollout watcher skip (pkg/controllers/rollout/controller.go:849, :882) and the same flip-then-create antipattern in pkg/utils/controller/resource_snapshot_resolver.go are left out of scope; both deserve their own follow-up issues.
Long-term we should consider migrating to a status.latestSnapshotName pointer on the parent CRO/RO so labels stop being load-bearing for "current" semantics; that would be a separate, larger change.

Copilot

Pull request overview

This PR fixes a race in the override controllers’ snapshot “latest” transition by switching from demote-old-then-create-new to create-new-first-then-demote-old, preventing a crash window where no snapshot is labeled is-latest-snapshot=true and rollout would not be enqueued until a long resync. To make the brief “double latest=true” window safe, it adds read-time deduplication in the overrider hot path and introduces a sibling-audit helper to clean up stale latest=true labels.

Changes:

Reordered CRO/RO snapshot updates to create-first, demote-after, including an AlreadyExists recovery path that verifies snapshot hash via an uncached reader and emits a Warning event on mismatch.
Added cleanupStaleLatestSiblings to flip stale latest=true labels on older snapshots, and added read-time dedup in FetchAllMatchingOverridesForResourceSnapshot.
Added/updated unit + integration tests for dedup logic and sibling cleanup behaviors; updated controller wiring to pass an uncached reader.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
pkg/utils/overrider/overrider.go	Deduplicates “latest” override snapshots at read time to tolerate transient duplicates.
pkg/utils/overrider/overrider_test.go	Adds unit tests for generic dedup + a fetch test covering duplicate latest snapshots.
pkg/controllers/overrider/common.go	Adds `UncachedReader`, event recorder field, snapshot pruning slice trimming, and sibling cleanup helper.
pkg/controllers/overrider/common_test.go	Adds integration tests for sibling cleanup and AlreadyExists recovery branches.
pkg/controllers/overrider/clusterresource_controller.go	Implements create-first/demote-after ordering + AlreadyExists hash verification + event emission.
pkg/controllers/overrider/resource_controller.go	Same ordering + recovery logic for namespaced overrides; sets up recorder.
pkg/controllers/overrider/suite_test.go	Wires `UncachedReader` into envtest manager setup.
cmd/hubagent/workload/setup.go	Wires `UncachedReader` into hub-agent override controller setup.

codecov · 2026-05-05T14:52:45Z

Codecov Report

❌ Patch coverage is 73.10345% with 39 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
pkg/controllers/overrider/resource_controller.go	57.50%	13 Missing and 4 partials ⚠️
...ontrollers/overrider/clusterresource_controller.go	62.16%	10 Missing and 4 partials ⚠️
pkg/controllers/overrider/common.go	71.42%	7 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

Yetkin Timocin (ytimocin) · 2026-05-05T18:32:12Z

E2E validation on Kind clusters

Ran four targeted edge-case tests against this branch on a fresh make setup-clusters (hub + 3 members).

Setup

make setup-clusters MEMBER_CLUSTER_COUNT=3 (fresh build, hub-agent rev cc4657694)
make join-members
4 Kind clusters on Apple Silicon (Mac), envtest already covers the in-process unit/integration paths

Edge cases

#	Test	Verifies	Result
1	Hub-agent restart mid-reconcile	Recovery from a crash during snapshot churn (the original bug scenario)	✅ PASS
2	10 concurrent CRO updates	Workqueue contention doesn't break invariants	✅ PASS
3	Inject duplicate `latest=true` sibling	`cleanupStaleLatestSiblings` audit fires on next reconcile	✅ PASS
4	RO same name in two namespaces	`(namespace, OverrideTrackingLabel)` dedup key isolates parents	✅ PASS

Sample results

Test 1 — after force-killing all hub-agent pods (--grace-period=0 --force) during 3 rapid CRO updates:

NAME                       LATEST   INDEX
edge-restart-cro-53101-0   false    0
edge-restart-cro-53101-1   false    1
edge-restart-cro-53101-2   true     2

Exactly one latest=true at the highest index — the create-first ordering recovered cleanly.

Test 2 — 10 patches submitted in parallel via &:

edge-concurrent-cro-53313-1  false  1
edge-concurrent-cro-53313-10 true   10   ← single latest
edge-concurrent-cro-53313-2  false  2
…
edge-concurrent-cro-53313-9  false  9

Test 3 — manually re-flipped snapshot 0 to latest=true to simulate a post-crash duplicate, then patched the CRO. Hub-agent log:

overrider/clusterresource_controller.go:187  "Created new overrideSnapshot"  newOverrideSnapshot="edge-dup-cro-53784-2"
overrider/clusterresource_controller.go:203  "Marked previous overrideSnapshot as inactive"  overrideSnapshot="edge-dup-cro-53784-1"
overrider/common.go:198                       "Flipped stale latest sibling to false"        overrideSnapshot="edge-dup-cro-53784-0"

The new audit helper fired exactly once on the injected duplicate.

Test 4 — two ResourceOverrides named edge-ro-shared-… in two different namespaces. Updating one produced index-1 in that namespace only; the other namespace's RO snapshot stayed at index 0.

Hub-agent log scan across all four runs

Created new overrideSnapshot: 19 occurrences (create-first ordering live)
Marked previous overrideSnapshot as inactive: 14 occurrences (demote-after live)
Flipped stale latest sibling to false: 1 occurrence (audit helper from test 3)
OverrideSnapshotHashMismatch events: 0
recovering from prior partial reconcile: 0
Failed to create new overrideSnapshot: 0
Failed to set the isLatestSnapshot label to false: 0
panics: 0

Side observation

The override controller uses predicate.GenerationChangedPredicate{}, so reconcile only fires on spec changes — not annotation/label edits. This means a duplicate latest=true left behind by a crash will persist until either (a) the next CRO/RO spec change, or (b) the informer's full-list resync (6 h hub-agent default) emits a synthetic Create event. Read-time dedup in FetchAllMatchingOverridesForResourceSnapshot masks this in the meantime, so callers always see the correct authoritative snapshot — just worth noting that the on-disk cleanup is gated on the next reconcile, not wall-clock immediate.

Baseline `make e2e-tests` caveat

The full suite under -p (9 parallel workers vs 4 Kind clusters on a single Mac) produced ~99 failures distributed uniformly across all placement test types. Failures were 10-second Eventually() timeouts — environmental, not regressions. No override-controller errors in hub-agent logs during the run. Should not affect CI/dedicated runners with proper resource isolation.

Wei Weng (weng271190436) · 2026-05-05T21:32:18Z

+		}
+		// AlreadyExists here normally means a prior reconcile succeeded the Create but failed to
+		// flip the old snapshot. Verify the existing object's hash before proceeding so we don't
+		// silently promote stale content (etcd restore from backup, manual edit, future hash bug).


Have a question: why would "etcd restore from backup" result in AlreadyExists path?

What does "future hash bug" mean? Does it mean hash function change?

For the 1st question, I was trying to be extra defensive by thinking about a partial restore. And, for the 2nd question, yes, I was thinking about a hash function change in between versions.

Wei Weng (weng271190436) · 2026-05-05T21:33:12Z

+			klog.ErrorS(getErr, "Failed to get existing overrideSnapshot for hash verification", "resourceOverride", roKObj, "newOverrideSnapshot", klog.KObj(newSnapshot))
+			return controller.NewAPIServerError(false, getErr)
+		}
+		if string(existing.Spec.OverrideHash) != overrideSpecHash {


Maybe we can recompute overrideHash of existing to avoid situations where someone modifies the overrideHash?

API Server applies defaults on Create, and, that is why, the hashes would be different between Create and re-hash of a read-back object.

The defaults come from kubebuilder:default markers on the API types — that's what makes the original and read-back objects differ.

Wei Weng (weng271190436) · 2026-05-05T21:49:26Z

+			// The content has not changed; no new snapshot is needed. Ensure the highest-index
+			// snapshot is marked latest, then audit siblings to clean up any duplicate latest=true
+			// left by a prior partial run.
+			if err := r.ensureSnapshotLatest(ctx, latestSnapshot); err != nil {


When would we need to mark the snapshot as latest?

If we always create a new snapshot with is-latest = true, the latest snapshot already has is-latest label set to true even if a crash happened right after a new snapshot was created?

Want to understand the scenario that you have in mind so that we can think about how to handle.

ensureSnapshotLatest was already here before this PR — I just kept calling it from the hash-match path as defensive insurance against out-of-band label edits. It's a no-op in steady state. Want me to drop it as part of this PR, or leave it alone since it's pre-existing?

michaelawyu (michaelawyu)

Hi Yetkin! I added some comments; would be really happy to discuss about this issue further.

michaelawyu (michaelawyu) · 2026-05-08T00:57:59Z

 	if len(snapshotList.Items) != 0 {
-		// convert the last unstructured snapshot to the typed object
-		latestSnapshot := &placementv1beta1.ClusterResourceOverrideSnapshot{}
+		latestSnapshot = &placementv1beta1.ClusterResourceOverrideSnapshot{}


Hi Yetkin! Thanks so much for this. It is a very interesting PR and I know that such fixes take a long while and a lot of efforts to complete. It is just that, I couldn't help feeling that given the nature of the bug, i.e., rollout controller failing to catch the state change where a resource snapshot is marked as latest again, it can just be easily fixed by tweaking the handler funcs in the rollout controller a bit:

The system just need an UpdateFunc that enqueues placement if it sees that for a snapshot update event, (e.ObjectOld.GetLabels()[isLatestLabelName] != e.ObjectNew.GetLabels()[isLatestLabelName]) && e.ObjectNew.GetLabels()[isLatestLabelName] == "true" (very roughly written).

(There is actually a bit more to this bug -> please see some of the other comments, happy to discuss about this a bit more and make sure that I understand it correctly)

michaelawyu (michaelawyu) · 2026-05-08T01:25:42Z

-				klog.ErrorS(err, "Failed to set the isLatestSnapshot label to false", "clusterResourceOverride", croKObj, "overrideSnapshot", klog.KObj(latestSnapshot))
-				return controller.NewUpdateIgnoreConflictError(err)
+			// Hash matches: re-mark this snapshot latest (idempotent) and audit siblings.
+			if err := r.ensureSnapshotLatest(ctx, latestSnapshot); err != nil {


This part is a bit interesting I think. If we do switch to the create-then-demote model, this part is a bit redundant (it is either always a no-op or always fail due to conflicts). And it cannot be reliably used to correct unexpected cases either (e.g., a latest label is set to false unexpectedly), as it would also trigger the issue we want to fix, i.e., the rollout controller missing the update.

michaelawyu (michaelawyu) · 2026-05-08T01:27:41Z

+				"existing snapshot %s has different content than the current spec; deleting it for the next reconcile to recreate. If this persists, investigate the snapshot for manual edits or a hash-function change between controller versions.", newSnapshot.Name)
+			// Drive observed → desired: delete the mismatched snapshot so the next reconcile
+			// recreates it with the correct content.
+			if delErr := r.Client.Delete(ctx, existing); delErr != nil && !errors.IsNotFound(delErr) {


This deletion part is a bit problematic I fear.

Aside from the semantic concerns (the snapshots should have monotonically, continuous indexes), the snapshot we would like to delete might have already been referenced in some bindings.

This might lead to the work generator getting stuck in a loop of errors when processing some bindings as it can no longer find a referenced snapshot. As our rollout (mostly rolling update) checks for Applied/Availability info, it might also lead a rollout to be stuck.

michaelawyu (michaelawyu) · 2026-05-08T14:40:04Z

+// cycle until the snapshot's labels are repaired.
+//
+// PT is *T constrained to client.Object so we can extract metadata and labels without copying.
+func dedupLatestSnapshots[T any, PT interface {


Hi Yetkin! This part is particularly interesting I think.

One of the major benefits that the demote then create pattern yields is that, it guarantees that at any given point of watches (with the cache lagging or not), the consumer (whoever needs to read/get notified on the appearance of a latest snapshot) will either see:

a) there is no latest snapshot -> which means the system observes an intermediate state and can simply wait for the next event; or
b) there is exactly one latest snapshot.

So the consumer can just do a list op with a label selector, and decide what to do based on the number of items returned. No additional sorting/ranking is needed.

But with the create then demote pattern,

e.g,

0 (latest) - | -> 1 (latest) - | -> 0 (demoted) - | -> 2 (latest) - | -> 1 (demoted) - | -> ...

Roughly (very roughly) 50% of the time, the consumer will see that there are multiple (2) latest snapshots. And it must do the sorting/ranking to determine the actual latest snapshot (or it can just back off until the situation is resolved by the snapshotting controller). The correctness part, as Yetkin you have acutely catches in this part of the code, can still be guaranteed, but it kinds of defeats the purpose of having the latest label. As the source of truth always lies in the monotonically increasing indexes, the consumer can simply just list every snapshot relevant and do the sorting/ranking every time to find the latest snapshot -> it's more costly, but more straightforward.

michaelawyu (michaelawyu) · 2026-05-08T15:13:32Z

+		// Compare stored OverrideHash, not a re-hash of existing.Spec.OverrideSpec: API-server
+		// defaulting on Create (selectionScope, scope, overrideType) makes the read-back spec
+		// differ from the input, so a recompute would never match.
+		if string(existing.Spec.OverrideHash) != overrideSpecHash {


This part is particularly tricky I assume (the flow itself, not specific to this PR)...

Generally speaking, if the user makes different edits each time (A -> B -> C -> ...) and we do not consider hash collision an issue, then this branch will never run and the issue of a past snapshot being marked as latest again will never occur.

The problem will only happen when the user does an A -> B -> A fashion of edits:

In the case of a demote then create flow, there are two possibilities:

case 1 =>

there is a snapshot 1 with hash A -> the user edits overrides that has hash B -> the controller is triggered -> the controller has snapshot 1 demoted, but failed to create snapshot 2 with hash B -> the user edits overrides back to the state of hash A -> the controller is triggered again and marked snapshot 1 as latest

case 1 is harmless, as it still aligns with our eventually consistent model; the system is just behaving as if the edit of hash B never happened. There is no rollout needed either.

case 2 =>

there is a snapshot 1 with hash A -> the user edits overrides that has hash B -> the controller is triggered -> the controller has snapshot 1 demoted, and created snapshot 2 with hash B -> the user edits overrides back to the state of hash A -> the controller is triggered again, but due to cache lagging, it just sees snapshot 1 in its demoted state, snapshot 2 hasn't appeared yet -> the controller marks snapshot 1 as latest again -> the system is trapped in an erred state with two latest snapshots

case 2 is harmful; the consumer side needs to handle this by doing its own sorting/ranking, and the producer side needs to fix the situation in later reconciliations.

And this is not a demote then create flow specific problem -> create then demote can have this as well. And this PR has fixed this by adding additional logic to clean up the history and setting the consumer to de-dup as needed.

But these occurrences are rare in nature -> if we bring object generation into the hash calculation the chances would get even more slim (to the brim of being impossible). Instead of having the cleanup logic and de-duppiing run often, I wonder if it would actually be easier to just:

a) on the consumer side, list with latest label selector, and back off if there are multiple latest snapshots -> wait for the producer side to fix things
b) on the producer side, hash with object generation embedded, and in very rare cases where multiple latest snapshots do appear, do the cleanup.

Important to note that any change we make must guarantee that the new flow won't trigger rollouts on existing placements, so the hashing part might not be easily editable.

The override controllers previously flipped is-latest-snapshot=false on the old snapshot before creating the new one. Because the rollout watcher only handles Create/Generic events (no UpdateFunc), a crash between flip and create left rollout un-enqueued until the next resync (6h hub-agent default) or parent-CR mutation, silently dropping overrides on member clusters. Reorder to create-first + flip-after, with a sibling audit and read-time dedup so transient duplicate latest=true is benign and self-healing: - Create new snapshot (latest=true) before flipping the old one to false. - On AlreadyExists, Get via UncachedReader and verify OverrideHash; on mismatch emit a Warning event on the parent CRO/RO and return ExpectedBehaviorError so retries don't carry stack traces. - Tolerate IsNotFound on the demote Update (concurrent prune / parent deletion) so the sibling audit still runs. - New cleanupStaleLatestSiblings helper flips any older sibling that still carries latest=true; runs on both the hash-match and new-snapshot paths. - Read-time dedup in FetchAllMatchingOverridesForResourceSnapshot picks the highest OverrideIndexLabel per parent (CRO key: tracking label; RO key: namespace + tracking label). Malformed labels log loudly and skip rather than blocking the rollout hot path. - removeExtraSnapshot now trims its in-memory list in place so the same list is safe to reuse for the audit step. Boy Scout: typo and misleading-comment cleanups in the touched test files. Signed-off-by: Yetkin Timocin <ytimocin@microsoft.com>

…ists tests - Wire a record.NewFakeRecorder into the CRO mismatched-hash test and assert the OverrideSnapshotHashMismatch Warning event is actually emitted (Copilot review kubefleet-dev#1). - Mirror both AlreadyExists branches (matching + mismatched hash) for the ResourceReconciler so the namespaced controller has parity coverage. The RO mismatched-hash test also asserts the Warning event on the parent RO. - Reword a stale comment in overrider_test.go that said the dedup path "errors" on malformed labels — it actually logs and skips (Copilot review kubefleet-dev#2). Signed-off-by: Yetkin Timocin <ytimocin@microsoft.com>

Add CRO and RO integration tests that exercise the hash-match short-circuit in ensureClusterResourceOverrideSnapshot / ensureResourceOverrideSnapshot. Pre-create a snapshot whose OverrideHash matches the parent's current spec hash; the controller must take the hash-match branch (no new snapshot, ensureSnapshotLatest re-flips, cleanupStaleLatestSiblings runs) instead of creating an extra snapshot. Lifts overrider package coverage from ~71% to ~73% by hitting the hash-match branch that the existing AlreadyExists tests bypassed. Signed-off-by: Yetkin Timocin <ytimocin@microsoft.com>

- Reword the AlreadyExists comment in both controllers to drop the loose "etcd restore from backup" example (a full etcd restore brings the CRO and snapshots back together consistently, so it's not the realistic trigger). Name the actual triggers explicitly: a manual edit of the snapshot or a hash-function change between controller versions. - On hash mismatch, delete the existing snapshot before returning the error. The override snapshot is owned by the CRO/RO and the reconciler's job is to drive observed → desired state; if the existing snapshot's content doesn't match the spec, deleting it so the next reconcile recreates it cleanly is more correct than letting stale content persist. The Warning event message is updated to reflect this recovery. - Document why we trust the stored OverrideHash field rather than recomputing from existing.Spec.OverrideSpec. The API server applies CRD defaulting on Create (selectionScope, scope, overrideType), so the read-back spec differs from the writer's input and a recompute would never match the original-intent hash. - Reword the dedupLatestSnapshots sort comment. Within a single call all items share scope (one keyOf, one type), so the namespace tier is just a defensive tie-breaker — not a mixed-scope handler as the previous wording implied. - Tighten the rest of the new comments throughout the touched files. Tests: the CRO + RO mismatched-hash integration tests now construct a genuinely divergent spec and assert the bad snapshot is deleted via Eventually(IsNotFound). Signed-off-by: Yetkin Timocin <ytimocin@microsoft.com>

Add focused unit tests using controller-runtime fake client + Delete interceptors to exercise the three Delete branches in the AlreadyExists recovery path that envtest can't reach reliably: - Delete succeeds → ExpectedBehaviorError (recovery proceeds) - Delete returns IsNotFound → swallowed, ExpectedBehaviorError - Delete returns non-NotFound error → APIServerError Same coverage for both CRO and RO controllers. Lifts overrider controller package coverage from ~73.6% to ~74.3%. Signed-off-by: Yetkin Timocin <ytimocin@microsoft.com>

Three small fixes to the unit tests added in the previous commit: - Bump copyright year on the new files from 2025 to 2026. - Mirror the CRO test's deleteCalls counter and assertion in the RO test so all three branches verify the controller actually attempted the Delete, not just the returned error sentinel. - Replace mutation of the global k8s.io/client-go/kubernetes/scheme with a fresh runtime.NewScheme() per test for safe test isolation (matches the pattern in pkg/controllers/workgenerator/override_test.go). Signed-off-by: Yetkin Timocin <ytimocin@microsoft.com>

Four small fixes following Go-hygiene review: - Fix CRO test asymmetry: the "non-NotFound delete error" case was missing wantSnapDelete: true, so the controller's Delete attempt on the error branch wasn't asserted. Now matches the RO mirror. - Add t.Parallel() to both Test* functions and their table-driven subtests; fresh fakeClient per case makes parallelism safe and satisfies the tparallel linter. - Rename ae* test variables (aeReconciler, aeCRO, aeCROName, aeRO, aeROName) to recovery* in common_test.go for clarity. The "ae" abbreviation was opaque outside the immediate Context block. - Drop the r.recorder != nil guard in both controllers' hash-mismatch branch. The recorder is wired in SetupWithManager and tests that reach this path always set it; the guard was inconsistent with the rest of the package (workgenerator, workapplier all call r.recorder.Eventf directly). Signed-off-by: Yetkin Timocin <ytimocin@microsoft.com>

Yetkin Timocin (ytimocin) requested a review from Copilot May 5, 2026 14:17

Copilot started reviewing on behalf of Yetkin Timocin (ytimocin) May 5, 2026 14:18 View session

Copilot AI reviewed May 5, 2026

View reviewed changes

Comment thread pkg/controllers/overrider/common_test.go Outdated

Comment thread pkg/utils/overrider/overrider_test.go Outdated

Wei Weng (weng271190436) reviewed May 5, 2026

View reviewed changes

Yetkin Timocin (ytimocin) force-pushed the fix/override-snapshot-transition-race branch 2 times, most recently from a01c978 to 203995e Compare May 6, 2026 16:47

michaelawyu (michaelawyu) reviewed May 8, 2026

View reviewed changes

Yetkin Timocin (ytimocin) force-pushed the fix/override-snapshot-transition-race branch from c452a28 to 0ff472d Compare May 9, 2026 02:35

Yetkin Timocin (ytimocin) added 7 commits June 2, 2026 10:15

Yetkin Timocin (ytimocin) force-pushed the fix/override-snapshot-transition-race branch from 0ff472d to de99cf5 Compare June 2, 2026 17:15

Conversation

Yetkin Timocin (ytimocin) commented May 5, 2026

Description of your changes

How has this code been tested

Special notes for your reviewer

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

codecov Bot commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Yetkin Timocin (ytimocin) commented May 5, 2026

E2E validation on Kind clusters

Setup

Edge cases

Sample results

Hub-agent log scan across all four runs

Side observation

Baseline make e2e-tests caveat

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

michaelawyu (michaelawyu) left a comment

Choose a reason for hiding this comment

Uh oh!

michaelawyu (michaelawyu) May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

michaelawyu (michaelawyu) May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

michaelawyu (michaelawyu) May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

michaelawyu (michaelawyu) May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

codecov Bot commented May 5, 2026 •

edited

Loading

Baseline `make e2e-tests` caveat

michaelawyu (michaelawyu) May 8, 2026 •

edited

Loading

michaelawyu (michaelawyu) May 8, 2026 •

edited

Loading

michaelawyu (michaelawyu) May 8, 2026 •

edited

Loading

michaelawyu (michaelawyu) May 8, 2026 •

edited

Loading