watch only known configmaps for WVA#49
watch only known configmaps for WVA#49Gregory-Pereira wants to merge 7 commits intoopendatahub-io:mainfrom
configmaps for WVA#49Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughThis PR makes small non-functional whitespace and formatting edits in several files and a behavioral change to controller wiring. It adjusts an inline struct tag comment in Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Issuesinternal/controller/configmap_reconciler.go — Informer set may diverge from predicate (missing watches)
internal/controller/configmap_reconciler.go — Informer lifecycle and cache sync ordering (race)
internal/controller/predicates.go — Behavioral change from removing a well-known name
🚥 Pre-merge checks | ✅ 2✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
docs/user-guide/namespace-management.md (1)
296-306:⚠️ Potential issue | 🟡 MinorScenario 3 example missing required label.
This inline ConfigMap example lacks
wva.llmd.ai/config: "true". Per the "Required Label" section at lines 179-182, this ConfigMap would be invisible to the controller.📝 Proposed fix
# 3. Create namespace-local saturation config kubectl apply -f - <<EOF apiVersion: v1 kind: ConfigMap metadata: name: wva-saturation-config namespace: my-app + labels: + wva.llmd.ai/config: "true" data: default: | kvCacheThreshold: 0.80 queueLengthThreshold: 10 EOF🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/user-guide/namespace-management.md` around lines 296 - 306, The ConfigMap example for "wva-saturation-config" is missing the required label so the controller will ignore it; update the example's metadata to include the required label key wva.llmd.ai/config set to "true" (add a metadata.labels block under the ConfigMap's metadata for name: wva-saturation-config in namespace: my-app) so the controller can discover this resource.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@docs/user-guide/namespace-management.md`:
- Around line 296-306: The ConfigMap example for "wva-saturation-config" is
missing the required label so the controller will ignore it; update the
example's metadata to include the required label key wva.llmd.ai/config set to
"true" (add a metadata.labels block under the ConfigMap's metadata for name:
wva-saturation-config in namespace: my-app) so the controller can discover this
resource.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Central YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: b2e3d99a-e496-4351-84f6-7bf7168ab698
📒 Files selected for processing (7)
cmd/main.goconfig/manager/configmap-saturation-scaling.yamlconfig/manager/configmap.yamlconfig/openshift/configmap-patch.yamlconfig/samples/model-scale-to-zero-config.yamldocs/user-guide/namespace-management.mdinternal/constants/labels.go
|
two questions:
|
+1 on doing it upstream first |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/controller/configmap_reconciler.go`:
- Around line 114-119: The list wellKnownNames currently includes
config.ConfigMapName() but Reconcile's switch (in the Reconcile method) has no
case to handle that name and falls to the default "Ignoring unrecognized
ConfigMap"; either remove config.ConfigMapName() from wellKnownNames if no
handling is required, or add a new case in Reconcile's switch to handle
config.ConfigMapName() (implement the desired handling logic similar to the
existing cases for config.SaturationConfigMapName,
config.DefaultScaleToZeroConfigMapName, or config.QMAnalyzerConfigMapName) so
events for config.ConfigMapName() are processed rather than ignored.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Central YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: c310f5cc-e638-455f-9e32-7445f85d0e0d
📒 Files selected for processing (5)
internal/config/saturation_scaling.gointernal/controller/configmap_reconciler.gointernal/engines/pipeline/enforcer.gointernal/engines/pipeline/greedy_score_optimizer_test.gointernal/engines/saturation/engine_v2.go
💤 Files with no reviewable changes (1)
- internal/engines/pipeline/enforcer.go
✅ Files skipped from review due to trivial changes (3)
- internal/config/saturation_scaling.go
- internal/engines/saturation/engine_v2.go
- internal/engines/pipeline/greedy_score_optimizer_test.go
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
internal/controller/configmap_reconciler.go (1)
135-140: Resync period of zero disables periodic resync.Setting
resyncPeriodto0means the informer relies exclusively on watch events. If a watch disconnects and misses events before reconnection, state could drift until the next full relist (triggered only by watch errors, not timer).If this is intentional for reducing API server load, document it. Otherwise, consider a non-zero resync interval (e.g.,
30*time.Minute) as a safety net.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/configmap_reconciler.go` around lines 135 - 140, The informer created by toolscache.NewSharedIndexInformer currently passes a resyncPeriod of 0 which disables periodic resync; change the resyncPeriod argument in the informer creation (the NewSharedIndexInformer call that constructs informer for &corev1.ConfigMap{}) to a non-zero duration such as 30*time.Minute (or make the interval configurable) so periodic relists act as a safety net, or if zero is intentional add a clear comment and/or configuration flag documenting the deliberate choice.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/controller/configmap_reconciler.go`:
- Around line 142-152: The informer is being added as an "Others" runnable
(mgr.Add(manager.RunnableFunc(...)) with informer.Run) which can start after
controllers block on cache sync; replace this by using the manager's cache so
the informer is part of the Caches phase: either implement the manager Caches
group interface (add GetCache() returning cache.Cache) and return the cache used
to build the informer, or simply call mgr.GetCache() and create the informer
from that cache instead of registering informer.Run; then remove the custom
runnable and keep bldr.WatchesRawSource(&source.Informer{Informer: informer,
Handler: &handler.EnqueueRequestForObject{}}) so WaitForCacheSync is satisfied
before controllers start.
---
Nitpick comments:
In `@internal/controller/configmap_reconciler.go`:
- Around line 135-140: The informer created by toolscache.NewSharedIndexInformer
currently passes a resyncPeriod of 0 which disables periodic resync; change the
resyncPeriod argument in the informer creation (the NewSharedIndexInformer call
that constructs informer for &corev1.ConfigMap{}) to a non-zero duration such as
30*time.Minute (or make the interval configurable) so periodic relists act as a
safety net, or if zero is intentional add a clear comment and/or configuration
flag documenting the deliberate choice.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Central YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: 90ec3ddd-41e4-4fa7-b262-d0c9162f3469
📒 Files selected for processing (2)
internal/controller/configmap_reconciler.gointernal/controller/predicates.go
| if err := mgr.Add(manager.RunnableFunc(func(ctx context.Context) error { | ||
| informer.Run(ctx.Done()) | ||
| return nil | ||
| })); err != nil { | ||
| return fmt.Errorf("registering informer for ConfigMap %q: %w", cmName, err) | ||
| } | ||
|
|
||
| bldr.WatchesRawSource(&source.Informer{ | ||
| Informer: informer, | ||
| Handler: &handler.EnqueueRequestForObject{}, | ||
| }) |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
Does controller-runtime manager.Add() guarantee runnables start before controller sources call WaitForCacheSync?
💡 Result:
No—Manager.Add() by itself does not guarantee that an arbitrary runnable has started before a controller’s sources call WaitForCacheSync.
What is guaranteed (by the manager’s start sequencing) is:
- The manager starts runnable groups in this order: HTTPServers → Webhooks → Caches → Others → Warmup → (then) LeaderElection runnables (after election / or immediately if leader election is disabled). Controllers run in the LeaderElection group, so they start after the cache group has been started (and its “ready” checks have completed). [1]
- If your runnable is classified into the Caches group (i.e., it matches
hasCache), controller-runtime wires its “ready check” torunnable.GetCache().WaitForCacheSync(ctx), and the group’sStart()waits for those ready checks for items queued beforeStart()completes. [2][3]
But if your runnable ends up in Others or LeaderElection:
- It may start concurrently with (or after) controllers, and there is no ordering guarantee that it will run before a controller’s sources execute their own
WaitForCacheSynclogic. Also, runnables added after a group has already started are just enqueued and started asynchronously, without any global ordering barrier against controllers. [2]
Practical takeaway: the only reliable “starts-before-controllers-wait-for-cache-sync” relationship is achieved by being in the Caches group (implementing hasCache) so the manager blocks on WaitForCacheSync before starting controllers; otherwise, Add() does not provide that guarantee. [1][2]
References:
[1] controller-runtime manager start order (internal.go, controllerManager.Start) (raw.githubusercontent.com)
[2] runnable grouping + “added after Start are started directly” (runnable_group.go) (raw.githubusercontent.com)
[3] cache sync semantics (Informers.WaitForCacheSync) (pkg.go.dev)
Citations:
- 1: https://raw.githubusercontent.com/kubernetes-sigs/controller-runtime/v0.23.3/pkg/manager/internal.go
- 2: https://raw.githubusercontent.com/kubernetes-sigs/controller-runtime/v0.23.3/pkg/manager/runnable_group.go
- 3: https://pkg.go.dev/sigs.k8s.io/controller-runtime/pkg/cache/internal
Confirmed startup race: informer runnable may start after controller's WaitForCacheSync blocks.
mgr.Add(manager.RunnableFunc(...)) registers the informer in the "Others" runnable group. Controller-runtime manager startup order is: Caches group (blocks on sync) → LeaderElection group (controllers). Since controllers run in LeaderElection and start after the Caches group completes, and your informer runnable is in "Others", the informer may start concurrently with or after the controller. When source.Informer.Start() calls WaitForCacheSync, if informer.Run() hasn't populated the cache yet, it will block indefinitely or timeout.
Fix: Implement the Caches group interface (add a GetCache() method returning a cache.Cache) so the manager blocks on WaitForCacheSync before starting controllers, or use mgr.GetCache() directly instead of a custom informer runnable.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/controller/configmap_reconciler.go` around lines 142 - 152, The
informer is being added as an "Others" runnable
(mgr.Add(manager.RunnableFunc(...)) with informer.Run) which can start after
controllers block on cache sync; replace this by using the manager's cache so
the informer is part of the Caches phase: either implement the manager Caches
group interface (add GetCache() returning cache.Cache) and return the cache used
to build the informer, or simply call mgr.GetCache() and create the informer
from that cache instead of registering informer.Run; then remove the custom
runnable and keep bldr.WatchesRawSource(&source.Informer{Informer: informer,
Handler: &handler.EnqueueRequestForObject{}}) so WaitForCacheSync is satisfied
before controllers start.
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Find go.mod for controller-runtime version
cat go.mod | grep controller-runtime
# Find main.go
fd -e go -g 'main.go' --exec cat {}
# Check controller-runtime cache options usage
rg -n "cache.Options\|ByObject\|NewManager\|ctrl.Options" --type=go -C3
Length of output: 20729
@pierDipi — confirmed. The root cause in v0.22.5: manager.RunnableFunc does not implement LeaderElectionRunnable, so controller-runtime places it in the LeaderElection group — the same group as controllers. Both informer.Run() and the controller's source.Informer.Start() (which calls WaitForCacheSync) launch concurrently, creating the race.
The minimal fix is a small dedicated struct that implements NeedLeaderElection() bool { return false }, which moves the informer runnable to the Others group — guaranteed to start before the LeaderElection group:
// informerRunnable wraps a SharedIndexInformer so it starts in the "Others"
// runnable group (NeedLeaderElection=false), which the manager starts before
// the LeaderElection group where controllers live. This ensures informer.Run()
// is called before source.Informer.Start() calls WaitForCacheSync.
type informerRunnable struct {
informer toolscache.SharedIndexInformer
}
func (ir *informerRunnable) Start(ctx context.Context) error {
ir.informer.Run(ctx.Done())
return nil
}
// NeedLeaderElection returns false so the manager places this runnable in the
// "Others" group, which starts before the LeaderElection group (controllers).
func (ir *informerRunnable) NeedLeaderElection() bool {
return false
}Then replace the mgr.Add(manager.RunnableFunc(...)) call in the loop:
// Before (races with controller start — RunnableFunc → LeaderElection group):
if err := mgr.Add(manager.RunnableFunc(func(ctx context.Context) error {
informer.Run(ctx.Done())
return nil
})); err != nil {
return fmt.Errorf("registering informer for ConfigMap %q: %w", cmName, err)
}
// After (informerRunnable → Others group, starts before controllers):
if err := mgr.Add(&informerRunnable{informer: informer}); err != nil {
return fmt.Errorf("registering informer for ConfigMap %q: %w", cmName, err)
}The informerRunnable type can live at the bottom of configmap_reconciler.go (or a shared file if more controllers need it). No changes to main.go or the source.Informer wiring are required.
There was a problem hiding this comment.
I have attempted to solve this in 47f141dbf0ee907fd550b4082bc9354abfec466b, can we confirm this is no longer an issue coderrabitai?
There was a problem hiding this comment.
Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!
Signed-off-by: greg pereira <grpereir@redhat.com>
Signed-off-by: greg pereira <grpereir@redhat.com>
Signed-off-by: greg pereira <grpereir@redhat.com>
…where controllers start) to the Others group Signed-off-by: greg pereira <grpereir@redhat.com>
|
@coderabbitai can you do another review please? |
|
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/controller/configmap_reconciler.go`:
- Around line 125-130: The ListWatch is hard-coded to metav1.NamespaceAll;
change it to use the reconciler's configured watch namespace (the same namespace
variable used by shouldWatchNamespaceLocalConfigMap, e.g., r.watchNamespace or
cfg.WatchNamespace) so the call to toolscache.NewFilteredListWatchFromClient
uses that namespace instead of NamespaceAll, and preserve the existing behavior
for cluster-wide mode (fall back to metav1.NamespaceAll only when the configured
watch namespace is empty/explicitly set to all). Ensure the reference to cmName
and the FieldSelector logic stays unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Central YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: 022cd3be-3562-4f0d-86c8-d71777e2a598
📒 Files selected for processing (6)
internal/config/saturation_scaling.gointernal/controller/configmap_reconciler.gointernal/controller/predicates.gointernal/engines/pipeline/enforcer.gointernal/engines/pipeline/greedy_score_optimizer_test.gointernal/engines/saturation/engine_v2.go
💤 Files with no reviewable changes (1)
- internal/engines/pipeline/enforcer.go
Signed-off-by: greg pereira <grpereir@redhat.com>
47f141d to
c265bbd
Compare
configmaps for WVA based on labelconfigmaps for WVA
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/controller/configmap_reconciler.go`:
- Around line 119-134: The controller currently only creates a ListWatch scoped
to r.Config.WatchNamespace(), which drops global ConfigMaps from
config.SystemNamespace() when WatchNamespace is set; update the loop that builds
the list watches (where NewFilteredListWatchFromClient is used for each cmName
in wellKnownNames) to also register a second ListWatch for
config.SystemNamespace() whenever watchNS != metav1.NamespaceAll and watchNS !=
config.SystemNamespace() (avoid duplicates when they match); ensure you capture
loop variables (cmName) as now, attach both watches to the controller so
ConfigMapPredicate and Reconcile will observe global config, and keep the
existing event filter (ConfigMapPredicate) unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Central YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: 00e8a16f-fd7e-41f9-b42f-623431aad04a
📒 Files selected for processing (6)
internal/config/saturation_scaling.gointernal/controller/configmap_reconciler.gointernal/controller/predicates.gointernal/engines/pipeline/enforcer.gointernal/engines/pipeline/greedy_score_optimizer_test.gointernal/engines/saturation/engine_v2.go
💤 Files with no reviewable changes (1)
- internal/engines/pipeline/enforcer.go
✅ Files skipped from review due to trivial changes (3)
- internal/engines/saturation/engine_v2.go
- internal/config/saturation_scaling.go
- internal/engines/pipeline/greedy_score_optimizer_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
- internal/controller/predicates.go
Signed-off-by: greg pereira <grpereir@redhat.com>
|
@coderabbitai please review |
|
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/controller/configmap_reconciler.go`:
- Around line 131-133: The ConfigMapPredicate passed to WithEventFilter on the
builder is not applied to WatchesRawSource calls, so update each
WatchesRawSource(...) call to attach the predicate directly to the underlying
informer (use source.Informer(...).WithPredicates(...) or equivalent) so
namespace/tracking filters run before enqueue; locate the WatchesRawSource
usages in the configmap controller (references: bldr, WatchesRawSource,
ConfigMapPredicate, WithEventFilter) and replace or augment them so each raw
source/informer is created with ConfigMapPredicate instead of relying solely on
WithEventFilter.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Central YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: 301c19fc-68ee-4ba7-8e54-7ea8fe8cb34d
📒 Files selected for processing (6)
internal/config/saturation_scaling.gointernal/controller/configmap_reconciler.gointernal/controller/predicates.gointernal/engines/pipeline/enforcer.gointernal/engines/pipeline/greedy_score_optimizer_test.gointernal/engines/saturation/engine_v2.go
💤 Files with no reviewable changes (1)
- internal/engines/pipeline/enforcer.go
…w sources) into each source.Informer's Predicates field directly Signed-off-by: greg pereira <grpereir@redhat.com>
Watching all the configmaps is creating resource issues. Now the cache filters by label so only a handful of ConfigMaps are stored in memory instead of every one in the cluster.
cc @vivekk16 @pierDipi
Summary by CodeRabbit
Refactor
Style