Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 43 additions & 0 deletions backend/cmd/headlamp_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -1769,6 +1769,49 @@ func TestCacheMiddleware_CacheHitAndCacheMiss_RealK8s(t *testing.T) {
assert.Equal(t, "true", secondFromCache, "second request should be from cache")
}

// Integration test to verify CacheMiddleware does not cause excessive goroutine usage with a real K8s API server.
func TestCacheMiddleware_GoRoutines(t *testing.T) {
if os.Getenv("HEADLAMP_RUN_INTEGRATION_TESTS") != strconv.FormatBool(istrue) {
t.Skip("skipping integration test")
}

start := runtime.NumGoroutine()

c, clusterName := newRealK8sHeadlampConfig(t)
handler := createHeadlampHandler(c)

ts := httptest.NewServer(handler)
defer ts.Close()

apiPath := "/clusters/" + clusterName + "/api/v1/pods"
ctx := context.Background()

resp, err := httpRequestWithContext(ctx, ts.URL+apiPath, "GET")
require.NoError(t, err)

defer resp.Body.Close()

Comment on lines +1786 to +1793
Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test doesn’t verify the request succeeded (status code) or that cache watching was actually started for the context. If authorization fails in handleCacheAuthorization, CheckForChanges can be skipped entirely, making goroutine assertions unrelated to the behavior under test. Add an explicit require.Equal(http.StatusOK, resp.StatusCode) (or expected status) and ensure the request path/headers exercise the codepath that calls k8cache.CheckForChanges.

Copilot uses AI. Check for mistakes.
time.Sleep(3 * time.Second)

running := runtime.NumGoroutine()

time.Sleep(3 * time.Second)

after := runtime.NumGoroutine()

expected := false
afterExpected := false

if (running >= 50 && running < 200) && (after >= 50 && after < 200) {
expected = true
afterExpected = true
}

assert.Equal(t, 3, start)
assert.Equal(t, running >= 50 && running < 200, expected)
assert.Equal(t, after >= 50 && after < 200, afterExpected)
Comment on lines +1794 to +1812
Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is very likely to fail and/or be meaningless:

  • runtime.NumGoroutine() is not stable across Go versions/test runners, so assert.Equal(t, 3, start) is unreliable.
  • expected/afterExpected are derived from the same boolean being asserted, making the assertions tautological (they’ll pass regardless of goroutine count).
    Consider asserting on a delta from start (or an upper bound) and using require.Eventually to wait for informers to start/settle, rather than hard-coded absolute counts.
Suggested change
time.Sleep(3 * time.Second)
running := runtime.NumGoroutine()
time.Sleep(3 * time.Second)
after := runtime.NumGoroutine()
expected := false
afterExpected := false
if (running >= 50 && running < 200) && (after >= 50 && after < 200) {
expected = true
afterExpected = true
}
assert.Equal(t, 3, start)
assert.Equal(t, running >= 50 && running < 200, expected)
assert.Equal(t, after >= 50 && after < 200, afterExpected)
// Wait until at least one additional goroutine is started, or time out.
require.Eventually(t, func() bool {
return runtime.NumGoroutine() > start
}, 10*time.Second, 100*time.Millisecond)
running := runtime.NumGoroutine()
// Allow some time for background work to settle.
time.Sleep(3 * time.Second)
after := runtime.NumGoroutine()
// Assert on deltas rather than absolute goroutine counts, which are
// environment-dependent. We expect some increase but no unbounded growth.
const maxDelta = 500
deltaRunning := running - start
if deltaRunning < 0 {
deltaRunning = 0
}
deltaAfter := after - start
if deltaAfter < 0 {
deltaAfter = 0
}
assert.GreaterOrEqual(t, deltaRunning, 1, "expected at least one additional goroutine after request")
assert.LessOrEqual(t, deltaRunning, maxDelta, "too many goroutines shortly after request")
assert.LessOrEqual(t, deltaAfter, maxDelta, "too many goroutines after settling time")

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@illume would this assertion good way to check accurately how many go routines spawned

}

// TestCacheMiddleware_CacheInvalidation_RealK8s tests cache invalidation with a
// real Kubernetes API server. Creates a ConfigMap, invalidates via DELETE, then
// verifies the next GET fetches fresh data. Requires HEADLAMP_RUN_INTEGRATION_TESTS=true
Expand Down
38 changes: 37 additions & 1 deletion backend/pkg/k8cache/cacheInvalidation.go
Original file line number Diff line number Diff line change
Expand Up @@ -135,7 +135,43 @@ func returnGVRList(apiResourceLists []*metav1.APIResourceList) []schema.GroupVer
}
}

return gvrList
// filtering the gvrList to make sure spawning go routines for only important
// resources which are required to be watched and cached,
// this will help to reduce the performance issue and resource utilization.
Comment on lines +138 to +140
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comments here have grammar/spelling issues (e.g., "make sure spawning go routines" and "go routines" -> "ensure we spawn goroutines"). Please rewrite for clarity since these comments explain a behavioral change that affects correctness.

Suggested change
// filtering the gvrList to make sure spawning go routines for only important
// resources which are required to be watched and cached,
// this will help to reduce the performance issue and resource utilization.
// Filter the gvrList so that we spawn goroutines only for important resources
// that need to be watched and cached. This helps improve performance and
// reduce resource utilization.

Copilot uses AI. Check for mistakes.
filtered := filterImportantResources(gvrList)

return filtered
Comment on lines +141 to +143
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR description says the filter was added “inside the CheckForChanges”, but the change is actually applied in returnGVRList (affecting all watcher setups). If the intent is different (e.g., conditional filtering), either adjust the implementation or update the PR description to match the code.

Copilot uses AI. Check for mistakes.
}
Comment on lines +138 to +144
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

returnGVRList now filters the watch/invalidation GVRs down to a small allowlist, but the cache middleware still stores GET responses for all resources (see StoreK8sResponseInCache usage in backend/cmd/server.go). This means cached responses for non-allowlisted resources will no longer be promptly invalidated on external changes and can remain stale until TTL expiry. Consider either restricting caching to the same allowlist, or implementing a lazy/per-request watcher registration so any cached resource also gets an informer (or keep watching all list+watch resources).

Copilot uses AI. Check for mistakes.
Comment on lines +138 to +144
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The doc comment above returnGVRList says it returns “all the supported resources that are supported by the k8s server”, but after this change it returns only the filtered subset. Update the comment to reflect the new behavior so callers don’t assume full coverage.

Copilot uses AI. Check for mistakes.

Comment on lines +138 to +145
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filtering the discovered GVRs down to a hard-coded allowlist changes cache invalidation semantics: the cache layer stores GET responses for arbitrary resource URLs, but changes to any resource not in allowed will no longer trigger informer events to invalidate cached entries (except via TTL or non-GET requests through Headlamp). This can lead to serving stale data for those resource kinds. Consider either (a) restricting caching to the same allowlist, (b) making the watched resources configurable, or (c) expanding the watched set to include all kinds that can be cached/are used by the UI (e.g., namespaces, ingresses, pv/pvc, rbac objects, etc.).

Suggested change
// filtering the gvrList to make sure spawning go routines for only important
// resources which are required to be watched and cached,
// this will help to reduce the performance issue and resource utilization.
filtered := filterImportantResources(gvrList)
return filtered
}
// Return the full list of listable/watchable resources so that cache
// invalidation covers all kinds that may be cached by the middleware.
// This avoids serving stale data for resources that would otherwise
// not be watched due to a hard-coded allowlist.
return gvrList
}

Copilot uses AI. Check for mistakes.
// filterImportantResources filters the provided list of GroupVersionResources to
// include only those that are deemed important for caching and watching.
// This helps reduce the number of resources we watch and cache,
// improving performance and resource utilization.
func filterImportantResources(gvrList []schema.GroupVersionResource) []schema.GroupVersionResource {
allowed := map[string]struct{}{
"pods": {},
"services": {},
"deployments": {},
"replicasets": {},
"statefulsets": {},
"daemonsets": {},
"nodes": {},
"configmaps": {},
"secrets": {},
"jobs": {},
"cronjobs": {},
}

filtered := make([]schema.GroupVersionResource, 0, len(allowed))

for _, gvr := range gvrList {
if _, ok := allowed[gvr.Resource]; ok {
Comment on lines +151 to +168
Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new allow-list only keys off gvr.Resource. That means any API group that happens to use the same resource name will be included (even if it shouldn't be), and it also makes it easy to miss commonly used resources (e.g., namespaces, ingresses, serviceaccounts, roles) without any compile-time signal. Consider keying by full GroupResource (group+resource) and/or centralizing this list with clear rationale and tests so future updates don’t silently change invalidation behavior.

Suggested change
allowed := map[string]struct{}{
"pods": {},
"services": {},
"deployments": {},
"replicasets": {},
"statefulsets": {},
"daemonsets": {},
"nodes": {},
"configmaps": {},
"secrets": {},
"jobs": {},
"cronjobs": {},
}
filtered := make([]schema.GroupVersionResource, 0, len(allowed))
for _, gvr := range gvrList {
if _, ok := allowed[gvr.Resource]; ok {
// Use full GroupResource (group + resource) for the allow-list so that
// different API groups that happen to use the same resource name are not
// conflated. Core resources use the empty group name "".
allowed := map[schema.GroupResource]struct{}{
{Group: "", Resource: "pods"}: {},
{Group: "", Resource: "services"}: {},
{Group: "", Resource: "deployments"}: {},
{Group: "", Resource: "replicasets"}: {},
{Group: "", Resource: "statefulsets"}: {},
{Group: "", Resource: "daemonsets"}: {},
{Group: "", Resource: "nodes"}: {},
{Group: "", Resource: "configmaps"}: {},
{Group: "", Resource: "secrets"}: {},
{Group: "", Resource: "jobs"}: {},
{Group: "", Resource: "cronjobs"}: {},
}
filtered := make([]schema.GroupVersionResource, 0, len(allowed))
for _, gvr := range gvrList {
gr := schema.GroupResource{Group: gvr.Group, Resource: gvr.Resource}
if _, ok := allowed[gr]; ok {

Copilot uses AI. Check for mistakes.
filtered = append(filtered, gvr)
}
}
Comment on lines +146 to +171
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

filterImportantResources introduces behavior that directly affects cache invalidation correctness/performance, but there are no unit tests covering the filtering (and its interaction with returnGVRList). Add tests that verify the allowlist behavior (including that non-allowlisted resources are excluded) to prevent regressions when the list changes.

Copilot uses AI. Check for mistakes.

// return the filtered list of GroupVersionResources that are important for caching and watching
return filtered
Comment on lines +146 to +174
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New filtering behavior isn’t covered by tests. There are existing tests for this package (e.g., cache invalidation and informer wiring), but none validate that the GVR discovery output is filtered as intended. Add a unit test for filterImportantResources (and/or returnGVRList) to prevent regressions and to document the expected allowlist behavior.

Copilot uses AI. Check for mistakes.
}
Comment on lines +146 to 175
Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

filterImportantResources is new behavior but there’s no unit test coverage ensuring the filter returns the expected subset and preserves Group/Version fields for allowed resources. Since this package already has cacheInvalidation_test.go, adding tests for returnGVRList/filterImportantResources would help prevent accidental regressions (e.g., typos in resource names or missed resources).

Copilot uses AI. Check for mistakes.

// Corrected CheckForChanges.
Expand Down
Loading