Skip to content

fix: automatically detect and handle legacy vs Kubernetes-style dashboard APIs#481

Draft
sd2k wants to merge 8 commits intomainfrom
fix-grafana-legacy-vs-kubernetes-apis
Draft

fix: automatically detect and handle legacy vs Kubernetes-style dashboard APIs#481
sd2k wants to merge 8 commits intomainfrom
fix-grafana-legacy-vs-kubernetes-apis

Conversation

@sd2k
Copy link
Collaborator

@sd2k sd2k commented Jan 14, 2026

This adds the core infrastructure for detecting whether a Grafana instance
supports kubernetes-style APIs (/apis) vs legacy APIs (/api).

Then, updates the dashboard tools to use capability detection for automatic
fallback from legacy APIs to kubernetes-style APIs when the legacy API
returns a 406 error.

Lastly it updates the explore deeplink generation to support the new schemaVersion=1
format used by Grafana 10+, while maintaining backward compatibility with
the legacy 'left=' parameter format.


Note

Medium Risk
Touches core Grafana request plumbing and changes how dashboards are fetched based on runtime error handling/caching, which could affect compatibility across Grafana versions. Behavior is guarded by fallback logic and covered by new unit/integration tests, but mis-detection or conversion issues could break dashboard retrieval.

Overview
Adds capability discovery/caching (/apis probing) and a new GrafanaInstance wrapper to track per-API-group mode (legacy vs kubernetes) and fetch dashboards via kubernetes-style endpoints when needed.

Updates the dashboard tool to try legacy first but on legacy 406 Not Acceptable parses the suggested /apis/... path, switches future calls to kubernetes for the dashboard API group, and converts kubernetes dashboard responses back into the legacy model for compatibility. Also updates explore deeplink generation to default to Grafana 10+ schemaVersion=1&panes=... URLs with optional legacy left= fallback, and adds comprehensive unit/integration tests around these behaviors.

Written by Cursor Bugbot for commit d97f3cf. This will update automatically on new commits. Configure here.

@sd2k sd2k changed the title fix grafana legacy vs kubernetes apis fix: automatically detect and handle legacy vs Kubernetes-style dashboard APIs Jan 14, 2026
@sd2k sd2k marked this pull request as ready for review February 24, 2026 15:38
@sd2k sd2k requested a review from a team as a code owner February 24, 2026 15:38
sd2k added 3 commits February 24, 2026 15:40
Implements Phase 1 of the API capability detection plan (#300).

This adds the core infrastructure for detecting whether a Grafana instance
supports kubernetes-style APIs (/apis) vs legacy APIs (/api).

New files:
- capability.go: Types, cache, and discovery logic for API capabilities
- grafana_instance.go: GrafanaInstance type that wraps legacy client with
  capability-aware API access

Key features:
- Thread-safe capability cache with configurable TTL (default 1 minute)
- Automatic discovery of available API groups and versions via GET /apis
- Support for API key and basic auth in kubernetes API requests
- Parse406Error helper for extracting API version from 406 error messages
- Context functions for extracting GrafanaInstance from context

The GrafanaInstance type provides:
- DiscoverCapabilities(): Fetch and cache API capabilities
- HasKubernetesAPIs(): Check if kubernetes APIs are available
- GetAPIGroupInfo(): Get info about a specific API group
- GetDashboardKubernetes(): Fetch dashboard via kubernetes-style API
- ShouldUseKubernetesAPI(): Determine which API style to use

Integration tests confirm successful discovery of 9 API groups from
Grafana 12+, including dashboard.grafana.app with versions v1beta1,
v0alpha1, v2beta1, and v2alpha1.
…se 2)

Implements Phase 2 of the API capability detection plan (#300).

This updates the dashboard tools to use capability detection for automatic
fallback from legacy APIs to kubernetes-style APIs when the legacy API
returns a 406 error.

Key changes to tools/dashboard.go:
- getDashboardByUID now checks if GrafanaInstance is available
- If capability is already set to kubernetes, uses kubernetes API directly
- Otherwise tries legacy API first (most compatible for Grafana Cloud)
- On 406 error, parses the error message to extract the suggested version
- Falls back to kubernetes API with the extracted or discovered version
- Converts kubernetes response format to legacy format for compatibility

New helper functions:
- getDashboardByUIDLegacy: Uses legacy /api/dashboards/uid endpoint
- getDashboardByUIDKubernetes: Uses kubernetes API with discovered version
- getDashboardByUIDKubernetesWithVersion: Uses kubernetes API with specific version
- convertKubernetesDashboardToLegacy: Converts k8s response to legacy format

Unit tests added for:
- Legacy API success case
- Legacy-only fallback (no GrafanaInstance)
- 406 fallback to kubernetes API
- Direct kubernetes when capability pre-set
- Not found error handling
- Kubernetes to legacy format conversion

All dependent functions (getDashboardSummary, getDashboardProperty,
getDashboardPanelQueries, updateDashboard) automatically benefit from
the capability detection through their use of getDashboardByUID.
Implements Phase 3 of the API capability detection plan (#300).

Updates the explore deeplink generation to support the new schemaVersion=1
format used by Grafana 10+, while maintaining backward compatibility with
the legacy 'left=' parameter format.

Changes to GenerateDeeplinkParams:
- Added ExploreQuery field for specifying PromQL/LogQL query expressions
- Added UseLegacyExploreURL flag to force legacy format if needed
- Default behavior now uses new schemaVersion=1 format

New explore URL format (Grafana 10+):
/explore?schemaVersion=1&panes={"pane1":{"datasource":"uid",...}}

Legacy format (still supported):
/explore?left={"datasource":"uid"}

The new format provides better support for:
- Multiple query expressions
- Proper time range encoding
- Future multi-pane explore layouts

Tests updated to cover both formats.
@sd2k sd2k force-pushed the fix-grafana-legacy-vs-kubernetes-apis branch from 1c773eb to 0db5a42 Compare February 24, 2026 15:43
@github-actions

This comment has been minimized.

@cursor

This comment has been minimized.

- Refresh detectedAt in SetAPICapability when cache entry is expired
- Extract namespace from 406 error instead of hardcoding "default"
- Embed time range in explore panes JSON for new deeplink format

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions

This comment has been minimized.

@cursor

This comment has been minimized.

- Ensure hasKubernetesAPIs is set to true when a kubernetes capability
  is recorded, preventing stale false values from poisoning the cache
- Unexport DiscoverAPIs to discoverAPIs since it's only used in tests;
  production code uses GrafanaInstance.discoverAPIsAuthenticated

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions

This comment has been minimized.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 3 potential issues.

Bugbot Autofix is ON. A Cloud Agent has been kicked off to fix the reported issues.

}

return entry.perAPICapability[apiGroup]
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Capability cache ignores TTL for per-group

Medium Severity

CapabilityCache.GetAPICapability returns cached perAPICapability values without checking whether the entry has expired via ttl. This can cause ShouldUseKubernetesAPI to keep forcing kubernetes-style endpoints even after cache expiry, while other capability lookups re-discover via Get/DiscoverCapabilities.

Fix in Cursor Fix in Web

Dashboard: dashboardJSON,
Meta: meta,
}, nil
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kubernetes dashboard UID derived from name

Medium Severity

convertKubernetesDashboardToLegacy populates dashboardJSON["uid"] from k8sDashboard.Metadata.Name when missing, ignoring k8sDashboard.Metadata.UID. If metadata.name differs from the Grafana dashboard UID, the converted legacy payload can contain an incorrect uid, breaking links and subsequent API calls that rely on the UID.

Fix in Cursor Fix in Web

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bugbot Autofix determined this is a false positive.

In Grafana's kubernetes-style API, metadata.name IS the dashboard UID (used as resource identifier in URL path), while metadata.uid is a Kubernetes resource UUID unrelated to the Grafana dashboard UID.


entry := instance.cache.Get(config.URL)
require.NoError(t, err)
require.NotNil(t, entry)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Integration test reads cache with wrong key

Low Severity

TestDiscoverAPIs_Integration calls instance.cache.Get(config.URL), but the cache is populated using g.baseURL (trimmed URL) in DiscoverCapabilities. If config.URL contains a trailing slash or differs in normalization, this lookup returns nil and makes the test incorrectly fail.

Fix in Cursor Fix in Web

The unused linter doesn't see test files with build tags, so the
unexported function was flagged as dead code. Keep it exported with
a doc comment clarifying its test-only usage.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions

This comment has been minimized.

@cursor
Copy link

cursor bot commented Feb 24, 2026

Bugbot Autofix prepared fixes for 2 of the 3 bugs found in the latest run.

  • ✅ Fixed: Capability cache ignores TTL for per-group
    • Added TTL expiration check to GetAPICapability method so it returns APICapabilityUnknown when the cache entry has expired, consistent with the Get method.
  • ✅ Fixed: Integration test reads cache with wrong key
    • Changed test to use instance.BaseURL() instead of config.URL for cache lookup, ensuring consistency with how DiscoverCapabilities stores the normalized URL key.

Create PR

Or push these changes by commenting:

@cursor push 3cfc6293d3
Preview (3cfc6293d3)
diff --git a/capability.go b/capability.go
--- a/capability.go
+++ b/capability.go
@@ -168,7 +168,7 @@
 }
 
 // GetAPICapability returns the capability for a specific API group.
-// Returns APICapabilityUnknown if not set.
+// Returns APICapabilityUnknown if not set or if the cache entry has expired.
 func (c *CapabilityCache) GetAPICapability(grafanaURL, apiGroup string) APICapability {
 	c.mu.RLock()
 	defer c.mu.RUnlock()
@@ -178,6 +178,11 @@
 		return APICapabilityUnknown
 	}
 
+	// Check if entry has expired
+	if time.Since(entry.detectedAt) > c.ttl {
+		return APICapabilityUnknown
+	}
+
 	if entry.perAPICapability == nil {
 		return APICapabilityUnknown
 	}

diff --git a/capability_integration_test.go b/capability_integration_test.go
--- a/capability_integration_test.go
+++ b/capability_integration_test.go
@@ -107,7 +107,7 @@
 	err := instance.DiscoverCapabilities(ctx)
 	require.NoError(t, err)
 
-	entry := instance.cache.Get(config.URL)
+	entry := instance.cache.Get(instance.BaseURL())
 	require.NoError(t, err)
 	require.NotNil(t, entry)

@sd2k sd2k marked this pull request as draft February 24, 2026 21:12
@sd2k
Copy link
Collaborator Author

sd2k commented Feb 24, 2026

This really needs better e2e tests, ideally against a Grafana instance with only the new k8s style APIs.

- Inject GrafanaInstance into integration test context so all existing
  tests exercise the capability-aware code path instead of the
  legacy-only early return
- Add mock-server lifecycle tests for 406 fallback: full lifecycle with
  cache verification, legacy-preferred when available, 404 after 406,
  independent API groups, and cache expiration behavior
- Add integration test verifying legacy Grafana uses legacy path by
  default (ShouldUseKubernetesAPI returns false, capabilities unknown)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions

This comment has been minimized.

The 406 fallback handler cached the kubernetes capability but not the
specific API version from the error (e.g. v2beta1). Subsequent calls
used GetPreferredVersion which returned v1beta1 from /apis discovery,
but v1beta1 returns null spec for v2 dashboards. Add SetPreferredVersion
to persist the version from the 406 error in the capability cache.

Replace mock-server tests with real E2E integration tests against a
second Grafana instance (grafana-k8s on port 3001) configured with
dualWriterMode=5. Tests create dashboards via the v2beta1 API which
triggers real 406 responses on the legacy endpoint, validating the
full fallback lifecycle, capability caching, and API group independence.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions
Copy link

MCP Token Analysis

Passed

Metric Value
Baseline 14264 tokens
Current 14341 tokens
Change +77 (+0.5%)

Tool Changes

Tool Change Tokens
generate_deeplink ✏️ Modified +77

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant