fix: automatically detect and handle legacy vs Kubernetes-style dashboard APIs#481
fix: automatically detect and handle legacy vs Kubernetes-style dashboard APIs#481
Conversation
Implements Phase 1 of the API capability detection plan (#300). This adds the core infrastructure for detecting whether a Grafana instance supports kubernetes-style APIs (/apis) vs legacy APIs (/api). New files: - capability.go: Types, cache, and discovery logic for API capabilities - grafana_instance.go: GrafanaInstance type that wraps legacy client with capability-aware API access Key features: - Thread-safe capability cache with configurable TTL (default 1 minute) - Automatic discovery of available API groups and versions via GET /apis - Support for API key and basic auth in kubernetes API requests - Parse406Error helper for extracting API version from 406 error messages - Context functions for extracting GrafanaInstance from context The GrafanaInstance type provides: - DiscoverCapabilities(): Fetch and cache API capabilities - HasKubernetesAPIs(): Check if kubernetes APIs are available - GetAPIGroupInfo(): Get info about a specific API group - GetDashboardKubernetes(): Fetch dashboard via kubernetes-style API - ShouldUseKubernetesAPI(): Determine which API style to use Integration tests confirm successful discovery of 9 API groups from Grafana 12+, including dashboard.grafana.app with versions v1beta1, v0alpha1, v2beta1, and v2alpha1.
…se 2) Implements Phase 2 of the API capability detection plan (#300). This updates the dashboard tools to use capability detection for automatic fallback from legacy APIs to kubernetes-style APIs when the legacy API returns a 406 error. Key changes to tools/dashboard.go: - getDashboardByUID now checks if GrafanaInstance is available - If capability is already set to kubernetes, uses kubernetes API directly - Otherwise tries legacy API first (most compatible for Grafana Cloud) - On 406 error, parses the error message to extract the suggested version - Falls back to kubernetes API with the extracted or discovered version - Converts kubernetes response format to legacy format for compatibility New helper functions: - getDashboardByUIDLegacy: Uses legacy /api/dashboards/uid endpoint - getDashboardByUIDKubernetes: Uses kubernetes API with discovered version - getDashboardByUIDKubernetesWithVersion: Uses kubernetes API with specific version - convertKubernetesDashboardToLegacy: Converts k8s response to legacy format Unit tests added for: - Legacy API success case - Legacy-only fallback (no GrafanaInstance) - 406 fallback to kubernetes API - Direct kubernetes when capability pre-set - Not found error handling - Kubernetes to legacy format conversion All dependent functions (getDashboardSummary, getDashboardProperty, getDashboardPanelQueries, updateDashboard) automatically benefit from the capability detection through their use of getDashboardByUID.
Implements Phase 3 of the API capability detection plan (#300). Updates the explore deeplink generation to support the new schemaVersion=1 format used by Grafana 10+, while maintaining backward compatibility with the legacy 'left=' parameter format. Changes to GenerateDeeplinkParams: - Added ExploreQuery field for specifying PromQL/LogQL query expressions - Added UseLegacyExploreURL flag to force legacy format if needed - Default behavior now uses new schemaVersion=1 format New explore URL format (Grafana 10+): /explore?schemaVersion=1&panes={"pane1":{"datasource":"uid",...}} Legacy format (still supported): /explore?left={"datasource":"uid"} The new format provides better support for: - Multiple query expressions - Proper time range encoding - Future multi-pane explore layouts Tests updated to cover both formats.
1c773eb to
0db5a42
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
- Refresh detectedAt in SetAPICapability when cache entry is expired - Extract namespace from 406 error instead of hardcoding "default" - Embed time range in explore panes JSON for new deeplink format Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
- Ensure hasKubernetesAPIs is set to true when a kubernetes capability is recorded, preventing stale false values from poisoning the cache - Unexport DiscoverAPIs to discoverAPIs since it's only used in tests; production code uses GrafanaInstance.discoverAPIsAuthenticated Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This comment has been minimized.
This comment has been minimized.
| } | ||
|
|
||
| return entry.perAPICapability[apiGroup] | ||
| } |
There was a problem hiding this comment.
Capability cache ignores TTL for per-group
Medium Severity
CapabilityCache.GetAPICapability returns cached perAPICapability values without checking whether the entry has expired via ttl. This can cause ShouldUseKubernetesAPI to keep forcing kubernetes-style endpoints even after cache expiry, while other capability lookups re-discover via Get/DiscoverCapabilities.
| Dashboard: dashboardJSON, | ||
| Meta: meta, | ||
| }, nil | ||
| } |
There was a problem hiding this comment.
Kubernetes dashboard UID derived from name
Medium Severity
convertKubernetesDashboardToLegacy populates dashboardJSON["uid"] from k8sDashboard.Metadata.Name when missing, ignoring k8sDashboard.Metadata.UID. If metadata.name differs from the Grafana dashboard UID, the converted legacy payload can contain an incorrect uid, breaking links and subsequent API calls that rely on the UID.
There was a problem hiding this comment.
Bugbot Autofix determined this is a false positive.
In Grafana's kubernetes-style API, metadata.name IS the dashboard UID (used as resource identifier in URL path), while metadata.uid is a Kubernetes resource UUID unrelated to the Grafana dashboard UID.
|
|
||
| entry := instance.cache.Get(config.URL) | ||
| require.NoError(t, err) | ||
| require.NotNil(t, entry) |
There was a problem hiding this comment.
Integration test reads cache with wrong key
Low Severity
TestDiscoverAPIs_Integration calls instance.cache.Get(config.URL), but the cache is populated using g.baseURL (trimmed URL) in DiscoverCapabilities. If config.URL contains a trailing slash or differs in normalization, this lookup returns nil and makes the test incorrectly fail.
The unused linter doesn't see test files with build tags, so the unexported function was flagged as dead code. Keep it exported with a doc comment clarifying its test-only usage. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This comment has been minimized.
This comment has been minimized.
|
Bugbot Autofix prepared fixes for 2 of the 3 bugs found in the latest run.
Or push these changes by commenting: Preview (3cfc6293d3)diff --git a/capability.go b/capability.go
--- a/capability.go
+++ b/capability.go
@@ -168,7 +168,7 @@
}
// GetAPICapability returns the capability for a specific API group.
-// Returns APICapabilityUnknown if not set.
+// Returns APICapabilityUnknown if not set or if the cache entry has expired.
func (c *CapabilityCache) GetAPICapability(grafanaURL, apiGroup string) APICapability {
c.mu.RLock()
defer c.mu.RUnlock()
@@ -178,6 +178,11 @@
return APICapabilityUnknown
}
+ // Check if entry has expired
+ if time.Since(entry.detectedAt) > c.ttl {
+ return APICapabilityUnknown
+ }
+
if entry.perAPICapability == nil {
return APICapabilityUnknown
}
diff --git a/capability_integration_test.go b/capability_integration_test.go
--- a/capability_integration_test.go
+++ b/capability_integration_test.go
@@ -107,7 +107,7 @@
err := instance.DiscoverCapabilities(ctx)
require.NoError(t, err)
- entry := instance.cache.Get(config.URL)
+ entry := instance.cache.Get(instance.BaseURL())
require.NoError(t, err)
require.NotNil(t, entry) |
|
This really needs better e2e tests, ideally against a Grafana instance with only the new k8s style APIs. |
- Inject GrafanaInstance into integration test context so all existing tests exercise the capability-aware code path instead of the legacy-only early return - Add mock-server lifecycle tests for 406 fallback: full lifecycle with cache verification, legacy-preferred when available, 404 after 406, independent API groups, and cache expiration behavior - Add integration test verifying legacy Grafana uses legacy path by default (ShouldUseKubernetesAPI returns false, capabilities unknown) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This comment has been minimized.
This comment has been minimized.
The 406 fallback handler cached the kubernetes capability but not the specific API version from the error (e.g. v2beta1). Subsequent calls used GetPreferredVersion which returned v1beta1 from /apis discovery, but v1beta1 returns null spec for v2 dashboards. Add SetPreferredVersion to persist the version from the 406 error in the capability cache. Replace mock-server tests with real E2E integration tests against a second Grafana instance (grafana-k8s on port 3001) configured with dualWriterMode=5. Tests create dashboards via the v2beta1 API which triggers real 406 responses on the legacy endpoint, validating the full fallback lifecycle, capability caching, and API group independence. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
MCP Token Analysis✅ Passed
Tool Changes
|



This adds the core infrastructure for detecting whether a Grafana instance
supports kubernetes-style APIs (/apis) vs legacy APIs (/api).
Then, updates the dashboard tools to use capability detection for automatic
fallback from legacy APIs to kubernetes-style APIs when the legacy API
returns a 406 error.
Lastly it updates the explore deeplink generation to support the new schemaVersion=1
format used by Grafana 10+, while maintaining backward compatibility with
the legacy 'left=' parameter format.
Note
Medium Risk
Touches core Grafana request plumbing and changes how dashboards are fetched based on runtime error handling/caching, which could affect compatibility across Grafana versions. Behavior is guarded by fallback logic and covered by new unit/integration tests, but mis-detection or conversion issues could break dashboard retrieval.
Overview
Adds capability discovery/caching (
/apisprobing) and a newGrafanaInstancewrapper to track per-API-group mode (legacy vs kubernetes) and fetch dashboards via kubernetes-style endpoints when needed.Updates the dashboard tool to try legacy first but on legacy
406 Not Acceptableparses the suggested/apis/...path, switches future calls to kubernetes for the dashboard API group, and converts kubernetes dashboard responses back into the legacy model for compatibility. Also updates explore deeplink generation to default to Grafana 10+schemaVersion=1&panes=...URLs with optional legacyleft=fallback, and adds comprehensive unit/integration tests around these behaviors.Written by Cursor Bugbot for commit d97f3cf. This will update automatically on new commits. Configure here.