[API] Clear request-scoped cache in gpu-metrics endpoint (#9265)

rohansonecha · claude · web-flow · commit 03a5a82dfa1e · 2026-04-06T17:27:47.000-07:00
* [API] Add on_gpu_metrics_collect plugin hook

Add a lifecycle hook to BasePlugin that the metrics server calls
before collecting GPU metrics. This allows plugins to sync
process-level state (e.g. KUBECONFIG) into the metrics server
process, which runs separately from request worker processes.

Without this, credentials uploaded via the credential manager
plugin only update the environment in the worker process that
handled the upload. The metrics server in the main process never
sees the change, requiring a pod restart for /gpu-metrics to
discover newly uploaded kubeconfigs.

Co-Authored-By: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;

* [API] Load plugins in metrics server on startup

The metrics server runs as a separate uvicorn instance in a
background thread, so plugins loaded by the main API server are
not available in its context. Add a startup event that loads
plugins if they haven't been loaded yet, enabling plugin hooks
like on_gpu_metrics_collect to work.

Co-Authored-By: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;

* [API] Ensure KUBECONFIG includes credential manager path in metrics server

Instead of relying on plugin loading (which fails in the metrics
server since install() requires a FastAPI app context), directly
ensure KUBECONFIG includes the credential manager kubeconfig path
before each gpu-metrics collection.

Co-Authored-By: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;

* [API] Add /gpu-metrics-debug endpoint for diagnostics

Temporary debug endpoint to inspect plugin state, KUBECONFIG, and
discovered contexts from within the metrics server process.

Co-Authored-By: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;

* [API] Clear request-scoped cache before gpu-metrics collection

The metrics server runs as a daemon thread where request-scoped
caches (kubernetes API clients, context names) are never cleared
automatically. This causes stale results from boot time to persist
indefinitely — if a kubeconfig file didn't exist at boot, context
discovery caches the failure and never retries.

Call clear_request_level_cache() before each gpu-metrics scrape,
matching the pattern used by the billing daemon.

Co-Authored-By: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;

* [API] Enhance debug endpoint to show cache clear effect

Shows contexts before and after clearing request-scoped cache
to verify the stale cache hypothesis.

Co-Authored-By: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;

* [API] Clear request-scoped cache in gpu-metrics endpoint

The metrics server runs as a daemon thread where request-scoped
caches are never cleared automatically. This causes stale results
from boot time to persist — if a kubeconfig didn't exist at boot,
context discovery caches the failure and never retries.

Add clear_request_level_cache() before each gpu-metrics scrape,
matching the pattern used by other daemon threads (billing, gpu
healer).

Co-Authored-By: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;

---------

Co-authored-by: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/sky/server/metrics.py b/sky/server/metrics.py
@@ -20,6 +20,7 @@
 from sky import global_user_state
 from sky import sky_logging
 from sky.metrics import utils as metrics_utils
+from sky.utils import annotations
 
 logger = sky_logging.init_logger(__name__)
 
@@ -199,6 +200,11 @@ def metrics() -> fastapi.Response:
 @metrics_app.get('/gpu-metrics')
 async def gpu_metrics() -> fastapi.Response:
     """Gets the GPU metrics from multiple external k8s clusters"""
+    # The metrics server runs as a daemon thread, not as a normal request
+    # handler, so request-scoped caches (e.g. kubernetes API clients,
+    # context names) are never cleared automatically. Clear them on each
+    # scrape so that newly uploaded kubeconfigs are discovered.
+    annotations.clear_request_level_cache()
     contexts = core.get_all_contexts()
     all_metrics: List[str] = []
     successful_contexts = 0