Skip to content

perf: cache Prometheus queries in flow-control dispatch gates#111

Open
lioraron wants to merge 2 commits intollm-d-incubation:mainfrom
lioraron:perf/cache-prometheus-queries
Open

perf: cache Prometheus queries in flow-control dispatch gates#111
lioraron wants to merge 2 commits intollm-d-incubation:mainfrom
lioraron:perf/cache-prometheus-queries

Conversation

@lioraron
Copy link
Copy Markdown
Contributor

Summary

  • Add CachedMetricSource, a TTL-based wrapper around MetricSource that returns cached results within the TTL instead of querying Prometheus on every Budget() call.
  • GateFactory now wraps Prometheus-backed sources with the cache automatically. Default TTL is 5 seconds; configurable via NewGateFactoryWithCacheTTL. A TTL of 0 disables caching.
  • Existing callers of NewGateFactory get caching with no code changes.

Why

Budget() is called on every poll tick (default 1s) per queue. Each call executes a live PromQL HTTP query against Prometheus. With multiple queues, this creates unnecessary load on Prometheus for metrics that don't need sub-second freshness.

Files changed

File Change
pkg/async/inference/flowcontrol/cached_metric_source.go New CachedMetricSource type implementing MetricSource
pkg/async/inference/flowcontrol/cached_metric_source_test.go Tests: caching within TTL, refresh after TTL, error caching, gate integration
pkg/async/inference/flowcontrol/gate_factory.go Add cacheTTL field, NewGateFactoryWithCacheTTL, wrap Prometheus sources with cache

Test plan

  • New tests pass: TestCachedMetricSource_CachesWithinTTL, TestCachedMetricSource_RefreshesAfterTTL, TestCachedMetricSource_CachesErrors, TestCachedMetricSource_WorksWithGates
  • All existing flowcontrol tests pass (55 total)
  • All unit and integration tests pass (go test ./... excluding e2e)

Closes #101

🤖 Generated with Claude Code

Add CachedMetricSource, a TTL-based wrapper around MetricSource that
avoids hitting Prometheus on every Budget() call. The default TTL is
5 seconds, configurable via NewGateFactoryWithCacheTTL.

GateFactory now automatically wraps Prometheus-backed sources with
the cache. Existing callers of NewGateFactory get the default TTL
with no code changes.

Closes llm-d-incubation#101

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@lioraron lioraron force-pushed the perf/cache-prometheus-queries branch from 919f806 to 6fcad43 Compare April 16, 2026 14:46
@lioraron lioraron marked this pull request as ready for review April 16, 2026 14:52
shimib
shimib previously approved these changes Apr 16, 2026
Keep both the DefaultCacheTTL constant from this branch and the
GateFactory interface compliance check added on main.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

perf: cache Prometheus queries in flow-control dispatch gates

2 participants