perf: cache Prometheus queries in flow-control dispatch gates#111
Open
lioraron wants to merge 2 commits intollm-d-incubation:mainfrom
Open
perf: cache Prometheus queries in flow-control dispatch gates#111lioraron wants to merge 2 commits intollm-d-incubation:mainfrom
lioraron wants to merge 2 commits intollm-d-incubation:mainfrom
Conversation
Add CachedMetricSource, a TTL-based wrapper around MetricSource that avoids hitting Prometheus on every Budget() call. The default TTL is 5 seconds, configurable via NewGateFactoryWithCacheTTL. GateFactory now automatically wraps Prometheus-backed sources with the cache. Existing callers of NewGateFactory get the default TTL with no code changes. Closes llm-d-incubation#101 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
919f806 to
6fcad43
Compare
shimib
previously approved these changes
Apr 16, 2026
Keep both the DefaultCacheTTL constant from this branch and the GateFactory interface compliance check added on main. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
CachedMetricSource, a TTL-based wrapper aroundMetricSourcethat returns cached results within the TTL instead of querying Prometheus on everyBudget()call.GateFactorynow wraps Prometheus-backed sources with the cache automatically. Default TTL is 5 seconds; configurable viaNewGateFactoryWithCacheTTL. A TTL of 0 disables caching.NewGateFactoryget caching with no code changes.Why
Budget()is called on every poll tick (default 1s) per queue. Each call executes a live PromQL HTTP query against Prometheus. With multiple queues, this creates unnecessary load on Prometheus for metrics that don't need sub-second freshness.Files changed
pkg/async/inference/flowcontrol/cached_metric_source.goCachedMetricSourcetype implementingMetricSourcepkg/async/inference/flowcontrol/cached_metric_source_test.gopkg/async/inference/flowcontrol/gate_factory.gocacheTTLfield,NewGateFactoryWithCacheTTL, wrap Prometheus sources with cacheTest plan
TestCachedMetricSource_CachesWithinTTL,TestCachedMetricSource_RefreshesAfterTTL,TestCachedMetricSource_CachesErrors,TestCachedMetricSource_WorksWithGatesgo test ./...excluding e2e)Closes #101
🤖 Generated with Claude Code