Add pool topology, unified gate interface, labels, and tier-priority dispatch#186
Add pool topology, unified gate interface, labels, and tier-priority dispatch#186joeltg wants to merge 14 commits into
Conversation
|
Unsigned commits detected! Please sign your commits. For instructions on how to set up GPG/SSH signing and verify your commits, please see GitHub Documentation. |
873472d to
644d434
Compare
…dispatch - pipeline: add Labels type with Get/Has/Set/Merge/Clone helpers - pipeline: add Pool and GateConfig structs for pool topology - pipeline: add PoolDispatch (per-pool channel map) as RMP output - pipeline: add RegisterMergePolicy/NewMergePolicy/MergePolicyDeps registry - pipeline: add Release/AttachRelease/FireReleases to EmbelishedRequestMessage - pipeline: add Labels/PoolID/HTTPHeaders/ModelNameOverride to RequestChannel - pipeline: add RetryReason to RetryMessage, Pools() to Flow interface - api: add Labels to InternalRouting and ResultMessage - pkg/util: add ExpandEnvMapValues helper - pkg/async: register random-robin via init(), return PoolDispatch - pkg/pubsub: add FlowConfig/TopicConfig pool fields, Pools(), label seeding - pkg/redis: add Pools() to both flow impls, new channel type, label seeding - pkg/asyncworker: add FireReleases defer, Labels/ModelNameOverride support - cmd/main: use NewMergePolicy registry, spawn per-pool worker pools Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: joeltg <joel@reflection.ai>
…flowcontrol - pipeline: add Gate/Verdict/GateFunc/AlwaysContinue/ApplyChain/Release to gate.go - pipeline: add gate_test.go with full chain/snapshot/error tests - pipeline: remove DispatchGate/AttributeGate/DispatchGateFunc/ConstOpenGate/GateFactory(old); update GateFactory to return Gate - flowcontrol: delete prometheus/metric/composite/dispatch gate files (10 files) - flowcontrol: rewrite gate_factory.go; add deadline/local-rate-limit/local-max-concurrency/constant-decision gates - pkg/redis: delete dispatch_gate.go, quota_gate.go (superseded by Verdict-based Gate) - pkg/pubsub: migrate gate eval to Gate.Apply + Verdict handling - pkg/asyncworker: migrate to Gate.Apply + Verdict handling with FireReleases defer - tests: update worker test for new error payload format Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: joeltg <joel@reflection.ai>
…b flow - TopicConfig: add Gates []GateConfig for subscription-level gate config - PubSubMQFlow: build one pool gate chain per pool (shared by all subscriptions routing to that pool) so LocalMaxConcurrencyGate caps the pool, not the sub - countingGate: wraps pool gate chain with per-subscription dispatchInFlight stats - RequestChannelData: add subscriptionGates field for pull-time gate chain - requestWorker: evaluate subscription gate ApplyChain before pool gate, with full Verdict/Release handling; pass subscriptionGates param - Add buildPoolGateChain, buildSubscriptionGateChain, poolGateTypesOf helpers Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: joeltg <joel@reflection.ai>
Remove docs/dispatch-budget.md and docs/guides/e2e-deploy.md, which documented the float-budget DispatchGate system (prometheus-budget, prometheus-saturation) that was removed in the gate interface refactor. Rewrite the README gate system section to describe the Verdict/Gate interface and the two evaluation sites (subscription gates: fast, label-mutating; pool gates: may block). Replace the old flat gate_type-per-subscription config examples with the FlowConfig pools + subscriptions format. Update the merge policy description to reflect per-pool channels and the registration model. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: joeltg <joel@reflection.ai>
Contributes the policy-layer pieces that complete the priority-tier RFC
implementation:
**Tier-priority RMP** (`pkg/async/inference/mergepolicy/tierpriority/`)
Label-driven N-pass merge policy. Reads `tier` and `class` labels (key
names configurable), buckets per `(pool, tier, class)`, runs a strict-
priority loop across tiers with round-robin across `(team, model)` source
channels within each tier, and routes each dispatched message to its
pool's output channel. Stamps the engine-side priority header
(`x-priority`) based on `(tier, class)`. Pure routing and ordering —
the RMP never consults a gate and never reads a Verdict. Registers via
`pipeline.RegisterMergePolicy` under `"tier-priority"`.
**`tier-priority-admission` pool gate** (`pkg/async/inference/flowcontrol/`)
The runtime counterpart to the tier-priority RMP. Three-way verdict at
pool saturation, keyed on message labels:
- class=reserved (any tier) → Block until capacity or ctx done.
The message is already prefetched into the pod; blocking dispatches
the moment capacity opens with no transport redelivery round-trip.
- tier=interactive, class=overflow → Drop with 429-shaped Result.
Interactive's sub-second SLA: better to fail fast so the producer
can fall back than wait through nack-redeliver.
- other overflow (async, batch) → Refuse (nack-redeliver).
Blocking overflow would tie up workers and head-of-line-block
reserved messages behind it in the per-pool FIFO channel; Refuse
keeps workers cycling so the RMP's priority ordering is preserved
at consumption.
**Pluggable `PoolLoadSource`** — the saturation signal is swappable:
- `source: prometheus` — `PromQLLoadSource` runs a configurable PromQL
query on a background goroutine (atomic store; zero dispatch-path
latency). Fail-open on stale/error. Right for self-hosted IGW-fronted
pools where the EPP or engine emits saturation metrics.
- `source: redis-counter` — `RedisCounterLoadSource` uses a redis-backed
atomic INCR/DECR counter (same Lua-script pattern as the
reservation-classifier gate contributed separately). Strict accounting;
no overshoot. Right for external pools (e.g. provider APIs with hard
concurrency limits).
`GateFactory` gains `WithRedisClient` and `WithBackgroundContext` options.
`cmd/main.go` initialises the signal-handler context before the factory so
PromQL refresh goroutines are stopped on shutdown.
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: joeltg <joel@reflection.ai>
Labels are operator-controlled pipeline state, not producer-visible results. Nothing in the transport publish path used ResultMessage.Labels, and the field comment described an intent (stamping transport-native attributes on result publish) that was never implemented and that we don't want. Remove the field from api.ResultMessage and the three worker call sites that set it. Also fix the RFC reference and clarify 'class' in the PR description. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> refactor: remove Pool.Labels Labels flow from subscriptions downward through gates — they are per-subscription identity (team, tier, model) that gates may classify further. A pool is a dispatch target, not a label source; pool-level labels conflate the two and add a confusing third merge layer. Subscriptions already declare any pool-wide label once via their own Labels field. The pool-ID auto-injection (labels["pool"] = Pool.ID) is preserved as it is framework-injected, not operator-declared. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: joeltg <joel@reflection.ai>
Cross-reference review against the private development fork surfaced several drift points from the original cherry-pick. This commit aligns upstream with the validated state. Critical fixes: - pubsubimpl: pool gates were being applied at the receive callback AND again at worker dispatch (double application). Removed the callback-site evaluation so pool gates run only in the per-pool worker, matching the PR description's architecture. - pubsubimpl: progressStats had data races (plain int64 fields written from multiple goroutines, no reader). Converted to atomic.Int64 with dispatchInFlightPeak and added the missing logProgress goroutine started in Start(). Functionality restored: - Added reservation_classifier_gate, the redis-backed multi-key in-flight cap that replaces the removed quota_gate functionality. Wired through gate_factory and cmd/main.go (shared redis client threaded into the factory via WithRedisClient). - Added Pool.Labels for pool-level static labels merged under subscription labels in pubsubimpl's channelLabels assembly. Tests: - Ported pipeline/labels_test.go, releases_test.go. - Ported behavioral tests for constant_decision, deadline, and local_max_concurrency gates. - Deleted broken integration tests referencing removed types (pipeline.AttributeGate, gate.Budget, redis-quota). - Deleted e2e tests + helm values for removed gate types (saturation, budget, redis-gate, quota, composite). Cleanup: - Removed the var _ = splitAndTrimCSV dead-code cloak (now used by reservation-classifier). - Clarified tier-priority policy package doc: class is the dominant axis, tier the tiebreaker — reserved/batch outranks overflow/interactive by design. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: joeltg <joel@reflection.ai>
…ntract P0 — Pub/Sub ack leak on subscription-gate Terminate The ack-handling goroutine was spawned after `ch <- emb`, but subscription-gate Drop/Refuse paths return before that point. The resultsChannel send sat unread, the resultChannels map entry leaked, inFlight was never decremented, and the Pub/Sub message stayed outstanding until ack-deadline expiry. Fixed by spawning the ack-handling goroutine immediately after creating the resultTracker (before any gate evaluation). Every terminal path — subscription-gate Terminate, normal dispatch + worker terminal, or ctx cancel — now drives ack/nack uniformly via resultsChannel. P1 — ApplyChain error semantics ApplyChain was converting any gate error into Refuse, which broke gates that intentionally return (Continue, err) for fail-open observability (reservation-classifier on redis outage, tier-priority admission on stale Prometheus). Those errs were being turned into nacks and redelivery loops instead of fail-open continues. New contract: the gate's Verdict is authoritative for control flow; errors are informational, joined via errors.Join, and returned alongside the final Verdict for caller-side logging. A gate that wants to nack on error returns Refuse explicitly. Updated callers (countingGate, pubsubimpl subscription-gate handler, asyncworker.Worker) to log err and honor the verdict. Updated the matching test in pipeline/gate_test.go and added one for the explicit Refuse + err case. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: joeltg <joel@reflection.ai>
…ub buffering P1 — tierpriority strands unknown labels popNext only polled buckets in the configured ClassOrder × TierOrder (plus empty-string fallbacks). A message with tier=urgent or class=gold (typo or schema drift) landed in a bucket nothing consults — work stranded until the input channel closed. Added popUnconfigured: a final catch-all pass that drains any bucket whose (tier, class) key isn't covered by the configured order, at lowest priority. The policy is now total over its input domain. P1 — helm chart emits removed flags ap-deployments.yaml still emitted --redis.ss.gate-type and --redis.ss.gate-params (flags removed when the SS gate impl was dropped). Any operator setting ap.redis.gateType would crashloop on "flag provided but not defined". Removed the block, the matching values.yaml defaults, the dead gateParamsJson helper, and the e2e deploy values references. P1 — SS gating landing-state inconsistency WithGateFactory on RedisSortedSetFlow was stored-but-unread, main.go logged "with per-queue gating", and README told users to switch to SS for per-queue gating — all false (SS gates aren't wired). Removed WithGateFactory + SortedSetOption + gateFactory field, dropped the arg from main.go, fixed the log line, and corrected README to direct operators at gcp-pubsub-gated for actual per-queue gating. P2 — pubsub callback could block prefetch under saturation The per-subscription channel was unbuffered, so under sustained saturation a stalled merge goroutine propagated straight back to the receive callback, halting Pub/Sub prefetch. Worse, ctx cancellation mid-stall deadlocked the callback (Receive() never exits). Buffered the channel to the same prefetch depth as Pub/Sub's MaxOutstandingMessages (extracted prefetchBufferDepth helper) so the callback is non-blocking up to the prefetch window. Wrapped the send in select-with-ctx so shutdown unblocks and any subscription-gate releases fire cleanly on cancel. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: joeltg <joel@reflection.ai>
…de-on-quota-gate Upstream PR llm-d-incubation#183 ("Adding a classifier mode to quota gate") adds a classifier path to the pre-refactor quota_gate / composite_gate / AttributeGate / DispatchGate machinery that this branch removed in favor of the unified Gate/Verdict model. The PR also stamps a typed Classification field on InternalRouting alongside the framework's generic Labels map. This branch already implements the same functional capability via the reservation-classifier gate (operator-configurable redis-backed multi-key in-flight cap that stamps a `class` label), which composes naturally with the tier-priority RMP and admission gate. There is no useful payload in PR llm-d-incubation#183's code beyond what we already have, and its shape conflicts with the redesigned interfaces. Resolution: -X ours wholesale, plus re-applying our deletions of the files PR llm-d-incubation#183 modified that this branch had already removed (composite_gate{,_test}.go, quota_gate{,_test}.go, test/integration/gate_factory_redis_test.go, test/integration/sortedset_quota_gate_test.go). Signed-off-by: joeltg <joel@reflection.ai>
644d434 to
31021ce
Compare
…ck; closed-channel exit; fail-closed on unknown gate types Four review findings, addressed together: 1. Pub/Sub redelivery race with parked workers. Pool gates intentionally block (tier-priority-admission reserved-branch, local-max-concurrency, local-rate-limit). With MaxExtension capped at 10m, a message parked behind saturation could be redelivered to another consumer while the first worker eventually dispatched it too. Added EmbelishedRequestMessage.TransportDeadline. The Pub/Sub callback stamps it from MaxExtension (minus a 10%-of-extension safety margin, clamped to [1s, 30s]) via pubSubTransportDeadline. The worker derives a gateCtx with that deadline and passes it to msg.Gate.Apply. On Verdict.Redeliver with gateCtx.Err() != nil, the worker calls nackImmediately — bypassing retryMessageWithReason's backoff so the broker's redelivery owns the next attempt rather than this worker holding the message past its lease. --pubsub.max-extension makes the budget operator-tunable. 2. "Publish result before ack" not enforced. publishPubSub now waits on result.Get and returns the error. resultWorker only acks the originating request after a successful publish; on publish failure it nacks. resultWorker takes a publishFunc so the contract can be tested with a stub. 3. Workers spin on closed dispatch channels. Worker's receive now uses msg, ok := <-requestChannel and returns on !ok. Otherwise after the merge policy closes its output channel, the worker would loop forever on zero-value messages. 4. Unknown gate types failed open. GateFactory.CreateGate returned AlwaysContinue for unknown gate names, so a YAML typo in tier-priority-admission silently disabled admission control. Now returns an error. Empty gate type stays open as the explicit constant case. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per-input readers were draining request channels into b.queues without any cap, while the dispatcher could block on the bounded pool output channel downstream. Under pool saturation, the receive callback could keep "dispatching" into the RMP while messages sat hidden in buckets past pubsub.max-extension — Pub/Sub then redelivered (because the TransportDeadline check upstream of the RMP didn't see the queued backlog), causing duplicate delivery and unbounded memory growth. Added a per-subscription queued slot count (poolState.queued + maxQueued) protected by the existing mu/cond. reader() now calls acquireInputSlot before each receive; the dispatcher calls releaseInputSlot when it pops and forwards a message. When subscription N has PerSubscriptionBuffer messages held in the RMP, that subscription's reader blocks before receiving, propagating backpressure to the Flow callback's `ch <- emb` and (transitively) to Pub/Sub prefetch flow control — which is where TransportDeadline now actually triggers a nack on saturated pools. Default PerSubscriptionBuffer = 2. Pub/Sub is the durable backlog; the RMP only needs a small lookahead per independently-filtered source. Tunable per-deployment via the per_subscription_buffer merge-policy config key. Tests cover: bound applies per reader; one subscription's bound doesn't block another subscription's read; config parser accepts/rejects values. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When a subscription gate returned Drop(result), the callback forwarded
the result to resultChannel and then independently sent ackAction{ack:
true} to resultsChannel — bypassing resultWorker's publish-confirmation
contract. If publishing failed, the Pub/Sub request was already acked
and the caller would never see a response.
Extracted the Terminate-verdict handling into
handleSubscriptionGateTerminate. When v.Result != nil, the handler
forwards the result to resultChannel and intentionally does NOT send an
ack here. resultWorker is the sole ack source for that path — it
publishes the result and only then sends ack/nack to the tracked
resultsChannel based on publish success.
Silent drops (v.Result == nil, !v.Redeliver) continue to direct-ack
because no publish is involved. Refuse paths nack with the existing
delay/no-delay logic.
Tests cover both branches: Drop(result) forwards to resultChannel
without acking; silent Drop(nil) acks directly.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two related changes for redis transports:
1. Closed-channel safety. drainBatch and the retry/result workers in
both redis-pubsub and redis-sortedset implementations could
busy-loop on a closed channel, receiving zero values forever.
Added ok-checks to every channel receive in those workers. Also
added a select-with-ctx around the redis-pubsub requestWorker's
msgChannel send and a nil-rmsg guard, so a transient subscription
hiccup doesn't deadlock the loop.
2. Sorted-set gate parity. The sortedset Flow had RequestChannel.Gate
plumbing but no way to actually populate it from config — the
WithGateFactory option was retained "for forward compatibility" but
never read. Wire it through:
- queueConfig gains GateType / GateParams (single pool gate, parity
with pubsub legacy) and Gates []GateConfig (queue gate chain,
parity with pubsub subscription gates).
- --redis.ss.gate-type / --redis.ss.gate-params flags for single-
queue mode (parity with pubsub flags).
- synthesizeSortedSetPool + buildRedisGateChain + gateFromChain
helpers build the pool gate chain at startup, attached to
RequestChannel.Gate so the worker runs it before dispatch.
- Queue-gate chain runs in processMessages between ZPopMin and the
msgChannel send. handleQueueGateTerminate routes Refuse →
retryChannel (or requeue if no retry channel), Drop(result) →
resultChannel, silent Drop → discard.
- Pools() now returns the full pool shape (RequestPath, HTTPHeaders,
ModelNameOverride, Labels) so per-pool worker spawning has the
complete dispatch context.
cmd/main.go now wires the gateFactory into the sortedset flow.
README documents the new flags and per-queue gate/Gates fields.
Tests cover: drainBatch closed-channel exit; queue-config drives both
pool and subscription gate factories with correct ordering; single-
queue flag mode parses gate-type/gate-params; queue-gate Drop(result)
forwards to resultChannel and does not dispatch; queue-gate Refuse
forwards to retryChannel with gate_refused reason.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lioraron
left a comment
There was a problem hiding this comment.
Review: 8 inline comments covering correctness, safety, and test coverage.
| "gatewayURL", pool.GatewayURL, | ||
| "workers", workers) | ||
| for w := 0; w < workers; w++ { | ||
| go asyncworker.Worker(ctx, impl.Characteristics(), inferenceClient, ch, impl.RetryChannel(), impl.ResultChannel(), requestTimeout) |
There was a problem hiding this comment.
Worker goroutines fire-and-forget: Workers are spawned with bare go asyncworker.Worker(...) and nothing waits for them to finish — the old WaitGroup was removed. After <-ctx.Done() returns (line 224), main() exits immediately, so in-flight requests are abandoned mid-dispatch. This can lose results or leave partial state in Redis.
Add a sync.WaitGroup (or similar) so the shutdown path waits for all workers to drain before exiting.
| // Start runs the refresh loop until ctx is canceled. Idempotent — a | ||
| // second call returns immediately because Run-once is the contract; | ||
| // the gate factory wires this when first used. | ||
| func (s *PromQLLoadSource) Start(ctx context.Context) { |
There was a problem hiding this comment.
Start is not idempotent despite the doc claiming it is: Each call launches a new goroutine with a new ticker. If called twice you get duplicate concurrent refresh loops hammering Prometheus.
Add a sync.Once guard (or an atomic flag) to match the documented contract.
| select { | ||
| case <-ctx.Done(): | ||
| return pipeline.Refuse(), ctx.Err() | ||
| case <-time.After(wait): |
There was a problem hiding this comment.
time.After timer leak: time.After(wait) inside this for loop allocates a timer per iteration that cannot be GC'd until it fires. Under sustained load with many goroutines blocked here, this leaks memory.
Replace with time.NewTimer + defer timer.Stop() (or reset the same timer across iterations).
| cacheTTL time.Duration | ||
| redisClients map[string]*goredis.Client | ||
| rdb *redis.Client | ||
| ctx context.Context |
There was a problem hiding this comment.
context.Context stored in struct: This is a well-documented Go anti-pattern (style guide). The context here drives background goroutines (PromQL refresh loops).
Consider passing it through a Start(ctx) method or a dedicated Run(ctx) lifecycle method instead of baking it into the struct.
| mergePolicyMu.Lock() | ||
| defer mergePolicyMu.Unlock() | ||
| if _, exists := mergePolicyRegistry[name]; exists { | ||
| panic(fmt.Sprintf("pipeline: merge policy %q already registered", name)) |
There was a problem hiding this comment.
panic on duplicate registration: Combined with init()-based registration, a misconfigured import crashes the process at startup with no recovery path. Return an error instead — callers (e.g. NewMergePolicy) already have error paths.
| } | ||
|
|
||
| // Set assigns value to key. Panics if l is nil. | ||
| func (l Labels) Set(key, value string) { |
There was a problem hiding this comment.
Set panics on nil receiver: Get and Has are explicitly nil-safe, but Set (and Merge) panic on nil. Since external gate authors call msg.Labels.Set(...), this is a footgun if Labels was never initialized.
Either auto-initialize in Set/Merge (return a new map), or add a prominent doc warning that callers must check for nil.
| } | ||
|
|
||
| // Continue is the zero-value Verdict: forward to the next stage. | ||
| var Continue = Verdict{} |
There was a problem hiding this comment.
Mutable package-level sentinel: var Continue = Verdict{} can be accidentally mutated by any gate (e.g. pipeline.Continue.Terminate = true), corrupting it globally for all callers.
Make it a function func Continue() Verdict { return Verdict{} }, or keep the var but make it unexported with a public accessor that returns a copy.
| for { | ||
| select { | ||
| case <-ctx.Done(): | ||
| return pipeline.Refuse(), ctx.Err() |
There was a problem hiding this comment.
No test coverage: This file has no _test.go counterpart. Token-bucket logic with timing (refill rates, burst capacity, edge cases around zero/negative inputs) is easy to get wrong and should have unit tests before merge.
|
Thanks @joeltg for this extensive work! This PR covers a lot of ground — gate interface migration, labels, pool topology, merge policy registry, flow control gates, and tier-priority dispatch. Each of these is a meaningful feature with its own design surface, and bundling them makes review harder and riskier (e.g. the 8 inline comments I left are spread across unrelated subsystems). Would it be possible to split this into smaller, independently mergeable PRs? A reasonable split might be:
I'd also suggest opening a corresponding issue for each PR that includes at least:
This makes each change independently reviewable, gives reviewers the context to evaluate design decisions, and creates a traceable record of why each piece exists. |
| Terminate bool | ||
| Redeliver bool | ||
| Result *api.ResultMessage |
There was a problem hiding this comment.
I would go for an alternative design here, instead of the double bool; the two-bool encoding allows a fourth state (Terminate=false, Redeliver=true) and forces every consumer to reason about their combination; I'd advise going with something like:
type VerdictKind int
const (
VerdictContinue VerdictKind = iota
VerdictDrop
VerdictRefuse
)
type Verdict struct {
Kind VerdictKind
Result *api.ResultMessage
}|
This PR is marked as stale after 21d of inactivity. After an additional 14d of inactivity (7d to become rotten, then 7d more), it will be closed. To prevent this PR from being closed, add a comment or remove the |
Motivation
We want to support tier-priority dispatch across a shared inference fleet: strict priority across tiers (interactive, async, batch), per-team reservation quotas that are floors not ceilings so idle capacity isn't stranded, and interactive overflow that fails fast under saturation rather than waiting through redelivery delays. This PR implements the framework primitives that make this possible, and the policy itself.
A fundamental issue is that the existing gate interface — a float "dispatch budget" consulted at pull time — works against this:
Instead, we want to pull aggressively and always, and gate admission at the worker rather than at the pull loop. Messages wait in the in-process prefetch buffer at zero cost and dispatch the instant capacity opens, with no nack-and-redeliver round-trip.
To support this architecture, we implement the following scoped refactoring:
Pool topology
A named
Pooldeclares its gateway URL, worker count, and gate chain. Subscriptions reference a pool by ID. The merge policy now emits one buffered channel per pool; each pool gets its own dedicated worker pool so backpressure on one pool's downstream endpoint stalls only that pool's prefetch — not the entire pod's.An example FlowConfig schema illustrating this (and the rest of the changes):
Unified gate interface
The
DispatchGate.Budgetfloat interface is replaced byGate.Apply, which returns aVerdict:Gates run at two distinct sites with different blocking rules:
Subscription gates run in the Pub/Sub receive callback, before the merge policy. They must not block (blocking stalls prefetch). Their primary job is classification: they read and mutate
msg.Labelsto stamp decisions likeclass=reservedorclass=overflowthat the merge policy and pool gates act on downstream. Gate-written labels are the key mechanism for runtime classification — the subscription config supplies static identity labels (team,tier,model); gates supply dynamic classification labels whose values depend on system state at dispatch time.Pool gates run in the per-pool worker pool, after the merge policy routes the message, before HTTP dispatch. They may block — because the message is already prefetched and in memory, parking the worker until capacity opens dispatches it with zero extra latency and no nack round-trip.
In both sites, multiple gates are provided in an array, are evaluated in order, and short-circuit on the first non-
Continueresult.Labels
Subscriptions declare static labels (
team,tier,model) in their config; the Flow stamps these onto every pulled message. Labels are entirely operator-controlled: transport attributes from producers round-trip viabody.Metadata/result.Metadatainstead, which is a separate namespace. Labels are a plainmap[string]stringwith last-write-wins semantics: gates may overwrite any key, and each gate in the chain sees the mutations made by earlier ones. This is what makes runtime classification composable — a deadline gate can run first and drop the message; a reservation classifier runs next and stampsclass; the merge policy buckets on that value.Tier-priority RMP and admission gate
Two pieces that compose to implement the priority-tier dispatch:
The tier-priority RMP buckets messages by
(pool, tier, class)and dispatches in strict priority across tiers (e.g.interactive > async > batch) with round-robin across(team, model)source channels within each tier. Heretieris a static subscription label set by the operator;classis a gate-written label (reservedoroverflow) stamped at subscription-gate time based on whether the request is within the team's reservation cap — this is the reason the gate-labels pipeline exists. The RMP stamps the engine-side priority header and is pure ordering: it never consults a gate.The
tier-priority-admissionpool gate reads a pool saturation signal and returns a three-way verdict based on labels: block ifclass=reserved(don't nack work that legitimately deserves a slot; park and dispatch the instant capacity opens); Drop with 429 iftier=interactive, class=overflow(interactive's SLA is sub-second; fail fast so the producer can fall back); Refuse for all other overflow (nacking async/batch overflow is correct — blocking it would tie up workers and head-of-line-block reserved messages behind it in the per-pool FIFO channel). The saturation signal is pluggable: a Prometheus-backed source for self-hosted IGW pools, a redis-counter source for external providers with hard concurrency limits.