Releases: tcncloud/sati
Releases · tcncloud/sati
v3.3.0
v3.2.3
What's Changed
- feat(tracing): forward tracingSamplingFraction through ExileClientManager by @namtzigla in #57
Full Changelog: v3.2.2...v3.2.3
v3.2.2
What's Changed
- feat(tracing): export spans to GCP Cloud Trace when opted in by @namtzigla in #56
Full Changelog: v3.2.1...v3.2.2
v3.2.1: fix(adaptive): min-gradient collapse on heterogeneous workloads
The min-gradient heuristic (clamp(bestLatency/EMA, 0.5, 1.0)) was
built on an unstated assumption: all samples represent the same kind
of work, so "fastest observed" is a meaningful reference for the
moving average. That assumption fails for plugins whose handlers
mix fast paths (cache hits, empty results, trivial validation) with
slow paths (DB round trips). The ratio then reflects workload
variance instead of queueing, and the controller sheds regardless
of whether anything is actually congested.
Reproduced live against finvi:
initial limit=10, 120 ListPools calls over ~3 min:
10 → 5 → 3 → 2 → 1 (one recompute every 25 samples)
sloGradient=1.0 throughout (p95=47ms, 9% of SLO)
resourceGradient=1.0 throughout (DB pool headroom)
errors=0 throughout
minGradient pinned at 0.5 floor because decayingMin=1ms (anchored
by a sub-millisecond cache-hit sample) vs EMA=10ms.
Also surfaced a secondary bug: the decayingMin drift
`prev + (prev >> 10)` collapses to zero at `prev <= 1023` nanos,
so once pinned at a small value the running minimum had no
recovery path.
Two fixes in this PR:
1. Replace the running decayingMin with p5 of the ring buffer,
clamped up to a 1 ms noise floor. p5 can't be pinned by a
single outlier (needs >=5% of the window to move) and the
noise floor prevents cache-hit-only plugins from feeding a
degenerate "zero fast-case" into the ratio. Removes the
AtomicLong field + bit-shift drift math entirely.
2. Add a homogeneity guard: when p5/p95 < GRADIENT_FLOOR, the
workload is too variable for the min-gradient to be a
meaningful queueing signal, so skip it (minG = 1.0) and let
the SLO and resource gradients drive the limit alone. Vegas-
style reasoning only works on homogeneous workloads; this
guard makes the controller correct for the rest.
No public API changes. `AdaptiveCapacity.decayingMinNanos()`
preserved for AdaptiveSnapshot compatibility — it now returns
`max(p5, noiseFloor)` instead of the running minimum, which is a
closer-to-intended "realistic fast-case" value.
Three new regression tests:
- singleSubMsOutlierDoesNotPinFastCase
- heterogeneousWorkloadDoesNotCollapseToMinLimit
- allSubMsSamplesDoNotCollapseLimit
All existing tests pass unchanged — the fix preserves behaviour
for homogeneous workloads (where min-gradient was working as
designed) and only changes behaviour for the broken case.
v3.2.0
What's Changed
- feat(client): drain in-flight work on close, preserve Results/Acks by @namtzigla in #54
Full Changelog: v3.1.2...v3.2.0
v3.1.2
What's Changed
- feat(client): expose AdaptiveSnapshot API for plugin diagnostics (follow-up to C4) by @namtzigla in #53
Full Changelog: v3.1.0...v3.1.2
v3.1.1
What's Changed
- bench: WorkStream v3 throughput harness by @namtzigla in #47
- Bump org.codehaus.plexus:plexus-utils from 4.0.2 to 4.0.3 in the gradle group across 1 directory by @dependabot[bot] in #48
- feat(plugin): add resourceLimits() and ResourceLimit record (C1) by @namtzigla in #49
- feat(internal): AdaptiveCapacity SLO-aware gradient controller (C2) by @namtzigla in #50
- feat(internal): WorkStreamClient refill-to-target credits + job/event signal split (C3) by @namtzigla in #51
- feat(client): wire AdaptiveCapacity as default + raise maxConcurrency to 100 (C4) by @namtzigla in #52
New Contributors
- @dependabot[bot] made their first contribution in #48
Full Changelog: v3.0.0...v3.1.1
v3.1.0
What's Changed
- bench: WorkStream v3 throughput harness by @namtzigla in #47
- Bump org.codehaus.plexus:plexus-utils from 4.0.2 to 4.0.3 in the gradle group across 1 directory by @dependabot[bot] in #48
- feat(plugin): add resourceLimits() and ResourceLimit record (C1) by @namtzigla in #49
- feat(internal): AdaptiveCapacity SLO-aware gradient controller (C2) by @namtzigla in #50
- feat(internal): WorkStreamClient refill-to-target credits + job/event signal split (C3) by @namtzigla in #51
- feat(client): wire AdaptiveCapacity as default + raise maxConcurrency to 100 (C4) by @namtzigla in #52
New Contributors
- @dependabot[bot] made their first contribution in #48
Full Changelog: v3.0.0...v3.1.0
v3.0.0
What's Changed
- Redesign sati as plain Java client for v3 protocol by @namtzigla in #42
Full Changelog: v2.32.1...v3.0.0
v3.0.0.rc3
fix BackoffTest: update expected values for 500ms base / 10s max