Skip to content

Releases: tcncloud/sati

v3.3.0

16 Apr 16:21

Choose a tag to compare

What's Changed

Full Changelog: v3.2.3...v3.3.0

v3.2.3

16 Apr 01:37

Choose a tag to compare

What's Changed

  • feat(tracing): forward tracingSamplingFraction through ExileClientManager by @namtzigla in #57

Full Changelog: v3.2.2...v3.2.3

v3.2.2

16 Apr 01:27

Choose a tag to compare

What's Changed

  • feat(tracing): export spans to GCP Cloud Trace when opted in by @namtzigla in #56

Full Changelog: v3.2.1...v3.2.2

v3.2.1: fix(adaptive): min-gradient collapse on heterogeneous workloads

16 Apr 00:16

Choose a tag to compare

The min-gradient heuristic (clamp(bestLatency/EMA, 0.5, 1.0)) was
built on an unstated assumption: all samples represent the same kind
of work, so "fastest observed" is a meaningful reference for the
moving average. That assumption fails for plugins whose handlers
mix fast paths (cache hits, empty results, trivial validation) with
slow paths (DB round trips). The ratio then reflects workload
variance instead of queueing, and the controller sheds regardless
of whether anything is actually congested.

Reproduced live against finvi:

  initial limit=10, 120 ListPools calls over ~3 min:
    10 → 5 → 3 → 2 → 1    (one recompute every 25 samples)
  sloGradient=1.0 throughout (p95=47ms, 9% of SLO)
  resourceGradient=1.0 throughout (DB pool headroom)
  errors=0 throughout
  minGradient pinned at 0.5 floor because decayingMin=1ms (anchored
    by a sub-millisecond cache-hit sample) vs EMA=10ms.

Also surfaced a secondary bug: the decayingMin drift
`prev + (prev >> 10)` collapses to zero at `prev <= 1023` nanos,
so once pinned at a small value the running minimum had no
recovery path.

Two fixes in this PR:

1. Replace the running decayingMin with p5 of the ring buffer,
   clamped up to a 1 ms noise floor. p5 can't be pinned by a
   single outlier (needs >=5% of the window to move) and the
   noise floor prevents cache-hit-only plugins from feeding a
   degenerate "zero fast-case" into the ratio. Removes the
   AtomicLong field + bit-shift drift math entirely.

2. Add a homogeneity guard: when p5/p95 < GRADIENT_FLOOR, the
   workload is too variable for the min-gradient to be a
   meaningful queueing signal, so skip it (minG = 1.0) and let
   the SLO and resource gradients drive the limit alone. Vegas-
   style reasoning only works on homogeneous workloads; this
   guard makes the controller correct for the rest.

No public API changes. `AdaptiveCapacity.decayingMinNanos()`
preserved for AdaptiveSnapshot compatibility — it now returns
`max(p5, noiseFloor)` instead of the running minimum, which is a
closer-to-intended "realistic fast-case" value.

Three new regression tests:
  - singleSubMsOutlierDoesNotPinFastCase
  - heterogeneousWorkloadDoesNotCollapseToMinLimit
  - allSubMsSamplesDoNotCollapseLimit

All existing tests pass unchanged — the fix preserves behaviour
for homogeneous workloads (where min-gradient was working as
designed) and only changes behaviour for the broken case.

v3.2.0

15 Apr 21:43

Choose a tag to compare

What's Changed

  • feat(client): drain in-flight work on close, preserve Results/Acks by @namtzigla in #54

Full Changelog: v3.1.2...v3.2.0

v3.1.2

15 Apr 20:31

Choose a tag to compare

What's Changed

  • feat(client): expose AdaptiveSnapshot API for plugin diagnostics (follow-up to C4) by @namtzigla in #53

Full Changelog: v3.1.0...v3.1.2

v3.1.1

15 Apr 20:28

Choose a tag to compare

What's Changed

  • bench: WorkStream v3 throughput harness by @namtzigla in #47
  • Bump org.codehaus.plexus:plexus-utils from 4.0.2 to 4.0.3 in the gradle group across 1 directory by @dependabot[bot] in #48
  • feat(plugin): add resourceLimits() and ResourceLimit record (C1) by @namtzigla in #49
  • feat(internal): AdaptiveCapacity SLO-aware gradient controller (C2) by @namtzigla in #50
  • feat(internal): WorkStreamClient refill-to-target credits + job/event signal split (C3) by @namtzigla in #51
  • feat(client): wire AdaptiveCapacity as default + raise maxConcurrency to 100 (C4) by @namtzigla in #52

New Contributors

Full Changelog: v3.0.0...v3.1.1

v3.1.0

15 Apr 20:20

Choose a tag to compare

What's Changed

  • bench: WorkStream v3 throughput harness by @namtzigla in #47
  • Bump org.codehaus.plexus:plexus-utils from 4.0.2 to 4.0.3 in the gradle group across 1 directory by @dependabot[bot] in #48
  • feat(plugin): add resourceLimits() and ResourceLimit record (C1) by @namtzigla in #49
  • feat(internal): AdaptiveCapacity SLO-aware gradient controller (C2) by @namtzigla in #50
  • feat(internal): WorkStreamClient refill-to-target credits + job/event signal split (C3) by @namtzigla in #51
  • feat(client): wire AdaptiveCapacity as default + raise maxConcurrency to 100 (C4) by @namtzigla in #52

New Contributors

Full Changelog: v3.0.0...v3.1.0

v3.0.0

14 Apr 12:56

Choose a tag to compare

What's Changed

  • Redesign sati as plain Java client for v3 protocol by @namtzigla in #42

Full Changelog: v2.32.1...v3.0.0

v3.0.0.rc3

11 Apr 03:27

Choose a tag to compare

v3.0.0.rc3 Pre-release
Pre-release
fix BackoffTest: update expected values for 500ms base / 10s max