feat(shard distributor): Persist Shard-Level Statistics for Load Balancing, and Add Cleanup Function #7354

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

AndreasHolt wants to merge 22 commits into cadence-workflow:master from AndreasHolt:lb-shard-metrics-etcd

+513 −33

AndreasHolt commented Oct 20, 2025

What changed?

introduce store.ShardMetrics (smoothed load + timestamps) and persist it under store/<namespace>/shards/<shardID>/metrics. SmoothedLoad will keep an EWMA of shard load, LastUpdateTime will be used for dynamically updating the alpha value used in EWMA, and LastMoveTime will be used to support cooldown logic to limits shard churn
extend the etcd store so AssignShard/AssignShards write metrics alongside ownership, refresh LastMoveTime when reusing existing metrics, and apply per-shard metric updates after the main transaction to stay within etcd’s 128ops trnsaction limit
extend GetState to read the new metric keys and expose them in NamespaceState, allowing the leader to use it for future rebalancing decisiosn

Why?
reported_shards is keyed by executor. That works for reporting the latest heartbeat, but it breaks down the moment a shard moves. Then the new owner can’t see the old owner’s smoothed load or timestamps, and the leader has to collect executor-specific parts just to reason about shard state. By giving each shard its own metrics key:

the data survives ownership changes. New executors and the leader can pick up where the prev owner left off
the leader can read NamespaceState and compute balancing or throtling decisions without looking for per-exec heartbeats
we can store both an EWMA (so short spikes hopefully won’t cause thrashing) and timestamps: last_update_time is used for the decay value
(alpha) when applying the next sample, and last_move_time is what we’ll use for cooldowns before moving a shard again.

A follow-up pull request will wire heartbeats to update the metrics each time.

How did you test it?
Integration tests w/ etcd (added new test cases to ./service/sharddistributor/store/etcd/etcdstore_test.go)
go test ./service/sharddistributor/store/etcd/executorstore
Also tested it by logging values while running the ephemeral service (which simulates executors and shards)

Potential risks
Added pressure to etcd and extra read operations when preparing metric updates

Release notes
Shard distributor now persists shard metrics in etcd (smoothed load and timestamps) for future load balancing logic.

Documentation Changes

AndreasHolt requested review from 3vilhamster, Groxx, Shaddoll, davidporter-id-au, demirkayaender, dkrotx, jakobht, neil-xie, sankari165, shijiesheng and taylanisikdemir as code owners

October 20, 2025 10:21

AndreasHolt force-pushed the lb-shard-metrics-etcd branch from d393051 to 6360f8a Compare

October 20, 2025 12:05

eleonoradgr reviewed

View reviewed changes

service/sharddistributor/store/etcd/etcdkeys/etcdkeys.go Outdated

    
              }

              func BuildShardKey(prefix string, namespace, shardID, keyType string) (string, error) {

              	if keyType != ShardAssignedKey && keyType != ShardMetricsKey {

Contributor

eleonoradgr Oct 21, 2025

where/when is this used?

eleonoradgr reviewed

View reviewed changes

service/sharddistributor/store/etcd/etcdkeys/etcdkeys.go

    
              	return parts[0], parts[1], nil

              }

              func BuildShardPrefix(prefix string, namespace string) string {

Contributor

eleonoradgr Oct 21, 2025 •

edited

Loading

We need to have tests for BuildShardPrefix, BuildShardKey and ParseShardKey :)

eleonoradgr reviewed

View reviewed changes

service/sharddistributor/store/etcd/executorstore/etcdstore.go

    
              		shardID, shardKeyType, err := etcdkeys.ParseShardKey(s.prefix, namespace, string(kv.Key))

              		if err != nil {

              			continue

              		}

Contributor

eleonoradgr Oct 21, 2025

If we don't want to abort metric emission , we should still log an error such that we have evidence that something is not working as expected.

eleonoradgr reviewed

View reviewed changes

service/sharddistributor/store/etcd/executorstore/etcdstore.go Outdated

    
              	// Compute shard moves to update last_move_time metrics when ownership changes.

              	// Read current assignments for the namespace and compare with the new state.

              	// Concurrent changes will be caught by the revision comparisons later.

              	currentAssignments := make(map[string]string) // shardID -> executorID

Contributor

eleonoradgr Oct 21, 2025 •

edited

Loading

Instead of building all the execurorsTOShard mapping can we rely on shardCache *shardcache.ShardToExecutorCache cache?

eleonoradgr reviewed

View reviewed changes

service/sharddistributor/store/etcd/executorstore/etcdstore.go Outdated

    
              		}

              	}

              	now := time.Now().Unix()

              	// Collect metric updates now so we can apply them after committing the main transaction.

Contributor

eleonoradgr Oct 21, 2025

Can we move all the code for metric generation to a separate function? it will make the overall code more readable

eleonoradgr reviewed

View reviewed changes

service/sharddistributor/store/etcd/executorstore/etcdstore.go Outdated

    
              	metrics           store.ShardMetrics

              	modRevision       int64

              	desiredLastMove   int64 // intended LastMoveTime for this update

              	defaultLastUpdate int64

Contributor

eleonoradgr Oct 21, 2025

what is the defaultLastUpdate? can we just call this LastUpdate?

eleonoradgr reviewed

View reviewed changes

service/sharddistributor/store/etcd/executorstore/etcdstore.go Outdated

    
              			} else {

              				update.metrics = store.ShardMetrics{

              					SmoothedLoad:   0,

              					LastUpdateTime: update.defaultLastUpdate,

Contributor

eleonoradgr Oct 21, 2025

looking at the way defaultLastUpdate is used, i think we can simplify the code and just remove it, wouln't be equal to use desired last move here?

eleonoradgr reviewed

View reviewed changes

service/sharddistributor/store/etcd/executorstore/etcdstore.go Outdated

    
              	for i := range updates {

              		update := &updates[i]

              		for {

Contributor

eleonoradgr Oct 21, 2025

this is very difficult to read, I am not sure we will understand what it does in few weeks, can we remove this? :)

eleonoradgr reviewed

View reviewed changes

service/sharddistributor/store/etcd/executorstore/etcdstore.go Outdated

    
              			newAssignments[shardID] = executorID

              		}

              	}

              	now := time.Now().Unix()

Contributor

eleonoradgr Oct 21, 2025

in general we use clock.TimeSource to handle time, it makes testing easier, I would suggest to extend this using the same approach, you can check out in executorImpl for example

jakobht reviewed

View reviewed changes

service/sharddistributor/store/etcd/executorstore/etcdstore.go Outdated

    
              // shardMetricsUpdate tracks the etcd key, revision, and metrics used to update a shard

              // after the main transaction in AssignShards for exec state.

              // Retains metrics to safely merge concurrent updates before retrying.

              type shardMetricsUpdate struct {

Member

jakobht Oct 22, 2025

I would call this statistics, metrics have a pretty standard meaning.

AndreasHolt force-pushed the lb-shard-metrics-etcd branch 3 times, most recently from 1078343 to 6816b8e Compare

October 27, 2025 07:43

eleonoradgr reviewed

View reviewed changes

service/sharddistributor/leader/process/processor.go Outdated

    
              }

              func (p *namespaceProcessor) cleanupStaleShardStats(ctx context.Context) {

              	namespaceState, err := p.shardStore.GetState(ctx, p.namespaceCfg.Name)

Contributor

eleonoradgr Oct 31, 2025

consider to only query GetState once and pass this as a parameter, even if we don't have the most updated state for cleaning the stats it is fine

Author

AndreasHolt Nov 3, 2025

Good suggestion, added this in recent commit: 3931fea

Refactored runCleanupLoop to fetch state once and pass it to both cleanup functions. Had to update tests to pass state directly, instead of expecting GetState calls in cleanupStaleExecutors/cleanupStaleShardStats.

eleonoradgr reviewed

View reviewed changes

service/sharddistributor/leader/process/processor.go

    
              	// 1. build set of active executors

              	// add all assigned shards from executors that are ACTIVE and not stale

Contributor

eleonoradgr Oct 31, 2025

what happens if the executor is in draining state? are we fine with losing the statics for that? it is covered from the following case where the shard is not in a done state right?

Author

AndreasHolt Nov 3, 2025

The ShardStatus != DONE check in cleanupStaleShardStats keeps shard stats alive while a draining executor still reports them, and the TTL "grace period" only removes them after the shard has been marked DONE and stayed idle for that whole TTL window.

eleonoradgr approved these changes

View reviewed changes

AndreasHolt added 2 commits

November 2, 2025 19:39


          feat(shard distributor): add shard key helpers and metrics state

d1e858f

Signed-off-by: Andreas Holt <[email protected]>


          feat(shard distributor): persist shard metrics in etcd store

5f105bf

Signed-off-by: Andreas Holt <[email protected]>

AndreasHolt and others added 19 commits

November 2, 2025 19:40


          fix(shard distributor): update LastMoveTime in the case where a shard…

d32cc72

… is being reassigned in AssignShard

Signed-off-by: Andreas Holt <[email protected]>


          test(shard distributor): add tests for shard metrics

9e958d9

Signed-off-by: Andreas Holt <[email protected]>


          fix(shard distributor): modify comment

3be24c4

Signed-off-by: Andreas Holt <[email protected]>


          fix(shard distributor): add atomic check to prevent metrics race

1ca89cd

Signed-off-by: Andreas Holt <[email protected]>


          fix(shard distributor): apply shard metric updates in a second phase …

9a48ab6

…to not overload etcd's 128 max ops per txn

Signed-off-by: Andreas Holt <[email protected]>


          feat(shard distributor): move shard metric updates out of AssignShard…

876472d

…s txn and retry monotonically

Signed-off-by: Andreas Holt <[email protected]>


          fix(shard distributor): keep NamespaceState revisions tied to assignm…

023fc73

…ents

Signed-off-by: Andreas Holt <[email protected]>


          refactor(shard distributor): use shard cache and clock for preparing …

3b3b8db

…shard metrics, move out to staging to separate function

Signed-off-by: Andreas Holt <[email protected]>


          test(shard distributor): BuildShardPrefix, BuildShardKey, ParseShardKey

81143d8

Signed-off-by: Andreas Holt <[email protected]>


          feat(shard distributor): simplify shard metrics updates

Signed-off-by: Andreas Holt <[email protected]>


          refactor(shard distributor): ShardMetrics renamed to ShardStatistics.…

266d00e

… And more idiomatic naming of collection vs singular type

Signed-off-by: Andreas Holt <[email protected]>


          test(shard distributor): small changes to shard key tests s.t. they l…

c5dee7f

…ook more like executor key tests

Signed-off-by: Andreas Holt <[email protected]>


          fix(shard distributor): no longer check for key type ShardStatisticsK…

b333e7d

…ey in BuildShardKey, as we don't use it

Signed-off-by: Andreas Holt <[email protected]>


          refactor(shard distributor): found a place where I forgot to rename t…

ac4b237

…o "statistics"

Signed-off-by: Andreas Holt <[email protected]>


          fix(shard distributor): move non-exported helpers to end of file to f…

24888ac

…ollow conventions

Signed-off-by: Andreas Holt <[email protected]>


          feat(shard distributor): clean up the shard statistics

63d060b

Signed-off-by: Andreas Holt <[email protected]>


          test(shard distributor): add test case for when shard stats are deleted

588a8f4

Signed-off-by: Andreas Holt <[email protected]>


          fix(shard distributor): add mapping (new metric)

2c88990

Signed-off-by: Andreas Holt <[email protected]>


          feat(shard distributor): retain shard stats while shards are within h…

…eartbeat TTL

Signed-off-by: Andreas Holt <[email protected]>

AndreasHolt force-pushed the lb-shard-metrics-etcd branch from d5a13d9 to 2642080 Compare

November 2, 2025 20:26


          refactor(shard distributor): fetch namespace state once per cleanup tick

3931fea

Signed-off-by: Andreas Holt <[email protected]>

AndreasHolt changed the title ~~feat(shard distributor): Persist Shard-Level Metrics With Guarded Updates for Load Balancing~~ feat(shard distributor): Persist Shard-Level Statistics for Load Balancing, and Add Cleanup Function

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

jakobht jakobht left review comments

eleonoradgr eleonoradgr approved these changes

Shaddoll Awaiting requested review from Shaddoll Shaddoll is a code owner

neil-xie Awaiting requested review from neil-xie neil-xie is a code owner

davidporter-id-au Awaiting requested review from davidporter-id-au davidporter-id-au is a code owner

Groxx Awaiting requested review from Groxx

shijiesheng Awaiting requested review from shijiesheng shijiesheng is a code owner

3vilhamster Awaiting requested review from 3vilhamster 3vilhamster is a code owner

sankari165 Awaiting requested review from sankari165 sankari165 is a code owner

dkrotx Awaiting requested review from dkrotx dkrotx is a code owner

taylanisikdemir Awaiting requested review from taylanisikdemir taylanisikdemir is a code owner

demirkayaender Awaiting requested review from demirkayaender demirkayaender is a code owner

Labels

None yet