Skip to content

store: lease-based leader election with decoupled refresh #248

Closed
sravotto wants to merge 2 commits intomainfrom
sr8_main_node
Closed

store: lease-based leader election with decoupled refresh #248
sravotto wants to merge 2 commits intomainfrom
sr8_main_node

Conversation

@sravotto
Copy link
Copy Markdown
Collaborator

@sravotto sravotto commented Mar 2, 2026

Replace the max(id) leader election with a sticky lease-based scheme using a dedicated single-row _visus.leader table.
The leader holds a lease that must be explicitly renewed; other nodes can only claim leadership after the lease expires.

Separate lease renewal from collection refresh so they run on independent cadences:

  • lease renewal defaults to every 30s (--lease-interval), while collection refresh defaults to every 5m (--refresh).
  • Stale-node cleanup runs once at startup and then hourly.

Add --random-id flag to allow running multiple Visus instances against the same cluster without requiring a CockroachDB node ID (useful for CockroachDB deployments).

@sravotto sravotto changed the title election: add stale cleanup, transition logging, and resilient error handling election: lease-based leader election with decoupled refresh Mar 3, 2026
@sravotto sravotto changed the title election: lease-based leader election with decoupled refresh store: lease-based leader election with decoupled refresh Mar 3, 2026
@sravotto sravotto force-pushed the sr8_main_node branch 8 times, most recently from 2aaf04e to fea3c64 Compare March 4, 2026 03:10
@sravotto sravotto marked this pull request as ready for review March 4, 2026 16:47
@sravotto sravotto requested a review from BramGruneir March 4, 2026 16:48
sravotto and others added 2 commits March 4, 2026 13:58
Replace the max(id) leader election with a sticky lease-based scheme
using a dedicated single-row _visus.leader table. The leader holds a
lease that must be explicitly renewed; other nodes can only claim
leadership after the lease expires.

Separate lease renewal from collection refresh so they run on
independent cadences: lease renewal defaults to every 30s
(--lease-interval), while collection refresh defaults to every 5m
(--refresh). Stale-node cleanup runs once at startup and then hourly.

Add --random-id flag to allow running multiple Visus instances against
the same cluster without requiring a CockroachDB node ID (useful for
CockroachDB deployments).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown

@BramGruneir BramGruneir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I think this needs a lot of work.

Here are my notes as I was review the code this morning:


Upon further reflection here, I think you might want to alter this scheme.
Either, store both the real node id and the random number for the process, or combine the two to make the "nodeid" be NNNNNNN0XXXXXXX where the Ns are the random number and the Xs are the actual node id, padded to always take up the same number of digits.

I think storing both is a better idea, but this way you can tell which process has the current lease.

Either way, you can remove the new --random-id flag and use a random number every time.


The more I see this, the more I think overriding the node id is stilly. Seems like a hack instead of real fix. I don't think you even need a random number as long as each visus process is trying to grab the lease after the current one expires.

Add some jitter in there too. As this leasing is going to create a storm if not. Think about a 1000 node cluster.


This isn't the first leasing code we've needed. Maybe time for a new powertool? One just for leases? And we can update visus and tempo to use the same library.

store := store.New(conn)
var st store.Store
if cfg.UseRandomID {
maxID := new(big.Int).SetInt64(1<<62 - 1)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems sloppy.

Won't this do the exact same thing in one line?

nodeID := rand.Int64N(1<<62-1) + 1

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, why only to 1<<62 ?

f.StringVar(&cfg.CaCert, "ca-cert", "",
"Path to the CA certificate")
f.DurationVar(&cfg.LeaseInterval, "lease-interval", 30*time.Second,
"How often to renew the leader lease.")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't call it a leader lease... that's going to conflict with cockroach nomenclature. It's just a lease and should be a well understood concept.

Comment thread internal/store/memory.go
err error // Error to return
mainNode bool
err error // Error to return
leader bool
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

really not a fan of leader here. In cockroach we use it because of Raft. This is just a lease, so maybe leaseholder?

Comment thread internal/store/memory.go
mainNode bool
err error // Error to return
leader bool
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why store a bool, shouldn't the leaseholder be the timestamp of when the lease expires? A bool seems like a strange choice, since it won't automatically expire of the processing is slow for whatever reason.

Comment thread internal/store/memory.go
sync.RWMutex
err error // Error to return
mainNode bool
err error // Error to return
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

error to return from what? What would set this error? Why would you hold it?

ALTER TABLE _visus.collection ADD COLUMN IF NOT EXISTS databases STRING DEFAULT '';
CREATE TABLE IF NOT EXISTS _visus.leader (
singleton BOOL PRIMARY KEY DEFAULT true CHECK (singleton = true),
node_id INT NOT NULL,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

store both node_random and node _id

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or combine them into the id in a human readable way

);

CREATE TABLE IF NOT EXISTS _visus.node (
id INT PRIMARY KEY,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will this get messed up now?

--
-- Try to renew this node's lease and return the current leader state.
-- Returns (renewed, lease_valid):
-- (true, true) – this node renewed its lease successfully.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should return the new expiry or null

-- Upsert the current node's heartbeat timestamp.
INSERT INTO _visus.node (id, updated)
VALUES ($1, current_timestamp())
ON CONFLICT (id) DO UPDATE SET updated = current_timestamp();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe store both the current timestamp and the timestamp send from visus as well? It will show lag that may be useful when diagnosing issues.

-- SPDX-License-Identifier: Apache-2.0

-- Upsert the current node's heartbeat timestamp.
INSERT INTO _visus.node (id, updated)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this file in this commit? Is it related to the lease? It doesn't look like it.

@sravotto sravotto closed this Mar 6, 2026
@sravotto sravotto deleted the sr8_main_node branch March 9, 2026 21:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants