store: lease-based leader election with decoupled refresh #248
store: lease-based leader election with decoupled refresh #248
Conversation
2aaf04e to
fea3c64
Compare
Replace the max(id) leader election with a sticky lease-based scheme using a dedicated single-row _visus.leader table. The leader holds a lease that must be explicitly renewed; other nodes can only claim leadership after the lease expires. Separate lease renewal from collection refresh so they run on independent cadences: lease renewal defaults to every 30s (--lease-interval), while collection refresh defaults to every 5m (--refresh). Stale-node cleanup runs once at startup and then hourly. Add --random-id flag to allow running multiple Visus instances against the same cluster without requiring a CockroachDB node ID (useful for CockroachDB deployments). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
BramGruneir
left a comment
There was a problem hiding this comment.
Overall, I think this needs a lot of work.
Here are my notes as I was review the code this morning:
Upon further reflection here, I think you might want to alter this scheme.
Either, store both the real node id and the random number for the process, or combine the two to make the "nodeid" be NNNNNNN0XXXXXXX where the Ns are the random number and the Xs are the actual node id, padded to always take up the same number of digits.
I think storing both is a better idea, but this way you can tell which process has the current lease.
Either way, you can remove the new --random-id flag and use a random number every time.
The more I see this, the more I think overriding the node id is stilly. Seems like a hack instead of real fix. I don't think you even need a random number as long as each visus process is trying to grab the lease after the current one expires.
Add some jitter in there too. As this leasing is going to create a storm if not. Think about a 1000 node cluster.
This isn't the first leasing code we've needed. Maybe time for a new powertool? One just for leases? And we can update visus and tempo to use the same library.
| store := store.New(conn) | ||
| var st store.Store | ||
| if cfg.UseRandomID { | ||
| maxID := new(big.Int).SetInt64(1<<62 - 1) |
There was a problem hiding this comment.
This seems sloppy.
Won't this do the exact same thing in one line?
nodeID := rand.Int64N(1<<62-1) + 1
| f.StringVar(&cfg.CaCert, "ca-cert", "", | ||
| "Path to the CA certificate") | ||
| f.DurationVar(&cfg.LeaseInterval, "lease-interval", 30*time.Second, | ||
| "How often to renew the leader lease.") |
There was a problem hiding this comment.
don't call it a leader lease... that's going to conflict with cockroach nomenclature. It's just a lease and should be a well understood concept.
| err error // Error to return | ||
| mainNode bool | ||
| err error // Error to return | ||
| leader bool |
There was a problem hiding this comment.
really not a fan of leader here. In cockroach we use it because of Raft. This is just a lease, so maybe leaseholder?
| mainNode bool | ||
| err error // Error to return | ||
| leader bool | ||
| } |
There was a problem hiding this comment.
Why store a bool, shouldn't the leaseholder be the timestamp of when the lease expires? A bool seems like a strange choice, since it won't automatically expire of the processing is slow for whatever reason.
| sync.RWMutex | ||
| err error // Error to return | ||
| mainNode bool | ||
| err error // Error to return |
There was a problem hiding this comment.
error to return from what? What would set this error? Why would you hold it?
| ALTER TABLE _visus.collection ADD COLUMN IF NOT EXISTS databases STRING DEFAULT ''; | ||
| CREATE TABLE IF NOT EXISTS _visus.leader ( | ||
| singleton BOOL PRIMARY KEY DEFAULT true CHECK (singleton = true), | ||
| node_id INT NOT NULL, |
There was a problem hiding this comment.
store both node_random and node _id
There was a problem hiding this comment.
or combine them into the id in a human readable way
| ); | ||
|
|
||
| CREATE TABLE IF NOT EXISTS _visus.node ( | ||
| id INT PRIMARY KEY, |
| -- | ||
| -- Try to renew this node's lease and return the current leader state. | ||
| -- Returns (renewed, lease_valid): | ||
| -- (true, true) – this node renewed its lease successfully. |
There was a problem hiding this comment.
this should return the new expiry or null
| -- Upsert the current node's heartbeat timestamp. | ||
| INSERT INTO _visus.node (id, updated) | ||
| VALUES ($1, current_timestamp()) | ||
| ON CONFLICT (id) DO UPDATE SET updated = current_timestamp(); |
There was a problem hiding this comment.
maybe store both the current timestamp and the timestamp send from visus as well? It will show lag that may be useful when diagnosing issues.
| -- SPDX-License-Identifier: Apache-2.0 | ||
|
|
||
| -- Upsert the current node's heartbeat timestamp. | ||
| INSERT INTO _visus.node (id, updated) |
There was a problem hiding this comment.
Why is this file in this commit? Is it related to the lease? It doesn't look like it.
Replace the max(id) leader election with a sticky lease-based scheme using a dedicated single-row
_visus.leadertable.The leader holds a lease that must be explicitly renewed; other nodes can only claim leadership after the lease expires.
Separate lease renewal from collection refresh so they run on independent cadences:
--lease-interval), while collection refresh defaults to every 5m (--refresh).Add
--random-idflag to allow running multiple Visus instances against the same cluster without requiring a CockroachDB node ID (useful for CockroachDB deployments).