Skip to content

server, config: support updating leader lease online#10631

Open
JmPotato wants to merge 1 commit intotikv:masterfrom
JmPotato:bob/leader-lease-online-config
Open

server, config: support updating leader lease online#10631
JmPotato wants to merge 1 commit intotikv:masterfrom
JmPotato:bob/leader-lease-online-config

Conversation

@JmPotato
Copy link
Copy Markdown
Member

@JmPotato JmPotato commented Apr 29, 2026

What problem does this PR solve?

Issue Number: ref #10630

What is changed and how does it work?

Support updating the PD leader lease timeout through POST /pd/api/v1/config with the top-level lease field.
Store the effective leader lease in PersistOptions and persist it with the existing config blob.
Reload persisted config before campaigning so the next PD leader campaign uses the runtime lease value.
Keep existing toml startup behavior unchanged and reject only non-positive API values.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)

Code changes

Side effects

  • Increased code complexity

Related changes

Manual test

  • go test ./server/config -count=1
  • go test -tags without_dashboard ./tests/server/api -run TestLeaderLeaseConfigAPI -count=1
  • go test -tags without_dashboard ./server/... -count=0
  • make build
  • git diff --check

Release note

Support updating the PD leader lease timeout online through `POST /pd/api/v1/config` with the `lease` field. The new value takes effect on the next PD leader campaign; use leader transfer or resign to apply it immediately.

Summary by CodeRabbit

  • New Features

    • Leader lease is now persisted and surfaced by the config API; server reads/uses the persisted lease and SetLeaderLease persists and validates positive values. Reloading can update only the lease without disturbing other persisted options.
  • Bug Fixes

    • Reload ignores non‑positive persisted lease values to avoid overwriting a valid in‑memory lease.
  • Tests

    • Added end‑to‑end and unit tests for API, persistence, reload semantics, and validation/error handling.

@ti-chi-bot ti-chi-bot Bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. dco-signoff: yes Indicates the PR's author has signed the dco. labels Apr 29, 2026
@ti-chi-bot
Copy link
Copy Markdown
Contributor

ti-chi-bot Bot commented Apr 29, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign okjiang for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 29, 2026

📝 Walkthrough

Walkthrough

Adds persisted leader-lease state and API handling: a new top-level lease config key is routed through the config API to update persisted PersistOptions.leaderLease; server reads the persisted lease for GetConfig and leader campaigning, validates (>0), and exposes SetLeaderLease for runtime updates.

Changes

Cohort / File(s) Summary
Persist Options
server/config/persist_options.go
Add leaderLease atomic field, GetLeaderLease/SetLeaderLease, IsValidLeaderLease, persist LeaderLease into saved Config, add ReloadLeaderLease and centralize persisted-config loading.
Server core
server/server.go
Switch lease reads to persistOptions; add SetLeaderLease(lease int64) error that validates, updates persist options, persists with rollback on error; reload leader lease before campaigning and use persisted value for campaigning.
Config API
server/api/config.go
updateConfig recognizes top-level "lease" and delegates to updateLeaderLease, uses jsonutil.AddKeyValue, returns if the key is missing or jsonutil errors, and calls svr.SetLeaderLease on change.
Config tests
server/config/config_test.go
Add tests validating persistence and reload semantics for LeaderLease, including persistence via direct JSON lease, via PersistOptions.SetLeaderLease, ReloadLeaderLease behavior, and ignoring non-positive persisted values.
API tests
tests/server/api/api_test.go
Add TestLeaderLeaseConfigAPI to assert GET exposes numeric lease, POST updates persisted lease, and invalid inputs (0, -1) return 400 without mutating persisted value.

Sequence Diagram

sequenceDiagram
    participant Client
    participant API as Config API
    participant Server as PD Server
    participant Persist as PersistOptions
    participant Storage as Config Storage

    Client->>API: POST /pd/api/v1/config {"lease": 3000}
    API->>API: Detect top-level "lease" key
    API->>Server: SetLeaderLease(3000)
    Server->>Server: Validate lease > 0
    Server->>Persist: SetLeaderLease(3000)
    Persist->>Persist: Update atomic leaderLease
    Server->>Storage: Persist updated config
    Storage-->>Server: OK
    Server-->>API: 200 OK
    API-->>Client: 200 OK

    Note over Client,Storage: On restart / reload
    Server->>Storage: Load persisted config
    Storage-->>Server: Config{..., lease:3000}
    Server->>Persist: ReloadLeaderLease(...)
    Persist->>Persist: IsValidLeaderLease -> true
    Persist-->>Server: leaderLease=3000
    Server->>Server: Use persisted leaderLease for campaigning
Loading

Estimated Code Review Effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related issues

Suggested labels

size/M

Suggested reviewers

  • lhy1024
  • rleungx
  • bufferflies

Poem

🐰 I nibbled bytes beneath the moon,
A lease that hops and lasts quite soon,
Atoms hold the leader's span,
API nudges server's plan,
Storage hums — leadership resumes.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 16.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely describes the main change: enabling online updates to the PD leader lease through the config API.
Description check ✅ Passed The description addresses the problem statement, explains changes clearly, documents API/config/persistence impacts, includes a release note, and lists comprehensive test coverage including manual test commands.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
Review rate limit: 7/8 reviews remaining, refill in 7 minutes and 30 seconds.

Comment @coderabbitai help to get the list of available commands and usage tips.

@ti-chi-bot ti-chi-bot Bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Apr 29, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@server/config/persist_options.go`:
- Around line 829-831: Reload currently treats a default-initialized
cfg.LeaderLease as a present, valid value and overwrites any startup lease; fix
this by making the persisted presence explicit (either change cfg.LeaderLease to
a pointer type or add a boolean flag set by LoadConfig that indicates the field
was present) and only call IsValidLeaderLease and o.SetLeaderLease when that
presence indicator shows the field existed in the persisted blob; apply the same
change/guard for the other lease-related block at the 838-840 region so you only
apply persisted leases if the field was actually present in the loaded config.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b9f77e78-96a8-454f-a2b0-f0b11648ce41

📥 Commits

Reviewing files that changed from the base of the PR and between f941a7e and c8a100b.

📒 Files selected for processing (5)
  • server/api/config.go
  • server/config/config_test.go
  • server/config/persist_options.go
  • server/server.go
  • tests/server/api/api_test.go

Comment thread server/api/config.go Outdated
Comment thread server/config/persist_options.go
@JmPotato JmPotato force-pushed the bob/leader-lease-online-config branch from c8a100b to f66c676 Compare April 29, 2026 04:56
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
server/config/persist_options.go (1)

829-831: ⚠️ Potential issue | 🟠 Major

Do not apply LeaderLease unless the persisted lease field actually exists.

On Line 829, IsValidLeaderLease(cfg.LeaderLease) is not enough to distinguish “field missing” vs “field present.” For older stored blobs without lease, reload can still overwrite a custom startup lease.

💡 Suggested fix (presence check before applying)
 func (o *PersistOptions) Reload(storage endpoint.ConfigStorage) error {
 	cfg := &persistedConfig{Config: &Config{}}
+	leaseProbe := struct {
+		LeaderLease *int64 `json:"lease"`
+	}{}
 	// Pass nil to initialize cfg to default values (all items undefined)
 	if err := cfg.Adjust(nil, true); err != nil {
 		return err
 	}
 
 	isExist, err := storage.LoadConfig(cfg)
 	if err != nil {
 		return err
 	}
+	_, _ = storage.LoadConfig(&leaseProbe)
 	adjustScheduleCfg(&cfg.Schedule)
 	...
 	if isExist {
 		...
-		if IsValidLeaderLease(cfg.LeaderLease) {
-			o.SetLeaderLease(cfg.LeaderLease)
+		if leaseProbe.LeaderLease != nil && IsValidLeaderLease(*leaseProbe.LeaderLease) {
+			o.SetLeaderLease(*leaseProbe.LeaderLease)
 		}
 		...
 	}
 	return nil
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/config/persist_options.go` around lines 829 - 831, The current code
calls o.SetLeaderLease when IsValidLeaderLease(cfg.LeaderLease) is true, which
overwrites a startup lease even if the persisted blob lacked a `lease` field;
change the condition to require the persisted field's presence as well (e.g.,
check the config wrapper/flag that indicates the `lease` field was present or
use a nil/presence check on cfg.LeaderLease) before calling o.SetLeaderLease;
update the if around IsValidLeaderLease(cfg.LeaderLease) to something like "if
cfg.HasLeaderLease() && IsValidLeaderLease(cfg.LeaderLease) {
o.SetLeaderLease(cfg.LeaderLease) }" (replace HasLeaderLease with the actual
presence indicator in the config struct).
server/api/config.go (1)

218-219: ⚠️ Potential issue | 🟠 Major

lease is updateable, but follower GET /config can still serve stale lease.

With Line 218 enabling dynamic lease, follower reads need to merge lease from leader too. Today follower GetConfig only syncs schedule/replication (Lines 81-82), so lease may lag until reload/leadership change.

💡 Suggested fix in follower merge path
 	mergedCfg := localCfg
 	mergedCfg.Replication = leaderCfg.Replication
 	mergedCfg.Schedule = leaderCfg.Schedule
+	mergedCfg.LeaderLease = leaderCfg.LeaderLease
 	h.rd.JSON(w, http.StatusOK, mergedCfg)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/api/config.go` around lines 218 - 219, GetConfig on followers
currently syncs only schedule/replication and can return a stale lease; update
the follower merge path to include the leader's lease value as well. In the
follower GET /config handling (where schedule/replication are merged) add logic
to fetch and merge the leader's lease into the response so that the value
updated via updateLeaderLease is reflected to followers; ensure you reuse the
same merge/order semantics as schedule/replication so updateLeaderLease (case
"lease") changes propagate immediately to follower GetConfig responses.
🧹 Nitpick comments (1)
server/config/config_test.go (1)

131-148: Add one backward-compat test for missing lease key.

Please add a case where stored config omits lease entirely and assert reload keeps the in-memory startup lease unchanged.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/config/config_test.go` around lines 131 - 148, Extend
TestLeaderLeaseReloadIgnoresNonPositivePersistedValues to include a case where
the persisted config omits the lease key entirely: call storage.SaveConfig with
an empty value (e.g., struct{}{} or a map with no "lease" key), then run the
same reload flow (NewConfig, cfg.Adjust, set cfg.LeaderLease to 7,
NewPersistOptions, opt.Reload) and assert opt.GetLeaderLease() remains 7;
reference TestLeaderLeaseReloadIgnoresNonPositivePersistedValues,
storage.SaveConfig, NewConfig, NewPersistOptions, opt.Reload, and
opt.GetLeaderLease to locate the code to change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@server/api/config.go`:
- Around line 218-219: GetConfig on followers currently syncs only
schedule/replication and can return a stale lease; update the follower merge
path to include the leader's lease value as well. In the follower GET /config
handling (where schedule/replication are merged) add logic to fetch and merge
the leader's lease into the response so that the value updated via
updateLeaderLease is reflected to followers; ensure you reuse the same
merge/order semantics as schedule/replication so updateLeaderLease (case
"lease") changes propagate immediately to follower GetConfig responses.

In `@server/config/persist_options.go`:
- Around line 829-831: The current code calls o.SetLeaderLease when
IsValidLeaderLease(cfg.LeaderLease) is true, which overwrites a startup lease
even if the persisted blob lacked a `lease` field; change the condition to
require the persisted field's presence as well (e.g., check the config
wrapper/flag that indicates the `lease` field was present or use a nil/presence
check on cfg.LeaderLease) before calling o.SetLeaderLease; update the if around
IsValidLeaderLease(cfg.LeaderLease) to something like "if cfg.HasLeaderLease()
&& IsValidLeaderLease(cfg.LeaderLease) { o.SetLeaderLease(cfg.LeaderLease) }"
(replace HasLeaderLease with the actual presence indicator in the config
struct).

---

Nitpick comments:
In `@server/config/config_test.go`:
- Around line 131-148: Extend
TestLeaderLeaseReloadIgnoresNonPositivePersistedValues to include a case where
the persisted config omits the lease key entirely: call storage.SaveConfig with
an empty value (e.g., struct{}{} or a map with no "lease" key), then run the
same reload flow (NewConfig, cfg.Adjust, set cfg.LeaderLease to 7,
NewPersistOptions, opt.Reload) and assert opt.GetLeaderLease() remains 7;
reference TestLeaderLeaseReloadIgnoresNonPositivePersistedValues,
storage.SaveConfig, NewConfig, NewPersistOptions, opt.Reload, and
opt.GetLeaderLease to locate the code to change.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 5e0a21ab-1f69-44ae-a4af-914af870c79f

📥 Commits

Reviewing files that changed from the base of the PR and between c8a100b and f66c676.

📒 Files selected for processing (5)
  • server/api/config.go
  • server/config/config_test.go
  • server/config/persist_options.go
  • server/server.go
  • tests/server/api/api_test.go

Signed-off-by: JmPotato <github@ipotato.me>
@JmPotato JmPotato force-pushed the bob/leader-lease-online-config branch from f66c676 to 6168127 Compare April 29, 2026 06:03
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
server/server.go (1)

1013-1025: ⚠️ Potential issue | 🟠 Major

Keep lease in the follower merge path.

GetConfig() still only overlays Schedule and Replication from leaderCfg. After a runtime lease update, a follower can keep serving an old lease until its local cache reloads, so /pd/api/v1/config can disagree with the leader. Please copy LeaderLease from leaderCfg here too.

Suggested patch
 mergedCfg := localCfg
 mergedCfg.Replication = leaderCfg.Replication
 mergedCfg.Schedule = leaderCfg.Schedule
+mergedCfg.LeaderLease = leaderCfg.LeaderLease
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/server.go` around lines 1013 - 1025, GetConfig currently doesn't copy
the leader's runtime lease into the returned cfg causing followers to serve
stale LeaderLease; in GetConfig ensure you overlay LeaderLease from the
persisted leader config (use the value returned by persistOptions.GetLeaderLease
/ the leaderCfg's lease) into cfg so the returned config reflects the leader's
lease update (update the LeaderLease field in GetConfig to copy/clone the
leader's lease value rather than leaving a stale local one).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@server/server.go`:
- Around line 1013-1025: GetConfig currently doesn't copy the leader's runtime
lease into the returned cfg causing followers to serve stale LeaderLease; in
GetConfig ensure you overlay LeaderLease from the persisted leader config (use
the value returned by persistOptions.GetLeaderLease / the leaderCfg's lease)
into cfg so the returned config reflects the leader's lease update (update the
LeaderLease field in GetConfig to copy/clone the leader's lease value rather
than leaving a stale local one).

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4124abaa-3025-491c-8fe5-0b2d872988cd

📥 Commits

Reviewing files that changed from the base of the PR and between f66c676 and 6168127.

📒 Files selected for processing (5)
  • server/api/config.go
  • server/config/config_test.go
  • server/config/persist_options.go
  • server/server.go
  • tests/server/api/api_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • server/config/persist_options.go

@ti-chi-bot
Copy link
Copy Markdown
Contributor

ti-chi-bot Bot commented Apr 29, 2026

@JmPotato: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-unit-test-next-gen-3 6168127 link true /test pull-unit-test-next-gen-3
pull-integration-realcluster-test 6168127 link true /test pull-integration-realcluster-test

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dco-signoff: yes Indicates the PR's author has signed the dco. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant