Skip to content

mcs: scheduling mcs support gctuner#10212

Open
bufferflies wants to merge 10 commits intotikv:masterfrom
bufferflies:feat/gc
Open

mcs: scheduling mcs support gctuner#10212
bufferflies wants to merge 10 commits intotikv:masterfrom
bufferflies:feat/gc

Conversation

@bufferflies
Copy link
Copy Markdown
Contributor

@bufferflies bufferflies commented Feb 4, 2026

What problem does this PR solve?

Issue Number: Ref #10213

What is changed and how does it work?

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Code changes

Side effects

  • Possible performance regression
  • Increased code complexity
  • Breaking backward compatibility

Related changes

Release note

None.

Summary by CodeRabbit

  • New Features
    • Configurable GC tuner with enable flag, GC threshold, server memory limit, and memory-trigger settings.
    • Dynamic GC tuner initialization and live updates from persisted server config; exposed across watcher, server APIs, and cluster components.
  • Bug Fixes
    • Improved thread-safety and stability for GC tuner initialization, updates, and stopping behavior.
  • Tests
    • Added unit and integration tests for GC tuner init, config watch, and dynamic updates.

Signed-off-by: bufferflies <1045931706@qq.com>
@ti-chi-bot
Copy link
Copy Markdown
Contributor

ti-chi-bot Bot commented Feb 4, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@ti-chi-bot ti-chi-bot Bot added do-not-merge/needs-linked-issue do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note-none Denotes a PR that doesn't merit a release note. dco-signoff: yes Indicates the PR's author has signed the dco. labels Feb 4, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Feb 4, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Add a thread-safe GC tuner (Config/State API) with Init/Update/Stop, wire it into server config, watcher, and cluster flows for dynamic tuning and memory-limit handling, and add unit/integration tests for config watch and tuner behavior.

Changes

Cohort / File(s) Summary
GC tuner core
pkg/gctuner/tuner.go, pkg/gctuner/tuner_test.go
Add exported Config and State types; thread-safe InitGCTuner, UpdateIfNeeded, Stop; mutex-protected tuning updates, memory-limit calculations, logging, and unit tests covering init/update/stop scenarios.
Server config surface
pkg/schedule/config/config.go
Add public ServerConfig type with GC-related fields and Clone() method.
Persisted config & wiring
pkg/mcs/scheduling/server/config/config.go, pkg/mcs/scheduling/server/config/watcher.go, pkg/mcs/scheduling/server/server.go, pkg/mcs/scheduling/server/apis/v1/api.go
Persist Server config, expose GetServerConfig() and GetGCTunerConfig(), initialize/update/stop gctuner from watcher and include Server config in API/Server GetConfig flows.
Cluster integration
server/cluster/cluster.go
Replace ad-hoc tuning with InitGCTuner(totalMem, c.getGCTunerConfig()), add getGCTunerConfig(), call UpdateIfNeeded() regularly, and stop tuner on shutdown.
Integration tests
tests/integrations/mcs/scheduling/config_test.go
Add TestGCTunerConfigWatch to validate watcher observes Server GC-related config changes and triggers updates.

Sequence Diagram(s)

sequenceDiagram
  participant PersistConfig
  participant Watcher
  participant Mem as memory
  participant GCT as gctuner.State
  participant Cluster as RaftCluster

  PersistConfig->>Watcher: persist initial ServerConfig
  Watcher->>Mem: read total memory
  Watcher->>GCT: InitGCTuner(totalMem, PersistConfig.GetGCTunerConfig())
  GCT-->>Watcher: returned State

  Note right of PersistConfig: on config change
  PersistConfig->>Watcher: new ServerConfig
  Watcher->>GCT: state.UpdateIfNeeded(newConfig)
  alt config disables tuner
    GCT->>GCT: Stop() (disable tuning / set GC to default)
  else config updates thresholds/limits
    GCT->>Mem: compute trigger bytes
    GCT->>GCT: apply updated thresholds / memory limits
  end

  Cluster->>GCT: periodic UpdateIfNeeded(c.getGCTunerConfig())
  Cluster-->>GCT: Stop() on shutdown
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested labels

size/XXL

Suggested reviewers

  • lhy1024
  • JmPotato

Poem

"I hopped through code at break of dawn,
Tuned thresholds where old bugs yawn.
Mutex snug and memory light,
GC hums softly through the night.
— a rabbit, pleased with every byte 🐇"

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is incomplete: missing a formal 'Issue Number: Close #xxx' format (uses 'Ref' instead), empty commit-message block, and no 'What is changed and how does it work?' details despite significant API and configuration changes. Add formal issue reference, provide detailed explanation of GC tuner integration changes and rationale, and fill the commit-message block for the final commit history.
Docstring Coverage ⚠️ Warning Docstring coverage is 53.85% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The PR title references the main change (GC tuner support in scheduling MCS) but uses abbreviated terminology ('mcs support gctuner') that could be clearer and more specific.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ti-chi-bot ti-chi-bot Bot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Feb 4, 2026
@bufferflies bufferflies marked this pull request as ready for review February 4, 2026 08:33
@ti-chi-bot ti-chi-bot Bot removed do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. do-not-merge/needs-linked-issue labels Feb 4, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Feb 4, 2026

Codecov Report

❌ Patch coverage is 93.51852% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 78.94%. Comparing base (b668f43) to head (3457ebf).
⚠️ Report is 51 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #10212      +/-   ##
==========================================
+ Coverage   78.59%   78.94%   +0.35%     
==========================================
  Files         520      529       +9     
  Lines       70014    71162    +1148     
==========================================
+ Hits        55028    56180    +1152     
+ Misses      11008    10972      -36     
- Partials     3978     4010      +32     
Flag Coverage Δ
unittests 78.94% <93.51%> (+0.35%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Fix all issues with AI agents
In `@pkg/gctuner/tuner_test.go`:
- Around line 129-153: Tests mutate package globals (EnableGOGCTuner,
memory.ServerMemoryLimit, GlobalMemoryLimitTuner/globalTuner via Tuning) and do
not restore them; update TestInitGCTuner, TestInitGCTunerWithZeroMemoryLimit,
and TestUpdateIfNeeded to capture current values at test start (e.g., prevEnable
:= EnableGOGCTuner.Load(), prevServerLimit := memory.ServerMemoryLimit,
prevGlobal := GlobalMemoryLimitTuner) and register t.Cleanup() to restore those
values (set EnableGOGCTuner back, reset memory.ServerMemoryLimit, and call
Tuning(prevGlobal) or reassign GlobalMemoryLimitTuner appropriately) so global
state is returned after each test.

In `@pkg/gctuner/tuner.go`:
- Around line 271-283: The update block currently only triggers when
newMemoryLimitBytes or newMemoryLimitGCTriggerBytes change, so changes to
newMemoryLimitGCTriggerRatio can be ignored due to rounding; modify the
conditional that gates the update to also compare newMemoryLimitGCTriggerRatio
against s.MemoryLimitGCTriggerRatio (and/or include an epsilon for float
comparison) so that when the GC trigger ratio changes you still set
s.MemoryLimitGCTriggerRatio, call GlobalMemoryLimitTuner.SetPercentage and
UpdateMemoryLimit; update references in this block to use
newMemoryLimitGCTriggerRatio, s.MemoryLimitGCTriggerRatio,
GlobalMemoryLimitTuner, memory.ServerMemoryLimit.Store and ensure updated is set
when the ratio-only change occurs.

In `@pkg/mcs/scheduling/server/config/config.go`:
- Around line 149-152: The ServerConfig adjustment is using the wrong metadata
key: replace the call c.Server.Adjust(configMetaData.Child("server")) with
c.Server.Adjust(configMetaData.Child("pd-server")) so the Adjust on ServerConfig
(c.Server) uses the TOML tag name `pd-server` and preserves user-provided values
instead of treating them as undefined; update the metadata key where
c.Server.Adjust is invoked to "pd-server".

In `@pkg/mcs/scheduling/server/config/watcher.go`:
- Around line 92-97: Replace the plain string concatenation error creation with
errors.Wrap to preserve context: after calling memory.MemTotal() in the watcher
initialization, keep the cancel() call but return errors.Wrap(err, "fail to get
total memory") instead of errors.New("fail to get total memory: "+err.Error());
this uses github.com/pingcap/errors and preserves the original error and stack
trace while keeping the same control flow around memory.MemTotal() and cancel().

Comment thread pkg/gctuner/tuner_test.go
Comment thread pkg/gctuner/tuner.go
Comment thread pkg/mcs/scheduling/server/config/config.go Outdated
Comment thread pkg/mcs/scheduling/server/config/watcher.go
Signed-off-by: bufferflies <1045931706@qq.com>
Copy link
Copy Markdown
Member

@okJiang okJiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// Tuning sets the threshold of heap which will be respect by gogc tuner.
// When Tuning, the env GOGC will not be take effect.
// threshold: disable tuning if threshold == 0
func Tuning(threshold uint64) {
	// disable gc tuner if percent is zero
	if t := globalTuner.Load(); t == nil {
		t1 := newTuner(threshold)
		globalTuner.CompareAndSwap(nil, &t1)
	} else {
		if threshold <= 0 {
			(*t).stop()
			globalTuner.CompareAndSwap(t, nil)
		} else {
			(*t).setThreshold(threshold)
		}
	}
}

The Tuning annotation says that threshold==0 is disabled, but newTuner(0) still starts the finalizer loop when globalTuner is empty; And if CompareAndSwap(nil, &t1) fails during concurrent initialization, the tuner created by the losing party will not stop (it may survive for a long time and participate in parameter adjustment when enabled later).

@okJiang
Copy link
Copy Markdown
Member

okJiang commented Feb 5, 2026

Do other microservice components also need to support gctunner?

@rleungx
Copy link
Copy Markdown
Member

rleungx commented Feb 5, 2026

Do other microservice components also need to support gctunner?

We should support it, and we can enable it for the specified case.

@bufferflies
Copy link
Copy Markdown
Contributor Author

// Tuning sets the threshold of heap which will be respect by gogc tuner.
// When Tuning, the env GOGC will not be take effect.
// threshold: disable tuning if threshold == 0
func Tuning(threshold uint64) {
	// disable gc tuner if percent is zero
	if t := globalTuner.Load(); t == nil {
		t1 := newTuner(threshold)
		globalTuner.CompareAndSwap(nil, &t1)
	} else {
		if threshold <= 0 {
			(*t).stop()
			globalTuner.CompareAndSwap(t, nil)
		} else {
			(*t).setThreshold(threshold)
		}
	}
}

The Tuning annotation says that threshold==0 is disabled, but newTuner(0) still starts the finalizer loop when globalTuner is empty; And if CompareAndSwap(nil, &t1) fails during concurrent initialization, the tuner created by the losing party will not stop (it may survive for a long time and participate in parameter adjustment when enabled later).

Yes, the tuning should fix this bug

Signed-off-by: bufferflies <1045931706@qq.com>
@bufferflies
Copy link
Copy Markdown
Contributor Author

// Tuning sets the threshold of heap which will be respect by gogc tuner.
// When Tuning, the env GOGC will not be take effect.
// threshold: disable tuning if threshold == 0
func Tuning(threshold uint64) {
	// disable gc tuner if percent is zero
	if t := globalTuner.Load(); t == nil {
		t1 := newTuner(threshold)
		globalTuner.CompareAndSwap(nil, &t1)
	} else {
		if threshold <= 0 {
			(*t).stop()
			globalTuner.CompareAndSwap(t, nil)
		} else {
			(*t).setThreshold(threshold)
		}
	}
}

The Tuning annotation says that threshold==0 is disabled, but newTuner(0) still starts the finalizer loop when globalTuner is empty; And if CompareAndSwap(nil, &t1) fails during concurrent initialization, the tuner created by the losing party will not stop (it may survive for a long time and participate in parameter adjustment when enabled later).

Yes, have fixed this bug

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
pkg/gctuner/tuner.go (1)

67-82: ⚠️ Potential issue | 🟠 Major

Leaked tuner on concurrent CompareAndSwap failure.

When two goroutines call Tuning(threshold) concurrently with threshold > 0 and globalTuner is nil, both create a newTuner (which starts a finalizer loop). The loser of CompareAndSwap silently discards its tuner without calling stop(), leaking a goroutine. This was also identified by @bufferflies in the PR discussion.

🔧 Proposed fix — stop the tuner if CAS fails
 	if t := globalTuner.Load(); t == nil {
 		// init gc tuner only when threshold > 0, otherwise do nothing
 		if threshold > 0 {
 			t1 := newTuner(threshold)
-			globalTuner.CompareAndSwap(nil, &t1)
+			if !globalTuner.CompareAndSwap(nil, &t1) {
+				t1.stop()
+				// Another goroutine won; update its threshold instead.
+				if t2 := globalTuner.Load(); t2 != nil {
+					(*t2).setThreshold(threshold)
+				}
+			}
 		}
🧹 Nitpick comments (2)
pkg/gctuner/tuner.go (2)

249-259: Duplicated computation between InitGCTuner and UpdateIfNeeded.

Lines 253-259 replicate the same derivation logic as lines 221-227 in InitGCTuner. Consider extracting a helper (e.g., computeState(totalMem uint64, cfg *Config) (...)) to keep the two paths in sync and reduce maintenance burden.

♻️ Sketch
func computeDerivedState(totalMem uint64, cfg *Config) (memLimitBytes, gcThresholdBytes uint64, gcTriggerRatio float64, gcTriggerBytes uint64) {
	memLimitBytes = uint64(float64(totalMem) * cfg.ServerMemoryLimit)
	gcThresholdBytes = uint64(float64(memLimitBytes) * cfg.GCTunerThreshold)
	if memLimitBytes == 0 {
		gcThresholdBytes = uint64(float64(totalMem) * cfg.GCTunerThreshold)
	}
	gcTriggerRatio = cfg.ServerMemoryLimitGCTrigger
	gcTriggerBytes = uint64(float64(memLimitBytes) * gcTriggerRatio)
	return
}

Then call it from both InitGCTuner and UpdateIfNeeded.


218-227: MemoryLimitGCTriggerBytes is never used downstream, so the inconsistency is a code-cleanliness issue, not a functional bug.

While the inconsistency is real—GCThresholdBytes has a fallback to totalMem when MemoryLimitBytes == 0 (lines 223-225), but MemoryLimitGCTriggerBytes is computed as 0 * ratio = 0 with no fallback (line 227)—the field itself is never read anywhere in the codebase. The actual memory limit tuning uses MemoryLimitGCTriggerRatio (the ratio) and ServerMemoryLimit directly; GlobalMemoryLimitTuner.calcMemoryLimit() has its own fallback logic that returns math.MaxInt64 when the computed limit is 0.

The same pattern exists in UpdateIfNeeded at line 259. Since MemoryLimitGCTriggerBytes is private and unused, consider removing it entirely or adding a fallback for consistency.

@ti-chi-bot ti-chi-bot Bot added the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Feb 6, 2026
@bufferflies
Copy link
Copy Markdown
Contributor Author

/ping @lhy1024

Comment thread pkg/gctuner/tuner.go
Comment thread pkg/mcs/scheduling/server/config/config.go Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/gctuner/tuner.go`:
- Around line 68-88: Concurrent calls to Tuning can allocate a newTuner whose
finalizer loop starts immediately and gets leaked if the goroutine loses the
CompareAndSwap on globalTuner; modify Tuning so that newTuner does not start its
finalizer until after globalTuner.CompareAndSwap(nil, &t1) returns true, or if
you must create the tuner before the CAS then call (*t1).stop() (or t1.stop())
when CompareAndSwap fails; specifically change the flow around
newTuner(threshold) + globalTuner.CompareAndSwap(nil, &t1) to either defer
starting the finalizer inside newTuner until after CAS success or invoke the
tuner's stop method on the loser path so no running finalizer is left behind,
and ensure code uses the same symbols Tuning, globalTuner, newTuner,
CompareAndSwap, and stop/setThreshold when implementing this fix.

In `@pkg/mcs/scheduling/server/config/config.go`:
- Around line 302-305: PersistConfig.setServerConfig currently stores the
incoming cfg directly, bypassing validation/clamping; call cfg.Adjust(nil) (or
equivalent clamp-only variant) before storing to ensure ServerMemoryLimit and
other fields are validated and defaults applied so downstream users like
GetGCTunerConfig() receive sane values; update PersistConfig.setServerConfig to
run ServerConfig.Adjust() on the provided cfg and then
o.serverConfig.Store(cfg).

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c98acc6e-702e-47e1-a616-f5cf766226be

📥 Commits

Reviewing files that changed from the base of the PR and between b3dac8b and 33451c927a8cba355e7efdba40692f850c575da5.

📒 Files selected for processing (3)
  • pkg/gctuner/tuner.go
  • pkg/mcs/scheduling/server/config/config.go
  • pkg/mcs/scheduling/server/server.go

Comment thread pkg/gctuner/tuner.go Outdated
Comment thread pkg/mcs/scheduling/server/config/config.go
Signed-off-by: tongjian <1045931706@qq.com>
@bufferflies
Copy link
Copy Markdown
Contributor Author

/retest

@bufferflies
Copy link
Copy Markdown
Contributor Author

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 12, 2026

✅ Actions performed

Full review triggered.

@bufferflies
Copy link
Copy Markdown
Contributor Author

/retest

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
pkg/gctuner/tuner_test.go (1)

134-142: ⚠️ Potential issue | 🟡 Minor

Restore the previous process-wide GC settings in all three tests.

These cases still mutate EnableGOGCTuner, memory.ServerMemoryLimit, GlobalMemoryLimitTuner, and the singleton tuner, but only TestInitGCTuner has partial cleanup and the other two still finish with a hard Tuning(0). That can bleed into later tests, and it also clears any pre-existing singleton instead of restoring it. Based on learnings: globalTuner is intentionally process-scoped and should remain active with its last configured parameters even after a watcher or leadership change.

Also applies to: 161-180, 182-234

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/gctuner/tuner_test.go` around lines 134 - 142, Tests mutate
process-global state (EnableGOGCTuner, memory.ServerMemoryLimit,
GlobalMemoryLimitTuner and the singleton globalTuner) but only TestInitGCTuner
partially restores state; update all three tests (including those covering lines
~161-180 and ~182-234) to capture and restore the prior values: snapshot
EnableGOGCTuner.Load(), memory.ServerMemoryLimit.Load(),
GlobalMemoryLimitTuner.Load() (or its equivalent), and the current globalTuner
singleton before mutating, then in t.Cleanup restore each saved value and
reassign the original singleton instead of calling Tuning(0); ensure cleanup
uses the exact symbols EnableGOGCTuner, memory.ServerMemoryLimit,
GlobalMemoryLimitTuner and globalTuner so the process-wide GC settings and
singleton are fully restored.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/mcs/scheduling/server/config/watcher.go`:
- Around line 76-82: The persistedConfig struct's Server field is a value type
so unmarshalling older JSON that lacks "pd-server" yields a zeroed
sc.ServerConfig and wipes existing runtime config; change persistedConfig.Server
to a pointer (*sc.ServerConfig) and update the code paths that read it (notably
setServerConfig and UpdateIfNeeded) to treat nil as "no persisted
override"—i.e., if persistedConfig.Server == nil then keep the existing server
config unchanged, otherwise apply the persisted values; also update the other
occurrences mentioned (lines ~160-164) to handle the pointer-to-server config
safely.
- Around line 111-113: The code currently calls gctuner.InitGCTuner using
persistConfig.GetGCTunerConfig() during watcher initialization (cw.gcTunerState
= gctuner.InitGCTuner(totalMem, gcCfg)), which uses the bootstrap/local Server
values; instead, remove/skip this early InitGCTuner call and defer
initializing/updating the process-wide global tuner until the first successful
PD config load is applied. Concretely: stop calling gctuner.InitGCTuner in the
watcher startup path (where cw.gcTunerState and totalMem are used) and move the
InitGCTuner invocation into the code path that handles applying a newly loaded
PD config (the function that processes the first loaded persistConfig / applies
config changes), using the GCTuner config from that PD config; keep globalTuner
process-scoped so subsequent watcher restarts or leadership changes do not
overwrite the last applied tuner unless an actual PD config update occurs.

---

Duplicate comments:
In `@pkg/gctuner/tuner_test.go`:
- Around line 134-142: Tests mutate process-global state (EnableGOGCTuner,
memory.ServerMemoryLimit, GlobalMemoryLimitTuner and the singleton globalTuner)
but only TestInitGCTuner partially restores state; update all three tests
(including those covering lines ~161-180 and ~182-234) to capture and restore
the prior values: snapshot EnableGOGCTuner.Load(),
memory.ServerMemoryLimit.Load(), GlobalMemoryLimitTuner.Load() (or its
equivalent), and the current globalTuner singleton before mutating, then in
t.Cleanup restore each saved value and reassign the original singleton instead
of calling Tuning(0); ensure cleanup uses the exact symbols EnableGOGCTuner,
memory.ServerMemoryLimit, GlobalMemoryLimitTuner and globalTuner so the
process-wide GC settings and singleton are fully restored.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 649d2a30-f9f8-47b1-909d-b037b2fa9b0a

📥 Commits

Reviewing files that changed from the base of the PR and between 58252c5 and 81267f4.

📒 Files selected for processing (8)
  • pkg/gctuner/tuner.go
  • pkg/gctuner/tuner_test.go
  • pkg/mcs/scheduling/server/config/config.go
  • pkg/mcs/scheduling/server/config/watcher.go
  • pkg/mcs/scheduling/server/server.go
  • pkg/schedule/config/config.go
  • server/cluster/cluster.go
  • tests/integrations/mcs/scheduling/config_test.go

Comment thread pkg/mcs/scheduling/server/config/watcher.go
Comment thread pkg/mcs/scheduling/server/config/watcher.go Outdated
Comment thread pkg/mcs/scheduling/server/server.go Outdated
cfg.Replication = *s.persistConfig.GetReplicationConfig().Clone()
cfg.ClusterVersion = *s.persistConfig.GetClusterVersion()
cfg.Schedule.MaxMergeRegionKeys = cfg.Schedule.GetMaxMergeRegionKeys()
cfg.Server = *s.persistConfig.GetServerConfig()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we also modify the getConfig function for /config?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, it just show the config

@bufferflies
Copy link
Copy Markdown
Contributor Author

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 13, 2026

✅ Actions performed

Full review triggered.

Comment thread pkg/mcs/scheduling/server/config/watcher.go Outdated
@bufferflies bufferflies force-pushed the feat/gc branch 2 times, most recently from 92a6121 to 93d2afd Compare March 13, 2026 06:31
Signed-off-by: tongjian <1045931706@qq.com>
@bufferflies
Copy link
Copy Markdown
Contributor Author

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 13, 2026

✅ Actions performed

Full review triggered.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
pkg/gctuner/tuner_test.go (1)

161-180: ⚠️ Potential issue | 🟡 Minor

Restore mutated globals in the remaining two tests as well.

TestInitGCTunerWithZeroMemoryLimit and TestUpdateIfNeeded still leave EnableGOGCTuner and memory.ServerMemoryLimit mutated after completion, which can leak state across tests.

♻️ Suggested fix
 func TestInitGCTunerWithZeroMemoryLimit(t *testing.T) {
 	re := require.New(t)
+	prevEnable := EnableGOGCTuner.Load()
+	prevMemLimit := memory.ServerMemoryLimit.Load()
+	t.Cleanup(func() {
+		EnableGOGCTuner.Store(prevEnable)
+		memory.ServerMemoryLimit.Store(prevMemLimit)
+		Tuning(0)
+	})

 	totalMem := uint64(1000000000) // 1GB
@@
-	// Cleanup
-	Tuning(0)
 }

 func TestUpdateIfNeeded(t *testing.T) {
 	re := require.New(t)
+	prevEnable := EnableGOGCTuner.Load()
+	prevMemLimit := memory.ServerMemoryLimit.Load()
+	t.Cleanup(func() {
+		EnableGOGCTuner.Store(prevEnable)
+		memory.ServerMemoryLimit.Store(prevMemLimit)
+		Tuning(0)
+	})

 	totalMem := uint64(1000000000) // 1GB
@@
-	// Cleanup
-	Tuning(0)
 }

Also applies to: 182-234

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/gctuner/tuner_test.go` around lines 161 - 180, The two tests
(TestInitGCTunerWithZeroMemoryLimit and TestUpdateIfNeeded) mutate package
globals EnableGOGCTuner and memory.ServerMemoryLimit and do not restore them;
fix each test by saving the original values for EnableGOGCTuner and
memory.ServerMemoryLimit at the start, then defer restoring them (e.g.,
origEnable := EnableGOGCTuner; origLimit := memory.ServerMemoryLimit; defer
func(){ EnableGOGCTuner = origEnable; memory.ServerMemoryLimit = origLimit }())
so test state is isolated, and keep the existing Tuning(0) cleanup as needed.
🧹 Nitpick comments (2)
pkg/mcs/scheduling/server/config/config.go (2)

84-85: Align the exported field comment with GoDoc style.

The comment should start with Server to match exported-identifier documentation style.

✍️ Suggested wording
-	// This config is sync with the pd server.
+	// Server is synchronized with the PD server configuration.
 	Server sc.ServerConfig `toml:"pd-server" json:"pd-server"`

As per coding guidelines, **/*.go: “Exported identifiers need GoDoc starting with the name.”

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/mcs/scheduling/server/config/config.go` around lines 84 - 85, Update the
comment for the exported field Server in the config struct so it follows GoDoc
style by starting with the identifier name "Server"; locate the struct field
Server (type sc.ServerConfig, tags `toml:"pd-server" json:"pd-server"`) and
replace the current comment "// This config is sync with the pd server." with a
GoDoc-style comment that begins "Server ..." and briefly describes the field's
purpose.

205-206: Avoid shared mutable aliasing for persisted server config.

setServerConfig stores the caller-owned pointer directly, while GetServerConfig claims to return a cloned config but returns the same pointer. This makes accidental external mutation possible and breaks the documented contract.

♻️ Suggested fix
 func NewPersistConfig(cfg *Config, ttl *cache.TTLString) *PersistConfig {
 	o := &PersistConfig{}
 	o.SetClusterVersion(&cfg.ClusterVersion)
 	o.schedule.Store(&cfg.Schedule)
 	o.replication.Store(&cfg.Replication)
-	o.serverConfig.Store(&cfg.Server)
+	o.setServerConfig(&cfg.Server)
 	// storeConfig will be fetched from TiKV by PD,
 	// so we just set an empty value here first.
 	o.storeConfig.Store(&sc.StoreConfig{})
@@
 func (o *PersistConfig) setServerConfig(cfg *sc.ServerConfig) {
-	// Some of the fields won't be persisted and watched,
-	o.serverConfig.Store(cfg)
+	if cfg == nil {
+		var empty *sc.ServerConfig
+		o.serverConfig.Store(empty)
+		return
+	}
+	o.serverConfig.Store(cfg.Clone())
 }
 
 // GetServerConfig returns the cloned server configuration.
 func (o *PersistConfig) GetServerConfig() *sc.ServerConfig {
-	return o.serverConfig.Load().(*sc.ServerConfig)
+	cfg := o.serverConfig.Load().(*sc.ServerConfig)
+	if cfg == nil {
+		return nil
+	}
+	return cfg.Clone()
 }

Also applies to: 215-215, 300-308

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/mcs/scheduling/server/config/config.go` around lines 205 - 206,
setServerConfig currently stores a caller-owned pointer into the shared
atomic.Value and GetServerConfig returns that same pointer, causing shared
mutable aliasing; change both to copy-on-write: in setServerConfig store a
deep-cloned copy (not the incoming pointer) into serverConfig, and in
GetServerConfig return a cloned/deep-copied instance (not the stored pointer).
Locate the atomic.Value named serverConfig and update setServerConfig and
GetServerConfig to perform cloning (use the existing config type's copy/clone
helper or implement a field-wise/deep clone) so no external mutation can affect
the persisted config.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@pkg/gctuner/tuner_test.go`:
- Around line 161-180: The two tests (TestInitGCTunerWithZeroMemoryLimit and
TestUpdateIfNeeded) mutate package globals EnableGOGCTuner and
memory.ServerMemoryLimit and do not restore them; fix each test by saving the
original values for EnableGOGCTuner and memory.ServerMemoryLimit at the start,
then defer restoring them (e.g., origEnable := EnableGOGCTuner; origLimit :=
memory.ServerMemoryLimit; defer func(){ EnableGOGCTuner = origEnable;
memory.ServerMemoryLimit = origLimit }()) so test state is isolated, and keep
the existing Tuning(0) cleanup as needed.

---

Nitpick comments:
In `@pkg/mcs/scheduling/server/config/config.go`:
- Around line 84-85: Update the comment for the exported field Server in the
config struct so it follows GoDoc style by starting with the identifier name
"Server"; locate the struct field Server (type sc.ServerConfig, tags
`toml:"pd-server" json:"pd-server"`) and replace the current comment "// This
config is sync with the pd server." with a GoDoc-style comment that begins
"Server ..." and briefly describes the field's purpose.
- Around line 205-206: setServerConfig currently stores a caller-owned pointer
into the shared atomic.Value and GetServerConfig returns that same pointer,
causing shared mutable aliasing; change both to copy-on-write: in
setServerConfig store a deep-cloned copy (not the incoming pointer) into
serverConfig, and in GetServerConfig return a cloned/deep-copied instance (not
the stored pointer). Locate the atomic.Value named serverConfig and update
setServerConfig and GetServerConfig to perform cloning (use the existing config
type's copy/clone helper or implement a field-wise/deep clone) so no external
mutation can affect the persisted config.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 40408e79-3640-45ab-9239-0592fff79bcd

📥 Commits

Reviewing files that changed from the base of the PR and between b2ac112 and 052746b.

📒 Files selected for processing (9)
  • pkg/gctuner/tuner.go
  • pkg/gctuner/tuner_test.go
  • pkg/mcs/scheduling/server/apis/v1/api.go
  • pkg/mcs/scheduling/server/config/config.go
  • pkg/mcs/scheduling/server/config/watcher.go
  • pkg/mcs/scheduling/server/server.go
  • pkg/schedule/config/config.go
  • server/cluster/cluster.go
  • tests/integrations/mcs/scheduling/config_test.go

@bufferflies
Copy link
Copy Markdown
Contributor Author

/retest

@bufferflies bufferflies requested a review from rleungx March 17, 2026 02:31
@bufferflies
Copy link
Copy Markdown
Contributor Author

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 17, 2026

✅ Actions performed

Full review triggered.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (1)
pkg/gctuner/tuner_test.go (1)

163-182: ⚠️ Potential issue | 🟡 Minor

Global state is still leaking across tests in these two cases.

InitGCTuner/UpdateIfNeeded mutate package-level state, but these tests don’t restore EnableGOGCTuner and memory.ServerMemoryLimit. Please add t.Cleanup (as done in TestInitGCTuner) to both tests.

🛠️ Suggested fix
func TestInitGCTunerWithZeroMemoryLimit(t *testing.T) {
	re := require.New(t)
+	prevEnable := EnableGOGCTuner.Load()
+	prevMemLimit := memory.ServerMemoryLimit.Load()
+	t.Cleanup(func() {
+		EnableGOGCTuner.Store(prevEnable)
+		memory.ServerMemoryLimit.Store(prevMemLimit)
+		Tuning(0)
+	})

	totalMem := uint64(1000000000) // 1GB
	...
}

func TestUpdateIfNeeded(t *testing.T) {
	re := require.New(t)
+	prevEnable := EnableGOGCTuner.Load()
+	prevMemLimit := memory.ServerMemoryLimit.Load()
+	t.Cleanup(func() {
+		EnableGOGCTuner.Store(prevEnable)
+		memory.ServerMemoryLimit.Store(prevMemLimit)
+		Tuning(0)
+	})

	totalMem := uint64(1000000000) // 1GB
	...
}

Also applies to: 184-236

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/gctuner/tuner_test.go` around lines 163 - 182, The tests calling
InitGCTuner/UpdateIfNeeded are mutating package globals (EnableGOGCTuner and
memory.ServerMemoryLimit) and must restore them after the test; add t.Cleanup
handlers in the TestInitGCTunerWithZeroMemoryLimit test and the other test that
covers lines 184-236 to save the current values of EnableGOGCTuner and
memory.ServerMemoryLimit, and restore them in t.Cleanup so global state won't
leak between tests; reference InitGCTuner, UpdateIfNeeded, EnableGOGCTuner, and
memory.ServerMemoryLimit when locating where to add the cleanup.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/gctuner/tuner.go`:
- Around line 294-297: State.Stop() must not reset the process-scoped GC tuner;
remove the Tuning(0) call so Stop does not call the global Tuning function and
thus does not reset globalTuner. Instead, only perform local State cleanup
(e.g., stop watcher goroutines or cancel contexts owned by State) without
invoking Tuning; ensure no other teardown paths for State call Tuning(0) so
globalTuner retains its last configuration across watcher/leadership
transitions.

In `@pkg/mcs/scheduling/server/config/watcher.go`:
- Around line 273-275: The Close() implementation currently calls
cw.gcTunerState.Stop(), which stops the process-scoped GC tuner and resets
global GC tuning; remove or guard that call so watcher teardown does not call
gcTunerState.Stop(). Specifically, in the Close() method do not call
cw.gcTunerState.Stop() (or only stop when explicitly created per-process, not
for the shared/global tuner) so that the global tuner state (globalTuner)
remains active with its last parameters across watcher/leadership transitions;
update any comments to note that cw.gcTunerState must not be stopped on watcher
teardown.

In `@server/cluster/cluster.go`:
- Around line 580-583: The cluster shutdown handler currently calls state.Stop()
when c.ctx.Done() fires, which resets the process-wide GC tuner; remove or guard
that call so the GC tuner (globalTuner) is NOT stopped on cluster context
cancellation/leadership changes. Specifically, in the block handling
<-c.ctx.Done() (where state.Stop() is invoked), stop calling state.Stop()
unconditionally — either delete that invocation or wrap it behind a true
process-termination condition (e.g., a distinct shutdown signal) so that
state.Stop() is only executed when the whole process is exiting, leaving
globalTuner configured across leadership transitions.

---

Duplicate comments:
In `@pkg/gctuner/tuner_test.go`:
- Around line 163-182: The tests calling InitGCTuner/UpdateIfNeeded are mutating
package globals (EnableGOGCTuner and memory.ServerMemoryLimit) and must restore
them after the test; add t.Cleanup handlers in the
TestInitGCTunerWithZeroMemoryLimit test and the other test that covers lines
184-236 to save the current values of EnableGOGCTuner and
memory.ServerMemoryLimit, and restore them in t.Cleanup so global state won't
leak between tests; reference InitGCTuner, UpdateIfNeeded, EnableGOGCTuner, and
memory.ServerMemoryLimit when locating where to add the cleanup.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 37a3c383-5331-4aa7-bd3f-37bb15e7d7b1

📥 Commits

Reviewing files that changed from the base of the PR and between b2ac112 and 241c67e.

📒 Files selected for processing (9)
  • pkg/gctuner/tuner.go
  • pkg/gctuner/tuner_test.go
  • pkg/mcs/scheduling/server/apis/v1/api.go
  • pkg/mcs/scheduling/server/config/config.go
  • pkg/mcs/scheduling/server/config/watcher.go
  • pkg/mcs/scheduling/server/server.go
  • pkg/schedule/config/config.go
  • server/cluster/cluster.go
  • tests/integrations/mcs/scheduling/config_test.go

Comment thread pkg/gctuner/tuner.go
Comment on lines +273 to +275
if cw.gcTunerState != nil {
cw.gcTunerState.Stop()
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Do not stop process-scoped GC tuner in watcher teardown.

On Line 273, Close() stops gcTunerState, which resets global GC tuning when the watcher closes (e.g., leadership/watcher transitions). That breaks the intended persistence of tuner state.

🛠️ Suggested fix
func (cw *Watcher) Close() {
	cw.cancel()
-	if cw.gcTunerState != nil {
-		cw.gcTunerState.Stop()
-	}
	cw.wg.Wait()
}

Based on learnings: In tikv/pd pkg/gctuner, the GC tuner (globalTuner) is intentionally process-scoped and should remain active with its last configured parameters even after a watcher or leadership change, and callers must NOT reset global GC state on watcher teardown.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if cw.gcTunerState != nil {
cw.gcTunerState.Stop()
}
func (cw *Watcher) Close() {
cw.cancel()
cw.wg.Wait()
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/mcs/scheduling/server/config/watcher.go` around lines 273 - 275, The
Close() implementation currently calls cw.gcTunerState.Stop(), which stops the
process-scoped GC tuner and resets global GC tuning; remove or guard that call
so watcher teardown does not call gcTunerState.Stop(). Specifically, in the
Close() method do not call cw.gcTunerState.Stop() (or only stop when explicitly
created per-process, not for the shared/global tuner) so that the global tuner
state (globalTuner) remains active with its last parameters across
watcher/leadership transitions; update any comments to note that cw.gcTunerState
must not be stopped on watcher teardown.

Comment thread server/cluster/cluster.go
Signed-off-by: tongjian <1045931706@qq.com>
@bufferflies
Copy link
Copy Markdown
Contributor Author

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 17, 2026

✅ Actions performed

Full review triggered.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
pkg/gctuner/tuner_test.go (1)

163-236: ⚠️ Potential issue | 🟡 Minor

Add global state cleanup to TestInitGCTunerWithZeroMemoryLimit and TestUpdateIfNeeded.

Unlike TestInitGCTuner, these tests don't capture and restore EnableGOGCTuner and memory.ServerMemoryLimit. This can cause test pollution, especially since TestUpdateIfNeeded sets EnableGOGCTuner to false at line 214.

🛠️ Suggested fix for TestInitGCTunerWithZeroMemoryLimit
 func TestInitGCTunerWithZeroMemoryLimit(t *testing.T) {
 	re := require.New(t)
+	prevEnable := EnableGOGCTuner.Load()
+	prevMemLimit := memory.ServerMemoryLimit.Load()
+	t.Cleanup(func() {
+		EnableGOGCTuner.Store(prevEnable)
+		memory.ServerMemoryLimit.Store(prevMemLimit)
+		Tuning(0)
+	})

 	totalMem := uint64(1000000000) // 1GB
🛠️ Suggested fix for TestUpdateIfNeeded
 func TestUpdateIfNeeded(t *testing.T) {
 	re := require.New(t)
+	prevEnable := EnableGOGCTuner.Load()
+	prevMemLimit := memory.ServerMemoryLimit.Load()
+	t.Cleanup(func() {
+		EnableGOGCTuner.Store(prevEnable)
+		memory.ServerMemoryLimit.Store(prevMemLimit)
+		Tuning(0)
+	})

 	totalMem := uint64(1000000000) // 1GB
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/gctuner/tuner_test.go` around lines 163 - 236, Both
TestInitGCTunerWithZeroMemoryLimit and TestUpdateIfNeeded must save and restore
global state to avoid test pollution: capture the current EnableGOGCTuner value
and memory.ServerMemoryLimit before calling InitGCTuner (and any changes to
cfg), then defer restoring them at the start of each test; also ensure deferred
state.Stop() remains for InitGCTuner/InitGCTunerWithZeroMemoryLimit and that
TestUpdateIfNeeded restores EnableGOGCTuner after toggling it to false and
resets memory.ServerMemoryLimit to its original value so subsequent tests are
unaffected.
🧹 Nitpick comments (1)
pkg/mcs/scheduling/server/config/config.go (1)

782-795: Misleading comment: GetGCTunerConfig is used beyond tests.

The comment states "only used test" but this method is also used by watcher.go (line 161: gcCfg := cw.GetGCTunerConfig()). Consider updating the comment to reflect actual usage.

✏️ Suggested fix
-// GetGCTunerConfig returns the GC tuner configuration, only used test.
+// GetGCTunerConfig returns the GC tuner configuration derived from the server config.
 func (o *PersistConfig) GetGCTunerConfig() *gctuner.Config {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/mcs/scheduling/server/config/config.go` around lines 782 - 795, Update
the misleading function comment for GetGCTunerConfig: change "only used test" to
accurately reflect that this method is used by production code (e.g., watcher.go
via cw.GetGCTunerConfig) as well as tests; locate the GetGCTunerConfig function
in PersistConfig and update its GoDoc to describe its real purpose and callers
(returns GC tuner settings for server components such as the watcher) so the
comment matches usage.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@pkg/gctuner/tuner_test.go`:
- Around line 163-236: Both TestInitGCTunerWithZeroMemoryLimit and
TestUpdateIfNeeded must save and restore global state to avoid test pollution:
capture the current EnableGOGCTuner value and memory.ServerMemoryLimit before
calling InitGCTuner (and any changes to cfg), then defer restoring them at the
start of each test; also ensure deferred state.Stop() remains for
InitGCTuner/InitGCTunerWithZeroMemoryLimit and that TestUpdateIfNeeded restores
EnableGOGCTuner after toggling it to false and resets memory.ServerMemoryLimit
to its original value so subsequent tests are unaffected.

---

Nitpick comments:
In `@pkg/mcs/scheduling/server/config/config.go`:
- Around line 782-795: Update the misleading function comment for
GetGCTunerConfig: change "only used test" to accurately reflect that this method
is used by production code (e.g., watcher.go via cw.GetGCTunerConfig) as well as
tests; locate the GetGCTunerConfig function in PersistConfig and update its
GoDoc to describe its real purpose and callers (returns GC tuner settings for
server components such as the watcher) so the comment matches usage.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: cecda5a1-94d3-4a1c-9f00-5d79fcd354ac

📥 Commits

Reviewing files that changed from the base of the PR and between 161d571 and 3457ebf.

📒 Files selected for processing (9)
  • pkg/gctuner/tuner.go
  • pkg/gctuner/tuner_test.go
  • pkg/mcs/scheduling/server/apis/v1/api.go
  • pkg/mcs/scheduling/server/config/config.go
  • pkg/mcs/scheduling/server/config/watcher.go
  • pkg/mcs/scheduling/server/server.go
  • pkg/schedule/config/config.go
  • server/cluster/cluster.go
  • tests/integrations/mcs/scheduling/config_test.go

Copy link
Copy Markdown
Member

@rleungx rleungx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of correctness concerns from the review.

Comment thread pkg/gctuner/tuner.go
}

// Stop disables the GC tuner.
func (*State) Stop() {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

InitGCTuner / UpdateIfNeeded now also update the process-global memory-limit tuner, but Stop() only calls Tuning(0). When the scheduling primary steps down (or the cluster stops), memory.ServerMemoryLimit / debug.SetMemoryLimit can stay at the last applied value even though this watcher lifecycle is primary-scoped. Should we also reset/stop the memory-limit side here so we don't leave stale process-wide GC state behind?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it will keep the mem limit setting, but will stop tuning.

cw.SetScheduleConfig(&cfg.Schedule)
cw.SetReplicationConfig(&cfg.Replication)
cw.SetStoreConfig(&cfg.Store)
if cfg.Server != nil {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cfg.Server can be nil when the persisted config was written before the new pd-server field existed. In that case we skip the update here, but PersistConfig was initialized from config.NewConfig(), so serverConfig is still the zero value. That seems to change the effective behavior from PD defaults to EnableGOGCTuner=false / zero thresholds until someone persists a newer config. Could we initialize this from PD defaults or add an upgrade-path test for an old persisted config without pd-server?

Signed-off-by: tongjian <1045931706@qq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dco-signoff: yes Indicates the PR's author has signed the dco. lgtm release-note-none Denotes a PR that doesn't merit a release note. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants