mcs: scheduling mcs support gctuner#10212
Conversation
|
Skipping CI for Draft Pull Request. |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdd a thread-safe GC tuner (Config/State API) with Init/Update/Stop, wire it into server config, watcher, and cluster flows for dynamic tuning and memory-limit handling, and add unit/integration tests for config watch and tuner behavior. Changes
Sequence Diagram(s)sequenceDiagram
participant PersistConfig
participant Watcher
participant Mem as memory
participant GCT as gctuner.State
participant Cluster as RaftCluster
PersistConfig->>Watcher: persist initial ServerConfig
Watcher->>Mem: read total memory
Watcher->>GCT: InitGCTuner(totalMem, PersistConfig.GetGCTunerConfig())
GCT-->>Watcher: returned State
Note right of PersistConfig: on config change
PersistConfig->>Watcher: new ServerConfig
Watcher->>GCT: state.UpdateIfNeeded(newConfig)
alt config disables tuner
GCT->>GCT: Stop() (disable tuning / set GC to default)
else config updates thresholds/limits
GCT->>Mem: compute trigger bytes
GCT->>GCT: apply updated thresholds / memory limits
end
Cluster->>GCT: periodic UpdateIfNeeded(c.getGCTunerConfig())
Cluster-->>GCT: Stop() on shutdown
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #10212 +/- ##
==========================================
+ Coverage 78.59% 78.94% +0.35%
==========================================
Files 520 529 +9
Lines 70014 71162 +1148
==========================================
+ Hits 55028 56180 +1152
+ Misses 11008 10972 -36
- Partials 3978 4010 +32
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Fix all issues with AI agents
In `@pkg/gctuner/tuner_test.go`:
- Around line 129-153: Tests mutate package globals (EnableGOGCTuner,
memory.ServerMemoryLimit, GlobalMemoryLimitTuner/globalTuner via Tuning) and do
not restore them; update TestInitGCTuner, TestInitGCTunerWithZeroMemoryLimit,
and TestUpdateIfNeeded to capture current values at test start (e.g., prevEnable
:= EnableGOGCTuner.Load(), prevServerLimit := memory.ServerMemoryLimit,
prevGlobal := GlobalMemoryLimitTuner) and register t.Cleanup() to restore those
values (set EnableGOGCTuner back, reset memory.ServerMemoryLimit, and call
Tuning(prevGlobal) or reassign GlobalMemoryLimitTuner appropriately) so global
state is returned after each test.
In `@pkg/gctuner/tuner.go`:
- Around line 271-283: The update block currently only triggers when
newMemoryLimitBytes or newMemoryLimitGCTriggerBytes change, so changes to
newMemoryLimitGCTriggerRatio can be ignored due to rounding; modify the
conditional that gates the update to also compare newMemoryLimitGCTriggerRatio
against s.MemoryLimitGCTriggerRatio (and/or include an epsilon for float
comparison) so that when the GC trigger ratio changes you still set
s.MemoryLimitGCTriggerRatio, call GlobalMemoryLimitTuner.SetPercentage and
UpdateMemoryLimit; update references in this block to use
newMemoryLimitGCTriggerRatio, s.MemoryLimitGCTriggerRatio,
GlobalMemoryLimitTuner, memory.ServerMemoryLimit.Store and ensure updated is set
when the ratio-only change occurs.
In `@pkg/mcs/scheduling/server/config/config.go`:
- Around line 149-152: The ServerConfig adjustment is using the wrong metadata
key: replace the call c.Server.Adjust(configMetaData.Child("server")) with
c.Server.Adjust(configMetaData.Child("pd-server")) so the Adjust on ServerConfig
(c.Server) uses the TOML tag name `pd-server` and preserves user-provided values
instead of treating them as undefined; update the metadata key where
c.Server.Adjust is invoked to "pd-server".
In `@pkg/mcs/scheduling/server/config/watcher.go`:
- Around line 92-97: Replace the plain string concatenation error creation with
errors.Wrap to preserve context: after calling memory.MemTotal() in the watcher
initialization, keep the cancel() call but return errors.Wrap(err, "fail to get
total memory") instead of errors.New("fail to get total memory: "+err.Error());
this uses github.com/pingcap/errors and preserves the original error and stack
trace while keeping the same control flow around memory.MemTotal() and cancel().
Signed-off-by: bufferflies <1045931706@qq.com>
okJiang
left a comment
There was a problem hiding this comment.
// Tuning sets the threshold of heap which will be respect by gogc tuner.
// When Tuning, the env GOGC will not be take effect.
// threshold: disable tuning if threshold == 0
func Tuning(threshold uint64) {
// disable gc tuner if percent is zero
if t := globalTuner.Load(); t == nil {
t1 := newTuner(threshold)
globalTuner.CompareAndSwap(nil, &t1)
} else {
if threshold <= 0 {
(*t).stop()
globalTuner.CompareAndSwap(t, nil)
} else {
(*t).setThreshold(threshold)
}
}
}The Tuning annotation says that threshold==0 is disabled, but newTuner(0) still starts the finalizer loop when globalTuner is empty; And if CompareAndSwap(nil, &t1) fails during concurrent initialization, the tuner created by the losing party will not stop (it may survive for a long time and participate in parameter adjustment when enabled later).
|
Do other microservice components also need to support gctunner? |
We should support it, and we can enable it for the specified case. |
Yes, the tuning should fix this bug |
|
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
pkg/gctuner/tuner.go (1)
67-82:⚠️ Potential issue | 🟠 MajorLeaked tuner on concurrent
CompareAndSwapfailure.When two goroutines call
Tuning(threshold)concurrently withthreshold > 0andglobalTuneris nil, both create anewTuner(which starts a finalizer loop). The loser ofCompareAndSwapsilently discards its tuner without callingstop(), leaking a goroutine. This was also identified by@bufferfliesin the PR discussion.🔧 Proposed fix — stop the tuner if CAS fails
if t := globalTuner.Load(); t == nil { // init gc tuner only when threshold > 0, otherwise do nothing if threshold > 0 { t1 := newTuner(threshold) - globalTuner.CompareAndSwap(nil, &t1) + if !globalTuner.CompareAndSwap(nil, &t1) { + t1.stop() + // Another goroutine won; update its threshold instead. + if t2 := globalTuner.Load(); t2 != nil { + (*t2).setThreshold(threshold) + } + } }
🧹 Nitpick comments (2)
pkg/gctuner/tuner.go (2)
249-259: Duplicated computation betweenInitGCTunerandUpdateIfNeeded.Lines 253-259 replicate the same derivation logic as lines 221-227 in
InitGCTuner. Consider extracting a helper (e.g.,computeState(totalMem uint64, cfg *Config) (...)) to keep the two paths in sync and reduce maintenance burden.♻️ Sketch
func computeDerivedState(totalMem uint64, cfg *Config) (memLimitBytes, gcThresholdBytes uint64, gcTriggerRatio float64, gcTriggerBytes uint64) { memLimitBytes = uint64(float64(totalMem) * cfg.ServerMemoryLimit) gcThresholdBytes = uint64(float64(memLimitBytes) * cfg.GCTunerThreshold) if memLimitBytes == 0 { gcThresholdBytes = uint64(float64(totalMem) * cfg.GCTunerThreshold) } gcTriggerRatio = cfg.ServerMemoryLimitGCTrigger gcTriggerBytes = uint64(float64(memLimitBytes) * gcTriggerRatio) return }Then call it from both
InitGCTunerandUpdateIfNeeded.
218-227:MemoryLimitGCTriggerBytesis never used downstream, so the inconsistency is a code-cleanliness issue, not a functional bug.While the inconsistency is real—
GCThresholdByteshas a fallback tototalMemwhenMemoryLimitBytes == 0(lines 223-225), butMemoryLimitGCTriggerBytesis computed as0 * ratio = 0with no fallback (line 227)—the field itself is never read anywhere in the codebase. The actual memory limit tuning usesMemoryLimitGCTriggerRatio(the ratio) andServerMemoryLimitdirectly;GlobalMemoryLimitTuner.calcMemoryLimit()has its own fallback logic that returnsmath.MaxInt64when the computed limit is 0.The same pattern exists in
UpdateIfNeededat line 259. SinceMemoryLimitGCTriggerBytesis private and unused, consider removing it entirely or adding a fallback for consistency.
|
/ping @lhy1024 |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/gctuner/tuner.go`:
- Around line 68-88: Concurrent calls to Tuning can allocate a newTuner whose
finalizer loop starts immediately and gets leaked if the goroutine loses the
CompareAndSwap on globalTuner; modify Tuning so that newTuner does not start its
finalizer until after globalTuner.CompareAndSwap(nil, &t1) returns true, or if
you must create the tuner before the CAS then call (*t1).stop() (or t1.stop())
when CompareAndSwap fails; specifically change the flow around
newTuner(threshold) + globalTuner.CompareAndSwap(nil, &t1) to either defer
starting the finalizer inside newTuner until after CAS success or invoke the
tuner's stop method on the loser path so no running finalizer is left behind,
and ensure code uses the same symbols Tuning, globalTuner, newTuner,
CompareAndSwap, and stop/setThreshold when implementing this fix.
In `@pkg/mcs/scheduling/server/config/config.go`:
- Around line 302-305: PersistConfig.setServerConfig currently stores the
incoming cfg directly, bypassing validation/clamping; call cfg.Adjust(nil) (or
equivalent clamp-only variant) before storing to ensure ServerMemoryLimit and
other fields are validated and defaults applied so downstream users like
GetGCTunerConfig() receive sane values; update PersistConfig.setServerConfig to
run ServerConfig.Adjust() on the provided cfg and then
o.serverConfig.Store(cfg).
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: c98acc6e-702e-47e1-a616-f5cf766226be
📥 Commits
Reviewing files that changed from the base of the PR and between b3dac8b and 33451c927a8cba355e7efdba40692f850c575da5.
📒 Files selected for processing (3)
pkg/gctuner/tuner.gopkg/mcs/scheduling/server/config/config.gopkg/mcs/scheduling/server/server.go
Signed-off-by: tongjian <1045931706@qq.com>
|
/retest |
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
|
/retest |
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
pkg/gctuner/tuner_test.go (1)
134-142:⚠️ Potential issue | 🟡 MinorRestore the previous process-wide GC settings in all three tests.
These cases still mutate
EnableGOGCTuner,memory.ServerMemoryLimit,GlobalMemoryLimitTuner, and the singleton tuner, but onlyTestInitGCTunerhas partial cleanup and the other two still finish with a hardTuning(0). That can bleed into later tests, and it also clears any pre-existing singleton instead of restoring it. Based on learnings:globalTuneris intentionally process-scoped and should remain active with its last configured parameters even after a watcher or leadership change.Also applies to: 161-180, 182-234
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/gctuner/tuner_test.go` around lines 134 - 142, Tests mutate process-global state (EnableGOGCTuner, memory.ServerMemoryLimit, GlobalMemoryLimitTuner and the singleton globalTuner) but only TestInitGCTuner partially restores state; update all three tests (including those covering lines ~161-180 and ~182-234) to capture and restore the prior values: snapshot EnableGOGCTuner.Load(), memory.ServerMemoryLimit.Load(), GlobalMemoryLimitTuner.Load() (or its equivalent), and the current globalTuner singleton before mutating, then in t.Cleanup restore each saved value and reassign the original singleton instead of calling Tuning(0); ensure cleanup uses the exact symbols EnableGOGCTuner, memory.ServerMemoryLimit, GlobalMemoryLimitTuner and globalTuner so the process-wide GC settings and singleton are fully restored.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/mcs/scheduling/server/config/watcher.go`:
- Around line 76-82: The persistedConfig struct's Server field is a value type
so unmarshalling older JSON that lacks "pd-server" yields a zeroed
sc.ServerConfig and wipes existing runtime config; change persistedConfig.Server
to a pointer (*sc.ServerConfig) and update the code paths that read it (notably
setServerConfig and UpdateIfNeeded) to treat nil as "no persisted
override"—i.e., if persistedConfig.Server == nil then keep the existing server
config unchanged, otherwise apply the persisted values; also update the other
occurrences mentioned (lines ~160-164) to handle the pointer-to-server config
safely.
- Around line 111-113: The code currently calls gctuner.InitGCTuner using
persistConfig.GetGCTunerConfig() during watcher initialization (cw.gcTunerState
= gctuner.InitGCTuner(totalMem, gcCfg)), which uses the bootstrap/local Server
values; instead, remove/skip this early InitGCTuner call and defer
initializing/updating the process-wide global tuner until the first successful
PD config load is applied. Concretely: stop calling gctuner.InitGCTuner in the
watcher startup path (where cw.gcTunerState and totalMem are used) and move the
InitGCTuner invocation into the code path that handles applying a newly loaded
PD config (the function that processes the first loaded persistConfig / applies
config changes), using the GCTuner config from that PD config; keep globalTuner
process-scoped so subsequent watcher restarts or leadership changes do not
overwrite the last applied tuner unless an actual PD config update occurs.
---
Duplicate comments:
In `@pkg/gctuner/tuner_test.go`:
- Around line 134-142: Tests mutate process-global state (EnableGOGCTuner,
memory.ServerMemoryLimit, GlobalMemoryLimitTuner and the singleton globalTuner)
but only TestInitGCTuner partially restores state; update all three tests
(including those covering lines ~161-180 and ~182-234) to capture and restore
the prior values: snapshot EnableGOGCTuner.Load(),
memory.ServerMemoryLimit.Load(), GlobalMemoryLimitTuner.Load() (or its
equivalent), and the current globalTuner singleton before mutating, then in
t.Cleanup restore each saved value and reassign the original singleton instead
of calling Tuning(0); ensure cleanup uses the exact symbols EnableGOGCTuner,
memory.ServerMemoryLimit, GlobalMemoryLimitTuner and globalTuner so the
process-wide GC settings and singleton are fully restored.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 649d2a30-f9f8-47b1-909d-b037b2fa9b0a
📒 Files selected for processing (8)
pkg/gctuner/tuner.gopkg/gctuner/tuner_test.gopkg/mcs/scheduling/server/config/config.gopkg/mcs/scheduling/server/config/watcher.gopkg/mcs/scheduling/server/server.gopkg/schedule/config/config.goserver/cluster/cluster.gotests/integrations/mcs/scheduling/config_test.go
| cfg.Replication = *s.persistConfig.GetReplicationConfig().Clone() | ||
| cfg.ClusterVersion = *s.persistConfig.GetClusterVersion() | ||
| cfg.Schedule.MaxMergeRegionKeys = cfg.Schedule.GetMaxMergeRegionKeys() | ||
| cfg.Server = *s.persistConfig.GetServerConfig() |
There was a problem hiding this comment.
Shall we also modify the getConfig function for /config?
There was a problem hiding this comment.
no, it just show the config
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
92a6121 to
93d2afd
Compare
Signed-off-by: tongjian <1045931706@qq.com>
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
♻️ Duplicate comments (1)
pkg/gctuner/tuner_test.go (1)
161-180:⚠️ Potential issue | 🟡 MinorRestore mutated globals in the remaining two tests as well.
TestInitGCTunerWithZeroMemoryLimitandTestUpdateIfNeededstill leaveEnableGOGCTunerandmemory.ServerMemoryLimitmutated after completion, which can leak state across tests.♻️ Suggested fix
func TestInitGCTunerWithZeroMemoryLimit(t *testing.T) { re := require.New(t) + prevEnable := EnableGOGCTuner.Load() + prevMemLimit := memory.ServerMemoryLimit.Load() + t.Cleanup(func() { + EnableGOGCTuner.Store(prevEnable) + memory.ServerMemoryLimit.Store(prevMemLimit) + Tuning(0) + }) totalMem := uint64(1000000000) // 1GB @@ - // Cleanup - Tuning(0) } func TestUpdateIfNeeded(t *testing.T) { re := require.New(t) + prevEnable := EnableGOGCTuner.Load() + prevMemLimit := memory.ServerMemoryLimit.Load() + t.Cleanup(func() { + EnableGOGCTuner.Store(prevEnable) + memory.ServerMemoryLimit.Store(prevMemLimit) + Tuning(0) + }) totalMem := uint64(1000000000) // 1GB @@ - // Cleanup - Tuning(0) }Also applies to: 182-234
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/gctuner/tuner_test.go` around lines 161 - 180, The two tests (TestInitGCTunerWithZeroMemoryLimit and TestUpdateIfNeeded) mutate package globals EnableGOGCTuner and memory.ServerMemoryLimit and do not restore them; fix each test by saving the original values for EnableGOGCTuner and memory.ServerMemoryLimit at the start, then defer restoring them (e.g., origEnable := EnableGOGCTuner; origLimit := memory.ServerMemoryLimit; defer func(){ EnableGOGCTuner = origEnable; memory.ServerMemoryLimit = origLimit }()) so test state is isolated, and keep the existing Tuning(0) cleanup as needed.
🧹 Nitpick comments (2)
pkg/mcs/scheduling/server/config/config.go (2)
84-85: Align the exported field comment with GoDoc style.The comment should start with
Serverto match exported-identifier documentation style.✍️ Suggested wording
- // This config is sync with the pd server. + // Server is synchronized with the PD server configuration. Server sc.ServerConfig `toml:"pd-server" json:"pd-server"`As per coding guidelines,
**/*.go: “Exported identifiers need GoDoc starting with the name.”🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/mcs/scheduling/server/config/config.go` around lines 84 - 85, Update the comment for the exported field Server in the config struct so it follows GoDoc style by starting with the identifier name "Server"; locate the struct field Server (type sc.ServerConfig, tags `toml:"pd-server" json:"pd-server"`) and replace the current comment "// This config is sync with the pd server." with a GoDoc-style comment that begins "Server ..." and briefly describes the field's purpose.
205-206: Avoid shared mutable aliasing for persisted server config.
setServerConfigstores the caller-owned pointer directly, whileGetServerConfigclaims to return a cloned config but returns the same pointer. This makes accidental external mutation possible and breaks the documented contract.♻️ Suggested fix
func NewPersistConfig(cfg *Config, ttl *cache.TTLString) *PersistConfig { o := &PersistConfig{} o.SetClusterVersion(&cfg.ClusterVersion) o.schedule.Store(&cfg.Schedule) o.replication.Store(&cfg.Replication) - o.serverConfig.Store(&cfg.Server) + o.setServerConfig(&cfg.Server) // storeConfig will be fetched from TiKV by PD, // so we just set an empty value here first. o.storeConfig.Store(&sc.StoreConfig{}) @@ func (o *PersistConfig) setServerConfig(cfg *sc.ServerConfig) { - // Some of the fields won't be persisted and watched, - o.serverConfig.Store(cfg) + if cfg == nil { + var empty *sc.ServerConfig + o.serverConfig.Store(empty) + return + } + o.serverConfig.Store(cfg.Clone()) } // GetServerConfig returns the cloned server configuration. func (o *PersistConfig) GetServerConfig() *sc.ServerConfig { - return o.serverConfig.Load().(*sc.ServerConfig) + cfg := o.serverConfig.Load().(*sc.ServerConfig) + if cfg == nil { + return nil + } + return cfg.Clone() }Also applies to: 215-215, 300-308
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/mcs/scheduling/server/config/config.go` around lines 205 - 206, setServerConfig currently stores a caller-owned pointer into the shared atomic.Value and GetServerConfig returns that same pointer, causing shared mutable aliasing; change both to copy-on-write: in setServerConfig store a deep-cloned copy (not the incoming pointer) into serverConfig, and in GetServerConfig return a cloned/deep-copied instance (not the stored pointer). Locate the atomic.Value named serverConfig and update setServerConfig and GetServerConfig to perform cloning (use the existing config type's copy/clone helper or implement a field-wise/deep clone) so no external mutation can affect the persisted config.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@pkg/gctuner/tuner_test.go`:
- Around line 161-180: The two tests (TestInitGCTunerWithZeroMemoryLimit and
TestUpdateIfNeeded) mutate package globals EnableGOGCTuner and
memory.ServerMemoryLimit and do not restore them; fix each test by saving the
original values for EnableGOGCTuner and memory.ServerMemoryLimit at the start,
then defer restoring them (e.g., origEnable := EnableGOGCTuner; origLimit :=
memory.ServerMemoryLimit; defer func(){ EnableGOGCTuner = origEnable;
memory.ServerMemoryLimit = origLimit }()) so test state is isolated, and keep
the existing Tuning(0) cleanup as needed.
---
Nitpick comments:
In `@pkg/mcs/scheduling/server/config/config.go`:
- Around line 84-85: Update the comment for the exported field Server in the
config struct so it follows GoDoc style by starting with the identifier name
"Server"; locate the struct field Server (type sc.ServerConfig, tags
`toml:"pd-server" json:"pd-server"`) and replace the current comment "// This
config is sync with the pd server." with a GoDoc-style comment that begins
"Server ..." and briefly describes the field's purpose.
- Around line 205-206: setServerConfig currently stores a caller-owned pointer
into the shared atomic.Value and GetServerConfig returns that same pointer,
causing shared mutable aliasing; change both to copy-on-write: in
setServerConfig store a deep-cloned copy (not the incoming pointer) into
serverConfig, and in GetServerConfig return a cloned/deep-copied instance (not
the stored pointer). Locate the atomic.Value named serverConfig and update
setServerConfig and GetServerConfig to perform cloning (use the existing config
type's copy/clone helper or implement a field-wise/deep clone) so no external
mutation can affect the persisted config.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 40408e79-3640-45ab-9239-0592fff79bcd
📒 Files selected for processing (9)
pkg/gctuner/tuner.gopkg/gctuner/tuner_test.gopkg/mcs/scheduling/server/apis/v1/api.gopkg/mcs/scheduling/server/config/config.gopkg/mcs/scheduling/server/config/watcher.gopkg/mcs/scheduling/server/server.gopkg/schedule/config/config.goserver/cluster/cluster.gotests/integrations/mcs/scheduling/config_test.go
|
/retest |
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (1)
pkg/gctuner/tuner_test.go (1)
163-182:⚠️ Potential issue | 🟡 MinorGlobal state is still leaking across tests in these two cases.
InitGCTuner/UpdateIfNeededmutate package-level state, but these tests don’t restoreEnableGOGCTunerandmemory.ServerMemoryLimit. Please addt.Cleanup(as done inTestInitGCTuner) to both tests.🛠️ Suggested fix
func TestInitGCTunerWithZeroMemoryLimit(t *testing.T) { re := require.New(t) + prevEnable := EnableGOGCTuner.Load() + prevMemLimit := memory.ServerMemoryLimit.Load() + t.Cleanup(func() { + EnableGOGCTuner.Store(prevEnable) + memory.ServerMemoryLimit.Store(prevMemLimit) + Tuning(0) + }) totalMem := uint64(1000000000) // 1GB ... } func TestUpdateIfNeeded(t *testing.T) { re := require.New(t) + prevEnable := EnableGOGCTuner.Load() + prevMemLimit := memory.ServerMemoryLimit.Load() + t.Cleanup(func() { + EnableGOGCTuner.Store(prevEnable) + memory.ServerMemoryLimit.Store(prevMemLimit) + Tuning(0) + }) totalMem := uint64(1000000000) // 1GB ... }Also applies to: 184-236
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/gctuner/tuner_test.go` around lines 163 - 182, The tests calling InitGCTuner/UpdateIfNeeded are mutating package globals (EnableGOGCTuner and memory.ServerMemoryLimit) and must restore them after the test; add t.Cleanup handlers in the TestInitGCTunerWithZeroMemoryLimit test and the other test that covers lines 184-236 to save the current values of EnableGOGCTuner and memory.ServerMemoryLimit, and restore them in t.Cleanup so global state won't leak between tests; reference InitGCTuner, UpdateIfNeeded, EnableGOGCTuner, and memory.ServerMemoryLimit when locating where to add the cleanup.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/gctuner/tuner.go`:
- Around line 294-297: State.Stop() must not reset the process-scoped GC tuner;
remove the Tuning(0) call so Stop does not call the global Tuning function and
thus does not reset globalTuner. Instead, only perform local State cleanup
(e.g., stop watcher goroutines or cancel contexts owned by State) without
invoking Tuning; ensure no other teardown paths for State call Tuning(0) so
globalTuner retains its last configuration across watcher/leadership
transitions.
In `@pkg/mcs/scheduling/server/config/watcher.go`:
- Around line 273-275: The Close() implementation currently calls
cw.gcTunerState.Stop(), which stops the process-scoped GC tuner and resets
global GC tuning; remove or guard that call so watcher teardown does not call
gcTunerState.Stop(). Specifically, in the Close() method do not call
cw.gcTunerState.Stop() (or only stop when explicitly created per-process, not
for the shared/global tuner) so that the global tuner state (globalTuner)
remains active with its last parameters across watcher/leadership transitions;
update any comments to note that cw.gcTunerState must not be stopped on watcher
teardown.
In `@server/cluster/cluster.go`:
- Around line 580-583: The cluster shutdown handler currently calls state.Stop()
when c.ctx.Done() fires, which resets the process-wide GC tuner; remove or guard
that call so the GC tuner (globalTuner) is NOT stopped on cluster context
cancellation/leadership changes. Specifically, in the block handling
<-c.ctx.Done() (where state.Stop() is invoked), stop calling state.Stop()
unconditionally — either delete that invocation or wrap it behind a true
process-termination condition (e.g., a distinct shutdown signal) so that
state.Stop() is only executed when the whole process is exiting, leaving
globalTuner configured across leadership transitions.
---
Duplicate comments:
In `@pkg/gctuner/tuner_test.go`:
- Around line 163-182: The tests calling InitGCTuner/UpdateIfNeeded are mutating
package globals (EnableGOGCTuner and memory.ServerMemoryLimit) and must restore
them after the test; add t.Cleanup handlers in the
TestInitGCTunerWithZeroMemoryLimit test and the other test that covers lines
184-236 to save the current values of EnableGOGCTuner and
memory.ServerMemoryLimit, and restore them in t.Cleanup so global state won't
leak between tests; reference InitGCTuner, UpdateIfNeeded, EnableGOGCTuner, and
memory.ServerMemoryLimit when locating where to add the cleanup.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 37a3c383-5331-4aa7-bd3f-37bb15e7d7b1
📒 Files selected for processing (9)
pkg/gctuner/tuner.gopkg/gctuner/tuner_test.gopkg/mcs/scheduling/server/apis/v1/api.gopkg/mcs/scheduling/server/config/config.gopkg/mcs/scheduling/server/config/watcher.gopkg/mcs/scheduling/server/server.gopkg/schedule/config/config.goserver/cluster/cluster.gotests/integrations/mcs/scheduling/config_test.go
| if cw.gcTunerState != nil { | ||
| cw.gcTunerState.Stop() | ||
| } |
There was a problem hiding this comment.
Do not stop process-scoped GC tuner in watcher teardown.
On Line 273, Close() stops gcTunerState, which resets global GC tuning when the watcher closes (e.g., leadership/watcher transitions). That breaks the intended persistence of tuner state.
🛠️ Suggested fix
func (cw *Watcher) Close() {
cw.cancel()
- if cw.gcTunerState != nil {
- cw.gcTunerState.Stop()
- }
cw.wg.Wait()
}Based on learnings: In tikv/pd pkg/gctuner, the GC tuner (globalTuner) is intentionally process-scoped and should remain active with its last configured parameters even after a watcher or leadership change, and callers must NOT reset global GC state on watcher teardown.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if cw.gcTunerState != nil { | |
| cw.gcTunerState.Stop() | |
| } | |
| func (cw *Watcher) Close() { | |
| cw.cancel() | |
| cw.wg.Wait() | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/mcs/scheduling/server/config/watcher.go` around lines 273 - 275, The
Close() implementation currently calls cw.gcTunerState.Stop(), which stops the
process-scoped GC tuner and resets global GC tuning; remove or guard that call
so watcher teardown does not call gcTunerState.Stop(). Specifically, in the
Close() method do not call cw.gcTunerState.Stop() (or only stop when explicitly
created per-process, not for the shared/global tuner) so that the global tuner
state (globalTuner) remains active with its last parameters across
watcher/leadership transitions; update any comments to note that cw.gcTunerState
must not be stopped on watcher teardown.
Signed-off-by: tongjian <1045931706@qq.com>
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
♻️ Duplicate comments (1)
pkg/gctuner/tuner_test.go (1)
163-236:⚠️ Potential issue | 🟡 MinorAdd global state cleanup to
TestInitGCTunerWithZeroMemoryLimitandTestUpdateIfNeeded.Unlike
TestInitGCTuner, these tests don't capture and restoreEnableGOGCTunerandmemory.ServerMemoryLimit. This can cause test pollution, especially sinceTestUpdateIfNeededsetsEnableGOGCTunertofalseat line 214.🛠️ Suggested fix for TestInitGCTunerWithZeroMemoryLimit
func TestInitGCTunerWithZeroMemoryLimit(t *testing.T) { re := require.New(t) + prevEnable := EnableGOGCTuner.Load() + prevMemLimit := memory.ServerMemoryLimit.Load() + t.Cleanup(func() { + EnableGOGCTuner.Store(prevEnable) + memory.ServerMemoryLimit.Store(prevMemLimit) + Tuning(0) + }) totalMem := uint64(1000000000) // 1GB🛠️ Suggested fix for TestUpdateIfNeeded
func TestUpdateIfNeeded(t *testing.T) { re := require.New(t) + prevEnable := EnableGOGCTuner.Load() + prevMemLimit := memory.ServerMemoryLimit.Load() + t.Cleanup(func() { + EnableGOGCTuner.Store(prevEnable) + memory.ServerMemoryLimit.Store(prevMemLimit) + Tuning(0) + }) totalMem := uint64(1000000000) // 1GB🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/gctuner/tuner_test.go` around lines 163 - 236, Both TestInitGCTunerWithZeroMemoryLimit and TestUpdateIfNeeded must save and restore global state to avoid test pollution: capture the current EnableGOGCTuner value and memory.ServerMemoryLimit before calling InitGCTuner (and any changes to cfg), then defer restoring them at the start of each test; also ensure deferred state.Stop() remains for InitGCTuner/InitGCTunerWithZeroMemoryLimit and that TestUpdateIfNeeded restores EnableGOGCTuner after toggling it to false and resets memory.ServerMemoryLimit to its original value so subsequent tests are unaffected.
🧹 Nitpick comments (1)
pkg/mcs/scheduling/server/config/config.go (1)
782-795: Misleading comment:GetGCTunerConfigis used beyond tests.The comment states "only used test" but this method is also used by
watcher.go(line 161:gcCfg := cw.GetGCTunerConfig()). Consider updating the comment to reflect actual usage.✏️ Suggested fix
-// GetGCTunerConfig returns the GC tuner configuration, only used test. +// GetGCTunerConfig returns the GC tuner configuration derived from the server config. func (o *PersistConfig) GetGCTunerConfig() *gctuner.Config {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/mcs/scheduling/server/config/config.go` around lines 782 - 795, Update the misleading function comment for GetGCTunerConfig: change "only used test" to accurately reflect that this method is used by production code (e.g., watcher.go via cw.GetGCTunerConfig) as well as tests; locate the GetGCTunerConfig function in PersistConfig and update its GoDoc to describe its real purpose and callers (returns GC tuner settings for server components such as the watcher) so the comment matches usage.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@pkg/gctuner/tuner_test.go`:
- Around line 163-236: Both TestInitGCTunerWithZeroMemoryLimit and
TestUpdateIfNeeded must save and restore global state to avoid test pollution:
capture the current EnableGOGCTuner value and memory.ServerMemoryLimit before
calling InitGCTuner (and any changes to cfg), then defer restoring them at the
start of each test; also ensure deferred state.Stop() remains for
InitGCTuner/InitGCTunerWithZeroMemoryLimit and that TestUpdateIfNeeded restores
EnableGOGCTuner after toggling it to false and resets memory.ServerMemoryLimit
to its original value so subsequent tests are unaffected.
---
Nitpick comments:
In `@pkg/mcs/scheduling/server/config/config.go`:
- Around line 782-795: Update the misleading function comment for
GetGCTunerConfig: change "only used test" to accurately reflect that this method
is used by production code (e.g., watcher.go via cw.GetGCTunerConfig) as well as
tests; locate the GetGCTunerConfig function in PersistConfig and update its
GoDoc to describe its real purpose and callers (returns GC tuner settings for
server components such as the watcher) so the comment matches usage.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: cecda5a1-94d3-4a1c-9f00-5d79fcd354ac
📒 Files selected for processing (9)
pkg/gctuner/tuner.gopkg/gctuner/tuner_test.gopkg/mcs/scheduling/server/apis/v1/api.gopkg/mcs/scheduling/server/config/config.gopkg/mcs/scheduling/server/config/watcher.gopkg/mcs/scheduling/server/server.gopkg/schedule/config/config.goserver/cluster/cluster.gotests/integrations/mcs/scheduling/config_test.go
rleungx
left a comment
There was a problem hiding this comment.
A couple of correctness concerns from the review.
| } | ||
|
|
||
| // Stop disables the GC tuner. | ||
| func (*State) Stop() { |
There was a problem hiding this comment.
InitGCTuner / UpdateIfNeeded now also update the process-global memory-limit tuner, but Stop() only calls Tuning(0). When the scheduling primary steps down (or the cluster stops), memory.ServerMemoryLimit / debug.SetMemoryLimit can stay at the last applied value even though this watcher lifecycle is primary-scoped. Should we also reset/stop the memory-limit side here so we don't leave stale process-wide GC state behind?
There was a problem hiding this comment.
Yes, it will keep the mem limit setting, but will stop tuning.
| cw.SetScheduleConfig(&cfg.Schedule) | ||
| cw.SetReplicationConfig(&cfg.Replication) | ||
| cw.SetStoreConfig(&cfg.Store) | ||
| if cfg.Server != nil { |
There was a problem hiding this comment.
cfg.Server can be nil when the persisted config was written before the new pd-server field existed. In that case we skip the update here, but PersistConfig was initialized from config.NewConfig(), so serverConfig is still the zero value. That seems to change the effective behavior from PD defaults to EnableGOGCTuner=false / zero thresholds until someone persists a newer config. Could we initialize this from PD defaults or add an upgrade-path test for an old persisted config without pd-server?
Signed-off-by: tongjian <1045931706@qq.com>
What problem does this PR solve?
Issue Number: Ref #10213
What is changed and how does it work?
Check List
Tests
Code changes
Side effects
Related changes
pingcap/docs/pingcap/docs-cn:pingcap/tiup:Release note
Summary by CodeRabbit