support etcd groups#10597
Conversation
|
Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/ping @AmoebaProtozoa |
📝 WalkthroughWalkthroughAdds meta-service-group support: new MetaServiceGroupManager, storage for assignment counts, keypath helper, keyspace integration (assignment/attachment/validation), API + CLI endpoints, server wiring, and tests across storage, keyspace, server, and client tooling. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant API as "API Handler"
participant KSM as "Keyspace Manager"
participant MGM as "MetaServiceGroupManager"
participant Storage
Client->>API: CreateKeyspace(request)
API->>KSM: CreateKeyspace(cfg)
KSM->>MGM: AssignToGroup(requestCount)
MGM->>Storage: RunInTxn(GetAssignmentCount)
Storage-->>MGM: counts map
MGM->>Storage: RunInTxn(IncrementAssignmentCount(selected, +1))
Storage-->>MGM: updated count
MGM-->>KSM: selected groupID
KSM->>KSM: persist MetaServiceGroupIDKey (txn)
KSM->>MGM: AttachEndpoints(meta.Config)
MGM-->>KSM: meta.Config w/ MetaServiceGroupAddressesKey (if available)
KSM->>Storage: Save keyspace meta/config
Storage-->>KSM: success
KSM-->>API: Keyspace created
API-->>Client: 200 OK
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 3❌ Failed checks (2 warnings, 1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
🧹 Nitpick comments (2)
server/config/config.go (1)
915-921:Clone()does not deep-copyMetaServiceGroups, risking shared map mutation.The existing
Clone()method copiesPreAllocslice but doesn't deep-copy the newMetaServiceGroupsmap. If the cloned config's map is modified, it will affect the original.♻️ Proposed fix to deep-copy the map
func (c *KeyspaceConfig) Clone() *KeyspaceConfig { preAlloc := append(c.PreAlloc[:0:0], c.PreAlloc...) + metaServiceGroups := make(map[string]string, len(c.MetaServiceGroups)) + for k, v := range c.MetaServiceGroups { + metaServiceGroups[k] = v + } cfg := *c cfg.PreAlloc = preAlloc + cfg.MetaServiceGroups = metaServiceGroups return &cfg }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/config/config.go` around lines 915 - 921, The Clone method for KeyspaceConfig currently deep-copies PreAlloc but leaves MetaServiceGroups shared; update KeyspaceConfig.Clone to allocate a new map for MetaServiceGroups, iterate the original c.MetaServiceGroups and copy each key/value into the new map, then assign that new map to cfg.MetaServiceGroups before returning; if the map values are slices or pointer types, also deep-copy those values (e.g., copy slices element-by-element) to avoid any shared mutable state.tools/pd-ctl/pdctl/command/meta_service_group_command.go (1)
91-101: Consider validating that parsed ID is non-empty.After trimming whitespace, an input like
"=addr1,addr2"would result in an empty ID being sent to the server. While the server currently accepts this, it may cause issues with downstream operations.🛡️ Proposed fix to validate non-empty ID
for _, group := range metaServiceGroups { parts := strings.SplitN(group, "=", 2) if len(parts) != 2 { cmd.PrintErrf("Invalid --group format: %q (expected id=addr1,addr2,...)\n", group) return } + id := strings.TrimSpace(parts[0]) + if id == "" { + cmd.PrintErrf("Invalid --group format: %q (group ID cannot be empty)\n", group) + return + } params = append(params, handlers.AddMetaServiceGroupRequest{ - ID: strings.TrimSpace(parts[0]), + ID: id, Addresses: strings.TrimSpace(parts[1]), }) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tools/pd-ctl/pdctl/command/meta_service_group_command.go` around lines 91 - 101, The loop that builds params from metaServiceGroups can append an empty ID when input like "=addr1,addr2" is provided; after splitting and trimming (where AddMetaServiceGroupRequest.ID is set), validate that strings.TrimSpace(parts[0]) is non-empty and if it is, call cmd.PrintErrf with a clear message (e.g., "Invalid --group: empty id in %q") and return instead of appending; update the loop around parts/params and the AddMetaServiceGroupRequest creation to perform this check so only requests with non-empty ID are sent to the server.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/keyspace/keyspace.go`:
- Around line 816-822: The meta-service-group count update is happening inside
manager.mgm.UpdateAssignment which opens its own RunInTxn, causing divergent
commits if SaveKeyspaceMeta (called in updateKeyspaceConfigTxn) later fails;
change UpdateAssignment to accept the current kv.Txn (thread the txn from
updateKeyspaceConfigTxn) so the mgm count mutation runs inside the same
transaction instead of starting its own RunInTxn, and update all call sites
accordingly (remove the internal RunInTxn/commit in manager.mgm.UpdateAssignment
and use the provided txn), ensuring SaveKeyspaceMeta and the meta-group count
mutation commit or roll back together.
- Around line 286-296: The code calls manager.mgm.AssignToGroup(1) and stores
MetaServiceGroupIDKey into request.Config before the create flow is durable,
which can leave the meta-service-group count incremented on later failures; fix
by either moving the AssignToGroup(1) call (and setting
request.Config[MetaServiceGroupIDKey]) until after durable operations complete
(after saveNewKeyspace, splitKeyspaceRegion, UpdateKeyspaceStateByID,
UpdateKeyspaceForGroup) or, if you must assign early, add a compensating
decrement/unassign call to manager.mgm (the inverse of AssignToGroup) on every
error path that follows (including failures from saveNewKeyspace,
splitKeyspaceRegion, UpdateKeyspaceStateByID, UpdateKeyspaceForGroup) so the
meta-group counts remain consistent.
In `@pkg/keyspace/meta_service_group.go`:
- Around line 55-61: The txn callback passed to m.store.RunInTxn swallows errors
from m.store.GetAssignmentCount by returning nil; change the callback to return
the actual error (return err) so the RunInTxn call propagates the storage/read
error to callers (locate the anonymous func passed to m.store.RunInTxn and the
call to m.store.GetAssignmentCount and replace the callback's "return nil" in
the error branch with "return err").
In `@server/apiv2/handlers/meta_service_group.go`:
- Around line 83-90: The loop that builds newGroups must validate each request:
inside the for _, request := range requests loop (where currentGroups, newGroups
and c.AbortWithStatusJSON are used), reject empty request.ID or empty
request.Addresses by returning http.StatusBadRequest with a clear error message;
also detect duplicates within the same POST body by checking if request.ID
already exists in newGroups (not just currentGroups) and abort with
http.StatusBadRequest and a duplicate-ID message; keep the existing check
against currentGroups (using metaServiceGroupAlreadyExistsErr) but add these two
pre-checks before inserting into newGroups.
- Around line 76-79: Replace c.BindJSON with c.ShouldBindJSON in the handler to
avoid Gin auto-abort, then on bind error call c.AbortWithStatusJSON with a
string message (use errs.ErrBindJSON.Wrap(err).GenWithStackByCause().Error())
instead of passing the error object; likewise, where fmt.Errorf(...) is passed
into AbortWithStatusJSON, convert it to a string (fmt.Errorf(...).Error()) so
responses serialize correctly; locate usages by searching for c.BindJSON,
ShouldBindJSON, AbortWithStatusJSON, errs.ErrBindJSON.Wrap, and fmt.Errorf in
this handler and update them accordingly.
---
Nitpick comments:
In `@server/config/config.go`:
- Around line 915-921: The Clone method for KeyspaceConfig currently deep-copies
PreAlloc but leaves MetaServiceGroups shared; update KeyspaceConfig.Clone to
allocate a new map for MetaServiceGroups, iterate the original
c.MetaServiceGroups and copy each key/value into the new map, then assign that
new map to cfg.MetaServiceGroups before returning; if the map values are slices
or pointer types, also deep-copy those values (e.g., copy slices
element-by-element) to avoid any shared mutable state.
In `@tools/pd-ctl/pdctl/command/meta_service_group_command.go`:
- Around line 91-101: The loop that builds params from metaServiceGroups can
append an empty ID when input like "=addr1,addr2" is provided; after splitting
and trimming (where AddMetaServiceGroupRequest.ID is set), validate that
strings.TrimSpace(parts[0]) is non-empty and if it is, call cmd.PrintErrf with a
clear message (e.g., "Invalid --group: empty id in %q") and return instead of
appending; update the loop around parts/params and the
AddMetaServiceGroupRequest creation to perform this check so only requests with
non-empty ID are sent to the server.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: e821edbc-53cf-4600-9804-dff389859b02
📒 Files selected for processing (22)
pkg/gc/gc_state_manager_test.gopkg/keyspace/keyspace.gopkg/keyspace/keyspace_test.gopkg/keyspace/meta_service_group.gopkg/keyspace/meta_service_group_test.gopkg/keyspace/tso_keyspace_group_test.gopkg/keyspace/util.gopkg/storage/endpoint/meta_service_group.gopkg/storage/meta_service_group_test.gopkg/storage/storage.gopkg/utils/keypath/meta_service_group.goserver/apiv2/handlers/meta_service_group.goserver/apiv2/router.goserver/cluster/cluster.goserver/cluster/metering_test.goserver/config/config.goserver/server.gotests/server/apiv2/handlers/keyspace_test.gotests/server/apiv2/handlers/meta_service_group_test.gotests/server/apiv2/handlers/testutil.gotools/pd-ctl/pdctl/command/meta_service_group_command.gotools/pd-ctl/pdctl/ctl.go
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (2)
pkg/keyspace/keyspace.go (2)
821-827:⚠️ Potential issue | 🔴 CriticalKeep the meta-group count update in the outer transaction.
updateKeyspaceConfigTxnalready runs insidemanager.store.RunInTxn, butpkg/keyspace/meta_service_group.go:93-115showsUpdateAssignmentstarts and commits its own transaction. IfSaveKeyspaceMetathen fails on Line 829, the keyspace config rolls back while the meta-service-group counts stay mutated. Thread the currentkv.Txnthrough the meta-service-group manager so both writes commit or roll back together.Minimal caller-side shape
- if manager.mgm != nil && oldMetaServiceGroup != newMetaServiceGroup { - if err := manager.mgm.UpdateAssignment(oldMetaServiceGroup, newMetaServiceGroup); err != nil { + if manager.mgm != nil && oldMetaServiceGroup != newMetaServiceGroup { + if err := manager.mgm.UpdateAssignmentInTxn(txn, oldMetaServiceGroup, newMetaServiceGroup); err != nil { return err } }A matching change is needed in
pkg/keyspace/meta_service_group.goso the count mutation reuses the caller's txn instead of opening a new one.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/keyspace/keyspace.go` around lines 821 - 827, The code currently calls manager.mgm.UpdateAssignment(oldMetaServiceGroup, newMetaServiceGroup) which opens its own transaction, causing inconsistent commits if SaveKeyspaceMeta fails; change UpdateAssignment to accept and use the caller's kv.Txn (e.g., UpdateAssignment(txn kv.Txn, oldID, newID)) and modify its implementation in pkg/keyspace/meta_service_group.go to reuse the provided txn instead of starting/committing a new one; update the caller in updateKeyspaceConfigTxn to pass the current txn through (and update any other callers) so meta-group count mutations and SaveKeyspaceMeta share the same txn and commit/rollback together.
274-285:⚠️ Potential issue | 🔴 CriticalRollback the meta-group count when create fails later.
Both create paths persist
AssignToGroup(1)too early.CreateKeyspacecan still fail immediately afterward onGetKeyspaceConfigByKind, and both flows can fail onsaveNewKeyspace/split cleanup, leaving the assignment count permanently incremented for a keyspace that never became durable. Either move the assignment until after creation is durable, or add a compensatingUpdateAssignment(assignedGroup, "")on every failure exit.Suggested pattern
assignToMetaServiceGroup := manager.mgm != nil && len(manager.mgm.GetGroups()) > 0 + assignedMetaServiceGroup := "" + rollbackMetaServiceGroup := false if assignToMetaServiceGroup { - metaServiceGroup, err := manager.mgm.AssignToGroup(1) + assignedMetaServiceGroup, err = manager.mgm.AssignToGroup(1) if err != nil { return nil, err } if request.Config == nil { request.Config = make(map[string]string) } - request.Config[MetaServiceGroupIDKey] = metaServiceGroup + request.Config[MetaServiceGroupIDKey] = assignedMetaServiceGroup + rollbackMetaServiceGroup = true } + defer func() { + if rollbackMetaServiceGroup { + if err := manager.mgm.UpdateAssignment(assignedMetaServiceGroup, ""); err != nil { + log.Error("[create-keyspace] failed to roll back meta-service group assignment", + zap.String("group-id", assignedMetaServiceGroup), + zap.Error(err)) + } + } + }() ... if err := manager.kgm.UpdateKeyspaceForGroup(userKind, config[TSOKeyspaceGroupIDKey], keyspace.GetId(), opAdd); err != nil { return nil, err } + rollbackMetaServiceGroup = falseAlso applies to: 444-454
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/keyspace/keyspace.go` around lines 274 - 285, The code calls manager.mgm.AssignToGroup(1) early and never reverts it if subsequent steps (e.g., GetKeyspaceConfigByKind, saveNewKeyspace, split cleanup) fail; update the flow so that either AssignToGroup is deferred until after the keyspace is durably created or add a compensating rollback by calling manager.mgm.UpdateAssignment(assignedGroup, "") on every error exit after assignment; locate the AssignToGroup call in CreateKeyspace (and the duplicate at the second location) and ensure any path that returns an error after assignment performs UpdateAssignment(assignedGroup, "") (and handles/logs any rollback error) or move the assignment to after saveNewKeyspace succeeds.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/keyspace/keyspace.go`:
- Around line 274-285: The code currently persists caller-controlled
MetaServiceGroupAddressesKey into request.Config before AttachEndpoints (which
only mutates the in-memory map) so stored config can contain untrusted
addresses; before any save/persist path that writes request.Config (e.g., in the
keyspace creation/update handlers where request.Config is used and
AttachEndpoints/manager.mgm is referenced), explicitly remove or reject
MetaServiceGroupAddressesKey from request.Config (or validate and replace it
with server-derived values) prior to persisting; ensure AttachEndpoints (and
uses of manager.mgm) only mutates an in-memory copy and that
MetaServiceGroupIDKey continues to be set from manager.mgm as needed, and apply
the same stripping/validation at the other analogous sites referenced (the other
create/update code paths where request.Config is saved).
---
Duplicate comments:
In `@pkg/keyspace/keyspace.go`:
- Around line 821-827: The code currently calls
manager.mgm.UpdateAssignment(oldMetaServiceGroup, newMetaServiceGroup) which
opens its own transaction, causing inconsistent commits if SaveKeyspaceMeta
fails; change UpdateAssignment to accept and use the caller's kv.Txn (e.g.,
UpdateAssignment(txn kv.Txn, oldID, newID)) and modify its implementation in
pkg/keyspace/meta_service_group.go to reuse the provided txn instead of
starting/committing a new one; update the caller in updateKeyspaceConfigTxn to
pass the current txn through (and update any other callers) so meta-group count
mutations and SaveKeyspaceMeta share the same txn and commit/rollback together.
- Around line 274-285: The code calls manager.mgm.AssignToGroup(1) early and
never reverts it if subsequent steps (e.g., GetKeyspaceConfigByKind,
saveNewKeyspace, split cleanup) fail; update the flow so that either
AssignToGroup is deferred until after the keyspace is durably created or add a
compensating rollback by calling manager.mgm.UpdateAssignment(assignedGroup, "")
on every error exit after assignment; locate the AssignToGroup call in
CreateKeyspace (and the duplicate at the second location) and ensure any path
that returns an error after assignment performs UpdateAssignment(assignedGroup,
"") (and handles/logs any rollback error) or move the assignment to after
saveNewKeyspace succeeds.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 224f3a03-dec0-43b8-9faf-4a471db571a6
📒 Files selected for processing (2)
pkg/keyspace/keyspace.gopkg/keyspace/keyspace_test.go
✅ Files skipped from review due to trivial changes (1)
- pkg/keyspace/keyspace_test.go
|
/check-issue-triage-complete |
* PD support etcd groups (tikv#375) * init commit Signed-off-by: David <8039876+AmoebaProtozoa@users.noreply.github.com> * fix comment Signed-off-by: David <8039876+AmoebaProtozoa@users.noreply.github.com> * lint Signed-off-by: David <8039876+AmoebaProtozoa@users.noreply.github.com> * lint' Signed-off-by: David <8039876+AmoebaProtozoa@users.noreply.github.com> * lint Signed-off-by: David <8039876+AmoebaProtozoa@users.noreply.github.com> * rename Signed-off-by: David <8039876+AmoebaProtozoa@users.noreply.github.com> * add pd-ctl support Signed-off-by: David <8039876+AmoebaProtozoa@users.noreply.github.com> --------- Signed-off-by: David <8039876+AmoebaProtozoa@users.noreply.github.com> * fix new import path Signed-off-by: David <8039876+AmoebaProtozoa@users.noreply.github.com> * fix test package path Signed-off-by: David <8039876+AmoebaProtozoa@users.noreply.github.com> * lint Signed-off-by: David <8039876+AmoebaProtozoa@users.noreply.github.com> * lint Signed-off-by: David <8039876+AmoebaProtozoa@users.noreply.github.com> * lint Signed-off-by: David <8039876+AmoebaProtozoa@users.noreply.github.com> --------- Signed-off-by: David <8039876+AmoebaProtozoa@users.noreply.github.com> (cherry picked from commit 45bc455)
Signed-off-by: tongjian <1045931706@qq.com>
Signed-off-by: bufferflies <1045931706@qq.com>
d9a4852 to
28bb6b0
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
server/config/config.go (1)
915-921:⚠️ Potential issue | 🟡 Minor
Clone()does not deep-copyMetaServiceGroupsmap.The
Clone()method copiesPreAllocbut notMetaServiceGroups. If a cloned config'sMetaServiceGroupsis mutated, it affects the original.🛡️ Proposed fix
func (c *KeyspaceConfig) Clone() *KeyspaceConfig { preAlloc := append(c.PreAlloc[:0:0], c.PreAlloc...) + metaServiceGroups := make(map[string]string, len(c.MetaServiceGroups)) + for k, v := range c.MetaServiceGroups { + metaServiceGroups[k] = v + } cfg := *c cfg.PreAlloc = preAlloc + cfg.MetaServiceGroups = metaServiceGroups return &cfg }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/config/config.go` around lines 915 - 921, Clone() currently deep-copies PreAlloc but leaves the MetaServiceGroups map shared, so mutations to the clone will affect the original; update KeyspaceConfig.Clone to deep-copy MetaServiceGroups by allocating a new map, iterating over c.MetaServiceGroups and copying each key/value into the new map, assign it to cfg.MetaServiceGroups before returning, and preserve existing PreAlloc deep-copy logic (symbols: KeyspaceConfig, Clone, MetaServiceGroups, PreAlloc).
🧹 Nitpick comments (6)
pkg/storage/meta_service_group_test.go (1)
114-116: Consider using idiomatic range iteration.The loop iterates over indices then accesses by index. Using
for _, tc := range testCasesis more idiomatic and avoids the extra index lookup.♻️ More idiomatic iteration
- for testCase := range testCases { - mustRunTestCase(re, store, testCases[testCase]) + for _, tc := range testCases { + mustRunTestCase(re, store, tc) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/storage/meta_service_group_test.go` around lines 114 - 116, The loop over testCases should use idiomatic range to avoid indexing: replace the current for testCase := range testCases { mustRunTestCase(re, store, testCases[testCase]) } with a direct element iteration so you iterate as for _, tc := range testCases and call mustRunTestCase(re, store, tc); this touches the loop where testCases is iterated and invokes mustRunTestCase with re and store.tools/pd-ctl/pdctl/command/meta_service_group_command.go (1)
91-101: Consider validating empty ID or addresses after trimming.After
strings.TrimSpace, the ID or addresses could become empty strings (e.g.," =addr"or"id= "). The server validates this, but early client-side validation provides better UX.♻️ Add client-side validation
for _, group := range metaServiceGroups { parts := strings.SplitN(group, "=", 2) if len(parts) != 2 { cmd.PrintErrf("Invalid --group format: %q (expected id=addr1,addr2,...)\n", group) return } + id := strings.TrimSpace(parts[0]) + addresses := strings.TrimSpace(parts[1]) + if id == "" || addresses == "" { + cmd.PrintErrf("Invalid --group format: %q (id and addresses cannot be empty)\n", group) + return + } params = append(params, handlers.AddMetaServiceGroupRequest{ - ID: strings.TrimSpace(parts[0]), - Addresses: strings.TrimSpace(parts[1]), + ID: id, + Addresses: addresses, }) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tools/pd-ctl/pdctl/command/meta_service_group_command.go` around lines 91 - 101, After splitting each metaServiceGroups entry and trimming, validate that both the ID and Addresses are non-empty before appending the handlers.AddMetaServiceGroupRequest: check strings.TrimSpace(parts[0]) and strings.TrimSpace(parts[1]) and if either is empty call cmd.PrintErrf with a clear "Invalid --group format: empty id or addresses" message (including the original group value) and return; ensure you still use the trimmed values for ID and Addresses when creating the request.pkg/keyspace/meta_service_group.go (2)
27-32: Avoid storingcontext.Contextin struct.Per coding guidelines, contexts should not be stored in structs. Instead, pass
ctxas the first parameter to each method that needs it. Storing contexts can lead to using stale or inappropriate contexts for operations.♻️ Suggested change
type MetaServiceGroupManager struct { - ctx context.Context store endpoint.MetaServiceGroupStorage syncutil.RWMutex metaServiceGroups map[string]string }Then update methods to accept
ctx context.Contextas first parameter:func (m *MetaServiceGroupManager) GetAssignmentCounts(ctx context.Context) (map[string]int, error) func (m *MetaServiceGroupManager) SelectGroup(ctx context.Context) (string, error) // etc.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/keyspace/meta_service_group.go` around lines 27 - 32, The MetaServiceGroupManager currently stores a context.Context in its struct (field ctx) — remove that field from MetaServiceGroupManager and update all methods to accept ctx context.Context as their first parameter (e.g., GetAssignmentCounts(ctx context.Context), SelectGroup(ctx context.Context), and any other methods that used the stored ctx); update the constructor/new function (if any) and all call sites to pass through the caller's context, and ensure internal uses of m.ctx are replaced with the method parameter to avoid retaining a stale context.
62-79: Non-deterministic group selection with equal counts.When multiple groups have the same minimum count, map iteration order determines which one is selected. This could cause unpredictable load distribution. Consider using deterministic tie-breaking (e.g., lexicographic order by group ID).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/keyspace/meta_service_group.go` around lines 62 - 79, The selection in selectGroupTxn is non-deterministic when multiple groups share the same minimum count because map iteration order is random; change the tie-break to a deterministic lexicographic comparison: when iterating countMap in selectGroupTxn, on currentCount < minCount update minCount and assignedGroup as now, and when currentCount == minCount compare currentGroup and assignedGroup lexicographically (e.g., currentGroup < assignedGroup) and choose the smaller one; ensure assignedGroup is initialized appropriately so the comparison is valid and still return errNoAvailableMetaServiceGroups if no group was selected.pkg/storage/endpoint/meta_service_group.go (1)
27-28: Consider using[]stringinstead ofmap[string]stringforidsparameter.
GetAssignmentCounttakesmap[string]stringbut only uses the keys. The values are unused, making the API signature misleading. Consider using[]stringinstead for clarity.♻️ Suggested signature change
- GetAssignmentCount(txn kv.Txn, ids map[string]string) (map[string]int, error) + GetAssignmentCount(txn kv.Txn, ids []string) (map[string]int, error)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/storage/endpoint/meta_service_group.go` around lines 27 - 28, GetAssignmentCount currently declares ids as map[string]string but only uses the keys, making the API misleading; change the signature of GetAssignmentCount(txn kv.Txn, ids map[string]string) (map[string]int, error) to use a slice of strings: GetAssignmentCount(txn kv.Txn, ids []string) (map[string]int, error), update all call sites to pass []string (extracting keys where callers currently pass maps), and adjust any internal iteration in GetAssignmentCount to range over the slice; ensure IncrementAssignmentCount and other related functions remain untouched but update any tests or usages that relied on the old map parameter.server/apiv2/handlers/meta_service_group.go (1)
116-124: Consider sorting POST response for consistency.The
GEThandler sorts the response by ID for deterministic output (line 159-161), but thePOSThandler iterates over a map without sorting. For API consistency and easier testing/debugging, consider applying the same sorting here.♻️ Proposed fix
response := make([]MetaServiceGroupStatus, 0, len(newGroups)) for id, addresses := range newGroups { response = append(response, MetaServiceGroupStatus{ ID: id, Addresses: addresses, AssignedKeyspaces: assignmentCounts[id], }) } + sort.Slice(response, func(i, j int) bool { + return response[i].ID < response[j].ID + }) c.IndentedJSON(http.StatusOK, response)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/apiv2/handlers/meta_service_group.go` around lines 116 - 124, The POST handler builds the response by iterating newGroups map (creating MetaServiceGroupStatus entries with ID, Addresses, AssignedKeyspaces) which is non-deterministic; fix it by collecting the map keys (IDs) into a slice, sort that slice, and then iterate the sorted IDs to append to response so the output is deterministic and matches the GET handler's sorted-by-ID behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/keyspace/meta_service_group.go`:
- Around line 157-162: GetGroups returns the internal metaServiceGroups map
reference allowing callers to mutate it without the lock; fix by copying the map
while holding the lock and returning the new map. In
MetaServiceGroupManager.GetGroups, acquire the RLock/RUnlock, allocate a new
map[string]string, iterate over m.metaServiceGroups copying each key/value into
the new map, then return that new map so callers cannot mutate the internal
m.metaServiceGroups.
---
Outside diff comments:
In `@server/config/config.go`:
- Around line 915-921: Clone() currently deep-copies PreAlloc but leaves the
MetaServiceGroups map shared, so mutations to the clone will affect the
original; update KeyspaceConfig.Clone to deep-copy MetaServiceGroups by
allocating a new map, iterating over c.MetaServiceGroups and copying each
key/value into the new map, assign it to cfg.MetaServiceGroups before returning,
and preserve existing PreAlloc deep-copy logic (symbols: KeyspaceConfig, Clone,
MetaServiceGroups, PreAlloc).
---
Nitpick comments:
In `@pkg/keyspace/meta_service_group.go`:
- Around line 27-32: The MetaServiceGroupManager currently stores a
context.Context in its struct (field ctx) — remove that field from
MetaServiceGroupManager and update all methods to accept ctx context.Context as
their first parameter (e.g., GetAssignmentCounts(ctx context.Context),
SelectGroup(ctx context.Context), and any other methods that used the stored
ctx); update the constructor/new function (if any) and all call sites to pass
through the caller's context, and ensure internal uses of m.ctx are replaced
with the method parameter to avoid retaining a stale context.
- Around line 62-79: The selection in selectGroupTxn is non-deterministic when
multiple groups share the same minimum count because map iteration order is
random; change the tie-break to a deterministic lexicographic comparison: when
iterating countMap in selectGroupTxn, on currentCount < minCount update minCount
and assignedGroup as now, and when currentCount == minCount compare currentGroup
and assignedGroup lexicographically (e.g., currentGroup < assignedGroup) and
choose the smaller one; ensure assignedGroup is initialized appropriately so the
comparison is valid and still return errNoAvailableMetaServiceGroups if no group
was selected.
In `@pkg/storage/endpoint/meta_service_group.go`:
- Around line 27-28: GetAssignmentCount currently declares ids as
map[string]string but only uses the keys, making the API misleading; change the
signature of GetAssignmentCount(txn kv.Txn, ids map[string]string)
(map[string]int, error) to use a slice of strings: GetAssignmentCount(txn
kv.Txn, ids []string) (map[string]int, error), update all call sites to pass
[]string (extracting keys where callers currently pass maps), and adjust any
internal iteration in GetAssignmentCount to range over the slice; ensure
IncrementAssignmentCount and other related functions remain untouched but update
any tests or usages that relied on the old map parameter.
In `@pkg/storage/meta_service_group_test.go`:
- Around line 114-116: The loop over testCases should use idiomatic range to
avoid indexing: replace the current for testCase := range testCases {
mustRunTestCase(re, store, testCases[testCase]) } with a direct element
iteration so you iterate as for _, tc := range testCases and call
mustRunTestCase(re, store, tc); this touches the loop where testCases is
iterated and invokes mustRunTestCase with re and store.
In `@server/apiv2/handlers/meta_service_group.go`:
- Around line 116-124: The POST handler builds the response by iterating
newGroups map (creating MetaServiceGroupStatus entries with ID, Addresses,
AssignedKeyspaces) which is non-deterministic; fix it by collecting the map keys
(IDs) into a slice, sort that slice, and then iterate the sorted IDs to append
to response so the output is deterministic and matches the GET handler's
sorted-by-ID behavior.
In `@tools/pd-ctl/pdctl/command/meta_service_group_command.go`:
- Around line 91-101: After splitting each metaServiceGroups entry and trimming,
validate that both the ID and Addresses are non-empty before appending the
handlers.AddMetaServiceGroupRequest: check strings.TrimSpace(parts[0]) and
strings.TrimSpace(parts[1]) and if either is empty call cmd.PrintErrf with a
clear "Invalid --group format: empty id or addresses" message (including the
original group value) and return; ensure you still use the trimmed values for ID
and Addresses when creating the request.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 0f48ee06-d5b7-4f5a-9265-46f4e1cbd8c0
📒 Files selected for processing (22)
pkg/gc/gc_state_manager_test.gopkg/keyspace/keyspace.gopkg/keyspace/keyspace_test.gopkg/keyspace/meta_service_group.gopkg/keyspace/meta_service_group_test.gopkg/keyspace/tso_keyspace_group_test.gopkg/keyspace/util.gopkg/storage/endpoint/meta_service_group.gopkg/storage/meta_service_group_test.gopkg/storage/storage.gopkg/utils/keypath/meta_service_group.goserver/apiv2/handlers/meta_service_group.goserver/apiv2/router.goserver/cluster/cluster.goserver/cluster/metering_test.goserver/config/config.goserver/server.gotests/server/apiv2/handlers/keyspace_test.gotests/server/apiv2/handlers/meta_service_group_test.gotests/server/apiv2/handlers/testutil.gotools/pd-ctl/pdctl/command/meta_service_group_command.gotools/pd-ctl/pdctl/ctl.go
✅ Files skipped from review due to trivial changes (4)
- server/cluster/metering_test.go
- tools/pd-ctl/pdctl/ctl.go
- pkg/utils/keypath/meta_service_group.go
- tests/server/apiv2/handlers/keyspace_test.go
🚧 Files skipped from review as they are similar to previous changes (6)
- server/apiv2/router.go
- pkg/keyspace/tso_keyspace_group_test.go
- server/cluster/cluster.go
- server/server.go
- pkg/keyspace/keyspace_test.go
- pkg/gc/gc_state_manager_test.go
|
@AmoebaProtozoa: adding LGTM is restricted to approvers and reviewers in OWNERS files. DetailsIn response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: AmoebaProtozoa The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
Signed-off-by: bufferflies <1045931706@qq.com>
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
pkg/keyspace/meta_service_group_test.go (1)
114-118: UseErrorIsfor sentinel error assertions.Line 117 should assert via
ErrorIsso wrapped sentinel errors still pass.✅ Proposed fix
func (suite *metaServiceGroupTestSuite) TestUpdateAssignmentUnknownNewGroup() { re := suite.Require() err := suite.manager.UpdateAssignment("", "nonexistent") - re.Equal(errUnknownMetaServiceGroup, err) + re.ErrorIs(err, errUnknownMetaServiceGroup) }As per coding guidelines "Use
errors.Is/Asfor sentinel error checks".🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/keyspace/meta_service_group_test.go` around lines 114 - 118, The test TestUpdateAssignmentUnknownNewGroup currently asserts the sentinel error with re.Equal which fails for wrapped errors; update the assertion to use an errors.Is-style check (e.g., testify/require's ErrorIs) so wrapped errors match: locate TestUpdateAssignmentUnknownNewGroup in meta_service_group_test.go and replace re.Equal(errUnknownMetaServiceGroup, err) with re.ErrorIs(err, errUnknownMetaServiceGroup) (or use errors.Is(err, errUnknownMetaServiceGroup) with require.True) to follow the sentinel error guideline.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/keyspace/meta_service_group.go`:
- Around line 99-110: AssignToGroup currently accepts any integer count which
can be zero or negative and corrupt totals; before acquiring the store
transaction, validate the input (in AssignToGroup) to ensure count > 0 and
return a clear error if not. Keep the existing flow that uses selectGroupTxn and
store.IncrementAssignmentCount(txn, assignedGroup, count), but add the
validation check at the start of AssignToGroup and avoid calling selectGroupTxn
or IncrementAssignmentCount when count is invalid.
- Around line 27-29: The MetaServiceGroupManager currently stores a context in
its ctx field—remove the ctx field from the MetaServiceGroupManager struct and
instead accept ctx context.Context as the first parameter on all methods that
perform storage/external effects: GetAssignmentCounts(ctx context.Context, ...),
SelectGroup(ctx context.Context, ...), AssignToGroup(ctx context.Context, ...),
and UpdateAssignment(ctx context.Context, ...); update all internal calls to use
the passed ctx when calling the store (endpoint.MetaServiceGroupStorage) and
adjust the constructor (e.g., NewMetaServiceGroupManager) to stop saving ctx on
the struct (only accept/use ctx for any one-time initialization if absolutely
needed).
---
Nitpick comments:
In `@pkg/keyspace/meta_service_group_test.go`:
- Around line 114-118: The test TestUpdateAssignmentUnknownNewGroup currently
asserts the sentinel error with re.Equal which fails for wrapped errors; update
the assertion to use an errors.Is-style check (e.g., testify/require's ErrorIs)
so wrapped errors match: locate TestUpdateAssignmentUnknownNewGroup in
meta_service_group_test.go and replace re.Equal(errUnknownMetaServiceGroup, err)
with re.ErrorIs(err, errUnknownMetaServiceGroup) (or use errors.Is(err,
errUnknownMetaServiceGroup) with require.True) to follow the sentinel error
guideline.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 7fe30969-2d04-4dab-9cf4-df8e1f1216fe
📒 Files selected for processing (2)
pkg/keyspace/meta_service_group.gopkg/keyspace/meta_service_group_test.go
| type MetaServiceGroupManager struct { | ||
| ctx context.Context | ||
| store endpoint.MetaServiceGroupStorage |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
🧩 Analysis chain
🏁 Script executed:
cat -n pkg/keyspace/meta_service_group.go | head -60Repository: tikv/pd
Length of output: 2220
🏁 Script executed:
rg "m\.ctx" pkg/keyspace/meta_service_group.go -A 2 -B 2Repository: tikv/pd
Length of output: 714
🏁 Script executed:
wc -l pkg/keyspace/meta_service_group.goRepository: tikv/pd
Length of output: 89
🏁 Script executed:
cat -n pkg/keyspace/meta_service_group.goRepository: tikv/pd
Length of output: 6386
Remove context field from struct; pass context.Context as a parameter to methods that perform storage operations.
The manager stores context in struct state (ctx field), which violates the guideline "First parameter must be context.Context for external effects; never store contexts in structs." Context lifetime should be determined by the caller, not at construction time.
Refactor by:
- Removing the
ctxfield from the struct - Adding
ctx context.Contextas the first parameter toGetAssignmentCounts(),SelectGroup(),AssignToGroup(), andUpdateAssignment() - Removing the
ctxparameter from the constructor or using it only for initialization if needed
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/keyspace/meta_service_group.go` around lines 27 - 29, The
MetaServiceGroupManager currently stores a context in its ctx field—remove the
ctx field from the MetaServiceGroupManager struct and instead accept ctx
context.Context as the first parameter on all methods that perform
storage/external effects: GetAssignmentCounts(ctx context.Context, ...),
SelectGroup(ctx context.Context, ...), AssignToGroup(ctx context.Context, ...),
and UpdateAssignment(ctx context.Context, ...); update all internal calls to use
the passed ctx when calling the store (endpoint.MetaServiceGroupStorage) and
adjust the constructor (e.g., NewMetaServiceGroupManager) to stop saving ctx on
the struct (only accept/use ctx for any one-time initialization if absolutely
needed).
| func (m *MetaServiceGroupManager) AssignToGroup(count int) (string, error) { | ||
| m.RLock() | ||
| defer m.RUnlock() | ||
| var assignedGroup string | ||
| if err := m.store.RunInTxn(m.ctx, func(txn kv.Txn) error { | ||
| var err error | ||
| assignedGroup, err = m.selectGroupTxn(txn) | ||
| if err != nil { | ||
| return err | ||
| } | ||
| return m.store.IncrementAssignmentCount(txn, assignedGroup, count) | ||
| }); err != nil { |
There was a problem hiding this comment.
Validate assignment delta before incrementing.
AssignToGroup accepts any count; zero/negative values make behavior inconsistent with the method contract ("increments count") and can corrupt totals.
🔧 Proposed fix
func (m *MetaServiceGroupManager) AssignToGroup(count int) (string, error) {
+ if count <= 0 {
+ return "", errors.New("assignment count must be positive")
+ }
m.RLock()
defer m.RUnlock()
var assignedGroup string🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/keyspace/meta_service_group.go` around lines 99 - 110, AssignToGroup
currently accepts any integer count which can be zero or negative and corrupt
totals; before acquiring the store transaction, validate the input (in
AssignToGroup) to ensure count > 0 and return a clear error if not. Keep the
existing flow that uses selectGroupTxn and store.IncrementAssignmentCount(txn,
assignedGroup, count), but add the validation check at the start of
AssignToGroup and avoid calling selectGroupTxn or IncrementAssignmentCount when
count is invalid.
|
@bufferflies: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Issue Number: ref #10516, close #10596
author: @AmoebaProtozoa
cp
45bc455dWhat changed
45bc455donto currentmasterto add meta service group support in PDpd-ctlpaths from the source changemaster, including restoring endpoint attachment inCreateKeyspaceand tightening the shared API test helper name length for current validationValidation
make gotest GOTEST_ARGS='./pkg/keyspace -run TestMetaServiceGroupTestSuite -count=1'make gotest GOTEST_ARGS='./pkg/storage -run TestMetaServiceGroupStorage -count=1'make gotest GOTEST_ARGS='./tests/server/apiv2/handlers -run TestMetaServiceGroupTestSuite -count=1'make checkSummary by CodeRabbit
New Features
Tests