Skip to content

support etcd groups#10597

Open
bufferflies wants to merge 4 commits intotikv:masterfrom
bufferflies:pr-merge/45bc455d-support-etcd-groups
Open

support etcd groups#10597
bufferflies wants to merge 4 commits intotikv:masterfrom
bufferflies:pr-merge/45bc455d-support-etcd-groups

Conversation

@bufferflies
Copy link
Copy Markdown
Contributor

@bufferflies bufferflies commented Apr 14, 2026

Issue Number: ref #10516, close #10596

author: @AmoebaProtozoa
cp 45bc455d

What changed

  • cherry-pick 45bc455d onto current master to add meta service group support in PD
  • include the matching storage, keyspace manager, API v2, and pd-ctl paths from the source change
  • keep the adaptation minimal for current master, including restoring endpoint attachment in CreateKeyspace and tightening the shared API test helper name length for current validation

Validation

  • make gotest GOTEST_ARGS='./pkg/keyspace -run TestMetaServiceGroupTestSuite -count=1'
  • make gotest GOTEST_ARGS='./pkg/storage -run TestMetaServiceGroupStorage -count=1'
  • make gotest GOTEST_ARGS='./tests/server/apiv2/handlers -run TestMetaServiceGroupTestSuite -count=1'
  • make check

Summary by CodeRabbit

  • New Features

    • Meta-service group management with automatic, balanced keyspace assignment and endpoint attachment
    • REST API to list and add meta-service groups
    • CLI commands to list and add meta-service groups
    • Configuration support and validation for meta-service group endpoints
  • Tests

    • Extensive tests covering storage, manager logic, keyspace integration, API handlers, and end-to-end behaviors

@ti-chi-bot
Copy link
Copy Markdown
Contributor

ti-chi-bot Bot commented Apr 14, 2026

Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@ti-chi-bot ti-chi-bot Bot added dco-signoff: yes Indicates the PR's author has signed the dco. do-not-merge/needs-triage-completed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Apr 14, 2026
@bufferflies
Copy link
Copy Markdown
Contributor Author

/ping @AmoebaProtozoa

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 14, 2026

📝 Walkthrough

Walkthrough

Adds meta-service-group support: new MetaServiceGroupManager, storage for assignment counts, keypath helper, keyspace integration (assignment/attachment/validation), API + CLI endpoints, server wiring, and tests across storage, keyspace, server, and client tooling.

Changes

Cohort / File(s) Summary
Meta-Service Group Core
pkg/keyspace/meta_service_group.go, pkg/keyspace/meta_service_group_test.go
New MetaServiceGroupManager: in-memory group map + transactional assignment select/assign/update, AttachEndpoints, and unit/integration tests validating assignment/count semantics.
Keyspace Integration & Helpers
pkg/keyspace/keyspace.go, pkg/keyspace/keyspace_test.go, pkg/keyspace/util.go, pkg/keyspace/tso_keyspace_group_test.go, pkg/gc/gc_state_manager_test.go
Config interface extended with Get/SetMetaServiceGroups and meta keys; keyspace create/load/update flows updated to persist/validate/select/attach meta-service-group assignments; added errors and IgnoreMetaServiceGroup; test constructors updated for new NewKeyspaceManager arity.
Storage Layer
pkg/storage/endpoint/meta_service_group.go, pkg/storage/meta_service_group_test.go, pkg/storage/storage.go, pkg/utils/keypath/meta_service_group.go
New endpoint.MetaServiceGroupStorage interface; Storage embeds it; StorageEndpoint implements Increment/Get assignment count with txn helpers; keypath helper for assignment count keys; storage tests added.
Server Wiring & Config
server/server.go, server/config/config.go, server/cluster/cluster.go, server/cluster/metering_test.go
Server gains metaServiceGroupManager field and accessor; constructed from cfg.Keyspace.MetaServiceGroups; KeyspaceConfig adds MetaServiceGroups with validation; cluster Server interface extended; tests updated for constructor change.
HTTP API & Tests
server/apiv2/handlers/meta_service_group.go, server/apiv2/router.go, tests/server/apiv2/handlers/meta_service_group_test.go
New API handlers GET/POST /meta-service-groups with request validation and status responses including assignment counts; route registered; end-to-end API tests added.
Test Utilities & Client Helpers
tests/server/apiv2/handlers/testutil.go, tests/server/apiv2/handlers/keyspace_test.go
Test helpers updated to use IgnoreMetaServiceGroup, added HTTP helpers for meta-service-group API, and keyspace name generation made time-based for uniqueness.
CLI
tools/pd-ctl/pdctl/command/meta_service_group_command.go, tools/pd-ctl/pdctl/ctl.go
Added meta-service-group pd-ctl command with list and add subcommands; registered in root CLI.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant API as "API Handler"
    participant KSM as "Keyspace Manager"
    participant MGM as "MetaServiceGroupManager"
    participant Storage

    Client->>API: CreateKeyspace(request)
    API->>KSM: CreateKeyspace(cfg)
    KSM->>MGM: AssignToGroup(requestCount)
    MGM->>Storage: RunInTxn(GetAssignmentCount)
    Storage-->>MGM: counts map
    MGM->>Storage: RunInTxn(IncrementAssignmentCount(selected, +1))
    Storage-->>MGM: updated count
    MGM-->>KSM: selected groupID
    KSM->>KSM: persist MetaServiceGroupIDKey (txn)
    KSM->>MGM: AttachEndpoints(meta.Config)
    MGM-->>KSM: meta.Config w/ MetaServiceGroupAddressesKey (if available)
    KSM->>Storage: Save keyspace meta/config
    Storage-->>KSM: success
    KSM-->>API: Keyspace created
    API-->>Client: 200 OK
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested labels

release-note, lgtm, approved

Suggested reviewers

  • okJiang
  • lhy1024
  • rleungx

Poem

🐰 I hopped through code with nimble paws,
New groups align with tidy laws.
Counts balance out, endpoints cling,
Keyspaces dance—assignments sing.
A joyful hop for each small PR cause.

🚥 Pre-merge checks | ✅ 2 | ❌ 3

❌ Failed checks (2 warnings, 1 inconclusive)

Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description references linked issues and provides validation test commands, but does not follow the required template structure with clear sections for problem statement, what changed, and impact analysis. Complete the PR description using the template: add explicit Issue Number line, populate 'What problem does this PR solve?' section, detail changes in 'What is changed and how does it work?', and address all checklist items.
Docstring Coverage ⚠️ Warning Docstring coverage is 35.14% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'support etcd groups' is vague and generic, using non-specific terminology that doesn't clearly convey the scope or nature of the changes despite the PR involving meta-service group management. Consider using a more specific title like 'pkg/keyspace: add meta-service group management and assignment' to better describe the primary changes.
✅ Passed checks (2 passed)
Check name Status Explanation
Linked Issues check ✅ Passed The PR implements comprehensive meta-service group management across storage, keyspace manager, API handlers, and CLI tools to support external group assignment tracking and endpoints, addressing the requirements from issue #10596.
Out of Scope Changes check ✅ Passed All changes are focused on implementing meta-service group management and remain within scope of the linked issues; no unrelated refactoring or feature additions were detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ti-chi-bot ti-chi-bot Bot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Apr 14, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (2)
server/config/config.go (1)

915-921: Clone() does not deep-copy MetaServiceGroups, risking shared map mutation.

The existing Clone() method copies PreAlloc slice but doesn't deep-copy the new MetaServiceGroups map. If the cloned config's map is modified, it will affect the original.

♻️ Proposed fix to deep-copy the map
 func (c *KeyspaceConfig) Clone() *KeyspaceConfig {
 	preAlloc := append(c.PreAlloc[:0:0], c.PreAlloc...)
+	metaServiceGroups := make(map[string]string, len(c.MetaServiceGroups))
+	for k, v := range c.MetaServiceGroups {
+		metaServiceGroups[k] = v
+	}
 	cfg := *c
 	cfg.PreAlloc = preAlloc
+	cfg.MetaServiceGroups = metaServiceGroups
 	return &cfg
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/config/config.go` around lines 915 - 921, The Clone method for
KeyspaceConfig currently deep-copies PreAlloc but leaves MetaServiceGroups
shared; update KeyspaceConfig.Clone to allocate a new map for MetaServiceGroups,
iterate the original c.MetaServiceGroups and copy each key/value into the new
map, then assign that new map to cfg.MetaServiceGroups before returning; if the
map values are slices or pointer types, also deep-copy those values (e.g., copy
slices element-by-element) to avoid any shared mutable state.
tools/pd-ctl/pdctl/command/meta_service_group_command.go (1)

91-101: Consider validating that parsed ID is non-empty.

After trimming whitespace, an input like "=addr1,addr2" would result in an empty ID being sent to the server. While the server currently accepts this, it may cause issues with downstream operations.

🛡️ Proposed fix to validate non-empty ID
 	for _, group := range metaServiceGroups {
 		parts := strings.SplitN(group, "=", 2)
 		if len(parts) != 2 {
 			cmd.PrintErrf("Invalid --group format: %q (expected id=addr1,addr2,...)\n", group)
 			return
 		}
+		id := strings.TrimSpace(parts[0])
+		if id == "" {
+			cmd.PrintErrf("Invalid --group format: %q (group ID cannot be empty)\n", group)
+			return
+		}
 		params = append(params, handlers.AddMetaServiceGroupRequest{
-			ID:        strings.TrimSpace(parts[0]),
+			ID:        id,
 			Addresses: strings.TrimSpace(parts[1]),
 		})
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tools/pd-ctl/pdctl/command/meta_service_group_command.go` around lines 91 -
101, The loop that builds params from metaServiceGroups can append an empty ID
when input like "=addr1,addr2" is provided; after splitting and trimming (where
AddMetaServiceGroupRequest.ID is set), validate that strings.TrimSpace(parts[0])
is non-empty and if it is, call cmd.PrintErrf with a clear message (e.g.,
"Invalid --group: empty id in %q") and return instead of appending; update the
loop around parts/params and the AddMetaServiceGroupRequest creation to perform
this check so only requests with non-empty ID are sent to the server.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/keyspace/keyspace.go`:
- Around line 816-822: The meta-service-group count update is happening inside
manager.mgm.UpdateAssignment which opens its own RunInTxn, causing divergent
commits if SaveKeyspaceMeta (called in updateKeyspaceConfigTxn) later fails;
change UpdateAssignment to accept the current kv.Txn (thread the txn from
updateKeyspaceConfigTxn) so the mgm count mutation runs inside the same
transaction instead of starting its own RunInTxn, and update all call sites
accordingly (remove the internal RunInTxn/commit in manager.mgm.UpdateAssignment
and use the provided txn), ensuring SaveKeyspaceMeta and the meta-group count
mutation commit or roll back together.
- Around line 286-296: The code calls manager.mgm.AssignToGroup(1) and stores
MetaServiceGroupIDKey into request.Config before the create flow is durable,
which can leave the meta-service-group count incremented on later failures; fix
by either moving the AssignToGroup(1) call (and setting
request.Config[MetaServiceGroupIDKey]) until after durable operations complete
(after saveNewKeyspace, splitKeyspaceRegion, UpdateKeyspaceStateByID,
UpdateKeyspaceForGroup) or, if you must assign early, add a compensating
decrement/unassign call to manager.mgm (the inverse of AssignToGroup) on every
error path that follows (including failures from saveNewKeyspace,
splitKeyspaceRegion, UpdateKeyspaceStateByID, UpdateKeyspaceForGroup) so the
meta-group counts remain consistent.

In `@pkg/keyspace/meta_service_group.go`:
- Around line 55-61: The txn callback passed to m.store.RunInTxn swallows errors
from m.store.GetAssignmentCount by returning nil; change the callback to return
the actual error (return err) so the RunInTxn call propagates the storage/read
error to callers (locate the anonymous func passed to m.store.RunInTxn and the
call to m.store.GetAssignmentCount and replace the callback's "return nil" in
the error branch with "return err").

In `@server/apiv2/handlers/meta_service_group.go`:
- Around line 83-90: The loop that builds newGroups must validate each request:
inside the for _, request := range requests loop (where currentGroups, newGroups
and c.AbortWithStatusJSON are used), reject empty request.ID or empty
request.Addresses by returning http.StatusBadRequest with a clear error message;
also detect duplicates within the same POST body by checking if request.ID
already exists in newGroups (not just currentGroups) and abort with
http.StatusBadRequest and a duplicate-ID message; keep the existing check
against currentGroups (using metaServiceGroupAlreadyExistsErr) but add these two
pre-checks before inserting into newGroups.
- Around line 76-79: Replace c.BindJSON with c.ShouldBindJSON in the handler to
avoid Gin auto-abort, then on bind error call c.AbortWithStatusJSON with a
string message (use errs.ErrBindJSON.Wrap(err).GenWithStackByCause().Error())
instead of passing the error object; likewise, where fmt.Errorf(...) is passed
into AbortWithStatusJSON, convert it to a string (fmt.Errorf(...).Error()) so
responses serialize correctly; locate usages by searching for c.BindJSON,
ShouldBindJSON, AbortWithStatusJSON, errs.ErrBindJSON.Wrap, and fmt.Errorf in
this handler and update them accordingly.

---

Nitpick comments:
In `@server/config/config.go`:
- Around line 915-921: The Clone method for KeyspaceConfig currently deep-copies
PreAlloc but leaves MetaServiceGroups shared; update KeyspaceConfig.Clone to
allocate a new map for MetaServiceGroups, iterate the original
c.MetaServiceGroups and copy each key/value into the new map, then assign that
new map to cfg.MetaServiceGroups before returning; if the map values are slices
or pointer types, also deep-copy those values (e.g., copy slices
element-by-element) to avoid any shared mutable state.

In `@tools/pd-ctl/pdctl/command/meta_service_group_command.go`:
- Around line 91-101: The loop that builds params from metaServiceGroups can
append an empty ID when input like "=addr1,addr2" is provided; after splitting
and trimming (where AddMetaServiceGroupRequest.ID is set), validate that
strings.TrimSpace(parts[0]) is non-empty and if it is, call cmd.PrintErrf with a
clear message (e.g., "Invalid --group: empty id in %q") and return instead of
appending; update the loop around parts/params and the
AddMetaServiceGroupRequest creation to perform this check so only requests with
non-empty ID are sent to the server.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e821edbc-53cf-4600-9804-dff389859b02

📥 Commits

Reviewing files that changed from the base of the PR and between b0a3c90 and fb0182c.

📒 Files selected for processing (22)
  • pkg/gc/gc_state_manager_test.go
  • pkg/keyspace/keyspace.go
  • pkg/keyspace/keyspace_test.go
  • pkg/keyspace/meta_service_group.go
  • pkg/keyspace/meta_service_group_test.go
  • pkg/keyspace/tso_keyspace_group_test.go
  • pkg/keyspace/util.go
  • pkg/storage/endpoint/meta_service_group.go
  • pkg/storage/meta_service_group_test.go
  • pkg/storage/storage.go
  • pkg/utils/keypath/meta_service_group.go
  • server/apiv2/handlers/meta_service_group.go
  • server/apiv2/router.go
  • server/cluster/cluster.go
  • server/cluster/metering_test.go
  • server/config/config.go
  • server/server.go
  • tests/server/apiv2/handlers/keyspace_test.go
  • tests/server/apiv2/handlers/meta_service_group_test.go
  • tests/server/apiv2/handlers/testutil.go
  • tools/pd-ctl/pdctl/command/meta_service_group_command.go
  • tools/pd-ctl/pdctl/ctl.go

Comment thread pkg/keyspace/keyspace.go Outdated
Comment thread pkg/keyspace/keyspace.go
Comment thread pkg/keyspace/meta_service_group.go
Comment thread server/apiv2/handlers/meta_service_group.go Outdated
Comment thread server/apiv2/handlers/meta_service_group.go Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
pkg/keyspace/keyspace.go (2)

821-827: ⚠️ Potential issue | 🔴 Critical

Keep the meta-group count update in the outer transaction.

updateKeyspaceConfigTxn already runs inside manager.store.RunInTxn, but pkg/keyspace/meta_service_group.go:93-115 shows UpdateAssignment starts and commits its own transaction. If SaveKeyspaceMeta then fails on Line 829, the keyspace config rolls back while the meta-service-group counts stay mutated. Thread the current kv.Txn through the meta-service-group manager so both writes commit or roll back together.

Minimal caller-side shape
-		if manager.mgm != nil && oldMetaServiceGroup != newMetaServiceGroup {
-			if err := manager.mgm.UpdateAssignment(oldMetaServiceGroup, newMetaServiceGroup); err != nil {
+		if manager.mgm != nil && oldMetaServiceGroup != newMetaServiceGroup {
+			if err := manager.mgm.UpdateAssignmentInTxn(txn, oldMetaServiceGroup, newMetaServiceGroup); err != nil {
 				return err
 			}
 		}

A matching change is needed in pkg/keyspace/meta_service_group.go so the count mutation reuses the caller's txn instead of opening a new one.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/keyspace/keyspace.go` around lines 821 - 827, The code currently calls
manager.mgm.UpdateAssignment(oldMetaServiceGroup, newMetaServiceGroup) which
opens its own transaction, causing inconsistent commits if SaveKeyspaceMeta
fails; change UpdateAssignment to accept and use the caller's kv.Txn (e.g.,
UpdateAssignment(txn kv.Txn, oldID, newID)) and modify its implementation in
pkg/keyspace/meta_service_group.go to reuse the provided txn instead of
starting/committing a new one; update the caller in updateKeyspaceConfigTxn to
pass the current txn through (and update any other callers) so meta-group count
mutations and SaveKeyspaceMeta share the same txn and commit/rollback together.

274-285: ⚠️ Potential issue | 🔴 Critical

Rollback the meta-group count when create fails later.

Both create paths persist AssignToGroup(1) too early. CreateKeyspace can still fail immediately afterward on GetKeyspaceConfigByKind, and both flows can fail on saveNewKeyspace/split cleanup, leaving the assignment count permanently incremented for a keyspace that never became durable. Either move the assignment until after creation is durable, or add a compensating UpdateAssignment(assignedGroup, "") on every failure exit.

Suggested pattern
 	assignToMetaServiceGroup := manager.mgm != nil && len(manager.mgm.GetGroups()) > 0
+	assignedMetaServiceGroup := ""
+	rollbackMetaServiceGroup := false
 	if assignToMetaServiceGroup {
-		metaServiceGroup, err := manager.mgm.AssignToGroup(1)
+		assignedMetaServiceGroup, err = manager.mgm.AssignToGroup(1)
 		if err != nil {
 			return nil, err
 		}
 		if request.Config == nil {
 			request.Config = make(map[string]string)
 		}
-		request.Config[MetaServiceGroupIDKey] = metaServiceGroup
+		request.Config[MetaServiceGroupIDKey] = assignedMetaServiceGroup
+		rollbackMetaServiceGroup = true
 	}
+	defer func() {
+		if rollbackMetaServiceGroup {
+			if err := manager.mgm.UpdateAssignment(assignedMetaServiceGroup, ""); err != nil {
+				log.Error("[create-keyspace] failed to roll back meta-service group assignment",
+					zap.String("group-id", assignedMetaServiceGroup),
+					zap.Error(err))
+			}
+		}
+	}()
 	...
 	if err := manager.kgm.UpdateKeyspaceForGroup(userKind, config[TSOKeyspaceGroupIDKey], keyspace.GetId(), opAdd); err != nil {
 		return nil, err
 	}
+	rollbackMetaServiceGroup = false

Also applies to: 444-454

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/keyspace/keyspace.go` around lines 274 - 285, The code calls
manager.mgm.AssignToGroup(1) early and never reverts it if subsequent steps
(e.g., GetKeyspaceConfigByKind, saveNewKeyspace, split cleanup) fail; update the
flow so that either AssignToGroup is deferred until after the keyspace is
durably created or add a compensating rollback by calling
manager.mgm.UpdateAssignment(assignedGroup, "") on every error exit after
assignment; locate the AssignToGroup call in CreateKeyspace (and the duplicate
at the second location) and ensure any path that returns an error after
assignment performs UpdateAssignment(assignedGroup, "") (and handles/logs any
rollback error) or move the assignment to after saveNewKeyspace succeeds.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/keyspace/keyspace.go`:
- Around line 274-285: The code currently persists caller-controlled
MetaServiceGroupAddressesKey into request.Config before AttachEndpoints (which
only mutates the in-memory map) so stored config can contain untrusted
addresses; before any save/persist path that writes request.Config (e.g., in the
keyspace creation/update handlers where request.Config is used and
AttachEndpoints/manager.mgm is referenced), explicitly remove or reject
MetaServiceGroupAddressesKey from request.Config (or validate and replace it
with server-derived values) prior to persisting; ensure AttachEndpoints (and
uses of manager.mgm) only mutates an in-memory copy and that
MetaServiceGroupIDKey continues to be set from manager.mgm as needed, and apply
the same stripping/validation at the other analogous sites referenced (the other
create/update code paths where request.Config is saved).

---

Duplicate comments:
In `@pkg/keyspace/keyspace.go`:
- Around line 821-827: The code currently calls
manager.mgm.UpdateAssignment(oldMetaServiceGroup, newMetaServiceGroup) which
opens its own transaction, causing inconsistent commits if SaveKeyspaceMeta
fails; change UpdateAssignment to accept and use the caller's kv.Txn (e.g.,
UpdateAssignment(txn kv.Txn, oldID, newID)) and modify its implementation in
pkg/keyspace/meta_service_group.go to reuse the provided txn instead of
starting/committing a new one; update the caller in updateKeyspaceConfigTxn to
pass the current txn through (and update any other callers) so meta-group count
mutations and SaveKeyspaceMeta share the same txn and commit/rollback together.
- Around line 274-285: The code calls manager.mgm.AssignToGroup(1) early and
never reverts it if subsequent steps (e.g., GetKeyspaceConfigByKind,
saveNewKeyspace, split cleanup) fail; update the flow so that either
AssignToGroup is deferred until after the keyspace is durably created or add a
compensating rollback by calling manager.mgm.UpdateAssignment(assignedGroup, "")
on every error exit after assignment; locate the AssignToGroup call in
CreateKeyspace (and the duplicate at the second location) and ensure any path
that returns an error after assignment performs UpdateAssignment(assignedGroup,
"") (and handles/logs any rollback error) or move the assignment to after
saveNewKeyspace succeeds.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 224f3a03-dec0-43b8-9faf-4a471db571a6

📥 Commits

Reviewing files that changed from the base of the PR and between fb0182c and d9a4852.

📒 Files selected for processing (2)
  • pkg/keyspace/keyspace.go
  • pkg/keyspace/keyspace_test.go
✅ Files skipped from review due to trivial changes (1)
  • pkg/keyspace/keyspace_test.go

Comment thread pkg/keyspace/keyspace.go
@bufferflies
Copy link
Copy Markdown
Contributor Author

/check-issue-triage-complete

AmoebaProtozoa and others added 3 commits April 14, 2026 14:57
* PD support etcd groups (tikv#375)

* init commit

Signed-off-by: David <8039876+AmoebaProtozoa@users.noreply.github.com>

* fix comment

Signed-off-by: David <8039876+AmoebaProtozoa@users.noreply.github.com>

* lint

Signed-off-by: David <8039876+AmoebaProtozoa@users.noreply.github.com>

* lint'

Signed-off-by: David <8039876+AmoebaProtozoa@users.noreply.github.com>

* lint

Signed-off-by: David <8039876+AmoebaProtozoa@users.noreply.github.com>

* rename

Signed-off-by: David <8039876+AmoebaProtozoa@users.noreply.github.com>

* add pd-ctl support

Signed-off-by: David <8039876+AmoebaProtozoa@users.noreply.github.com>

---------

Signed-off-by: David <8039876+AmoebaProtozoa@users.noreply.github.com>

* fix new import path

Signed-off-by: David <8039876+AmoebaProtozoa@users.noreply.github.com>

* fix test package path

Signed-off-by: David <8039876+AmoebaProtozoa@users.noreply.github.com>

* lint

Signed-off-by: David <8039876+AmoebaProtozoa@users.noreply.github.com>

* lint

Signed-off-by: David <8039876+AmoebaProtozoa@users.noreply.github.com>

* lint

Signed-off-by: David <8039876+AmoebaProtozoa@users.noreply.github.com>

---------

Signed-off-by: David <8039876+AmoebaProtozoa@users.noreply.github.com>
(cherry picked from commit 45bc455)
Signed-off-by: tongjian <1045931706@qq.com>
Signed-off-by: bufferflies <1045931706@qq.com>
@bufferflies bufferflies force-pushed the pr-merge/45bc455d-support-etcd-groups branch from d9a4852 to 28bb6b0 Compare April 14, 2026 07:01
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
server/config/config.go (1)

915-921: ⚠️ Potential issue | 🟡 Minor

Clone() does not deep-copy MetaServiceGroups map.

The Clone() method copies PreAlloc but not MetaServiceGroups. If a cloned config's MetaServiceGroups is mutated, it affects the original.

🛡️ Proposed fix
 func (c *KeyspaceConfig) Clone() *KeyspaceConfig {
 	preAlloc := append(c.PreAlloc[:0:0], c.PreAlloc...)
+	metaServiceGroups := make(map[string]string, len(c.MetaServiceGroups))
+	for k, v := range c.MetaServiceGroups {
+		metaServiceGroups[k] = v
+	}
 	cfg := *c
 	cfg.PreAlloc = preAlloc
+	cfg.MetaServiceGroups = metaServiceGroups
 	return &cfg
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/config/config.go` around lines 915 - 921, Clone() currently
deep-copies PreAlloc but leaves the MetaServiceGroups map shared, so mutations
to the clone will affect the original; update KeyspaceConfig.Clone to deep-copy
MetaServiceGroups by allocating a new map, iterating over c.MetaServiceGroups
and copying each key/value into the new map, assign it to cfg.MetaServiceGroups
before returning, and preserve existing PreAlloc deep-copy logic (symbols:
KeyspaceConfig, Clone, MetaServiceGroups, PreAlloc).
🧹 Nitpick comments (6)
pkg/storage/meta_service_group_test.go (1)

114-116: Consider using idiomatic range iteration.

The loop iterates over indices then accesses by index. Using for _, tc := range testCases is more idiomatic and avoids the extra index lookup.

♻️ More idiomatic iteration
-	for testCase := range testCases {
-		mustRunTestCase(re, store, testCases[testCase])
+	for _, tc := range testCases {
+		mustRunTestCase(re, store, tc)
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/storage/meta_service_group_test.go` around lines 114 - 116, The loop over
testCases should use idiomatic range to avoid indexing: replace the current for
testCase := range testCases { mustRunTestCase(re, store, testCases[testCase]) }
with a direct element iteration so you iterate as for _, tc := range testCases
and call mustRunTestCase(re, store, tc); this touches the loop where testCases
is iterated and invokes mustRunTestCase with re and store.
tools/pd-ctl/pdctl/command/meta_service_group_command.go (1)

91-101: Consider validating empty ID or addresses after trimming.

After strings.TrimSpace, the ID or addresses could become empty strings (e.g., " =addr" or "id= "). The server validates this, but early client-side validation provides better UX.

♻️ Add client-side validation
 	for _, group := range metaServiceGroups {
 		parts := strings.SplitN(group, "=", 2)
 		if len(parts) != 2 {
 			cmd.PrintErrf("Invalid --group format: %q (expected id=addr1,addr2,...)\n", group)
 			return
 		}
+		id := strings.TrimSpace(parts[0])
+		addresses := strings.TrimSpace(parts[1])
+		if id == "" || addresses == "" {
+			cmd.PrintErrf("Invalid --group format: %q (id and addresses cannot be empty)\n", group)
+			return
+		}
 		params = append(params, handlers.AddMetaServiceGroupRequest{
-			ID:        strings.TrimSpace(parts[0]),
-			Addresses: strings.TrimSpace(parts[1]),
+			ID:        id,
+			Addresses: addresses,
 		})
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tools/pd-ctl/pdctl/command/meta_service_group_command.go` around lines 91 -
101, After splitting each metaServiceGroups entry and trimming, validate that
both the ID and Addresses are non-empty before appending the
handlers.AddMetaServiceGroupRequest: check strings.TrimSpace(parts[0]) and
strings.TrimSpace(parts[1]) and if either is empty call cmd.PrintErrf with a
clear "Invalid --group format: empty id or addresses" message (including the
original group value) and return; ensure you still use the trimmed values for ID
and Addresses when creating the request.
pkg/keyspace/meta_service_group.go (2)

27-32: Avoid storing context.Context in struct.

Per coding guidelines, contexts should not be stored in structs. Instead, pass ctx as the first parameter to each method that needs it. Storing contexts can lead to using stale or inappropriate contexts for operations.

♻️ Suggested change
 type MetaServiceGroupManager struct {
-	ctx   context.Context
 	store endpoint.MetaServiceGroupStorage
 	syncutil.RWMutex
 	metaServiceGroups map[string]string
 }

Then update methods to accept ctx context.Context as first parameter:

func (m *MetaServiceGroupManager) GetAssignmentCounts(ctx context.Context) (map[string]int, error)
func (m *MetaServiceGroupManager) SelectGroup(ctx context.Context) (string, error)
// etc.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/keyspace/meta_service_group.go` around lines 27 - 32, The
MetaServiceGroupManager currently stores a context.Context in its struct (field
ctx) — remove that field from MetaServiceGroupManager and update all methods to
accept ctx context.Context as their first parameter (e.g.,
GetAssignmentCounts(ctx context.Context), SelectGroup(ctx context.Context), and
any other methods that used the stored ctx); update the constructor/new function
(if any) and all call sites to pass through the caller's context, and ensure
internal uses of m.ctx are replaced with the method parameter to avoid retaining
a stale context.

62-79: Non-deterministic group selection with equal counts.

When multiple groups have the same minimum count, map iteration order determines which one is selected. This could cause unpredictable load distribution. Consider using deterministic tie-breaking (e.g., lexicographic order by group ID).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/keyspace/meta_service_group.go` around lines 62 - 79, The selection in
selectGroupTxn is non-deterministic when multiple groups share the same minimum
count because map iteration order is random; change the tie-break to a
deterministic lexicographic comparison: when iterating countMap in
selectGroupTxn, on currentCount < minCount update minCount and assignedGroup as
now, and when currentCount == minCount compare currentGroup and assignedGroup
lexicographically (e.g., currentGroup < assignedGroup) and choose the smaller
one; ensure assignedGroup is initialized appropriately so the comparison is
valid and still return errNoAvailableMetaServiceGroups if no group was selected.
pkg/storage/endpoint/meta_service_group.go (1)

27-28: Consider using []string instead of map[string]string for ids parameter.

GetAssignmentCount takes map[string]string but only uses the keys. The values are unused, making the API signature misleading. Consider using []string instead for clarity.

♻️ Suggested signature change
-	GetAssignmentCount(txn kv.Txn, ids map[string]string) (map[string]int, error)
+	GetAssignmentCount(txn kv.Txn, ids []string) (map[string]int, error)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/storage/endpoint/meta_service_group.go` around lines 27 - 28,
GetAssignmentCount currently declares ids as map[string]string but only uses the
keys, making the API misleading; change the signature of GetAssignmentCount(txn
kv.Txn, ids map[string]string) (map[string]int, error) to use a slice of
strings: GetAssignmentCount(txn kv.Txn, ids []string) (map[string]int, error),
update all call sites to pass []string (extracting keys where callers currently
pass maps), and adjust any internal iteration in GetAssignmentCount to range
over the slice; ensure IncrementAssignmentCount and other related functions
remain untouched but update any tests or usages that relied on the old map
parameter.
server/apiv2/handlers/meta_service_group.go (1)

116-124: Consider sorting POST response for consistency.

The GET handler sorts the response by ID for deterministic output (line 159-161), but the POST handler iterates over a map without sorting. For API consistency and easier testing/debugging, consider applying the same sorting here.

♻️ Proposed fix
 	response := make([]MetaServiceGroupStatus, 0, len(newGroups))
 	for id, addresses := range newGroups {
 		response = append(response, MetaServiceGroupStatus{
 			ID:                id,
 			Addresses:         addresses,
 			AssignedKeyspaces: assignmentCounts[id],
 		})
 	}
+	sort.Slice(response, func(i, j int) bool {
+		return response[i].ID < response[j].ID
+	})
 	c.IndentedJSON(http.StatusOK, response)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/apiv2/handlers/meta_service_group.go` around lines 116 - 124, The POST
handler builds the response by iterating newGroups map (creating
MetaServiceGroupStatus entries with ID, Addresses, AssignedKeyspaces) which is
non-deterministic; fix it by collecting the map keys (IDs) into a slice, sort
that slice, and then iterate the sorted IDs to append to response so the output
is deterministic and matches the GET handler's sorted-by-ID behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/keyspace/meta_service_group.go`:
- Around line 157-162: GetGroups returns the internal metaServiceGroups map
reference allowing callers to mutate it without the lock; fix by copying the map
while holding the lock and returning the new map. In
MetaServiceGroupManager.GetGroups, acquire the RLock/RUnlock, allocate a new
map[string]string, iterate over m.metaServiceGroups copying each key/value into
the new map, then return that new map so callers cannot mutate the internal
m.metaServiceGroups.

---

Outside diff comments:
In `@server/config/config.go`:
- Around line 915-921: Clone() currently deep-copies PreAlloc but leaves the
MetaServiceGroups map shared, so mutations to the clone will affect the
original; update KeyspaceConfig.Clone to deep-copy MetaServiceGroups by
allocating a new map, iterating over c.MetaServiceGroups and copying each
key/value into the new map, assign it to cfg.MetaServiceGroups before returning,
and preserve existing PreAlloc deep-copy logic (symbols: KeyspaceConfig, Clone,
MetaServiceGroups, PreAlloc).

---

Nitpick comments:
In `@pkg/keyspace/meta_service_group.go`:
- Around line 27-32: The MetaServiceGroupManager currently stores a
context.Context in its struct (field ctx) — remove that field from
MetaServiceGroupManager and update all methods to accept ctx context.Context as
their first parameter (e.g., GetAssignmentCounts(ctx context.Context),
SelectGroup(ctx context.Context), and any other methods that used the stored
ctx); update the constructor/new function (if any) and all call sites to pass
through the caller's context, and ensure internal uses of m.ctx are replaced
with the method parameter to avoid retaining a stale context.
- Around line 62-79: The selection in selectGroupTxn is non-deterministic when
multiple groups share the same minimum count because map iteration order is
random; change the tie-break to a deterministic lexicographic comparison: when
iterating countMap in selectGroupTxn, on currentCount < minCount update minCount
and assignedGroup as now, and when currentCount == minCount compare currentGroup
and assignedGroup lexicographically (e.g., currentGroup < assignedGroup) and
choose the smaller one; ensure assignedGroup is initialized appropriately so the
comparison is valid and still return errNoAvailableMetaServiceGroups if no group
was selected.

In `@pkg/storage/endpoint/meta_service_group.go`:
- Around line 27-28: GetAssignmentCount currently declares ids as
map[string]string but only uses the keys, making the API misleading; change the
signature of GetAssignmentCount(txn kv.Txn, ids map[string]string)
(map[string]int, error) to use a slice of strings: GetAssignmentCount(txn
kv.Txn, ids []string) (map[string]int, error), update all call sites to pass
[]string (extracting keys where callers currently pass maps), and adjust any
internal iteration in GetAssignmentCount to range over the slice; ensure
IncrementAssignmentCount and other related functions remain untouched but update
any tests or usages that relied on the old map parameter.

In `@pkg/storage/meta_service_group_test.go`:
- Around line 114-116: The loop over testCases should use idiomatic range to
avoid indexing: replace the current for testCase := range testCases {
mustRunTestCase(re, store, testCases[testCase]) } with a direct element
iteration so you iterate as for _, tc := range testCases and call
mustRunTestCase(re, store, tc); this touches the loop where testCases is
iterated and invokes mustRunTestCase with re and store.

In `@server/apiv2/handlers/meta_service_group.go`:
- Around line 116-124: The POST handler builds the response by iterating
newGroups map (creating MetaServiceGroupStatus entries with ID, Addresses,
AssignedKeyspaces) which is non-deterministic; fix it by collecting the map keys
(IDs) into a slice, sort that slice, and then iterate the sorted IDs to append
to response so the output is deterministic and matches the GET handler's
sorted-by-ID behavior.

In `@tools/pd-ctl/pdctl/command/meta_service_group_command.go`:
- Around line 91-101: After splitting each metaServiceGroups entry and trimming,
validate that both the ID and Addresses are non-empty before appending the
handlers.AddMetaServiceGroupRequest: check strings.TrimSpace(parts[0]) and
strings.TrimSpace(parts[1]) and if either is empty call cmd.PrintErrf with a
clear "Invalid --group format: empty id or addresses" message (including the
original group value) and return; ensure you still use the trimmed values for ID
and Addresses when creating the request.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 0f48ee06-d5b7-4f5a-9265-46f4e1cbd8c0

📥 Commits

Reviewing files that changed from the base of the PR and between d9a4852 and 28bb6b0.

📒 Files selected for processing (22)
  • pkg/gc/gc_state_manager_test.go
  • pkg/keyspace/keyspace.go
  • pkg/keyspace/keyspace_test.go
  • pkg/keyspace/meta_service_group.go
  • pkg/keyspace/meta_service_group_test.go
  • pkg/keyspace/tso_keyspace_group_test.go
  • pkg/keyspace/util.go
  • pkg/storage/endpoint/meta_service_group.go
  • pkg/storage/meta_service_group_test.go
  • pkg/storage/storage.go
  • pkg/utils/keypath/meta_service_group.go
  • server/apiv2/handlers/meta_service_group.go
  • server/apiv2/router.go
  • server/cluster/cluster.go
  • server/cluster/metering_test.go
  • server/config/config.go
  • server/server.go
  • tests/server/apiv2/handlers/keyspace_test.go
  • tests/server/apiv2/handlers/meta_service_group_test.go
  • tests/server/apiv2/handlers/testutil.go
  • tools/pd-ctl/pdctl/command/meta_service_group_command.go
  • tools/pd-ctl/pdctl/ctl.go
✅ Files skipped from review due to trivial changes (4)
  • server/cluster/metering_test.go
  • tools/pd-ctl/pdctl/ctl.go
  • pkg/utils/keypath/meta_service_group.go
  • tests/server/apiv2/handlers/keyspace_test.go
🚧 Files skipped from review as they are similar to previous changes (6)
  • server/apiv2/router.go
  • pkg/keyspace/tso_keyspace_group_test.go
  • server/cluster/cluster.go
  • server/server.go
  • pkg/keyspace/keyspace_test.go
  • pkg/gc/gc_state_manager_test.go

Comment thread pkg/keyspace/meta_service_group.go
@ti-chi-bot
Copy link
Copy Markdown
Contributor

ti-chi-bot Bot commented Apr 14, 2026

@AmoebaProtozoa: adding LGTM is restricted to approvers and reviewers in OWNERS files.

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@ti-chi-bot
Copy link
Copy Markdown
Contributor

ti-chi-bot Bot commented Apr 14, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: AmoebaProtozoa
Once this PR has been reviewed and has the lgtm label, please assign yisaer, yudongusa for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Signed-off-by: bufferflies <1045931706@qq.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
pkg/keyspace/meta_service_group_test.go (1)

114-118: Use ErrorIs for sentinel error assertions.

Line 117 should assert via ErrorIs so wrapped sentinel errors still pass.

✅ Proposed fix
 func (suite *metaServiceGroupTestSuite) TestUpdateAssignmentUnknownNewGroup() {
 	re := suite.Require()
 	err := suite.manager.UpdateAssignment("", "nonexistent")
-	re.Equal(errUnknownMetaServiceGroup, err)
+	re.ErrorIs(err, errUnknownMetaServiceGroup)
 }

As per coding guidelines "Use errors.Is/As for sentinel error checks".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/keyspace/meta_service_group_test.go` around lines 114 - 118, The test
TestUpdateAssignmentUnknownNewGroup currently asserts the sentinel error with
re.Equal which fails for wrapped errors; update the assertion to use an
errors.Is-style check (e.g., testify/require's ErrorIs) so wrapped errors match:
locate TestUpdateAssignmentUnknownNewGroup in meta_service_group_test.go and
replace re.Equal(errUnknownMetaServiceGroup, err) with re.ErrorIs(err,
errUnknownMetaServiceGroup) (or use errors.Is(err, errUnknownMetaServiceGroup)
with require.True) to follow the sentinel error guideline.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/keyspace/meta_service_group.go`:
- Around line 99-110: AssignToGroup currently accepts any integer count which
can be zero or negative and corrupt totals; before acquiring the store
transaction, validate the input (in AssignToGroup) to ensure count > 0 and
return a clear error if not. Keep the existing flow that uses selectGroupTxn and
store.IncrementAssignmentCount(txn, assignedGroup, count), but add the
validation check at the start of AssignToGroup and avoid calling selectGroupTxn
or IncrementAssignmentCount when count is invalid.
- Around line 27-29: The MetaServiceGroupManager currently stores a context in
its ctx field—remove the ctx field from the MetaServiceGroupManager struct and
instead accept ctx context.Context as the first parameter on all methods that
perform storage/external effects: GetAssignmentCounts(ctx context.Context, ...),
SelectGroup(ctx context.Context, ...), AssignToGroup(ctx context.Context, ...),
and UpdateAssignment(ctx context.Context, ...); update all internal calls to use
the passed ctx when calling the store (endpoint.MetaServiceGroupStorage) and
adjust the constructor (e.g., NewMetaServiceGroupManager) to stop saving ctx on
the struct (only accept/use ctx for any one-time initialization if absolutely
needed).

---

Nitpick comments:
In `@pkg/keyspace/meta_service_group_test.go`:
- Around line 114-118: The test TestUpdateAssignmentUnknownNewGroup currently
asserts the sentinel error with re.Equal which fails for wrapped errors; update
the assertion to use an errors.Is-style check (e.g., testify/require's ErrorIs)
so wrapped errors match: locate TestUpdateAssignmentUnknownNewGroup in
meta_service_group_test.go and replace re.Equal(errUnknownMetaServiceGroup, err)
with re.ErrorIs(err, errUnknownMetaServiceGroup) (or use errors.Is(err,
errUnknownMetaServiceGroup) with require.True) to follow the sentinel error
guideline.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7fe30969-2d04-4dab-9cf4-df8e1f1216fe

📥 Commits

Reviewing files that changed from the base of the PR and between 28bb6b0 and 8a5b30f.

📒 Files selected for processing (2)
  • pkg/keyspace/meta_service_group.go
  • pkg/keyspace/meta_service_group_test.go

Comment on lines +27 to +29
type MetaServiceGroupManager struct {
ctx context.Context
store endpoint.MetaServiceGroupStorage
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n pkg/keyspace/meta_service_group.go | head -60

Repository: tikv/pd

Length of output: 2220


🏁 Script executed:

rg "m\.ctx" pkg/keyspace/meta_service_group.go -A 2 -B 2

Repository: tikv/pd

Length of output: 714


🏁 Script executed:

wc -l pkg/keyspace/meta_service_group.go

Repository: tikv/pd

Length of output: 89


🏁 Script executed:

cat -n pkg/keyspace/meta_service_group.go

Repository: tikv/pd

Length of output: 6386


Remove context field from struct; pass context.Context as a parameter to methods that perform storage operations.

The manager stores context in struct state (ctx field), which violates the guideline "First parameter must be context.Context for external effects; never store contexts in structs." Context lifetime should be determined by the caller, not at construction time.

Refactor by:

  1. Removing the ctx field from the struct
  2. Adding ctx context.Context as the first parameter to GetAssignmentCounts(), SelectGroup(), AssignToGroup(), and UpdateAssignment()
  3. Removing the ctx parameter from the constructor or using it only for initialization if needed
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/keyspace/meta_service_group.go` around lines 27 - 29, The
MetaServiceGroupManager currently stores a context in its ctx field—remove the
ctx field from the MetaServiceGroupManager struct and instead accept ctx
context.Context as the first parameter on all methods that perform
storage/external effects: GetAssignmentCounts(ctx context.Context, ...),
SelectGroup(ctx context.Context, ...), AssignToGroup(ctx context.Context, ...),
and UpdateAssignment(ctx context.Context, ...); update all internal calls to use
the passed ctx when calling the store (endpoint.MetaServiceGroupStorage) and
adjust the constructor (e.g., NewMetaServiceGroupManager) to stop saving ctx on
the struct (only accept/use ctx for any one-time initialization if absolutely
needed).

Comment on lines +99 to +110
func (m *MetaServiceGroupManager) AssignToGroup(count int) (string, error) {
m.RLock()
defer m.RUnlock()
var assignedGroup string
if err := m.store.RunInTxn(m.ctx, func(txn kv.Txn) error {
var err error
assignedGroup, err = m.selectGroupTxn(txn)
if err != nil {
return err
}
return m.store.IncrementAssignmentCount(txn, assignedGroup, count)
}); err != nil {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Validate assignment delta before incrementing.

AssignToGroup accepts any count; zero/negative values make behavior inconsistent with the method contract ("increments count") and can corrupt totals.

🔧 Proposed fix
 func (m *MetaServiceGroupManager) AssignToGroup(count int) (string, error) {
+	if count <= 0 {
+		return "", errors.New("assignment count must be positive")
+	}
 	m.RLock()
 	defer m.RUnlock()
 	var assignedGroup string
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/keyspace/meta_service_group.go` around lines 99 - 110, AssignToGroup
currently accepts any integer count which can be zero or negative and corrupt
totals; before acquiring the store transaction, validate the input (in
AssignToGroup) to ensure count > 0 and return a clear error if not. Keep the
existing flow that uses selectGroupTxn and store.IncrementAssignmentCount(txn,
assignedGroup, count), but add the validation check at the start of
AssignToGroup and avoid calling selectGroupTxn or IncrementAssignmentCount when
count is invalid.

@ti-chi-bot
Copy link
Copy Markdown
Contributor

ti-chi-bot Bot commented Apr 14, 2026

@bufferflies: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-unit-test-next-gen-2 8a5b30f link true /test pull-unit-test-next-gen-2
pull-unit-test-next-gen-3 8a5b30f link true /test pull-unit-test-next-gen-3

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@bufferflies bufferflies requested review from okJiang and rleungx April 15, 2026 06:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dco-signoff: yes Indicates the PR's author has signed the dco. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

45bc455d support etcd groups

2 participants