fix: deletion of nodes by Despire · Pull Request #1919 · berops/claudie

Despire · 2025-12-08T11:19:30Z

Fixes deletion of nodes in newer version of kubectl (>= 1.33)

the export ETCDCTL_API=3 is no longer needed as the lowest supported kuberenetes version (1.32.x) ships with etcd version = 3.5.x

and the env has been made as a default since version 3.4.x

Also changed the dumping of the etcd members as a JSON, instead of parsing raw Stdout output

The issue with the approach in master was that etcd started to print warnings which would be parsed instead of the Stdout output, leading to panics in the kuber service.

Summary by CodeRabbit

Bug Fixes
- Adjusted ETCD environment handling to reduce inconsistent etcdctl behavior during cluster node removal.
Refactor
- Improved ETCD member discovery and removal flow with structured member handling and clearer warnings when nodes aren’t found.
Chores
- Updated many image tags.
- Bumped bundled kubectl installer and streamlined install steps.
- Added Docker Buildx setup to CI.
Configuration
- Enabled installation proxy and upgraded numerous test cluster node types.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2025-12-08T11:19:41Z

Walkthrough

Reformatted ETCD-related kubectl constants; node deletion now obtains etcd members via etcdctl member list -w json --hex=true, unmarshals into a new etcdMemberList type and removes members by ID; many manifest image tags and test-set serverType values updated; Dockerfile kubectl bumped to v1.33.0; CI adds Buildx setup.

Changes

Cohort / File(s)	Summary
Kubectl ETCD constant formatting `internal/kubectl/kubectl.go`	Reflowed ETCD-related command constant(s): `exportEtcdEnvsCmd` backtick placement adjusted and `ETCDCTL_API=3` moved out of the original multi-line backtick block (formatting/line-break change).
ETCD member JSON handling & node deletion `services/kuber/server/domain/utils/nodes/delete.go`	Added `encoding/json` and new `etcdMemberList` type; `getEtcdMembers` now returns an `etcdMemberList` by unmarshalling `etcdctl member list -w json --hex=true`; removed ad-hoc pod parsing; `deleteFromEtcd` iterates `Members` and uses member `Id` to remove etcd members; function signature updated.
Kustomize image tag updates `manifests/claudie/kustomization.yaml`, `manifests/testing-framework/kustomization.yaml`	Updated image `newTag` values across kustomization manifests (tag bump).
Testing framework test-sets (serverType & proxy) `manifests/testing-framework/test-sets/...`	Many test-set YAMLs updated: `serverType` value changes (e.g., `cpx11`→`cpx22`, `e2-small`→`e2-medium`, `t3.small`→`t3.medium`, `VM.Standard1.1`→`VM.Standard1.2`); one test set adds `installationProxy: mode: "on"`; some node-pool names/refs adjusted.
Kubectl installer bump & Dockerfile changes `services/kuber/Dockerfile`	Bumped `KC_VERSION` from `v1.27.0` to `v1.33.0`, switched download URL to `https://dl.k8s.io/release/${KC_VERSION}/bin/linux/$TARGETARCH/kubectl`, and expanded post-install RUN steps (apk update, install bash, cleanup).
CI Buildx setup `.github/workflows/CI-pipeline.yml`	Added `docker/setup-buildx-action@v3` step in the `build-and-push` job to set up Docker Buildx where BuildKit may be absent.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    actor Operator
    participant Deleter as deleteFromEtcd
    participant Kubectl as Kubectl.Exec
    participant Etcd as etcdctl

    Operator->>Deleter: trigger node deletion
    Deleter->>Kubectl: exec exportEtcdEnvsCmd && etcdctl member list -w json --hex=true
    Kubectl->>Etcd: run "member list" (JSON, hex)
    Etcd-->>Kubectl: JSON members
    Kubectl-->>Deleter: stdout (members JSON)
    Deleter->>Deleter: unmarshal JSON -> etcdMemberList
    alt members found
        loop for each Member in Members
            Deleter->>Kubectl: exec exportEtcdEnvsCmd && etcdctl member remove <Member.Id>
            Kubectl->>Etcd: run "member remove <Id>"
            Etcd-->>Kubectl: removal result
            Kubectl-->>Deleter: result
        end
    else no matching member
        Deleter->>Deleter: log warning for missing node
    end

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'fix: deletion of nodes' directly relates to the main changes in the PR which fix node deletion for newer kubectl versions (>= 1.33) by modifying etcd-related commands and environment setup.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch fix/deletion-nodes

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ad27216 and 4f4136c.

📒 Files selected for processing (2)

manifests/claudie/kustomization.yaml (1 hunks)
manifests/testing-framework/kustomization.yaml (1 hunks)

🧰 Additional context used

🧠 Learnings (2)

📓 Common learnings

Learnt from: Despire
Repo: berops/claudie PR: 1749
File: manifests/testing-framework/test-sets/test-set5/1.yaml:65-65
Timestamp: 2025-06-10T14:27:44.312Z
Learning: When Despire declines to address consistency issues like version string prefixes in a focused PR (e.g., kubeone updates), they prefer to handle such standardization in separate issues rather than expanding the scope of the current PR.

📚 Learning: 2025-06-10T08:04:36.596Z

Learnt from: Despire
Repo: berops/claudie PR: 1735
File: manifests/claudie/kustomization.yaml:59-61
Timestamp: 2025-06-10T08:04:36.596Z
Learning: In Claudie's deployment process, different images in the same kustomization.yaml file may have different tags intentionally. This is part of their e2e process where only relevant images are updated when changes propagate, rather than updating all images to the same tag. This selective updating approach is deliberate and not an inconsistency.

Applied to files:

manifests/claudie/kustomization.yaml
manifests/testing-framework/kustomization.yaml

🔇 Additional comments (2)

manifests/claudie/kustomization.yaml (1)

59-73: Image tags correctly updated for coordinated deployment.

All eight Claudie service images have been consistently updated to the same tag (4a73b29-3766), indicating a single coordinated build. This aligns with the PR's objective to fix node deletion for newer kubectl versions.

manifests/testing-framework/kustomization.yaml (1)

92-93: Testing framework image tag updated consistently.

The testing framework image has been updated to match the coordinated tag (4a73b29-3766) used across all Claudie services, ensuring test infrastructure alignment with the deployed code.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

services/kuber/server/domain/utils/nodes/delete.go (2)
245-261: JSON parsing of etcdctl member list is a clean improvement; watch for mixed stderr/stdout

The new getEtcdMembers helper is a clean abstraction and correctly surfaces both kubectl/exec errors and JSON parsing issues. One subtle point: KubectlExecEtcd calls runWithOutput, which explicitly uses CombinedOutput() (line 298 of internal/kubectl/kubectl.go), combining both stdout and stderr into a single byte stream. Since etcdctl 3.5.x writes deprecation warnings and other logging output to stderr while JSON goes to stdout, any such warnings will cause json.Unmarshal to fail. That's safer than mis‑parsing warnings as members, but if you start seeing frequent "failed to unmarshal etcd member list output" errors, consider an alternative that reads only etcdctl's stdout for this specific JSON call.

23-34: Add explicit JSON tags to etcdMemberList for clarity and robustness

The struct currently works due to Go's case-insensitive JSON field matching (falling back when no exact case match exists), but adding explicit tags improves clarity and future-proofs against changes. Replace the inline struct with a named type:
type etcdMember struct {
	Id   string `json:"ID"`
	Name string `json:"name"`
}

type etcdMemberList struct {
	Members []etcdMember
}
This makes the JSON mapping explicit and self-documenting.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2815488 and fcfddf4.

📒 Files selected for processing (2)

internal/kubectl/kubectl.go (1 hunks)
services/kuber/server/domain/utils/nodes/delete.go (4 hunks)

🧰 Additional context used

🧠 Learnings (3)

📚 Learning: 2025-07-29T13:01:46.081Z

Learnt from: Despire
Repo: berops/claudie PR: 1803
File: internal/kubectl/kubectl.go:279-279
Timestamp: 2025-07-29T13:01:46.081Z
Learning: In kubectl wrapper functions in internal/kubectl/kubectl.go, the commands executed through bash -c are internally constructed using predefined templates, not user-supplied commands. Parameters like resource names and manifests come from internal codebase functions rather than external user input.

Applied to files:

internal/kubectl/kubectl.go

📚 Learning: 2025-08-13T13:49:44.424Z

Learnt from: Despire
Repo: berops/claudie PR: 1819
File: services/kuber/server/domain/utils/nodes/patch.go:22-24
Timestamp: 2025-08-13T13:49:44.424Z
Learning: In the Claudie codebase, kubectl commands have a linear backoff algorithm with up to 3 retries built-in (as indicated by MaxKubectlRetries: 3 in kubectl.Kubectl configuration).

Applied to files:

internal/kubectl/kubectl.go

📚 Learning: 2025-08-25T06:54:50.759Z

Learnt from: Despire
Repo: berops/claudie PR: 1838
File: services/kube-eleven/server/main.go:37-37
Timestamp: 2025-08-25T06:54:50.759Z
Learning: In services/kube-eleven/server/main.go, Despire prefers to use environment variable values (like KUBE_ELEVEN_CONCURRENT_CLUSTERS) directly without adding validation guards or clamping logic, keeping the behavior simple where env values are used "as is".

Applied to files:

internal/kubectl/kubectl.go

🧬 Code graph analysis (1)

services/kuber/server/domain/utils/nodes/delete.go (1)

internal/kubectl/kubectl.go (1)

Kubectl (20-26)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: check-changes

🔇 Additional comments (1)

internal/kubectl/kubectl.go (1)

30-37: etcd command templates align with supported versions; behavior is correct

Defining getEtcdPodsCmd and exportEtcdEnvsCmd as constants keeps KubectlGetEtcdPods and KubectlExecEtcd readable. The getEtcdPodsCmd with the grep etcd pattern correctly filters for etcd pods (matching the standard etcd-<nodeName> naming convention in Kubernetes), and removing ETCDCTL_API=3 from exportEtcdEnvsCmd is appropriate—etcd v3.5 and v3.4 use the v3 API by default, so the explicit export is unnecessary for supported clusters. The etcdctl member remove command correctly accepts the hex IDs produced by etcdctl member list as documented in etcd's membership management examples.

jakubhlavacka

LGTM

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (1)

services/kuber/server/domain/utils/nodes/delete.go (1)

192-201: Optional: log unmatched masters and break after successful etcd member removal

Within deleteFromEtcd, if a master’s k8sName never matches any member.Name, we silently skip etcd removal for that node. Given this leaves a stale etcd member, a debug/info log would be useful for operators to spot name mismatches or drift (this was also noted in an earlier review).

You could also break out of the inner loop once a match is found to avoid unnecessary scans:

-	for _, node := range d.masterNodes {
-		for _, member := range etcd.Members {
-			if node.k8sName == member.Name {
-				d.logger.Debug().Msgf("Deleting etcd member %s, with hash %s", member.Name, member.Id)
-				etcdctlCmd := fmt.Sprintf("etcdctl member remove %s", member.Id)
-				if _, err := kc.KubectlExecEtcd(etcdPods[0], etcdctlCmd); err != nil {
-					return fmt.Errorf("error while executing \"etcdctl member remove\" on node %s, cluster %s: %w", member.Name, d.clusterPrefix, err)
-				}
-			}
-		}
-	}
+	for _, node := range d.masterNodes {
+		found := false
+		for _, member := range etcd.Members {
+			if node.k8sName != member.Name {
+				continue
+			}
+			found = true
+			d.logger.Debug().Msgf("Deleting etcd member %s, with ID %s", member.Name, member.Id)
+			etcdctlCmd := fmt.Sprintf("etcdctl member remove %s", member.Id)
+			if _, err := kc.KubectlExecEtcd(etcdPods[0], etcdctlCmd); err != nil {
+				return fmt.Errorf("error while executing \"etcdctl member remove\" on node %s, cluster %s: %w", member.Name, d.clusterPrefix, err)
+			}
+			break
+		}
+		if !found {
+			d.logger.Debug().Msgf("No etcd member found for master %s; skipping etcd removal", node.k8sName)
+		}
+	}

Purely optional, but would make mismatches much easier to diagnose.

🧹 Nitpick comments (1)

services/kuber/server/domain/utils/nodes/delete.go (1)
23-34: Structured etcd member parsing is a good move; please make JSON mapping explicit and confirm ID encoding

Switching from ad‑hoc text parsing to etcdctl ... -w json with a typed etcdMemberList is a solid improvement for robustness. A couple of follow‑ups:

The etcd docs show member list -w json returning a members array with fields ID (numeric) and name (string). (pkg.go.dev) Your struct expects a string Id and assumes this is a hex‑encoded identifier. Because behavior of --hex=true with JSON isn’t clearly documented, it’s worth explicitly verifying on the etcd versions you target that:

members[].ID is indeed emitted as a string (or that member remove happily accepts the decimal form you’re passing), and

the JSON shape (members, ID, name) matches what encoding/json is unmarshalling into here.

For maintainability, I’d also consider adding explicit JSON tags (and optionally a named member type) so the contract with etcdctl is clear and resilient to refactors, e.g.:
-type etcdMemberList struct {
-	Members []struct {
-		// Hex encoded id of the member within etcd.
-		Id string
-
-		// Name of the node within the cluster.
-		Name string
-	}
-}
+type etcdMemberList struct {
+	Members []struct {
+		// Hex-encoded ID of the member within etcd, as printed by etcdctl.
+		ID string `json:"ID"`
+
+		// Name of the node within the cluster.
+		Name string `json:"name"`
+	} `json:"members"`
+}
@@
-				d.logger.Debug().Msgf("Deleting etcd member %s, with hash %s", member.Name, member.Id)
-				etcdctlCmd := fmt.Sprintf("etcdctl member remove %s", member.Id)
+				d.logger.Debug().Msgf("Deleting etcd member %s, with ID %s", member.Name, member.ID)
+				etcdctlCmd := fmt.Sprintf("etcdctl member remove %s", member.ID)
(Adjust naming/ID format as needed once you’ve confirmed what -w json --hex=true actually returns on your clusters.)

Also applies to: 257-272

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dafa057 and 3320171.

📒 Files selected for processing (1)

services/kuber/server/domain/utils/nodes/delete.go (4 hunks)

🧰 Additional context used

🧠 Learnings (2)

📚 Learning: 2025-08-13T14:22:14.668Z

Learnt from: Despire
Repo: berops/claudie PR: 1819
File: services/kuber/server/domain/utils/nodes/patch.go:230-237
Timestamp: 2025-08-13T14:22:14.668Z
Learning: In services/kuber/server/domain/utils/nodes/patch.go, the user Despire has observed that using "replace" operation for JSON-Patch on /spec/taints works fine on newly added taints in their testing environment, even though the JSON-Patch specification suggests "replace" should fail on missing fields.

Applied to files:

services/kuber/server/domain/utils/nodes/delete.go

📚 Learning: 2025-06-18T12:23:27.872Z

Learnt from: Despire
Repo: berops/claudie PR: 1770
File: services/builder/domain/usecases/kube_eleven_caller.go:15-18
Timestamp: 2025-06-18T12:23:27.872Z
Learning: In the claudie builder workflow, the terraform stage comes before the kube-eleven stage. If DNS building fails during terraform, the process wouldn't proceed to the kube-eleven stage. Therefore, nil-checks for ep.Dns are unnecessary in reconcileK8sCluster since the DNS is guaranteed to be properly built by that point in the workflow.

Applied to files:

services/kuber/server/domain/utils/nodes/delete.go

🧬 Code graph analysis (1)

services/kuber/server/domain/utils/nodes/delete.go (1)

internal/kubectl/kubectl.go (1)

Kubectl (20-26)

🔇 Additional comments (1)

services/kuber/server/domain/utils/nodes/delete.go (1)

182-189: Safer etcd pod parsing confirmed; Go 1.25 meets strings.SplitSeq requirement

The rework of getEtcdPodNames to iterate strings.SplitSeq, trim whitespace, drop empty lines, and explicitly error on len(lines) == 0 makes the subsequent etcdPods[0] access in deleteFromEtcd safe and avoids empty pod names.

The go.mod directive is set to go 1.25, which meets and exceeds the Go 1.24+ requirement for strings.SplitSeq. No toolchain adjustments needed.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

services/kuber/server/domain/utils/nodes/delete.go (1)
26-34: Add explicit json tags for clarity and correctness.

The etcd JSON output uses specific casing (members, ID, name). While Go's json.Unmarshal performs case-insensitive fallback matching, adding explicit tags prevents subtle bugs and improves readability.
 type etcdMemberList struct {
-	Members []struct {
+	Members []struct {
 		// Hex encoded id of the member within etcd.
-		Id string
+		Id string `json:"ID"`
 
 		// Name of the node within the cluster.
-		Name string
-	}
+		Name string `json:"name"`
+	} `json:"members"`
 }

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3320171 and cab8fbf.

📒 Files selected for processing (1)

services/kuber/server/domain/utils/nodes/delete.go (4 hunks)

🧰 Additional context used

🧠 Learnings (2)

📚 Learning: 2025-08-13T14:22:14.668Z

Learnt from: Despire
Repo: berops/claudie PR: 1819
File: services/kuber/server/domain/utils/nodes/patch.go:230-237
Timestamp: 2025-08-13T14:22:14.668Z
Learning: In services/kuber/server/domain/utils/nodes/patch.go, the user Despire has observed that using "replace" operation for JSON-Patch on /spec/taints works fine on newly added taints in their testing environment, even though the JSON-Patch specification suggests "replace" should fail on missing fields.

Applied to files:

services/kuber/server/domain/utils/nodes/delete.go

📚 Learning: 2025-06-18T12:23:27.872Z

Learnt from: Despire
Repo: berops/claudie PR: 1770
File: services/builder/domain/usecases/kube_eleven_caller.go:15-18
Timestamp: 2025-06-18T12:23:27.872Z
Learning: In the claudie builder workflow, the terraform stage comes before the kube-eleven stage. If DNS building fails during terraform, the process wouldn't proceed to the kube-eleven stage. Therefore, nil-checks for ep.Dns are unnecessary in reconcileK8sCluster since the DNS is guaranteed to be properly built by that point in the workflow.

Applied to files:

services/kuber/server/domain/utils/nodes/delete.go

🧬 Code graph analysis (1)

services/kuber/server/domain/utils/nodes/delete.go (1)

internal/kubectl/kubectl.go (1)

Kubectl (20-26)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: check-changes

🔇 Additional comments (2)

services/kuber/server/domain/utils/nodes/delete.go (2)

180-214: LGTM! Past review concerns addressed.

The unsafe etcdPods[0] access concern is now mitigated by getEtcdPodNames returning an error when no pods are found (lines 260-262). The silent no-op concern is addressed with the warning log at lines 208-210 when a node is not found among etcd members.

267-283: LGTM! JSON-based parsing is a robust improvement.

The shift from text parsing to JSON unmarshalling eliminates the risk of parsing etcd warning text that could be printed to stdout. The --hex=true flag ensures the member ID is returned as a hex string, matching the struct definition. Error handling is properly contextualized.

the eviction used to stuck when the kuber removes the master node from etcd members first

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

services/kuber/server/domain/utils/nodes/delete.go (1)
99-123: deleteFromEtcd removes all masters on every iteration; sequencing contradicts "cordon/drain first" comment

Inside DeleteNodes, you call d.deleteFromEtcd(kubectl, etcdEpNode) for each master in a loop (line 116):
for _, master := range d.masterNodes {
    // cordon + drain master
    if err := d.deleteFromEtcd(kubectl, etcdEpNode); err != nil { ... }
    // deleteNodesByName(master, ...)
}
But deleteFromEtcd itself loops over all d.masterNodes (line 210) and removes every matching etcd member in a single call. The net effect:

On the first iteration, etcd membership for all masters is removed, even for those not yet cordoned/drained.

Subsequent iterations re-scan members that no longer exist and log "is not a member of etcd, ignoring" warnings.

This contradicts the "IMPORTANT: first you have to cordon and drain, and only after that you can remove from etcd" comment and performs redundant etcdctl calls.

Pass the current master node to deleteFromEtcd instead of processing all masters:
-func (d *Deleter) deleteFromEtcd(kc kubectl.Kubectl, etcdEpNode *spec.Node) error {
+func (d *Deleter) deleteFromEtcd(kc kubectl.Kubectl, etcdEpNode *spec.Node, node nodeInfo) error {
And call it with the current master at line 116:
-    if err := d.deleteFromEtcd(kubectl, etcdEpNode); err != nil {
+    if err := d.deleteFromEtcd(kubectl, etcdEpNode, master); err != nil {
Then update the inner loop in deleteFromEtcd (line 210) to operate on the single passed node instead of iterating over d.masterNodes. This is safe—there is only one call site in the codebase.

🧹 Nitpick comments (2)

services/kuber/server/domain/utils/nodes/delete.go (2)
23-33: Make etcd JSON shape explicit with a named member type and tags

The etcdMemberList struct and its anonymous inner struct rely on default, case-insensitive JSON field-name matching for members/ID/name. This works but is a bit implicit and harder to reuse or test.

Consider introducing a named type with explicit tags for clarity and future-proofing:
-type etcdMemberList struct {
-	Members []struct {
-		// Hex encoded id of the member within etcd.
-		Id string
-
-		// Name of the node within the cluster.
-		Name string
-	}
-}
+type etcdMember struct {
+	// Hex-encoded ID of the member within etcd.
+	ID   string `json:"ID"`
+	Name string `json:"name"`
+}
+
+type etcdMemberList struct {
+	Members []etcdMember `json:"members"`
+}
And update the usage in deleteFromEtcd accordingly (e.g. member.ID instead of member.Id).

Also applies to: 280-295

132-141: Cordon/drain error handling is consistent for workers; optional: aggregate drain failures too

Worker handling mirrors the master path: cordon errors are accumulated into errDel while KubectlDrain failures are treated as fatal and abort DeleteNodes. That’s reasonable given the risk around draining.

If you ever want “best-effort” deletion of multiple workers in one call, you could mirror the cordon pattern and aggregate drain failures into errDel instead of returning immediately, but that’s more of a product decision than a correctness issue.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 891245f and 71d12fe.

📒 Files selected for processing (1)

services/kuber/server/domain/utils/nodes/delete.go (6 hunks)

🧰 Additional context used

🧠 Learnings (4)

📓 Common learnings

Learnt from: Despire
Repo: berops/claudie PR: 1749
File: manifests/testing-framework/test-sets/test-set5/1.yaml:65-65
Timestamp: 2025-06-10T14:27:44.312Z
Learning: When Despire declines to address consistency issues like version string prefixes in a focused PR (e.g., kubeone updates), they prefer to handle such standardization in separate issues rather than expanding the scope of the current PR.

📚 Learning: 2025-08-13T14:22:14.668Z

Learnt from: Despire
Repo: berops/claudie PR: 1819
File: services/kuber/server/domain/utils/nodes/patch.go:230-237
Timestamp: 2025-08-13T14:22:14.668Z
Learning: In services/kuber/server/domain/utils/nodes/patch.go, the user Despire has observed that using "replace" operation for JSON-Patch on /spec/taints works fine on newly added taints in their testing environment, even though the JSON-Patch specification suggests "replace" should fail on missing fields.

Applied to files:

services/kuber/server/domain/utils/nodes/delete.go

📚 Learning: 2025-06-18T12:23:27.872Z

Learnt from: Despire
Repo: berops/claudie PR: 1770
File: services/builder/domain/usecases/kube_eleven_caller.go:15-18
Timestamp: 2025-06-18T12:23:27.872Z
Learning: In the claudie builder workflow, the terraform stage comes before the kube-eleven stage. If DNS building fails during terraform, the process wouldn't proceed to the kube-eleven stage. Therefore, nil-checks for ep.Dns are unnecessary in reconcileK8sCluster since the DNS is guaranteed to be properly built by that point in the workflow.

Applied to files:

services/kuber/server/domain/utils/nodes/delete.go

📚 Learning: 2025-06-10T14:27:44.312Z

Learnt from: Despire
Repo: berops/claudie PR: 1749
File: manifests/testing-framework/test-sets/test-set5/1.yaml:65-65
Timestamp: 2025-06-10T14:27:44.312Z
Learning: When Despire declines to address consistency issues like version string prefixes in a focused PR (e.g., kubeone updates), they prefer to handle such standardization in separate issues rather than expanding the scope of the current PR.

Applied to files:

services/kuber/server/domain/utils/nodes/delete.go

🧬 Code graph analysis (1)

services/kuber/server/domain/utils/nodes/delete.go (1)

internal/kubectl/kubectl.go (1)

Kubectl (20-26)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: check-changes

🔇 Additional comments (1)

services/kuber/server/domain/utils/nodes/delete.go (1)
259-278: Improved etcd pod name parsing and bounds safety looks good

The new getEtcdPodNames implementation that iterates line-by-line, trims whitespace, filters out empty lines, and errors when no pods are found makes the subsequent etcdPods[0] access safe and avoids spurious empty entries from trailing newlines:
var lines []string
for line := range strings.SplitSeq(string(etcdPodsBytes), "\n") {
    if line := strings.TrimSpace(line); line != "" {
        lines = append(lines, line)
    }
}
if len(lines) == 0 {
    return nil, errors.New("no etcd pods found in cluster")
}
return lines, nil
This is a solid robustness improvement over the previous direct strings.Split usage.

supporting kubeAPI v1.32.x, v1.33.x, and v1.34.x

the newer version of actions-runner (2.329.x and later) don't have the BuildKit

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

services/kuber/Dockerfile (1)
6-7: Update to kubectl v1.33.0 from official distribution site aligns with PR objectives.

The changes correctly update kubectl to v1.33.0 and switch to the official distribution URL (dl.k8s.io rather than googleapis.com). This addresses the PR's goal to fix node deletion for newer kubectl versions.

One minor note: the wget command lacks an explicit -O kubectl flag to specify the output filename. While this works implicitly (wget will use the basename kubectl from the URL and save to /go/kubectl, matching the subsequent COPY expectation), being explicit would improve clarity:
-RUN KC_VERSION=v1.33.0 && \
-    wget -q https://dl.k8s.io/release/${KC_VERSION}/bin/linux/$TARGETARCH/kubectl
+RUN KC_VERSION=v1.33.0 && \
+    wget -q -O kubectl https://dl.k8s.io/release/${KC_VERSION}/bin/linux/$TARGETARCH/kubectl

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1496192 and d3fd7aa.

📒 Files selected for processing (1)

services/kuber/Dockerfile (2 hunks)

🧰 Additional context used

🧠 Learnings (1)

📓 Common learnings

Learnt from: Despire
Repo: berops/claudie PR: 1749
File: manifests/testing-framework/test-sets/test-set5/1.yaml:65-65
Timestamp: 2025-06-10T14:27:44.312Z
Learning: When Despire declines to address consistency issues like version string prefixes in a focused PR (e.g., kubeone updates), they prefer to handle such standardization in separate issues rather than expanding the scope of the current PR.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: gotest

🔇 Additional comments (1)

services/kuber/Dockerfile (1)

39-42: Post-install steps appropriately prepare the runtime image.

The multi-line RUN command correctly chains executable permissions, package updates, bash installation, and cache cleanup in a single layer. This follows Docker best practices for reducing image size while meeting the service's shell execution requirements (noted in the comment at line 48).

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

manifests/testing-framework/test-sets/proxy-without-hetzner/8.yaml (1)

31-44: Server type bump looks fine; just be mindful of cost/quota impact for this test set.
No functional concerns in the manifest itself; this will increase resources for gcp-cmpt-nodes, which may affect CI budget/quotas if these test-sets run frequently.

🧹 Nitpick comments (1)

manifests/testing-framework/test-sets/succeeds-on-last-2/2.yaml (1)

35-45: The intermediate test file uses an obviously-invalid placeholder OCID that differs from realistic patterns in other test-sets (e.g., ami-00000000000000001). While this file is meant to fail and won't block the overall test progression to file 3.yaml, consider using a more realistic-looking invalid OCID or clarifying in the comment what specific failure mode is expected (image validation, availability, etc.) to improve debugging clarity.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 91a2e55 and 64ff97a.

📒 Files selected for processing (50)

manifests/testing-framework/test-sets/autoscaling-1/1.yaml (1 hunks)
manifests/testing-framework/test-sets/autoscaling-1/2.yaml (1 hunks)
manifests/testing-framework/test-sets/autoscaling-1/3.yaml (1 hunks)
manifests/testing-framework/test-sets/autoscaling-2/1.yaml (2 hunks)
manifests/testing-framework/test-sets/autoscaling-2/2.yaml (2 hunks)
manifests/testing-framework/test-sets/proxy-with-hetzner/1.yaml (2 hunks)
manifests/testing-framework/test-sets/proxy-with-hetzner/2.yaml (2 hunks)
manifests/testing-framework/test-sets/proxy-with-hetzner/3.yaml (2 hunks)
manifests/testing-framework/test-sets/proxy-with-hetzner/4.yaml (2 hunks)
manifests/testing-framework/test-sets/proxy-with-hetzner/5.yaml (2 hunks)
manifests/testing-framework/test-sets/proxy-with-hetzner/6.yaml (2 hunks)
manifests/testing-framework/test-sets/proxy-with-hetzner/7.yaml (2 hunks)
manifests/testing-framework/test-sets/proxy-with-hetzner/8.yaml (3 hunks)
manifests/testing-framework/test-sets/proxy-without-hetzner/1.yaml (1 hunks)
manifests/testing-framework/test-sets/proxy-without-hetzner/10.yaml (1 hunks)
manifests/testing-framework/test-sets/proxy-without-hetzner/2.yaml (1 hunks)
manifests/testing-framework/test-sets/proxy-without-hetzner/3.yaml (1 hunks)
manifests/testing-framework/test-sets/proxy-without-hetzner/4.yaml (1 hunks)
manifests/testing-framework/test-sets/proxy-without-hetzner/5.yaml (1 hunks)
manifests/testing-framework/test-sets/proxy-without-hetzner/6.yaml (1 hunks)
manifests/testing-framework/test-sets/proxy-without-hetzner/7.yaml (1 hunks)
manifests/testing-framework/test-sets/proxy-without-hetzner/8.yaml (1 hunks)
manifests/testing-framework/test-sets/proxy-without-hetzner/9.yaml (1 hunks)
manifests/testing-framework/test-sets/rolling-update-2/1.yaml (3 hunks)
manifests/testing-framework/test-sets/rolling-update-2/2.yaml (2 hunks)
manifests/testing-framework/test-sets/rolling-update/1.yaml (1 hunks)
manifests/testing-framework/test-sets/rolling-update/2.yaml (1 hunks)
manifests/testing-framework/test-sets/succeeds-on-last-1/1.yaml (1 hunks)
manifests/testing-framework/test-sets/succeeds-on-last-1/2.yaml (1 hunks)
manifests/testing-framework/test-sets/succeeds-on-last-2/1.yaml (1 hunks)
manifests/testing-framework/test-sets/succeeds-on-last-2/2.yaml (1 hunks)
manifests/testing-framework/test-sets/succeeds-on-last-2/3.yaml (1 hunks)
manifests/testing-framework/test-sets/succeeds-on-last-4/1.yaml (1 hunks)
manifests/testing-framework/test-sets/succeeds-on-last-4/2.yaml (1 hunks)
manifests/testing-framework/test-sets/succeeds-on-last-4/3.yaml (1 hunks)
manifests/testing-framework/test-sets/test-set1/1.yaml (8 hunks)
manifests/testing-framework/test-sets/test-set1/2.yaml (7 hunks)
manifests/testing-framework/test-sets/test-set2/1.yaml (1 hunks)
manifests/testing-framework/test-sets/test-set2/2.yaml (1 hunks)
manifests/testing-framework/test-sets/test-set2/3.yaml (1 hunks)
manifests/testing-framework/test-sets/test-set3/1.yaml (3 hunks)
manifests/testing-framework/test-sets/test-set3/2.yaml (2 hunks)
manifests/testing-framework/test-sets/test-set3/3.yaml (1 hunks)
manifests/testing-framework/test-sets/test-set3/4.yaml (1 hunks)
manifests/testing-framework/test-sets/test-set4/1.yaml (1 hunks)
manifests/testing-framework/test-sets/test-set4/2.yaml (1 hunks)
manifests/testing-framework/test-sets/test-set4/3.yaml (1 hunks)
manifests/testing-framework/test-sets/test-set5/1.yaml (1 hunks)
manifests/testing-framework/test-sets/test-set5/2.yaml (1 hunks)
manifests/testing-framework/test-sets/test-set5/3.yaml (1 hunks)

✅ Files skipped from review due to trivial changes (2)

manifests/testing-framework/test-sets/proxy-with-hetzner/1.yaml
manifests/testing-framework/test-sets/test-set3/4.yaml

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2025-10-07T15:17:32.154Z

Learnt from: m-brando
Repo: berops/claudie PR: 1857
File: manifests/testing-framework/test-sets/test-set1/1.yaml:70-78
Timestamp: 2025-10-07T15:17:32.154Z
Learning: In the Claudie repository, secrets referenced in test manifests under `manifests/testing-framework/test-sets/` (such as ovh-secret, gcp-secret, hetzner-secret, etc.) are provisioned via an external secret management system in Kubernetes and are not stored as static Secret manifests in the repository. Do not flag missing Secret manifests for these secretRef entries as they are managed externally at runtime.

Applied to files:

manifests/testing-framework/test-sets/proxy-with-hetzner/8.yaml
manifests/testing-framework/test-sets/proxy-with-hetzner/7.yaml

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: Run golangci-lint
GitHub Check: build-and-push

🔇 Additional comments (61)

manifests/testing-framework/test-sets/succeeds-on-last-1/2.yaml (1)

18-27: Validate CI/e2e quota & intent for t3.medium in eu-west-3a.

Bumping serverType to t3.medium can affect AWS capacity/quota and test cost; please confirm the e2e AWS account has sufficient vCPU limits in eu-west-3a and that this aligns with the broader test-set sizing changes in the PR.

manifests/testing-framework/test-sets/succeeds-on-last-1/1.yaml (1)

24-24: Clarify the rationale for the instance type upgrade.

The serverType has been changed from t3.small to t3.medium. While the AI summary indicates this is part of a broader set of test-manifest updates, the connection to the PR's core objective (fixing node deletion for kubectl >= 1.33) is unclear.

Is this instance size increase required for the node deletion fix to function correctly, or is it a separate infrastructure improvement? If the latter, consider tracking it as a distinct concern or PR to maintain focus on the node deletion fix.

manifests/testing-framework/test-sets/succeeds-on-last-4/2.yaml (1)

52-53: Server flavor bump looks fine; just confirm availability/quota/cost assumptions for Hetzner cpx22.
Since htz-ctrl-nodes is used as both control+compute (and LB targets), this upsizes the whole cluster footprint for this test-set.

manifests/testing-framework/test-sets/succeeds-on-last-4/3.yaml (1)

51-52: Server flavor bump looks fine; just confirm availability/quota/cost assumptions for Hetzner cpx22.
This node pool backs control+compute (and LB targets), so the increase applies to the full test cluster.

manifests/testing-framework/test-sets/succeeds-on-last-4/1.yaml (1)

24-25: Server flavor bump looks fine; just confirm availability/quota/cost assumptions for Hetzner cpx22.

manifests/testing-framework/test-sets/autoscaling-1/1.yaml (1)

44-44: Verify that the e2-medium upgrade is necessary for Kubernetes 1.34.0 compatibility.

The manifest upgrades the gcp-cmpt-nodes serverType from e2-small to e2-medium. While the change is safe and the manifest structure is valid, this appears orthogonal to the core PR objective (fixing node deletion for kubectl >= 1.33). Clarify whether this upgrade is required for the new Kubernetes version or if it's part of a separate infrastructure improvement.

manifests/testing-framework/test-sets/autoscaling-1/2.yaml (1)

42-42: Verify that the e2-medium upgrade is necessary for Kubernetes 1.34.0 compatibility.

The manifest upgrades the gcp-cmpt-nodes serverType from e2-small to e2-medium. While the change is safe and the manifest structure is valid, this appears orthogonal to the core PR objective (fixing node deletion for kubectl >= 1.33). Clarify whether this upgrade is required for the new Kubernetes version or if it's part of a separate infrastructure improvement.

manifests/testing-framework/test-sets/autoscaling-1/3.yaml (1)

44-44: Verify that the e2-medium upgrade is necessary for Kubernetes 1.34.0 compatibility.

The manifest upgrades the gcp-cmpt-nodes serverType from e2-small to e2-medium. While the change is safe and the manifest structure is valid, this appears orthogonal to the core PR objective (fixing node deletion for kubectl >= 1.33). Clarify whether this upgrade is required for the new Kubernetes version or if it's part of a separate infrastructure improvement.

manifests/testing-framework/test-sets/test-set5/3.yaml (1)

34-34: Instance type upgrade for compute nodes looks good.

The serverType upgrade from t3.small to t3.medium for the aws-cmpt-nodes pool aligns with the broader test infrastructure improvements in this PR to support node deletion fixes for newer kubectl versions.

manifests/testing-framework/test-sets/test-set5/1.yaml (1)

34-34: Instance type upgrade for control plane nodes is appropriate.

The serverType change from t3.small to t3.medium for aws-ctrl-nodes provides additional headroom for control plane operations and aligns with the infrastructure updates across test-set5 manifests in this PR.

manifests/testing-framework/test-sets/test-set5/2.yaml (1)

34-34: Control plane instance type upgrade is consistent.

The serverType change from t3.small to t3.medium for aws-ctrl-nodes maintains consistency with the parallel test-set5/1.yaml and supports the enhanced node deletion handling required for kubectl >= 1.33.

manifests/testing-framework/test-sets/test-set4/2.yaml (1)

42-42: Configuration update approved.

The serverType change from VM.Standard1.1 to VM.Standard1.2 for the OCI node pool is a straightforward infrastructure upgrade. This aligns with similar updates in test-set4/1.yaml and test-set4/3.yaml.

manifests/testing-framework/test-sets/test-set4/1.yaml (1)

24-24: Configuration update approved.

The serverType change from VM.Standard1.1 to VM.Standard1.2 is consistent with the test-set4 infrastructure update across all related manifests.

manifests/testing-framework/test-sets/test-set4/3.yaml (1)

42-42: Configuration update approved.

The serverType change from VM.Standard1.1 to VM.Standard1.2 is consistent with the broader test-set4 infrastructure update.

manifests/testing-framework/test-sets/test-set3/2.yaml (1)

33-34: Verify provider availability/cost impact of the larger instance types (cpx22, e2-medium).
This is a pure capacity bump, but it can break CI if the types aren’t available in nbg1 / europe-west1 or if quotas/budget assumptions were tuned for cpx11 / e2-small.

Also applies to: 43-45

manifests/testing-framework/test-sets/test-set3/3.yaml (1)

51-54: Confirm cpx22 is available for Hetzner in nbg1 and won’t regress test reliability/cost.

manifests/testing-framework/test-sets/test-set3/1.yaml (1)

84-88: Double-check quotas/availability + LB pool sizing consistency.

Ensure t3.medium (AWS eu-west-2a), cpx22 (Hetzner nbg1), and e2-medium (GCP europe-west1) won’t hit quota/capacity issues in CI.

gcp-ldbl-nodes is still e2-small (Line 126) while other GCP pools are e2-medium—confirm that’s intentional (LB workload) vs an inconsistency with other updated test-set3 manifests.

Also applies to: 95-98, 114-128

manifests/testing-framework/test-sets/autoscaling-2/1.yaml (1)

33-33: Verify the necessity of serverType upgrade for kubectl 1.33 compatibility.

The serverType changes from t3.small to t3.medium are consistent across both control and compute node pools. However, the PR objectives focus on etcd API and node deletion logic changes. Please clarify the relationship between the kubectl 1.33 upgrade and the increased instance sizes, and consider documenting this rationale in the commit message or a comment within the manifest if the upgrade is resource-driven.

Also applies to: 49-49

manifests/testing-framework/test-sets/autoscaling-2/2.yaml (1)

35-35: Consistency with autoscaling-2/1.yaml confirmed.

The serverType changes in this file are consistent with autoscaling-2/1.yaml, applying the same upgrade pattern across both control and compute pools. Note that this file uses template tag v0.9.14 (vs. v0.9.12 in the other file), which may indicate different test scenarios or progression stages.

Also applies to: 51-51

manifests/testing-framework/test-sets/test-set2/1.yaml (1)

78-78: Test infrastructure configuration updated appropriately for K8s 1.34.0 testing.

The serverType upgrade to cpx22 for htz-kube-nodes and the addition of installationProxy for the OCI cluster are consistent with the PR's objective to validate node deletion with newer kubectl versions (1.34.0). The manifest structure is valid and all references are properly configured.

Also applies to: 169-171

manifests/testing-framework/test-sets/test-set2/2.yaml (1)

87-87: Test infrastructure configuration updated appropriately for K8s 1.34.0 testing.

The serverType upgrade to cpx22 for htz-kube-nodes and the addition of installationProxy for the OCI cluster maintain consistency across test-set2 manifests. The manifest is valid and properly configured for the newer Kubernetes version.

Also applies to: 179-181

manifests/testing-framework/test-sets/test-set2/3.yaml (1)

78-78: Test infrastructure configuration updated appropriately for K8s 1.34.0 testing.

The serverType upgrade to cpx22 and installationProxy addition are consistent with the broader test-set2 updates. The manifest properly configures the test environment for validating node deletion with newer Kubernetes versions.

Also applies to: 170-172

manifests/testing-framework/test-sets/test-set1/2.yaml (2)

87-87: Verify serverType upgrades are aligned across test infrastructure.

Instance type changes across all providers have been consistently upgraded (cpx11→cpx22, e2-small→e2-standard-4, etc.), which aligns with infrastructure scaling for newer Kubernetes versions. However, ensure these changes are mirrored across the parallel test-set1/1.yaml file and that the upgraded instance sizes provide adequate resources for the test scenarios.

Also applies to: 100-100, 127-127, 151-151, 162-162, 175-175, 236-236

265-267: Confirm intentionality: OCI cluster has installationProxy while others do not.

The OCI cluster (ts1-oci-cluster-test-set-no1) is the only cluster in this file with an installationProxy configuration (mode: "on"). Verify whether this is intentional or if installationProxy should be added to other clusters. Check test-set1/1.yaml for consistency on this configuration pattern.

manifests/testing-framework/test-sets/test-set1/1.yaml (2)

87-87: Approve: serverType upgrades are consistent with test-set1/2.yaml.

The instance type upgrades in this file align exactly with the changes in test-set1/2.yaml, indicating consistent test infrastructure scaling across both test-set configurations.

Also applies to: 100-100, 125-125, 152-152, 163-163, 176-176, 237-237

292-293: Discrepancy detected: installationProxy added to OVH cluster here, but not in test-set1/2.yaml.

Line 292-293 adds installationProxy: mode "on" to the OVH cluster (ts1-ovh-cluster-test-set-no1) in this file. However, the corresponding OVH cluster in test-set1/2.yaml does not have this configuration. Verify whether this is intentional (test-set1/1.yaml uses proxy, test-set1/2.yaml does not) or if the configuration should be mirrored in both files for consistency.

manifests/testing-framework/test-sets/rolling-update/2.yaml (1)

92-92: Test infrastructure upgrade approved.

The upgrade from cpx11 to cpx22 aligns with the broader test infrastructure updates in this PR. Verify that this server type is available in the Hetzner region (nbg1) and that resource quotas permit this upgrade.

manifests/testing-framework/test-sets/rolling-update/1.yaml (1)

92-92: Test infrastructure upgrade approved.

Consistent with the parallel upgrade in rolling-update/2.yaml, this Hetzner server type upgrade from cpx11 to cpx22 is appropriate for the test infrastructure changes in this PR.

manifests/testing-framework/test-sets/rolling-update-2/2.yaml (2)

34-34: Hetzner server type upgrade approved.

The upgrade from cpx11 to cpx22 is consistent with parallel changes in other rolling-update test-sets.

48-48: AWS instance type upgrade approved.

The upgrade from t3.small to t3.medium aligns with the broader AWS infrastructure upgrades documented across this PR's test-set changes.

manifests/testing-framework/test-sets/rolling-update-2/1.yaml (3)

70-70: Hetzner server type upgrade approved.

The upgrade of htz-ctrl-nodes from cpx11 to cpx22 is consistent with parallel infrastructure upgrades in this PR.

95-95: AWS instance type upgrade approved.

The upgrade from t3.small to t3.medium aligns with AWS infrastructure upgrades documented across this PR's test-set changes.

106-106: Hetzner server type upgrade approved.

The upgrade of htz-cmpt-nodes from cpx11 to cpx22 is consistent with parallel infrastructure upgrades across rolling-update test-sets.

manifests/testing-framework/test-sets/proxy-with-hetzner/2.yaml (2)

24-25: Server type bump looks consistent with the rest of the Hetzner proxy test-sets.

37-40: Server type bump looks consistent with the rest of the Hetzner proxy test-sets.

manifests/testing-framework/test-sets/proxy-with-hetzner/7.yaml (2)

25-33: Pool rename + serverType update are internally consistent (definition matches later control reference).

63-65: Control pool reference correctly follows the renamed dynamic pool.

manifests/testing-framework/test-sets/proxy-with-hetzner/5.yaml (2)

31-34: Hetzner serverType bump (cpx22) is straightforward.

45-48: GCP serverType bump (e2-medium) is straightforward.

manifests/testing-framework/test-sets/proxy-with-hetzner/8.yaml (3)

25-33: Renamed control pool + serverType bump are consistent within the manifest.

40-49: Compute pool serverType bump is consistent with the rest of the Hetzner proxy set.

63-66: Control pool reference correctly follows the renamed pool.

manifests/testing-framework/test-sets/proxy-with-hetzner/6.yaml (2)

31-34: Hetzner serverType bump (cpx22) is straightforward.

45-48: GCP serverType bump (e2-medium) is straightforward.

manifests/testing-framework/test-sets/proxy-with-hetzner/3.yaml (2)

23-26: Server type bump looks consistent with the rest of the Hetzner proxy test-sets.

36-41: Server type bump looks consistent with the rest of the Hetzner proxy test-sets.

manifests/testing-framework/test-sets/proxy-with-hetzner/4.yaml (2)

23-26: Server type bump looks consistent with the rest of the Hetzner proxy test-sets.

36-41: Server type bump looks consistent with the rest of the Hetzner proxy test-sets.

manifests/testing-framework/test-sets/proxy-without-hetzner/10.yaml (1)

37-37: LGTM. Straightforward infrastructure update aligning with broader test-set sizing improvements.

manifests/testing-framework/test-sets/proxy-without-hetzner/3.yaml (1)

37-37: LGTM. Consistent with broader infrastructure upgrades across proxy-without-hetzner test-sets.

manifests/testing-framework/test-sets/proxy-without-hetzner/7.yaml (1)

37-37: LGTM. Consistent infrastructure upgrade pattern.

manifests/testing-framework/test-sets/proxy-without-hetzner/6.yaml (1)

37-37: LGTM. Consistent sizing upgrade; proxy mode variation ("off") is intentional for test coverage.

manifests/testing-framework/test-sets/proxy-without-hetzner/1.yaml (1)

37-37: LGTM. Consistent infrastructure upgrade across test-sets.

manifests/testing-framework/test-sets/proxy-without-hetzner/5.yaml (1)

37-37: LGTM. Consistent sizing upgrade; proxy disabled for this test scenario.

manifests/testing-framework/test-sets/proxy-without-hetzner/2.yaml (1)

37-37: LGTM. Consistent sizing upgrade; proxy disabled for test variation.

manifests/testing-framework/test-sets/proxy-without-hetzner/4.yaml (1)

37-37: LGTM. Consistent sizing upgrade; proxy enabled for this test scenario.

manifests/testing-framework/test-sets/proxy-without-hetzner/9.yaml (1)

31-44: Server type bump is OK; verify the intent is to standardize compute sizing across proxy-without-hetzner cases.
Given this is a test-set manifest, the only practical concern is higher resource usage (and potential quota pressure) in CI.

manifests/testing-framework/test-sets/succeeds-on-last-2/1.yaml (1)

16-35: Verify Hetzner flavor availability/quotas after cpx22 bump.
serverType: cpx22 (Line 24) is likely fine, but test stability will depend on whether the Hetzner account/region used by CI supports this flavor and has sufficient quota (and cost expectations).

manifests/testing-framework/test-sets/succeeds-on-last-2/3.yaml (2)

26-34: Hetzner pool flavor bump looks consistent—just re-check CI capacity.
serverType: cpx22 (Line 33) matches the broader test-set upgrades; please ensure CI credentials/quotas in nbg1 still cover this flavor.

35-44: Verify OCI shape availability for the test tenancy/AD.
serverType: VM.Standard1.2 (Line 41) is an infra sizing change that can fail if the tenancy/region/AD lacks capacity or shape entitlement; worth validating for eu-frankfurt-1 / AD-1.

manifests/testing-framework/test-sets/succeeds-on-last-2/2.yaml (1)

26-34: Hetzner cpx22 change is fine; just validate quota/cost for the ordinary test set.
Since this is an e2e test-set manifest, ensure the Hetzner project used by CI has capacity for cpx22 so runs don’t become flaky.

jakubhlavacka · 2025-12-16T07:37:53Z

Closing because the deletion of the nodes works.
The other issues will be addressed later.

fix deletion of nodes

fcfddf4

Despire requested review from jakubhlavacka and m-brando December 8, 2025 11:19

Despire added the test-set-ordinary Will select ordinary test-sets that tests the general functionally of building/modifying clusters label Dec 8, 2025

coderabbitai Bot reviewed Dec 8, 2025

View reviewed changes

Comment thread services/kuber/server/domain/utils/nodes/delete.go

Auto commit - update kustomization.yaml

dafa057

jakubhlavacka approved these changes Dec 8, 2025

View reviewed changes

check number of etcd pods

3320171

coderabbitai Bot reviewed Dec 8, 2025

View reviewed changes

add warn log on not found etcd member

cab8fbf

coderabbitai Bot reviewed Dec 8, 2025

View reviewed changes

Auto commit - update kustomization.yaml

891245f

jakubhlavacka self-assigned this Dec 8, 2025

cordon and drain the master node before removing from etcd members

71d12fe

the eviction used to stuck when the kuber removes the master node from etcd members first

coderabbitai Bot reviewed Dec 9, 2025

View reviewed changes

CI/CD pipeline and others added 12 commits December 9, 2025 16:44

Auto commit - update kustomization.yaml

dc7bfb6

trigger CI

91df78f

Auto commit - update kustomization.yaml

68cc743

trigger CI

c4bde40

Auto commit - update kustomization.yaml

29aa76f

upgrade kubectl to v1.33.0

7b39677

supporting kubeAPI v1.32.x, v1.33.x, and v1.34.x

update kubectl download link in kuber Dockerfile

c02aadd

Auto commit - update kustomization.yaml

dc8d688

Trigger CI

237919f

trigger CI

ff2982b

install docker BuildKit in CI pipeline

1496192

the newer version of actions-runner (2.329.x and later) don't have the BuildKit

run apk update in kuber Dockerfile before installing new package

d3fd7aa

coderabbitai Bot reviewed Dec 11, 2025

View reviewed changes

Auto commit - update kustomization.yaml

ab602cb

jakubhlavacka and others added 9 commits December 12, 2025 09:39

use proxy in ts1-ovh-cluster

cda031c

Auto commit - update kustomization.yaml

9e191a0

increase ovh, gcp, and oci resources in test-set1

fb5d750

increase gcp, oci, and ovh resources in test-set1

cfe3774

Auto commit - update kustomization.yaml

91a2e55

upgrade OCI machines in e2e tests

d3999c4

upgrade ovh and gcp nodepool machines in e2e tests

d43aeb9

upgrade aws machines in the e2e tests

02a4c19

upgrade htz machines in e2e tests

64ff97a

coderabbitai Bot reviewed Dec 12, 2025

View reviewed changes

CI/CD pipeline and others added 7 commits December 12, 2025 14:21

Auto commit - update kustomization.yaml

53f692f

trigger CI

5040213

Auto commit - update kustomization.yaml

14006af

trigger CI

8782a85

Auto commit - update kustomization.yaml

78c75c5

use proxy in ts1-ovh test-set1

ad27216

Auto commit - update kustomization.yaml

4f4136c

jakubhlavacka added this pull request to the merge queue Dec 16, 2025

Merged via the queue into master with commit 7a2ccd3 Dec 16, 2025

jakubhlavacka deleted the fix/deletion-nodes branch December 16, 2025 07:38

jakubhlavacka mentioned this pull request Dec 16, 2025

Bug: e2e failure due to the kubelet's missing certificate #1920

Closed

coderabbitai Bot mentioned this pull request Dec 17, 2025

upgrade GCP, Azure, OCI, Hetzner, and OVH VMs in e2e #1921

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: deletion of nodes#1919

fix: deletion of nodes#1919
jakubhlavacka merged 35 commits into
masterfrom
fix/deletion-nodes

Despire commented Dec 8, 2025 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Dec 8, 2025 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

jakubhlavacka left a comment

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot left a comment

Uh oh!

jakubhlavacka commented Dec 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Despire commented Dec 8, 2025 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Pre-merge checks and finishing touches

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jakubhlavacka left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

jakubhlavacka commented Dec 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Despire commented Dec 8, 2025 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Dec 8, 2025 •

edited

Loading