Skip to content

operator,storage-operator: bump operator-sdk to v1.42.0#4791

Closed
ezekiel-alexrod wants to merge 10 commits intodevelopment/133.0from
improvement/bump-operator-sdk-v1.42.0
Closed

operator,storage-operator: bump operator-sdk to v1.42.0#4791
ezekiel-alexrod wants to merge 10 commits intodevelopment/133.0from
improvement/bump-operator-sdk-v1.42.0

Conversation

@ezekiel-alexrod
Copy link
Copy Markdown
Contributor

@ezekiel-alexrod ezekiel-alexrod commented Feb 23, 2026

Bump operator-sdk to v1.42.0 in metalk8s-operator and storage-operator

Summary

This PR bumps the operator-sdk from v1.37.x (baseline) to v1.42.0 incrementally in both
the metalk8s-operator and storage-operator, following the official migration guides for each
version. It also takes the opportunity to improve error handling in the ClusterConfig controller
and fix a test reliability issue.


Changes

operator-sdk incremental bumps (operator & storage-operator)

Each bump was applied step-by-step following the upstream migration guide:

  • v1.38.0 — Remove kube-rbac-proxy; expose metrics via a native HTTPS endpoint; update
    golangci-lint to v1.59.1; update controller-tools to v0.15.0 and kustomize to v5.4.2.
  • v1.39.0 — Upgrade Kubernetes dependencies to v0.31.14 and controller-runtime to v0.19.5;
    update kustomize to v5.4.3 and controller-tools to v0.16.1; add network-policy scaffolding
    to protect metrics and webhook endpoints.
  • v1.40.0 — Upgrade Go to 1.23.12 and Kubernetes dependencies to v0.32.12 with
    controller-runtime v0.20.4; add metrics certificate watcher support via --metrics-cert-path
    flags; replace static ENVTEST versions with dynamic go list computation; add network-policy
    TLS patch scaffolding for Prometheus and cert-manager metrics integration.
  • v1.41.0 — Upgrade Go to 1.24.13 and Kubernetes dependencies to v0.33.8 with
    controller-runtime v0.21.0; migrate golangci-lint to v2 config format (v2.8.0); update
    controller-tools to v0.18.0; add Kind cluster targets for e2e tests (setup-test-e2e,
    cleanup-test-e2e).
  • v1.42.0 — No migrations required for this release; make metalk8s and ./doit.sh codegen
    re-run to regenerate artifacts.

Align operator files with operator-sdk testdata examples

After all version bumps, the configuration files for both operators (config/, cmd/main.go,
Makefile, Dockerfile, README.md, etc.) were realigned with the reference testdata examples
provided by the operator-sdk, ensuring consistency with upstream best practices.

Fix: replace panic with structured log + os.Exit(1) in ClusterConfig controller

In operator/pkg/controller/clusterconfig/controller.go, the previous behaviour of calling
panic(fmt.Errorf(...)) when the main ClusterConfig object was unexpectedly deleted has been
replaced with a cleaner approach: log the error via logr and exit with code 1. This produces
a proper log entry in the operator's output and still triggers a pod restart, while avoiding an
uncontrolled panic and its potentially confusing stack trace.

Fix: rollout restart of operator after ingress tests

A new autouse module-scoped pytest fixture was added to tests/post/steps/test_ingress.py.
Ingress tests heavily modify the ClusterConfig, which can cause the operator to enter
CrashLoopBackOff with exponentially growing backoff delays. The fixture performs a
kubectl rollout restart of the operator deployment at the end of the ingress test module,
ensuring the operator pod starts fresh (restartCount=0) before subsequent test modules
(e.g. test_operator) that depend on quick operator recovery.

Closes: MK8S-110, MK8S-111

@bert-e
Copy link
Copy Markdown
Contributor

bert-e commented Feb 23, 2026

Hello ezekiel-alexrod,

My role is to assist you with the merge of this
pull request. Please type @bert-e help to get information
on this process, or consult the user documentation.

Available options
name description privileged authored
/after_pull_request Wait for the given pull request id to be merged before continuing with the current one.
/bypass_author_approval Bypass the pull request author's approval
/bypass_build_status Bypass the build and test status
/bypass_commit_size Bypass the check on the size of the changeset TBA
/bypass_incompatible_branch Bypass the check on the source branch prefix
/bypass_jira_check Bypass the Jira issue check
/bypass_peer_approval Bypass the pull request peers' approval
/bypass_leader_approval Bypass the pull request leaders' approval
/approve Instruct Bert-E that the author has approved the pull request. ✍️
/create_pull_requests Allow the creation of integration pull requests.
/create_integration_branches Allow the creation of integration branches.
/no_octopus Prevent Wall-E from doing any octopus merge and use multiple consecutive merge instead
/unanimity Change review acceptance criteria from one reviewer at least to all reviewers
/wait Instruct Bert-E not to run until further notice.
Available commands
name description privileged
/help Print Bert-E's manual in the pull request.
/status Print Bert-E's current status in the pull request TBA
/clear Remove all comments from Bert-E from the history TBA
/retry Re-start a fresh build TBA
/build Re-start a fresh build TBA
/force_reset Delete integration branches & pull requests, and restart merge process from the beginning.
/reset Try to remove integration branches unless there are commits on them which do not appear on the source branch.

Status report is not available.

@bert-e
Copy link
Copy Markdown
Contributor

bert-e commented Feb 23, 2026

Waiting for approval

The following approvals are needed before I can proceed with the merge:

  • the author

  • 2 peers

Peer approvals must include at least 1 approval from the following list:

@ezekiel-alexrod ezekiel-alexrod force-pushed the improvement/bump-operator-sdk-v1.42.0 branch 3 times, most recently from 249f34d to 4eb574b Compare February 25, 2026 13:52
@bert-e
Copy link
Copy Markdown
Contributor

bert-e commented Feb 25, 2026

Waiting for approval

The following approvals are needed before I can proceed with the merge:

  • the author

  • 2 peers

Peer approvals must include at least 1 approval from the following list:

@ezekiel-alexrod ezekiel-alexrod force-pushed the improvement/bump-operator-sdk-v1.42.0 branch from 4eb574b to ce3caa1 Compare February 25, 2026 16:00
@bert-e
Copy link
Copy Markdown
Contributor

bert-e commented Feb 25, 2026

Conflict

There is a conflict between your branch improvement/bump-operator-sdk-v1.42.0 and the
destination branch development/133.0.

Please resolve the conflict on the feature branch (improvement/bump-operator-sdk-v1.42.0).

git fetch && \
git checkout origin/improvement/bump-operator-sdk-v1.42.0 && \
git merge origin/development/133.0

Resolve merge conflicts and commit

git push origin HEAD:improvement/bump-operator-sdk-v1.42.0

Following the guide https://sdk.operatorframework.io/docs/upgrading-sdk-version/v1.38.0/ :
- Remove kube-rbac-proxy, expose metrics via HTTPS endpoint
- Update golangci-lint to v1.59.1
- Update controller-tools to v0.15.0, kustomize to v5.4.2, and go-install-tool with symlink pattern
Following the guide https://sdk.operatorframework.io/docs/upgrading-sdk-version/v1.39.0/ :
- Upgrade K8s dependencies to v0.31.14 and controller-runtime to v0.19.5
(K8s 1.31).
- Update kustomize to v5.4.3, controller-tools to v0.16.1,
ENVTEST to 1.31.
- Add network-policy scaffolding to protect metrics and
webhook endpoints.
Following the guide https://sdk.operatorframework.io/docs/upgrading-sdk-version/v1.40.0/ :
- Upgrade Go to 1.23.12 and K8s dependencies to v0.32.12 with
controller-runtime v0.20.4.
- Add metrics certificate watcher support via --metrics-cert-path flags.
- Replace static ENVTEST versions with dynamic go list computation.
- Upgrade OPM to v1.55.0, add lint-config and setup-envtest targets.
- Add network-policy TLS patch scaffolding for Prometheus and cert-manager metrics integration.
Following the guide https://sdk.operatorframework.io/docs/upgrading-sdk-version/v1.41.0/ :
- Upgrade Go to 1.24.13 and K8s dependencies to v0.33.8 with
controller-runtime v0.21.0. Migrate golangci-lint to v2.8.0 with v2
config format.
- Update controller-tools to v0.18.0.
- Add Kind cluster targets for e2e tests (setup-test-e2e, cleanup-test-e2e).
Following the guide https://sdk.operatorframework.io/docs/upgrading-sdk-version/v1.42.0/ :
No migrations required for this release.

Command launched in each operator directory:
make metalk8s

Command launched in the root directory:
./doit.sh codegen
@ezekiel-alexrod ezekiel-alexrod force-pushed the improvement/bump-operator-sdk-v1.42.0 branch from d30c7d6 to abe9f85 Compare February 25, 2026 16:18
@bert-e
Copy link
Copy Markdown
Contributor

bert-e commented Feb 25, 2026

Waiting for approval

The following approvals are needed before I can proceed with the merge:

  • the author

  • 2 peers

Peer approvals must include at least 1 approval from the following list:

@ezekiel-alexrod ezekiel-alexrod force-pushed the improvement/bump-operator-sdk-v1.42.0 branch from abe9f85 to 5533d5d Compare February 25, 2026 16:29
@ezekiel-alexrod ezekiel-alexrod force-pushed the improvement/bump-operator-sdk-v1.42.0 branch from 5533d5d to db708e5 Compare February 26, 2026 11:17
@bert-e
Copy link
Copy Markdown
Contributor

bert-e commented Feb 26, 2026

Waiting for approval

The following approvals are needed before I can proceed with the merge:

  • the author

  • 2 peers

Peer approvals must include at least 1 approval from the following list:

@bert-e
Copy link
Copy Markdown
Contributor

bert-e commented Feb 26, 2026

Conflict

There is a conflict between your branch improvement/bump-operator-sdk-v1.42.0 and the
destination branch development/133.0.

Please resolve the conflict on the feature branch (improvement/bump-operator-sdk-v1.42.0).

git fetch && \
git checkout origin/improvement/bump-operator-sdk-v1.42.0 && \
git merge origin/development/133.0

Resolve merge conflicts and commit

git push origin HEAD:improvement/bump-operator-sdk-v1.42.0

@bert-e
Copy link
Copy Markdown
Contributor

bert-e commented Feb 26, 2026

Waiting for approval

The following approvals are needed before I can proceed with the merge:

  • the author

  • 2 peers

Peer approvals must include at least 1 approval from the following list:

@ezekiel-alexrod ezekiel-alexrod marked this pull request as ready for review February 27, 2026 08:33
@ezekiel-alexrod ezekiel-alexrod requested a review from a team as a code owner February 27, 2026 08:33
Copy link
Copy Markdown
Collaborator

@TeddyAndrieux TeddyAndrieux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry but there is too many changes to do, I reviewed only 28 files and already got 17 comments (and I may have miss some)
I'm really not confident reviewing it not your fault this update mechanism is really error prone, may I suggest to do a fresh operator sdk project for each (operator and storage-operaotr) and just copy our code logic in it so that we are more confident about the content

@@ -88,7 +88,7 @@ RUN curl --fail -L -o /tmp/go.tar.gz https://go.dev/dl/go${GO_VERSION}.linux-amd
rm -rf /tmp/go.tar.gz

# Install golangci-lint
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

go version should be bumped as well here I think

Comment on lines +208 to +210
// Add a Recorder to the reconciler.
// This allows the operator author to emit events during reconcilliation.
// Recorder: mgr.GetEventRecorderFor("clusterconfig-controller"),
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me it's not part of default operatorsdk scalfolding it's specific to testdata so I suggest to remove those (especially since we already create a recorder in the code, maybe we should change the way we handle the recorder but it's not the goal of this PR)

Comment on lines +218 to +220
// Add a Recorder to the reconciler.
// This allows the operator author to emit events during reconcilliation.
// Recorder: mgr.GetEventRecorderFor("virtualippool-controller"),
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@@ -0,0 +1,20 @@
# The following manifests contain a self-signed issuer CR and a metrics certificate CR.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is only added when we add webhook we do not have any here so let's remove it

@@ -0,0 +1,20 @@
# The following manifests contain a self-signed issuer CR and a certificate CR.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@@ -0,0 +1,3 @@
resources:
- allow-webhook-traffic.yaml
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto remove this one

@@ -0,0 +1,27 @@
# This rule is not used by the project memcached-operator itself.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

memchaed-operator

# This rule is not used by the project memcached-operator itself.
# It is provided to allow the cluster admin to help manage permissions for users.
#
# Grants full permissions ('*') over cache.example.com.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Comment on lines -533 to +525
name: metalk8s-operator-manager-role
name: metalk8s-operator-clusterconfig-admin-role
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have to handle this change during upgrade/downgrade (removing the "old" one)

app.kubernetes.io/managed-by: kustomize
app.kubernetes.io/name: operator
app.kubernetes.io/part-of: metalk8s
name: metalk8s-operator-clusterconfig-editor-role
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@ezekiel-alexrod
Copy link
Copy Markdown
Contributor Author

ezekiel-alexrod commented Mar 12, 2026

Closing to clean the PR and restarting one here : #4818

@ezekiel-alexrod ezekiel-alexrod deleted the improvement/bump-operator-sdk-v1.42.0 branch March 24, 2026 09:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants