operator,storage-operator: bump operator-sdk to v1.42.0#4791
operator,storage-operator: bump operator-sdk to v1.42.0#4791ezekiel-alexrod wants to merge 10 commits intodevelopment/133.0from
Conversation
Hello ezekiel-alexrod,My role is to assist you with the merge of this Available options
Available commands
Status report is not available. |
Waiting for approvalThe following approvals are needed before I can proceed with the merge:
Peer approvals must include at least 1 approval from the following list: |
249f34d to
4eb574b
Compare
Waiting for approvalThe following approvals are needed before I can proceed with the merge:
Peer approvals must include at least 1 approval from the following list: |
4eb574b to
ce3caa1
Compare
ConflictThere is a conflict between your branch Please resolve the conflict on the feature branch ( git fetch && \
git checkout origin/improvement/bump-operator-sdk-v1.42.0 && \
git merge origin/development/133.0Resolve merge conflicts and commit git push origin HEAD:improvement/bump-operator-sdk-v1.42.0 |
Following the guide https://sdk.operatorframework.io/docs/upgrading-sdk-version/v1.38.0/ : - Remove kube-rbac-proxy, expose metrics via HTTPS endpoint - Update golangci-lint to v1.59.1 - Update controller-tools to v0.15.0, kustomize to v5.4.2, and go-install-tool with symlink pattern
Following the guide https://sdk.operatorframework.io/docs/upgrading-sdk-version/v1.39.0/ : - Upgrade K8s dependencies to v0.31.14 and controller-runtime to v0.19.5 (K8s 1.31). - Update kustomize to v5.4.3, controller-tools to v0.16.1, ENVTEST to 1.31. - Add network-policy scaffolding to protect metrics and webhook endpoints.
Following the guide https://sdk.operatorframework.io/docs/upgrading-sdk-version/v1.40.0/ : - Upgrade Go to 1.23.12 and K8s dependencies to v0.32.12 with controller-runtime v0.20.4. - Add metrics certificate watcher support via --metrics-cert-path flags. - Replace static ENVTEST versions with dynamic go list computation. - Upgrade OPM to v1.55.0, add lint-config and setup-envtest targets. - Add network-policy TLS patch scaffolding for Prometheus and cert-manager metrics integration.
Following the guide https://sdk.operatorframework.io/docs/upgrading-sdk-version/v1.41.0/ : - Upgrade Go to 1.24.13 and K8s dependencies to v0.33.8 with controller-runtime v0.21.0. Migrate golangci-lint to v2.8.0 with v2 config format. - Update controller-tools to v0.18.0. - Add Kind cluster targets for e2e tests (setup-test-e2e, cleanup-test-e2e).
Following the guide https://sdk.operatorframework.io/docs/upgrading-sdk-version/v1.42.0/ : No migrations required for this release. Command launched in each operator directory: make metalk8s Command launched in the root directory: ./doit.sh codegen
d30c7d6 to
abe9f85
Compare
Waiting for approvalThe following approvals are needed before I can proceed with the merge:
Peer approvals must include at least 1 approval from the following list: |
abe9f85 to
5533d5d
Compare
5533d5d to
db708e5
Compare
Waiting for approvalThe following approvals are needed before I can proceed with the merge:
Peer approvals must include at least 1 approval from the following list: |
ConflictThere is a conflict between your branch Please resolve the conflict on the feature branch ( git fetch && \
git checkout origin/improvement/bump-operator-sdk-v1.42.0 && \
git merge origin/development/133.0Resolve merge conflicts and commit git push origin HEAD:improvement/bump-operator-sdk-v1.42.0 |
Waiting for approvalThe following approvals are needed before I can proceed with the merge:
Peer approvals must include at least 1 approval from the following list: |
TeddyAndrieux
left a comment
There was a problem hiding this comment.
I'm sorry but there is too many changes to do, I reviewed only 28 files and already got 17 comments (and I may have miss some)
I'm really not confident reviewing it not your fault this update mechanism is really error prone, may I suggest to do a fresh operator sdk project for each (operator and storage-operaotr) and just copy our code logic in it so that we are more confident about the content
| @@ -88,7 +88,7 @@ RUN curl --fail -L -o /tmp/go.tar.gz https://go.dev/dl/go${GO_VERSION}.linux-amd | |||
| rm -rf /tmp/go.tar.gz | |||
|
|
|||
| # Install golangci-lint | |||
There was a problem hiding this comment.
go version should be bumped as well here I think
| // Add a Recorder to the reconciler. | ||
| // This allows the operator author to emit events during reconcilliation. | ||
| // Recorder: mgr.GetEventRecorderFor("clusterconfig-controller"), |
There was a problem hiding this comment.
To me it's not part of default operatorsdk scalfolding it's specific to testdata so I suggest to remove those (especially since we already create a recorder in the code, maybe we should change the way we handle the recorder but it's not the goal of this PR)
| // Add a Recorder to the reconciler. | ||
| // This allows the operator author to emit events during reconcilliation. | ||
| // Recorder: mgr.GetEventRecorderFor("virtualippool-controller"), |
| @@ -0,0 +1,20 @@ | |||
| # The following manifests contain a self-signed issuer CR and a metrics certificate CR. | |||
There was a problem hiding this comment.
This one is only added when we add webhook we do not have any here so let's remove it
| @@ -0,0 +1,20 @@ | |||
| # The following manifests contain a self-signed issuer CR and a certificate CR. | |||
| @@ -0,0 +1,3 @@ | |||
| resources: | |||
| - allow-webhook-traffic.yaml | |||
There was a problem hiding this comment.
ditto remove this one
| @@ -0,0 +1,27 @@ | |||
| # This rule is not used by the project memcached-operator itself. | |||
| # This rule is not used by the project memcached-operator itself. | ||
| # It is provided to allow the cluster admin to help manage permissions for users. | ||
| # | ||
| # Grants full permissions ('*') over cache.example.com. |
| name: metalk8s-operator-manager-role | ||
| name: metalk8s-operator-clusterconfig-admin-role |
There was a problem hiding this comment.
We have to handle this change during upgrade/downgrade (removing the "old" one)
| app.kubernetes.io/managed-by: kustomize | ||
| app.kubernetes.io/name: operator | ||
| app.kubernetes.io/part-of: metalk8s | ||
| name: metalk8s-operator-clusterconfig-editor-role |
|
Closing to clean the PR and restarting one here : #4818 |
Bump
operator-sdkto v1.42.0 inmetalk8s-operatorandstorage-operatorSummary
This PR bumps the
operator-sdkfrom v1.37.x (baseline) to v1.42.0 incrementally in boththe
metalk8s-operatorandstorage-operator, following the official migration guides for eachversion. It also takes the opportunity to improve error handling in the ClusterConfig controller
and fix a test reliability issue.
Changes
operator-sdk incremental bumps (
operator&storage-operator)Each bump was applied step-by-step following the upstream migration guide:
kube-rbac-proxy; expose metrics via a native HTTPS endpoint; updategolangci-lintto v1.59.1; updatecontroller-toolsto v0.15.0 andkustomizeto v5.4.2.controller-runtimeto v0.19.5;update
kustomizeto v5.4.3 andcontroller-toolsto v0.16.1; add network-policy scaffoldingto protect metrics and webhook endpoints.
controller-runtimev0.20.4; add metrics certificate watcher support via--metrics-cert-pathflags; replace static ENVTEST versions with dynamic
go listcomputation; add network-policyTLS patch scaffolding for Prometheus and cert-manager metrics integration.
controller-runtimev0.21.0; migrategolangci-lintto v2 config format (v2.8.0); updatecontroller-toolsto v0.18.0; add Kind cluster targets for e2e tests (setup-test-e2e,cleanup-test-e2e).make metalk8sand./doit.sh codegenre-run to regenerate artifacts.
Align operator files with operator-sdk testdata examples
After all version bumps, the configuration files for both operators (
config/,cmd/main.go,Makefile,Dockerfile,README.md, etc.) were realigned with the referencetestdataexamplesprovided by the operator-sdk, ensuring consistency with upstream best practices.
Fix: replace
panicwith structured log +os.Exit(1)in ClusterConfig controllerIn
operator/pkg/controller/clusterconfig/controller.go, the previous behaviour of callingpanic(fmt.Errorf(...))when the mainClusterConfigobject was unexpectedly deleted has beenreplaced with a cleaner approach: log the error via
logrand exit with code1. This producesa proper log entry in the operator's output and still triggers a pod restart, while avoiding an
uncontrolled panic and its potentially confusing stack trace.
Fix: rollout restart of operator after ingress tests
A new
autousemodule-scoped pytest fixture was added totests/post/steps/test_ingress.py.Ingress tests heavily modify the
ClusterConfig, which can cause the operator to enterCrashLoopBackOffwith exponentially growing backoff delays. The fixture performs akubectl rollout restartof the operator deployment at the end of the ingress test module,ensuring the operator pod starts fresh (
restartCount=0) before subsequent test modules(e.g.
test_operator) that depend on quick operator recovery.Closes: MK8S-110, MK8S-111