Skip to content

Conversation

@gpei
Copy link
Contributor

@gpei gpei commented Jun 6, 2025

As a follow up of #65478, we found that there's no need to run the RHEL upgrade playbook before the MCO update for 4.14 to 4.16 and 4.16 to 4.18, so this PR is for verifying whether it is still necessary to use the UPGRADE_RHEL_WORKER_BEFOREHAND parameter in current 4.15 to 4.16, 4.16 to 4.17 upgrade jobs.

The following passed rehearsal jobs proved that we can remove the UPGRADE_RHEL_WORKER_BEFOREHAND env var from the jobs now.

1.    "podsPerCore": 100,
was added in KubeletConfiguration

2. /etc/crio/crio.conf.d/01-ctrcfg-pidsLimit
was created to set the following option

[crio]
  [crio.runtime]
    pids_limit = 2048

The following log from https://gcsweb-qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/qe-private-deck/pr-logs/pull/openshift_release/65773/rehearse-65773-periodic-ci-openshift-openshift-tests-private-release-4.17-amd64-nightly-4.17-upgrade-from-stable-4.16-azure-ipi-proxy-workers-rhcos-rhel8-f28/1930937876522995712/artifacts/azure-ipi-proxy-workers-rhcos-rhel8-f28/openshift-extended-upgrade-pre/build-log.txt also indicated this

  Jun  6 12:44:43.046: INFO: Running 'oc --namespace=e2e-test-mco-upgrade-lrg86 --kubeconfig=/tmp/kubeconfig-2784990640 process --ignore-unknown-parameters=true -f /tmp/fixture-testdata-dir1617868448/test/extended/testdata/mco/generic-kubelet-config.yaml -p NAME=mco-tc-62154-kubeletconfig -p KUBELETCONFIG={"podsPerCore": 100}'
  Jun  6 12:44:43.310: INFO: the file of resource is /tmp/e2e-test-mco-upgrade-lrg86-e0787hk3config.json.stdout
  Jun  6 12:44:43.310: INFO: Running 'oc --kubeconfig=/tmp/kubeconfig-2784990640 create -f /tmp/e2e-test-mco-upgrade-lrg86-e0787hk3config.json.stdout'
  kubeletconfig.machineconfiguration.openshift.io/mco-tc-62154-kubeletconfig created
  STEP: create ContainerRuntimeConfig 06/06/25 12:44:43.568
  Jun  6 12:44:46.570: INFO: showInfo is true
  Jun  6 12:44:46.570: INFO: Running 'oc --namespace=e2e-test-mco-upgrade-lrg86 --kubeconfig=/tmp/kubeconfig-2784990640 process --ignore-unknown-parameters=true -f /tmp/fixture-testdata-dir1617868448/test/extended/testdata/mco/generic-container-runtime-config.yaml -p NAME=mco-tc-62154-crconfig -p CRCONFIG={"pidsLimit": 2048}'
  Jun  6 12:44:46.838: INFO: the file of resource is /tmp/e2e-test-mco-upgrade-lrg86-pb2ropk6config.json.stdout
  Jun  6 12:44:46.838: INFO: Running 'oc --kubeconfig=/tmp/kubeconfig-2784990640 create -f /tmp/e2e-test-mco-upgrade-lrg86-pb2ropk6config.json.stdout'
  containerruntimeconfig.machineconfiguration.openshift.io/mco-tc-62154-crconfig created
  STEP: wait for worker pool to be ready 06/06/25 12:44:47.117
  Jun  6 12:44:47.117: INFO: Running 'oc --kubeconfig=/tmp/kubeconfig-2784990640 get mcp worker -o jsonpath={.status.machineCount}'
...
  Jun  6 12:55:47.947: ERROR: Degraded MC:
...
                  "lastTransitionTime": "2025-06-06T12:55:07Z",
                  "message": "Node ci-op-ygvsqis0-c057b-s4jpr-rhel-1 is reporting: \"reboot command failed, something is seriously wrong\"",
                  "reason": "1 nodes are reporting degraded status on sync",

will check with MCO team about this, but this will not affect our removal of the UPGRADE_RHEL_WORKER_BEFOREHAND parameter.

@openshift-ci openshift-ci bot requested review from Xia-Zhao-rh and memodi June 6, 2025 10:34
@gpei
Copy link
Contributor Author

gpei commented Jun 6, 2025

/uncc memodi
/uncc Xia-Zhao-rh

@openshift-ci openshift-ci bot removed request for Xia-Zhao-rh and memodi June 6, 2025 10:35
@gpei
Copy link
Contributor Author

gpei commented Jun 6, 2025

/pj-rehearse periodic-ci-openshift-openshift-tests-private-release-4.17-amd64-nightly-4.17-upgrade-from-stable-4.16-azure-ipi-proxy-workers-rhcos-rhel8-f28 periodic-ci-openshift-openshift-tests-private-release-4.16-amd64-nightly-4.16-upgrade-from-stable-4.15-azure-ipi-proxy-workers-rhcos-rhel8-f28

@openshift-ci-robot
Copy link
Contributor

@gpei: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@gpei
Copy link
Contributor Author

gpei commented Jun 7, 2025

/pj-rehearse periodic-ci-openshift-openshift-tests-private-release-4.17-amd64-nightly-4.17-upgrade-from-stable-4.16-azure-ipi-proxy-workers-rhcos-rhel8-f28

@openshift-ci-robot
Copy link
Contributor

@gpei: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@gpei gpei changed the title [DEBUG]remove UPGRADE_RHEL_WORKER_BEFOREHAND to have a try Remove UPGRADE_RHEL_WORKER_BEFOREHAND workaround in RHEL upgrade jobs Jun 7, 2025
@openshift-ci-robot
Copy link
Contributor

[REHEARSALNOTIFIER]
@gpei: the pj-rehearse plugin accommodates running rehearsal tests for the changes in this PR. Expand 'Interacting with pj-rehearse' for usage details. The following rehearsable tests have been affected by this change:

Test name Repo Type Reason
periodic-ci-openshift-openshift-tests-private-release-4.16-amd64-nightly-4.16-upgrade-from-stable-4.15-azure-ipi-proxy-workers-rhcos-rhel8-f28 N/A periodic Ci-operator config changed
periodic-ci-openshift-openshift-tests-private-release-4.17-amd64-nightly-4.17-upgrade-from-stable-4.16-azure-ipi-proxy-workers-rhcos-rhel8-f28 N/A periodic Ci-operator config changed
Interacting with pj-rehearse

Comment: /pj-rehearse to run up to 5 rehearsals
Comment: /pj-rehearse skip to opt-out of rehearsals
Comment: /pj-rehearse {test-name}, with each test separated by a space, to run one or more specific rehearsals
Comment: /pj-rehearse more to run up to 10 rehearsals
Comment: /pj-rehearse max to run up to 25 rehearsals
Comment: /pj-rehearse auto-ack to run up to 5 rehearsals, and add the rehearsals-ack label on success
Comment: /pj-rehearse list to get an up-to-date list of affected jobs
Comment: /pj-rehearse abort to abort all active rehearsals
Comment: /pj-rehearse network-access-allowed to allow rehearsals of tests that have the restrict_network_access field set to false. This must be executed by an openshift org member who is not the PR author

Once you are satisfied with the results of the rehearsals, comment: /pj-rehearse ack to unblock merge. When the rehearsals-ack label is present on your PR, merge will no longer be blocked by rehearsals.
If you would like the rehearsals-ack label removed, comment: /pj-rehearse reject to re-block merging.

1 similar comment
@openshift-ci-robot
Copy link
Contributor

[REHEARSALNOTIFIER]
@gpei: the pj-rehearse plugin accommodates running rehearsal tests for the changes in this PR. Expand 'Interacting with pj-rehearse' for usage details. The following rehearsable tests have been affected by this change:

Test name Repo Type Reason
periodic-ci-openshift-openshift-tests-private-release-4.16-amd64-nightly-4.16-upgrade-from-stable-4.15-azure-ipi-proxy-workers-rhcos-rhel8-f28 N/A periodic Ci-operator config changed
periodic-ci-openshift-openshift-tests-private-release-4.17-amd64-nightly-4.17-upgrade-from-stable-4.16-azure-ipi-proxy-workers-rhcos-rhel8-f28 N/A periodic Ci-operator config changed
Interacting with pj-rehearse

Comment: /pj-rehearse to run up to 5 rehearsals
Comment: /pj-rehearse skip to opt-out of rehearsals
Comment: /pj-rehearse {test-name}, with each test separated by a space, to run one or more specific rehearsals
Comment: /pj-rehearse more to run up to 10 rehearsals
Comment: /pj-rehearse max to run up to 25 rehearsals
Comment: /pj-rehearse auto-ack to run up to 5 rehearsals, and add the rehearsals-ack label on success
Comment: /pj-rehearse list to get an up-to-date list of affected jobs
Comment: /pj-rehearse abort to abort all active rehearsals
Comment: /pj-rehearse network-access-allowed to allow rehearsals of tests that have the restrict_network_access field set to false. This must be executed by an openshift org member who is not the PR author

Once you are satisfied with the results of the rehearsals, comment: /pj-rehearse ack to unblock merge. When the rehearsals-ack label is present on your PR, merge will no longer be blocked by rehearsals.
If you would like the rehearsals-ack label removed, comment: /pj-rehearse reject to re-block merging.

@gpei
Copy link
Contributor Author

gpei commented Jun 9, 2025

@jianlinliu please help to take a look, thx

@jianlinliu
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jun 9, 2025
@gpei
Copy link
Contributor Author

gpei commented Jun 9, 2025

@liangxia for the approval, thanks

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 9, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: gpei, jianlinliu, liangxia

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 9, 2025
@gpei
Copy link
Contributor Author

gpei commented Jun 9, 2025

/pj-rehearse ack

@openshift-ci-robot
Copy link
Contributor

@gpei: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@openshift-ci-robot openshift-ci-robot added the rehearsals-ack Signifies that rehearsal jobs have been acknowledged label Jun 9, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 9, 2025

@gpei: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot bot merged commit 5ba445e into openshift:master Jun 9, 2025
13 checks passed
mehabhalodiya pushed a commit to mehabhalodiya/release that referenced this pull request Jun 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. rehearsals-ack Signifies that rehearsal jobs have been acknowledged

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants