Skip to content

Conversation

@jianlinliu
Copy link
Contributor

No description provided.

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented May 27, 2025

@jianlinliu: This pull request references CORS-3991 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.20.0" version, but no target version was set.

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label May 27, 2025
@openshift-ci openshift-ci bot requested review from andfasano and neisw May 27, 2025 02:51
@openshift-ci-robot openshift-ci-robot added the rehearsals-ack Signifies that rehearsal jobs have been acknowledged label May 27, 2025
@openshift-ci-robot openshift-ci-robot removed the rehearsals-ack Signifies that rehearsal jobs have been acknowledged label May 27, 2025
@jianlinliu
Copy link
Contributor Author

/test ci-operator-config

@jianlinliu
Copy link
Contributor Author

/pj-rehearse pull-ci-openshift-installer-release-4.20-e2e-aws-private

@openshift-ci-robot
Copy link
Contributor

@jianlinliu: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@jianlinliu
Copy link
Contributor Author

/pj-rehearse pull-ci-openshift-installer-release-4.20-e2e-aws-private

@openshift-ci-robot
Copy link
Contributor

@jianlinliu: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@jianlinliu
Copy link
Contributor Author

jianlinliu commented May 27, 2025

cc @tthvo and @yunjiang29 to review.

For the failure in the e2e job, sounds like some e2e test cases are not applicable for private clusters, or those e2e cases need to update to work with private clusters. I am not sure if that is blocker to merge this PR.

Copy link
Member

@tthvo tthvo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the failure in the e2e job, sounds like some e2e test cases are not applicable for private clusters, or those e2e cases need to update to work with private clusters

Right, if I understand it correctly, some LoadBalancer Services created by tests are external so no subnets (i.e. all private) cannot support such Services. And the CCM throws error below.

And those services should have service.beta.kubernetes.io/aws-load-balancer-internal: true if private cluster.

2025-05-27T07:52:48.779302762Z I0527 07:52:48.779254       1 controller.go:401] Ensuring load balancer for service e2e-service-lb-test-tqvbf/service-test
2025-05-27T07:52:48.779352103Z I0527 07:52:48.779314       1 aws.go:2153] EnsureLoadBalancer(kubernetes, e2e-service-lb-test-tqvbf, service-test, us-west-2, , [{ TCP  80 {0 80 } 32066}], map[service.beta.kubernetes.io/aws-load-balancer-healthcheck-healthy-threshold:2 service.beta.kubernetes.io/aws-load-balancer-healthcheck-interval:8 service.beta.kubernetes.io/aws-load-balancer-healthcheck-unhealthy-threshold:3])
2025-05-27T07:52:48.779440555Z I0527 07:52:48.779406       1 event.go:389] "Event occurred" object="e2e-service-lb-test-tqvbf/service-test" fieldPath="" kind="Service" apiVersion="v1" type="Normal" reason="EnsuringLoadBalancer" message="Ensuring load balancer"
2025-05-27T07:52:49.297880635Z I0527 07:52:49.297825       1 aws.go:1666] Ignoring private subnet for public ELB "subnet-0d4e48fbd321b5601"
2025-05-27T07:52:49.297880635Z I0527 07:52:49.297847       1 aws.go:1666] Ignoring private subnet for public ELB "subnet-0b210adbabc637755"
2025-05-27T07:52:49.297880635Z I0527 07:52:49.297853       1 aws.go:1666] Ignoring private subnet for public ELB "subnet-0bec46812ac6747b5"
2025-05-27T07:52:49.297909266Z E0527 07:52:49.297896       1 controller.go:301] "Unhandled Error" err="error processing service e2e-service-lb-test-tqvbf/service-test (retrying with exponential backoff): failed to ensure load balancer: could not find any suitable subnets for creating the ELB" logger="UnhandledError"
2025-05-27T07:52:49.297976907Z I0527 07:52:49.297936       1 event.go:389] "Event occurred" object="e2e-service-lb-test-tqvbf/service-test" fieldPath="" kind="Service" apiVersion="v1" type="Warning" reason="SyncLoadBalancerFailed" message="Error syncing load balancer: failed to ensure load balancer: could not find any suitable subnets for creating the ELB"
2025-05-27T07:57:20.722143960Z I0527 07:57:20.721314       1 controller.go:958] Removing finalizer from service e2e-deployment-8004/test-rolling-update-with-lb

Though, this looks good to me! Do we plan to have this for 4.19 too?

@jianlinliu
Copy link
Contributor Author

Right, if I understand it correctly, some LoadBalancer Services created by tests are external so no subnets (i.e. all private) cannot support such Services. And the CCM throws error below.

And those services should have service.beta.kubernetes.io/aws-load-balancer-internal: true if private cluster.

Thanks for your analysis, this reminds me we have a similar step aws-provision-tags-for-byo-vpc, let me add it and try again to check if that would make the results look better.

Though, this looks good to me! Do we plan to have this for 4.19 too?

Yeah, that was the original plan. But during #64371, @jinyunma found some e2e bugs, and fixed them in openshift/origin#29759, while the PR was landed into 4.20, did not backport to 4.19 yet, so the plan get changed, I am going to drop 4.19, do you have any advice ?

@openshift-ci-robot
Copy link
Contributor

@jianlinliu, pj-rehearse: unable to determine affected jobs. This could be due to a branch that needs to be rebased. ERROR:

could not load configuration from base revision of release repo: could not checkout worktree: '[git checkout 08c2da5bc951eb940154e995159c2ace8d2802f3]' failed with out:  and error exec: Stdout already set
Interacting with pj-rehearse

Comment: /pj-rehearse to run up to 5 rehearsals
Comment: /pj-rehearse skip to opt-out of rehearsals
Comment: /pj-rehearse {test-name}, with each test separated by a space, to run one or more specific rehearsals
Comment: /pj-rehearse more to run up to 10 rehearsals
Comment: /pj-rehearse max to run up to 25 rehearsals
Comment: /pj-rehearse auto-ack to run up to 5 rehearsals, and add the rehearsals-ack label on success
Comment: /pj-rehearse list to get an up-to-date list of affected jobs
Comment: /pj-rehearse abort to abort all active rehearsals
Comment: /pj-rehearse network-access-allowed to allow rehearsals of tests that have the restrict_network_access field set to false. This must be executed by an openshift org member who is not the PR author

Once you are satisfied with the results of the rehearsals, comment: /pj-rehearse ack to unblock merge. When the rehearsals-ack label is present on your PR, merge will no longer be blocked by rehearsals.
If you would like the rehearsals-ack label removed, comment: /pj-rehearse reject to re-block merging.

@jianlinliu
Copy link
Contributor Author

/pj-rehearse pull-ci-openshift-installer-release-4.20-e2e-aws-private

@openshift-ci-robot
Copy link
Contributor

@jianlinliu: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@tthvo
Copy link
Member

tthvo commented May 28, 2025

Yeah, that was the original plan. But during #64371, @jinyunma found some e2e bugs, and fixed them in openshift/origin#29759, while the PR was landed into 4.20, did not backport to 4.19 yet, so the plan get changed, I am going to drop 4.19, do you have any advice ?

Oh right, it makes sense to hold off these jobs for 4.19 if openshift/origin#29759 is not yet backported. I guess we can revisit this when any bugs related to private clusters are found for 4.19 and we need testing.

@yunjiang29
Copy link
Contributor

LGTM

@openshift-ci
Copy link
Contributor

openshift-ci bot commented May 28, 2025

@jianlinliu: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/rehearse/openshift/installer/release-4.20/e2e-aws-private ce057b1 link unknown /pj-rehearse pull-ci-openshift-installer-release-4.20-e2e-aws-private

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@jianlinliu
Copy link
Contributor Author

this reminds me we have a similar step aws-provision-tags-for-byo-vpc, let me add it and try again to check if that would make the results look better.

@tthvo After added aws-provision-tags-for-byo-vpc, the test result looks the same as before.

And from your analysis, ccm is expecting public subnets, so the added steps did not help that.

I think we can move forwards to get this merge, because the new job is optional, the e2e failure does not block any PR merge. Of course, if in the future, if we need the whole e2e test suite get passed, we can file jira issues to the individual component team to future enhancement.

@jianlinliu
Copy link
Contributor Author

/pj-rehearse ack

@openshift-ci-robot
Copy link
Contributor

@jianlinliu: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@openshift-ci-robot openshift-ci-robot added the rehearsals-ack Signifies that rehearsal jobs have been acknowledged label May 28, 2025
Copy link
Member

@tthvo tthvo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

Looks great!

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label May 28, 2025
@jianlinliu
Copy link
Contributor Author

@vrutkovs do you mind to review and approve this PR ?

@openshift-ci
Copy link
Contributor

openshift-ci bot commented May 29, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jianlinliu, tthvo, vrutkovs

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 29, 2025
@openshift-merge-bot openshift-merge-bot bot merged commit 18defac into openshift:master May 29, 2025
17 of 18 checks passed
abraham2512 pushed a commit to abraham2512/release that referenced this pull request May 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. rehearsals-ack Signifies that rehearsal jobs have been acknowledged

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants