Skip to content

OCPBUGS-18027: explain use of multiple AWS security groups #89569

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: enterprise-4.15
Choose a base branch
from

Conversation

jeana-redhat
Copy link
Contributor

@jeana-redhat jeana-redhat commented Mar 4, 2025

Version(s):
4.12-4.15

Issue:
OCPBUGS-18027

Link to docs preview:
Sample YAML for a compute machine set custom resource on AWS:

QE review:

  • QE has approved this change.

Additional information:
Also fixes mixed up callouts

@jeana-redhat jeana-redhat added this to the Continuous Release milestone Mar 4, 2025
@openshift-ci-robot openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Mar 4, 2025
@openshift-ci-robot
Copy link

@jeana-redhat: This pull request references Jira Issue OCPBUGS-18027, which is invalid:

  • expected Jira Issue OCPBUGS-18027 to depend on a bug in one of the following states: VERIFIED, RELEASE PENDING, CLOSED (ERRATA), CLOSED (CURRENT RELEASE), CLOSED (DONE), CLOSED (DONE-ERRATA), but no dependents were found

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

Version(s):
4.12-4.15

Issue:
OCPBUGS-18027

Link to docs preview:

QE review:

  • QE has approved this change.

Additional information:
Also fixes mixed up callouts

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Mar 4, 2025
@jeana-redhat
Copy link
Contributor Author

jeana-redhat commented Mar 4, 2025

@damdo do you remember https://issues.redhat.com/browse/OCPBUGS-11524 from 2023? trying to get some of these bugs outta here this sprint 😅

(https://issues.redhat.com/browse/OCPBUGS-18027 is the corresponding docs bug)

Edit: something is messed up with the edge annotations here and is making the build fail. But the new content should be reviewable and I can fix the numbering tomorrow :)

Comment on lines +199 to +200
<7> Optional: Specify one or more additional security groups using the format shown.
When you specify multiple security groups, the rules are merged and applied to the instance as described in link:https://docs.aws.amazon.com/vpc/latest/userguide/security-group-rules.html[{aws-short} documentation about security group rules].
Refer to {aws-short} documentation for guidance on quotas and other limitations.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved some other things around to fix numbering, but this one is the new content

Comment on lines +131 to +134
- filters: <7>
- name: tag:Name
values:
- <optional_security_group>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New sample YAML

@jeana-redhat jeana-redhat force-pushed the OCPBUGS-18027-aws-mapi-multi-sec-grps branch 2 times, most recently from d585d90 to 99d2f19 Compare March 4, 2025 22:26
Copy link
Member

@damdo damdo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @jeana-redhat thanks for putting this together 👍

I had a look at the bug you linked.
Wouldn't this yield scenario 2 from the testing @sunzhaohua2 conducted here ?

If there are multiple groups and both have the kubernetes.io/cluster/<infra-id> tag, this is considered an error state and the CCM will not reconcile the security group rules

@sunzhaohua2 could you provide guidance on this?

@jeana-redhat
Copy link
Contributor Author

Hey @jeana-redhat thanks for putting this together 👍

I had a look at the bug you linked. Wouldn't this yield scenario 2 from the testing @sunzhaohua2 conducted here ?

If there are multiple groups and both have the kubernetes.io/cluster/<infra-id> tag, this is considered an error state and the CCM will not reconcile the security group rules

@sunzhaohua2 could you provide guidance on this?

Yeah, I am probably not understanding where that tag plays in. I thought by having the value <optional_security_group> different than <infrastructure_id>-worker-sg, that got around the issue, but I likely am just misunderstanding something about the config here :)

I see kubernetes.io/cluster/<infrastructure_id> is down in providerSpec.value.tags, but it's still impacting the SG setting I guess? Snipped out spec bits:

providerSpec:
        value:
          [...]
          securityGroups:
            - filters:
                - name: tag:Name
                  values:
                    - <infrastructure_id>-worker-sg
            - filters: 
                - name: tag:Name
                  values:
                    - <optional_security_group>
          [...]
          tags:
            - name: kubernetes.io/cluster/<infrastructure_id>
              value: owned
            - name: <custom_tag_name> 
              value: <custom_tag_value>

So do I need to somehow relate these things to each other in a way that only one SG gets the kubernetes.io/cluster/<infrastructure_id> tag? Not sure how exactly to make that happen 🤔

@jeana-redhat jeana-redhat force-pushed the OCPBUGS-18027-aws-mapi-multi-sec-grps branch from 99d2f19 to 1b9bfb7 Compare March 5, 2025 14:41
publicIp: true
endif::edge[]
tags:
- name: kubernetes.io/cluster/<infrastructure_id> <1>
tags: <9>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved the callout for tags to the top level to make it easier to talk about this section more generally

ifndef::edge[]
<8> Specify the infrastructure ID and zone.
endif::edge[]
ifdef::edge[]
<8> The ID of the public subnet that you created in AWS {zone-type}. You created this public subnet ID when you finished the procedure for "Creating a subnet in an AWS zone".
endif::edge[]
<9> Optional: Specify custom tag data for your cluster in addition to the existing `kubernetes.io/cluster/<infrastructure_id>` tag.
Copy link
Contributor Author

@jeana-redhat jeana-redhat Mar 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed this from

Optional: Specify custom tag data for your cluster.

to

Optional: Specify custom tag data for your cluster in addition to the existing `kubernetes.io/cluster/<infrastructure_id>` tag.

because I moved the callout from the custom tag name line to the tags stanza

@jeana-redhat
Copy link
Contributor Author

Refreshed the PR with fixes to the callouts so that it will build correctly

Copy link

openshift-ci bot commented Mar 5, 2025

@jeana-redhat: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@sunzhaohua2
Copy link

@jeana-redhat @damdo I will check this tomorrow

@jeana-redhat
Copy link
Contributor Author

@jeana-redhat @damdo I will check this tomorrow

Thanks! No rush :)

@sunzhaohua2
Copy link

Hi @jeana-redhat I checked this bug and I tested again on a 4.15 cluster, I can confirm Joel mentioned here are correct.
When you have multiple groups:

  • If there are multiple groups and neither has the Kubernetes.io/cluster/<infra-id> tag, this is considered an error state and the CCM will not reconcile the security group rules
  • If there are multiple groups and both have the Kubernetes.io/cluster/<infra-id> tag, this is considered an error state and the CCM will not reconcile the security group rules
  • If there are multiple groups and a single group has the Kubernetes.io/cluster/<infra-id> tag, this is considered correct and the tagged group is reconciled with the security group rules

But regarding the traffic issue mentioned by Dam, I am not sure whether we should mention that multiple SGs are supported in the doc, even though the machine creation was successful and CCM did not report any errors. @damdo, do you have any concerns about this?

And for the new versions, by default there are 2 SGs with tags in machineset, the tag is not Kubernetes.io/cluster/<infra-id> , is sigs.k8s.io/cluster-api-provider-aws/cluster/<infra-id>, I am not sure whether the conclusion mentioned by Joel above is still valid. I need some testing.

4.19.0-0.nightly-2025-03-05-160850 by default:

$ oc edit machineset zhsun-aws37-nrfd5-worker-us-east-2c
          securityGroups:
          - filters:
            - name: tag:Name
              values:
              - zhsun-aws37-nrfd5-node
          - filters:
            - name: tag:Name
              values:
              - zhsun-aws37-nrfd5-lb

for 4.15.0-0.nightly-2025-03-05-202732 by default:

$ oc edit machineset zhsun-aws15-5xnl9-worker-us-east-2b
          securityGroups:
          - filters:
            - name: tag:Name
              values:
              - zhsun-aws15-5xnl9-worker-sg

@jeana-redhat
Copy link
Contributor Author

Hey, thank you!

Can you post a spec that shows where these tags are?

And for the new versions, by default there are 2 SGs with tags in machineset, the tag is not Kubernetes.io/cluster/<infra-id> , is sigs.k8s.io/cluster-api-provider-aws/cluster/<infra-id>, I am not sure whether the conclusion mentioned by Joel above is still valid. I need some testing.

I ask because I see one in providerSpec.value.tags in the sample spec, but I am not sure how it relates to the SG fields.

@sunzhaohua2
Copy link

Hey, thank you!

Can you post a spec that shows where these tags are?

And for the new versions, by default there are 2 SGs with tags in machineset, the tag is not Kubernetes.io/cluster/<infra-id> , is sigs.k8s.io/cluster-api-provider-aws/cluster/<infra-id>, I am not sure whether the conclusion mentioned by Joel above is still valid. I need some testing.

I ask because I see one in providerSpec.value.tags in the sample spec, but I am not sure how it relates to the SG fields.

Hi @jeana-redhat I was checked from aws console, for the 2 SGs in machineset, I checked again zhsun-aws37-nrfd5-lb is lb sg, it has tag Kubernetes.io/cluster/<infra-id>
Screenshot 2025-03-11 at 15 40 43
last week I checked the wrong sg zhsun-aws37-nrfd5-node, it doesn't have tag Kubernetes.io/cluster/<infra-id>
Screenshot 2025-03-11 at 15 44 10

And I tested again on 4.19, the results are same with before. For this issue If there are multiple groups and both have the kubernetes.io/cluster/<infra-id> tag, this is considered an error state and the CCM will not reconcile the security group rules maybe we need to add a note to remind users not to add tag Kubernetes.io/cluster/<infra-id> when creating SG.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch/enterprise-4.12 branch/enterprise-4.13 branch/enterprise-4.14 branch/enterprise-4.15 jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants