Skip to content

🐛 Prevent access entries delete+recreate cycle on EKS reconciliation#6007

Open
raykrueger wants to merge 1 commit into
kubernetes-sigs:mainfrom
raykrueger:fix/6003-access-entries-recreate-cycle
Open

🐛 Prevent access entries delete+recreate cycle on EKS reconciliation#6007
raykrueger wants to merge 1 commit into
kubernetes-sigs:mainfrom
raykrueger:fix/6003-access-entries-recreate-cycle

Conversation

@raykrueger

@raykrueger raykrueger commented May 9, 2026

Copy link
Copy Markdown
Contributor

What type of PR is this?

/kind bug

What this PR does / why we need it:

When accessEntries with an unspecified type field are defined in AWSManagedControlPlane, the reconciler detects drift on every cycle because an empty Type is compared as "" against "STANDARD" (the EKS default) from the AWS API. This triggers a continuous delete-and-recreate loop every ~7s, with each deletion removing the access policy association and the subsequent recreation racing against the next reconcile before the policy can be re-associated.

This change:

  1. Normalizes empty Type to AccessEntryTypeStandard before comparison in updateAccessEntry, so unspecified types no longer appear as drift.
  2. Skips re-associating access policies that already match the desired scope to avoid unnecessary API calls.

Which issue(s) this PR fixes
Fixes #6003

Special notes for your reviewer:

The fix touches two functions in pkg/cloud/services/eks/accessentry.go:

  • updateAccessEntry — normalizes empty type before drift detection
  • reconcileAccessPolicies — skips already-matching policy associations; new helper policyScopeMatches compares desired vs existing scope

AI Usage:

This PR benefited from AI assistance (Qwen3.6-27B via Opencode) for:

  • Initial code exploration and understanding the drift detection logic
  • Drafting the PR description and release note
  • Reviewing case-sensitivity of the AccessEntryType.APIValue() comparison

Checklist:

  • squashed commits
  • includes documentation
  • includes AI generated content
  • includes emoji in title
  • adds unit tests
  • adds or updates e2e tests

Release note:

Fix infinite delete-and-recreate loop for EKS access entries when the type field is left unspecified (defaults to STANDARD).

When accessEntries with an unspecified type field are defined in
AWSManagedControlPlane, the reconciler detects drift on every cycle
because an empty Type is compared as "" against "STANDARD" (the EKS
default) from the AWS API. This triggers a continuous delete-and-recreate
loop every ~7s, with each deletion removing the access policy association
and the subsequent recreation racing against the next reconcile before
the policy can be re-associated.

Normalize empty Type to AccessEntryTypeStandard before comparison in
updateAccessEntry. Also skip re-associating access policies that already
match the desired scope to avoid unnecessary API calls.

Signed-off-by: Ray Krueger <raykrueger@gmail.com>
@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. labels May 9, 2026
@k8s-ci-robot k8s-ci-robot requested review from damdo and dlipovetsky May 9, 2026 00:09
@k8s-ci-robot k8s-ci-robot added needs-priority size/L Denotes a PR that changes 100-499 lines, ignoring generated files. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 9, 2026
@k8s-ci-robot

Copy link
Copy Markdown
Contributor

Hi @raykrueger. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Tip

We noticed you've done this a few times! Consider joining the org to skip this step and gain /lgtm and other bot rights. We recommend asking approvers on your previous PRs to sponsor you.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label May 9, 2026
@Ankitasw

Copy link
Copy Markdown
Member

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 11, 2026
@raykrueger

Copy link
Copy Markdown
Contributor Author

/retest

@richardcase

Copy link
Copy Markdown
Member

This looks good to me, thanks @raykrueger

@richardcase

Copy link
Copy Markdown
Member

/test ?

@richardcase

Copy link
Copy Markdown
Member

/test pull-cluster-api-provider-aws-e2e-eks

@richardcase

Copy link
Copy Markdown
Member

When the e2e passes feel free to unhold.

/hold

But from my side:

/approve

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 22, 2026
@k8s-ci-robot

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: richardcase

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 22, 2026
@k8s-ci-robot

Copy link
Copy Markdown
Contributor

@raykrueger: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-cluster-api-provider-aws-e2e-eks bf223da link false /test pull-cluster-api-provider-aws-e2e-eks

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@raykrueger

Copy link
Copy Markdown
Contributor Author

This is failing on the upgrade policy test. So we'll still need #5992

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. kind/bug Categorizes issue or PR as related to a bug. needs-priority ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

accessEntries reconciliation causes continuous delete+recreate cycle, breaking auth

4 participants