Skip to content

🐛 fix: fully bypass CloudFormation when -skip-cloudformation-creation is set#5999

Open
SAY-5 wants to merge 2 commits into
kubernetes-sigs:mainfrom
SAY-5:fix/skip-cloudformation-fully-5944
Open

🐛 fix: fully bypass CloudFormation when -skip-cloudformation-creation is set#5999
SAY-5 wants to merge 2 commits into
kubernetes-sigs:mainfrom
SAY-5:fix/skip-cloudformation-fully-5944

Conversation

@SAY-5

@SAY-5 SAY-5 commented May 3, 2026

Copy link
Copy Markdown

What type of PR is this?

/kind bug

What this PR does / why we need it:

-skip-cloudformation-creation only guarded the createCloudFormationStack call. Two follow-on calls assumed the stack existed in the current region:

  • ensureStackTags (suite.go:155) calls CloudFormation:DescribeStacks and fails with ValidationError: Stack ... does not exist when the stack lives in a different region (CloudFormation is regional).
  • newUserAccessKey (aws.go:877) calls IAM:ListAccessKeys for the bootstrap user. If the existing stack was deployed without BootstrapUser.Enable = true, the user does not exist; the error from ListAccessKeys was being silently discarded (keyOuts, _ := ...) and the next loop iteration nil-derefs keyOuts.AccessKeyMetadata.

This PR moves both calls inside the existing if !SkipCloudFormationCreation guard. When skipping, it derives BootstrapAccessKey from the calling session's resolved credentials so downstream consumers (encodeCredentials, BootstrapUserAWSSession, EnsureServiceQuotas, etc.) keep working. It also surfaces the previously discarded ListAccessKeys error so any future failure produces a real diagnostic instead of a nil-pointer panic.

Which issue(s) this PR fixes:
Fixes #5944

Special notes for your reviewer:

The skip path assumes the calling session uses long-lived IAM credentials (which is the typical setup when the CF stack has been pre-deployed). iamtypes.AccessKey has no SessionToken field, so STS temporary credentials would lose their token; this matches the behavior of NewAWSSessionWithKey already in the file.

AI Usage:

Claude Code (Sonnet/Opus) assisted with locating the affected call sites and drafting the patch. I reviewed every line before commit, validated the control flow against the issue reporter's repro, and confirmed go build -tags=e2e ./test/e2e/shared/... and go vet pass cleanly.

Checklist:

  • squashed commits
  • includes documentation
  • includes AI generated content
  • includes emoji in title
  • adds unit tests
  • adds or updates e2e tests

Release note:

Fix `-skip-cloudformation-creation` so it also bypasses the CloudFormation `DescribeStacks` tag check and the bootstrap user access-key creation, allowing e2e tests to run against a pre-deployed stack in a different region or one without `BootstrapUser` enabled. Also surface a previously discarded `ListAccessKeys` error that masked the real failure as a nil-pointer panic.

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. do-not-merge/invalid-commit-message Indicates that a PR should not merge because it has an invalid commit message. kind/bug Categorizes issue or PR as related to a bug. labels May 3, 2026
@linux-foundation-easycla

linux-foundation-easycla Bot commented May 3, 2026

Copy link
Copy Markdown

CLA Signed
The committers listed above are authorized under a signed CLA.

  • ✅ login: SAY-5 / name: Sai Asish Y (1ff67fe, 28482fa)
  • ✅ login: raykrueger / name: Ray Krueger (28482fa)

@k8s-ci-robot

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign neolit123 for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot requested a review from dlipovetsky May 3, 2026 06:56
@k8s-ci-robot k8s-ci-robot added the cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. label May 3, 2026
@k8s-ci-robot k8s-ci-robot requested a review from richardcase May 3, 2026 06:56
@k8s-ci-robot

Copy link
Copy Markdown
Contributor

Welcome @SAY-5!

It looks like this is your first PR to kubernetes-sigs/cluster-api-provider-aws 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/cluster-api-provider-aws has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 3, 2026
@k8s-ci-robot

Copy link
Copy Markdown
Contributor

Hi @SAY-5. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@SAY-5 SAY-5 force-pushed the fix/skip-cloudformation-fully-5944 branch from 1e59cf2 to cb9841d Compare May 3, 2026 07:32
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/invalid-commit-message Indicates that a PR should not merge because it has an invalid commit message. label May 3, 2026
@SAY-5

SAY-5 commented May 3, 2026

Copy link
Copy Markdown
Author

CLA pending, will link the commit email to the GitHub account shortly to clear EasyCLA.

@richardcase

Copy link
Copy Markdown
Member

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 6, 2026
@raykrueger

Copy link
Copy Markdown
Contributor

I opened this issue, and I do actually have code for this you can grab from as neeed.
https://github.com/raykrueger/cluster-api-provider-aws/tree/fix/skip-cloudformation-credential-fallback

I never opened the PR because I've been using it as a local cherry-pick patch, and haven't sat down to make it into a PR

@SAY-5

SAY-5 commented May 6, 2026

Copy link
Copy Markdown
Author

@raykrueger thanks, pulled the runtime iamUserExists probe and the SessionToken-preserving credential encode from your branch (credited in the commit message). Switched my flag-based fallback over to the runtime check so the path also works when -skip-cloudformation-creation is set but the bootstrap user happens to exist.

@richardcase

Copy link
Copy Markdown
Member

/retest

SAY-5 and others added 2 commits May 26, 2026 22:28
…s set

Resolves issue 5944.

ensureStackTags and newUserAccessKey ran unconditionally after the CF
creation block, calling CloudFormation:DescribeStacks and IAM access
key APIs that fail when the stack is in a different region or was
deployed without BootstrapUser. Move both calls inside the existing
guard. When skipping, derive BootstrapAccessKey from the calling
session credentials so downstream consumers keep working.

Also surface the previously discarded ListAccessKeys error instead of
leaving keyOuts nil and panicking with a nil-pointer dereference.

Signed-off-by: SAY-5 <saiasish.cnp@gmail.com>
Signed-off-by: SAY-5 <say.apm35@gmail.com>
Per @raykrueger's branch fix/skip-cloudformation-credential-fallback:
detect the bootstrap user at runtime with iamUserExists. The user can be
present even when -skip-cloudformation-creation is set (deployed manually,
re-run, etc.), so a flag-based fallback is too coarse. When the user is
absent, reuse the calling session and encode credentials via
encodeCredentialsFromSession so SessionToken is preserved for STS-backed
principals (assumed roles, SSO, instance profiles).

Co-authored-by: Ray Krueger <raykrueger@gmail.com>
Signed-off-by: SAY-5 <saiasish.cnp@gmail.com>
@SAY-5 SAY-5 force-pushed the fix/skip-cloudformation-fully-5944 branch from 4ab808a to 28482fa Compare May 27, 2026 05:29
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels May 27, 2026
@richardcase

Copy link
Copy Markdown
Member

/test ?

@richardcase

Copy link
Copy Markdown
Member

/test pull-cluster-api-provider-aws-e2e-eks
/test pull-cluster-api-provider-aws-e2e

@SAY-5

SAY-5 commented Jun 9, 2026

Copy link
Copy Markdown
Author

The eks job failed on a 40 minute provision timeout in the upgrade-policy spec (WaitForClusterToProvision), not in the bootstrap path this PR touches; the suite setup and the other 12 specs passed. Looks like an infra flake.

/retest

@damdo

damdo commented Jun 11, 2026

Copy link
Copy Markdown
Member

/test pull-cluster-api-provider-aws-e2e-eks

@k8s-ci-robot

Copy link
Copy Markdown
Contributor

@SAY-5: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-cluster-api-provider-aws-e2e-eks 28482fa link false /test pull-cluster-api-provider-aws-e2e-eks

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. needs-priority ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

skip-cloudformation-creation flag does not fully bypass CloudFormation stack dependency

5 participants