🐛 fix: fully bypass CloudFormation when -skip-cloudformation-creation is set#5999
🐛 fix: fully bypass CloudFormation when -skip-cloudformation-creation is set#5999SAY-5 wants to merge 2 commits into
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Welcome @SAY-5! |
|
Hi @SAY-5. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Regular contributors should join the org to skip this step. Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
1e59cf2 to
cb9841d
Compare
|
CLA pending, will link the commit email to the GitHub account shortly to clear EasyCLA. |
|
/ok-to-test |
|
I opened this issue, and I do actually have code for this you can grab from as neeed. I never opened the PR because I've been using it as a local cherry-pick patch, and haven't sat down to make it into a PR |
|
@raykrueger thanks, pulled the runtime |
|
/retest |
…s set Resolves issue 5944. ensureStackTags and newUserAccessKey ran unconditionally after the CF creation block, calling CloudFormation:DescribeStacks and IAM access key APIs that fail when the stack is in a different region or was deployed without BootstrapUser. Move both calls inside the existing guard. When skipping, derive BootstrapAccessKey from the calling session credentials so downstream consumers keep working. Also surface the previously discarded ListAccessKeys error instead of leaving keyOuts nil and panicking with a nil-pointer dereference. Signed-off-by: SAY-5 <saiasish.cnp@gmail.com> Signed-off-by: SAY-5 <say.apm35@gmail.com>
Per @raykrueger's branch fix/skip-cloudformation-credential-fallback: detect the bootstrap user at runtime with iamUserExists. The user can be present even when -skip-cloudformation-creation is set (deployed manually, re-run, etc.), so a flag-based fallback is too coarse. When the user is absent, reuse the calling session and encode credentials via encodeCredentialsFromSession so SessionToken is preserved for STS-backed principals (assumed roles, SSO, instance profiles). Co-authored-by: Ray Krueger <raykrueger@gmail.com> Signed-off-by: SAY-5 <saiasish.cnp@gmail.com>
4ab808a to
28482fa
Compare
|
/test ? |
|
/test pull-cluster-api-provider-aws-e2e-eks |
|
The eks job failed on a 40 minute provision timeout in the upgrade-policy spec (WaitForClusterToProvision), not in the bootstrap path this PR touches; the suite setup and the other 12 specs passed. Looks like an infra flake. /retest |
|
/test pull-cluster-api-provider-aws-e2e-eks |
|
@SAY-5: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
What type of PR is this?
/kind bug
What this PR does / why we need it:
-skip-cloudformation-creationonly guarded thecreateCloudFormationStackcall. Two follow-on calls assumed the stack existed in the current region:ensureStackTags(suite.go:155) callsCloudFormation:DescribeStacksand fails withValidationError: Stack ... does not existwhen the stack lives in a different region (CloudFormation is regional).newUserAccessKey(aws.go:877) callsIAM:ListAccessKeysfor the bootstrap user. If the existing stack was deployed withoutBootstrapUser.Enable = true, the user does not exist; the error fromListAccessKeyswas being silently discarded (keyOuts, _ := ...) and the next loop iteration nil-derefskeyOuts.AccessKeyMetadata.This PR moves both calls inside the existing
if !SkipCloudFormationCreationguard. When skipping, it derivesBootstrapAccessKeyfrom the calling session's resolved credentials so downstream consumers (encodeCredentials,BootstrapUserAWSSession,EnsureServiceQuotas, etc.) keep working. It also surfaces the previously discardedListAccessKeyserror so any future failure produces a real diagnostic instead of a nil-pointer panic.Which issue(s) this PR fixes:
Fixes #5944
Special notes for your reviewer:
The skip path assumes the calling session uses long-lived IAM credentials (which is the typical setup when the CF stack has been pre-deployed).
iamtypes.AccessKeyhas noSessionTokenfield, so STS temporary credentials would lose their token; this matches the behavior ofNewAWSSessionWithKeyalready in the file.AI Usage:
Claude Code (Sonnet/Opus) assisted with locating the affected call sites and drafting the patch. I reviewed every line before commit, validated the control flow against the issue reporter's repro, and confirmed
go build -tags=e2e ./test/e2e/shared/...andgo vetpass cleanly.Checklist:
Release note: