Skip to content

skip-cloudformation-creation flag does not fully bypass CloudFormation stack dependency #5944

@raykrueger

Description

@raykrueger

/kind bug

What steps did you take and what happened:

Deploy the CAPA CloudFormation bootstrap stack in us-east-1, then run e2e tests in a different region with the skip flag:

AWS_REGION=us-east-2 make test-e2e \
  GINKGO_FOCUS="..." \
  E2E_ARGS="-skip-cloudformation-creation -skip-cloudformation-deletion"

The suite panics/fails in two places not guarded by -skip-cloudformation-creation:

1. ensureStackTagstest/e2e/shared/suite.go:155

Calls CloudFormation:DescribeStacks unconditionally after the creation block. Fails with:

ValidationError: Stack with id cluster-api-provider-aws-sigs-k8s-io does not exist

2. newUserAccessKeytest/e2e/shared/aws.go:883

This is related, but different. Our existing stack was deployed without the BootstrapUser.
This causes a nil-pointer panic rather than a handled error.

IAM:ListAccessKeys error is silently discarded (keyOuts, _ := ...), leaving keyOuts nil and causing a nil-pointer panic:

runtime error: invalid memory address or nil pointer dereference
    test/e2e/shared/aws.go:883

What did you expect to happen:

-skip-cloudformation-creation should bypass all operations that assume the CF stack exists in the current region, not just the stack creation call itself.

Anything else you would like to add:

CloudFormation:DescribeStacks is regional. A stack deployed in us-east-1 is not visible in us-east-2. IAM is global, so a bootstrap user created by the stack should exist cross-region, but the flag guards only the creation step, leaving two unconditional calls that require the stack to exist locally. Additionally, if the stack was deployed without BootstrapUser.Enable = true, the bootstrap user does not exist at all, making the newUserAccessKey failure inevitable regardless of region.

The silent error discard in newUserAccessKey (keyOuts, _ := ...) also independently deserves fixing. It turns a real API error into a confusing nil-pointer panic with no context.

Suggested fix: guard ensureStackTags and newUserAccessKey inside the existing if !e2eCtx.Settings.SkipCloudFormationCreation block, falling back to the caller's session credentials when skipping.

The ListAccessKeys nil pointer is more complicated. My cloudformation stack was deployed without the BootstrapUser. This is in a shared environment, so it's not something I can let e2e tests do dynamically.

My local work around was to use &creds and &iamtypes.AccessKey to populate the e2eCtx.Environment.BootstrapAccessKey.

Environment:

  • Cluster-api-provider-aws version: main
  • Kubernetes version: v1.32.0
  • OS: macOS

I fully admit to, and apologize for, violating the new AI Policy and having Claude open this Issue. I didn't see the policy till just now, an hour after opening this issue. I do stand by the issue. I had to work around this with local hacks in order to run the e2e tests. -Ray

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.needs-priorityneeds-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions