feat(ci): clean up stale stacks with global vitest setup hook by Hweinstock · Pull Request #1499 · aws/agentcore-cli

Hweinstock · 2026-06-09T23:35:13Z

Problem

Stacks are being orphaned by e2e tests, resulting in 1700+ deployed cloud-formation stacks. This wastes resources, and risks hitting resource limits.
#1493

Solution

Add a pre-test hook that cleans up stacks older than 3 hours.
hook is resilient to failures, throttling, and does not fail tests if it throws.
create a utils folder for this functionality so that it can be re-used.
move deleteCredentialProvider in to utilts.

Note: we use a global hook, instead of beforeAll on e2e-helper to ensure this runs ONCE per e2e test invoke, instead of ONCE per test suite invoked to reduce noisy API calls.

Testing

Ran this with the retry flag set very high to delete most of the existing old stacks.

Also verified this doesn't affect subsequent test runs by running with a single file:

Running: e2e-tests/byo-custom-jwt.test.ts

 RUN  v4.1.8 **/agentcore-cli

[global-setup]:starting global setup in region: us-east-1
[global-setup]:cleaning up stale stacks...
[global-setup:stack-cleanup]:listing stacks with cutoff=2026-06-10T10:21:01.520Z, prefix=AgentCore-E2e
(node:1515769) Warning: NodeVersionSupportWarning: The AWS SDK for JavaScript (v3)
versions published after the first week of January 2027
will require node >=22. You are running node v20.20.2.

To continue receiving updates to AWS services, bug fixes,
and security updates please upgrade to node >=22.

More information can be found at: https://a.co/c895JFp
(Use `node --trace-warnings ...` to show where the warning was created)
[global-setup:stack-cleanup]:found 0 stacks
[global-setup:stack-cleanup]:no stacks found!
[global-setup]:done cleaning up stacks after 0.108 seconds
[global-setup]:cleaning up stale credential providers...
(node:1515868) Warning: NodeVersionSupportWarning: The AWS SDK for JavaScript (v3)
versions published after the first week of January 2027
will require node >=22. You are running node v20.20.2.

To continue receiving updates to AWS services, bug fixes,
and security updates please upgrade to node >=22.

More information can be found at: https://a.co/c895JFp
(Use `node --trace-warnings ...` to show where the warning was created)

agentcore-devx-automation · 2026-06-09T23:35:57Z

Claude Security Review: no high-confidence findings. (run)

github-actions · 2026-06-09T23:36:07Z

Package Tarball

aws-agentcore-0.18.0.tgz

How to install

gh release download pr-1499-tarball --repo aws/agentcore-cli --pattern "*.tgz" --dir /tmp/pr-tarball
npm install -g /tmp/pr-tarball/aws-agentcore-0.18.0.tgz

agentcore-cli-automation

Nice safety net for the orphaned-stack problem. A few things worth addressing before merge — main concerns are around the production behavior diverging from what was actually tested manually, and unbounded parallelism against the CloudFormation API.

agentcore-cli-automation · 2026-06-09T23:38:19Z

+
+  const cfn = new CloudFormationClient({ region: REGION, maxAttempts: 10 });
+  try {
+    await cleanUpOldStacks(cfn);


The PR description says this was tested with the retry flag set very high to clean up the existing backlog, but the production call on line 118 invokes cleanUpOldStacks(cfn) with no options — so options?.retries is undefined and the retry block on line 100 never fires. As written, the hook is effectively single-shot.

If single-shot is intentional, that's fine but worth a comment so it doesn't drift. If you want retries in CI:

Pass an explicit retries (e.g. cleanUpOldStacks(cfn, { retries: 2 })), or

Default retries inside cleanUpOldStacks so the production path matches what was tested.

Either way the manually-tested configuration isn't what will run in CI on every e2e invocation, which is a gap worth closing.

this is intentional. If we retry, it will loop and take a significantly longer amount of time. This one shot is a best effort attempt of cleanup, and if an operator needs to, they can adjust the retries on local runs to make more powerful.

agentcore-cli-automation · 2026-06-09T23:38:19Z

+    const names = stacks.map(s => s.StackName!);
+
+    logger.info(`deleting ${names.length} stacks with names=${names.join(',')}`);
+    const results = await Promise.allSettled(names.map(name => deleteStackAndVerify(client, name)));


With no maxStacksDeleted cap passed from the production call site (line 118), listStacks returns every matching stack, and then Promise.allSettled(names.map(...)) (line 96) fans out a DeleteStackCommand plus a waitUntilStackDeleteComplete polling loop for every single one in parallel. Given the PR description mentions a 1700+ stack backlog, that's potentially 1700 concurrent waiters each calling DescribeStacks every 15s — this will hammer the CFN API and almost certainly hit throttling that even maxAttempts: 10 won't absorb cleanly. It will also drag out the e2e setup time meaningfully.

A couple of options:

Set a sensible default maxStacksDeleted (e.g. 50) so each CI run nibbles at the backlog instead of trying to drain it.

Process in batches of N concurrent deletes (e.g. with p-limit or a simple chunked loop) instead of unbounded Promise.allSettled.

Both — cap total per run and limit concurrency.

Option 3 gives the best behavior: bounded blast radius on the API and bounded time impact on e2e setup.

It does get throttled, this is what the retries were for. However, in practice, if this is running each time they shouldn't build up to the point where we get throttled.

agentcore-cli-automation · 2026-06-09T23:38:19Z

+  const startTime = Date.now();
+  try {
+    const result = await waitUntilStackDeleteComplete(
+      { client: cfn, maxWaitTime: 60 * 3, minDelay: 15 },


maxWaitTime: 60 * 3 is 180 seconds. AgentCore stacks contain ECR repos, IAM roles, log groups, CodeBuild projects, etc. Stack deletion under throttling (which is likely given the unbounded parallelism above) will frequently exceed 3 minutes, especially when many stacks are deleting concurrently and each waiter is also being throttled on its DescribeStacks polls.

When the waiter times out, deleteStackAndVerify returns false even though the underlying DeleteStackCommand was accepted by CFN — so the stack will still get deleted asynchronously, but this hook's accounting (deleted X of Y) and the recursive retry decision will be misled.

Suggestions:

Bump maxWaitTime to something like 60 * 10 or 60 * 15.

Or: skip the waiter entirely. The point of this hook is to issue DeleteStack calls; CFN will finish them asynchronously and the next CI run will clean up anything that didn't finish. That also dramatically reduces API call volume from this hook.

experimentally verified it took 30-45 seconds for existing stacks. If a stack takes longer than 3 minutes we skip it to avoid bloating the e2e test run with unnecessary clean up time.

agentcore-devx-automation · 2026-06-10T02:05:26Z

Claude Security Review: no high-confidence findings. (run)

agentcore-devx-automation · 2026-06-10T13:16:21Z

Claude Security Review: no high-confidence findings. (run)

agentcore-devx-automation · 2026-06-10T13:26:28Z

Claude Security Review: no high-confidence findings. (run)

aidandaly24 · 2026-06-10T13:52:18Z

+    bedrockCPClient.destroy();
+  }
+
+  logger.info(`setup finished in ${Date.now() - startTime / 1000} seconds`);


Operator precedence bug in the duration: / binds tighter than -, so this evaluates as Date.now() - (startTime / 1000) — it subtracts ~1.7M from Date.now() and logs a nonsense duration (~1.7 billion "seconds"). Should be:

logger.info(`setup finished in ${(Date.now() - startTime) / 1000} seconds`);

Log-only (no functional impact), but the two timing logs just above (lines 34 and the stack-cleanup one) already parenthesize correctly, so this one reads inconsistently.

good catch, lemme fix this.

agentcore-devx-automation · 2026-06-10T14:07:08Z

Claude Security Review: no high-confidence findings. (run)

feat(ci): clean up stale stacks with global vest setup hook

a650bda

Hweinstock had a problem deploying to e2e-testing June 9, 2026 23:35 — with GitHub Actions Error

github-actions Bot added size/m PR size: M agentcore-harness-reviewing AgentCore Harness review in progress labels Jun 9, 2026

agentcore-devx-automation Bot added the claude-security-reviewing Claude Code /security-review in progress label Jun 9, 2026

agentcore-devx-automation Bot removed the claude-security-reviewing Claude Code /security-review in progress label Jun 9, 2026

agentcore-cli-automation suggested changes Jun 9, 2026

View reviewed changes

github-actions Bot removed the agentcore-harness-reviewing AgentCore Harness review in progress label Jun 9, 2026

feat(e2e): pull out utils into a folder

4e416cc

github-actions Bot removed the size/m PR size: M label Jun 10, 2026

Hweinstock had a problem deploying to e2e-testing June 10, 2026 02:05 — with GitHub Actions Error

github-actions Bot added the size/m PR size: M label Jun 10, 2026

agentcore-devx-automation Bot added the claude-security-reviewing Claude Code /security-review in progress label Jun 10, 2026

agentcore-devx-automation Bot removed the claude-security-reviewing Claude Code /security-review in progress label Jun 10, 2026

Hweinstock changed the title ~~feat(ci): clean up stale stacks with global vest setup hook~~ feat(ci): clean up stale stacks with global vitest setup hook Jun 10, 2026

refactor(e2e): inject behavior for credential provider cleanup

7d7842a

github-actions Bot removed the size/m PR size: M label Jun 10, 2026

Hweinstock had a problem deploying to e2e-testing June 10, 2026 13:15 — with GitHub Actions Error

github-actions Bot added the size/m PR size: M label Jun 10, 2026

agentcore-devx-automation Bot added the claude-security-reviewing Claude Code /security-review in progress label Jun 10, 2026

agentcore-devx-automation Bot removed the claude-security-reviewing Claude Code /security-review in progress label Jun 10, 2026

feat(e2e): log total cleanup time

ddb8455

github-actions Bot added size/m PR size: M and removed size/m PR size: M labels Jun 10, 2026

agentcore-devx-automation Bot added the claude-security-reviewing Claude Code /security-review in progress label Jun 10, 2026

agentcore-devx-automation Bot removed the claude-security-reviewing Claude Code /security-review in progress label Jun 10, 2026

Hweinstock marked this pull request as ready for review June 10, 2026 13:39

Hweinstock requested a review from a team June 10, 2026 13:39

Hweinstock had a problem deploying to e2e-testing June 10, 2026 13:46 — with GitHub Actions Error

aidandaly24 reviewed Jun 10, 2026

View reviewed changes

fix(e2e): ensure seconds logged for setup are accurate

f010428

github-actions Bot added size/m PR size: M and removed size/m PR size: M labels Jun 10, 2026

agentcore-devx-automation Bot added the claude-security-reviewing Claude Code /security-review in progress label Jun 10, 2026

aidandaly24 approved these changes Jun 10, 2026

View reviewed changes

agentcore-devx-automation Bot removed the claude-security-reviewing Claude Code /security-review in progress label Jun 10, 2026

Hweinstock had a problem deploying to e2e-testing June 10, 2026 14:16 — with GitHub Actions Error

Hweinstock merged commit 9966e9d into aws:main Jun 10, 2026
29 of 30 checks passed

Hweinstock deleted the feat/avoid-stale-stacks branch June 10, 2026 15:13

Conversation

Hweinstock commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Solution

Testing

Uh oh!

agentcore-devx-automation Bot commented Jun 9, 2026

Uh oh!

github-actions Bot commented Jun 9, 2026

Package Tarball

How to install

Uh oh!

agentcore-cli-automation left a comment

Choose a reason for hiding this comment

Uh oh!

agentcore-cli-automation Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

Hweinstock Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

agentcore-cli-automation Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

Hweinstock Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

agentcore-cli-automation Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

Hweinstock Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

agentcore-devx-automation Bot commented Jun 10, 2026

Uh oh!

agentcore-devx-automation Bot commented Jun 10, 2026

Uh oh!

agentcore-devx-automation Bot commented Jun 10, 2026

Uh oh!

aidandaly24 Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

Hweinstock Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

agentcore-devx-automation Bot commented Jun 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Hweinstock commented Jun 9, 2026 •

edited

Loading

Hweinstock Jun 9, 2026 •

edited

Loading