Skip to content

fix(smoke): treat UPDATE_ROLLBACK_COMPLETE as needs-recreate#329

Merged
chrisns merged 1 commit into
mainfrom
fix/smoke-nuke-update-rollback-complete
May 20, 2026
Merged

fix(smoke): treat UPDATE_ROLLBACK_COMPLETE as needs-recreate#329
chrisns merged 1 commit into
mainfrom
fix/smoke-nuke-update-rollback-complete

Conversation

@chrisns
Copy link
Copy Markdown
Member

@chrisns chrisns commented May 20, 2026

Summary

Post-merge smoke on main found `all-demo` in `UPDATE_ROLLBACK_COMPLETE` and called `use_canonical` (because CFN technically accepts updates from that state). The subsequent CFN deploy then failed:

```
StorageFileSystem61EA7B3D DELETE_FAILED:
File system 'fs-...' has data pending export to S3.
Use forceDelete=true to force delete without exporting pending data.
```

The leaf update failed, CFN's internal rollback then tried to delete the `AWS::S3Files::FileSystem` — whose CFN handler does not pass `forceDelete=true`. Update fails. The umbrella ends up wedged again.

Fix

Mirror the `ROLLBACK_COMPLETE` branch: nuke from `UPDATE_ROLLBACK_COMPLETE` and let the loop re-evaluate. Next iteration hits `DOES_NOT_EXIST` → `use_canonical` → full resource sweep (which does call `s3files delete-file-system --force-delete`), then fresh deploy.

Costs an extra ~60 min recreate cycle each time CFN rolled back; benefit is the umbrella self-recovers from S3Files-stuck rollbacks rather than needing human intervention.

Test plan

  • `bash -n` clean
  • Post-merge smoke on main reaches `success` on the half-deleted umbrella

…loyable

Post-merge smoke on main found all-demo in UPDATE_ROLLBACK_COMPLETE,
called use_canonical, and the subsequent CFN update failed because the
internal rollback (triggered by a leaf failure) tried to delete the
StorageFileSystem61EA7B3D — which has pending S3 export data and needs
forceDelete=true. The AWS::S3Files::FileSystem CFN handler does not
pass forceDelete, so the rollback fails and the deploy reports
"Failed to create/update the stack".

CFN technically accepts updates from UPDATE_ROLLBACK_COMPLETE, but for
the all-demo umbrella that state always hides this kind of half-cleaned
S3Files / nested-stack debris. Safer to mirror the ROLLBACK_COMPLETE
branch: delete-stack, wait, retain-on-DELETE_FAILED, then `continue`
the loop so the next iteration hits DOES_NOT_EXIST → use_canonical →
full resource sweep (which DOES force-delete file systems).

Cost: one extra ~60m recreate cycle when CFN rolled back. Benefit: the
umbrella self-recovers from S3Files-stuck rollbacks instead of needing
human cleanup.
@chrisns chrisns added this pull request to the merge queue May 20, 2026
Merged via the queue into main with commit 1d91b00 May 20, 2026
9 of 11 checks passed
@chrisns chrisns deleted the fix/smoke-nuke-update-rollback-complete branch May 20, 2026 08:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant