Skip to content

fix(ecs): detect rollback after service stability wait#859

Open
ebihimself wants to merge 3 commits intoaws-actions:masterfrom
ebihimself:master
Open

fix(ecs): detect rollback after service stability wait#859
ebihimself wants to merge 3 commits intoaws-actions:masterfrom
ebihimself:master

Conversation

@ebihimself
Copy link
Copy Markdown

@ebihimself ebihimself commented Apr 20, 2026

Summary

Fixes #191 by validating the final ECS deployment after service stabilization.

Problem

The action currently waits for ECS service stability, but a failed deployment with deployment circuit breaker + rollback can still leave the service stable after ECS rolls back to the previous deployment.

Change

After waitUntilServicesStable(...), call DescribeServices and:

  • fail if the deployment for the expected task definition is FAILED
  • fail if the PRIMARY deployment task definition is not the expected task definition

Why

This ensures the action only succeeds when the intended deployment actually wins, rather than when ECS stabilizes via rollback.

@ebihimself ebihimself changed the title fix(ecs): detect rollback after service stability wait WIP: fix(ecs): detect rollback after service stability wait Apr 20, 2026
@ebihimself ebihimself changed the title WIP: fix(ecs): detect rollback after service stability wait fix(ecs): detect rollback after service stability wait Apr 20, 2026
@ebihimself ebihimself closed this Apr 20, 2026
@ebihimself ebihimself reopened this Apr 20, 2026
@NahutabDevelop
Copy link
Copy Markdown

Howdy @ebihimself , you're missing the changes for the /dist bundle. Could you add that to your pull request?

@ebihimself
Copy link
Copy Markdown
Author

ebihimself commented Apr 20, 2026

Hey @NahutabDevelop, wazap? 🤠 Pushed! 🚀

Comment thread index.js
// even if the service is "stable", ECS may have rolled back to the
// previous task definition. In that case, the PRIMARY deployment
// will not match the task definition we expected to promote.
if (primaryDeployment.taskDefinition !== expectedTaskDefArn) {
Copy link
Copy Markdown
Contributor

@s3cube s3cube Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a few parameters on the action for which this approach doesn't work. These are parameters called in the UpdateService call that trigger a deployment such as service-managed-ebs-volume or force-new-deployment

Basically, the task-definition would not change for these across deployments

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense! I will change the design. I think the safest way to check is to use the deployment ID.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there might be a cleaner way, let me confirm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

GitHub action should fail if deployment circuit breaker / rollback is triggered

3 participants