Skip to content

Add healthchecks to ECS tasks#2022

Merged
whi-tw merged 1 commit into
mainfrom
whi-tw/add-container-healthcheck
Mar 19, 2026
Merged

Add healthchecks to ECS tasks#2022
whi-tw merged 1 commit into
mainfrom
whi-tw/add-container-healthcheck

Conversation

@whi-tw
Copy link
Copy Markdown
Contributor

@whi-tw whi-tw commented Mar 17, 2026

What problem does this pull request solve?

Trello card: https://trello.com/c/smia5MUD/834-configure-container-health-check-for-our-apps

Previously, we didn't have any health checks configured in ECS, only
at the load balancer. This meant that if the application inside the
container was unhealthy, the load balancer would stop sending traffic to
it, but ECS would keep running the task and not attempt to replace it.

Also, introducing the otel sidecar (which has a healthcheck configured)
caused the task to always appear with 'Unknown' health status in the ECS
console, which is confusing.

These health check values are cribbed from the review apps configuration
for each app, and should be suitable for our production workloads.

Finally, we exclude the health check configuration from the
task_container_definition output: we only consume this output when
creationg scheduled tasks / other copies of the task definition, and in
those cases we don't want to inherit the health check configuration
for the long-running service tasks. We could override the health check
configuration in those cases, but it's simpler to just not include it in
the output at all and require that any task that needs a health check
sets it explicitly.

Things to consider when reviewing

  • Ensure that you consider the wider context.
  • Does it work when run on your machine?
  • Is it clear what the code is doing?
  • Do the commit messages explain why the changes were made?
  • Are there all the unit tests needed?
  • Has all relevant documentation been updated?

Reminders

If you've made changes to the deployer role (files in modules/deployer-access):

  • Remember to run make <environment> forms/account apply on the relevant environments (dev, staging, user-research, and/or prod)
  • Check the #govuk-forms-deployment-notifications Slack channel to ensure the apply-forms-terraform-<environment> pipelines have run successfully

Previously, we didn't have any health checks configured in ECS, only
at the load balancer. This meant that if the application inside the
container was unhealthy, the load balancer would stop sending traffic to
it, but ECS would keep running the task and not attempt to replace it.

Also, introducing the otel sidecar (which has a healthcheck configured)
caused the task to always appear with 'Unknown' health status in the ECS
console, which is confusing.

These health check values are cribbed from the review apps configuration
for each app, and should be suitable for our production workloads.

Finally, we exclude the health check configuration from the
`task_container_definition` output: we only consume this output when
creationg scheduled tasks / other copies of the task definition, and in
those cases we don't want to inherit the health check configuration
for the long-running service tasks. We could override the health check
configuration in those cases, but it's simpler to just not include it in
the output at all and require that any task that needs a health check
sets it explicitly.
Copilot AI review requested due to automatic review settings March 17, 2026 10:48
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds ECS container-level health checks to the long-running Forms services so ECS can replace unhealthy tasks (and avoid “Unknown” task health when using the ADOT sidecar), implemented via a new healthcheck input on the shared ecs-service Terraform module.

Changes:

  • Add a healthcheck variable to infra/modules/ecs-service and wire it into the generated container definition.
  • Configure /up health checks for forms-admin, forms-runner, and forms-product-page ECS services.
  • Attempt to prevent health checks from being inherited by consumers of task_container_definition output (for scheduled/derived tasks).

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
infra/modules/forms-runner/main.tf Adds container health check config and centralizes container port in a local.
infra/modules/forms-product-page/main.tf Adds container health check config and centralizes container port in a local.
infra/modules/forms-admin/main.tf Adds container health check config and centralizes container port in a local.
infra/modules/ecs-service/variables.tf Introduces healthcheck module input type.
infra/modules/ecs-service/ecs.tf Adds healthCheck into the generated task container definition.
infra/modules/ecs-service/outputs.tf Modifies task_container_definition output to try to exclude health check config.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread infra/modules/ecs-service/ecs.tf
Comment thread infra/modules/ecs-service/variables.tf
Comment thread infra/modules/ecs-service/outputs.tf
Comment thread infra/modules/forms-runner/main.tf
Comment thread infra/modules/forms-product-page/main.tf
Comment thread infra/modules/forms-admin/main.tf
@whi-tw whi-tw requested a review from a team March 18, 2026 10:23
Copy link
Copy Markdown
Contributor

@cadmiumcat cadmiumcat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So glad to have this! thanks. And good call on making the output null

@whi-tw whi-tw added this pull request to the merge queue Mar 19, 2026
Merged via the queue into main with commit 513b849 Mar 19, 2026
24 checks passed
@whi-tw whi-tw deleted the whi-tw/add-container-healthcheck branch March 19, 2026 09:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants