Skip to content

Conversation

@JuliaBrigitte
Copy link
Contributor

@JuliaBrigitte JuliaBrigitte commented Jan 21, 2026

What does this change?

Added new security group for access to the registration db. This is to eventually replace the VPC default group and increase least privilege database access. Previously any application could reach the database due to their use of the VPC default group1. Now an application needs to explicitly use the DatabaseAccessSecurityGroup. For ease, an SSM parameter is also created to reference the group.

How to test

The healthcheck has been updated to make a simple database request - if the healthcheck passes then we're able to connect to the database. Therefore, to test, deployment to CODE needs to succeed. To confirm we're only making the database query once, "Performing a query to check DB connectivity" should only appear once, regardless of how often the healthcheck request is made.


Co-authored by: @akash1810

Footnotes

  1. They could be using the group for DB access, and other functionality.

@JuliaBrigitte JuliaBrigitte requested a review from a team as a code owner January 21, 2026 18:16
@JuliaBrigitte JuliaBrigitte force-pushed the aajb/import-db-yaml-from-mobile-platform branch from 144c117 to 6174a3e Compare January 22, 2026 10:16
@github-actions
Copy link
Contributor

github-actions bot commented Jan 22, 2026

@github-actions
Copy link
Contributor

github-actions bot commented Jan 22, 2026

@github-actions
Copy link
Contributor

github-actions bot commented Jan 22, 2026

@github-actions
Copy link
Contributor

github-actions bot commented Jan 22, 2026

@github-actions
Copy link
Contributor

github-actions bot commented Jan 22, 2026

@github-actions
Copy link
Contributor

github-actions bot commented Jan 22, 2026

@JuliaBrigitte JuliaBrigitte marked this pull request as draft January 22, 2026 14:04
@akash1810 akash1810 force-pushed the aajb/import-db-yaml-from-mobile-platform branch 5 times, most recently from dd6fa8a to f286f55 Compare January 22, 2026 15:26
Co-authored-by: Akash <akash1810@users.noreply.github.com>
@akash1810 akash1810 force-pushed the aajb/import-db-yaml-from-mobile-platform branch from f286f55 to 6dc5ffa Compare January 22, 2026 15:34
JuliaBrigitte and others added 2 commits January 22, 2026 15:48
Co-authored-by: Akash <akash1810@users.noreply.github.com>
Co-authored-by: Julia <JuliaBrigitte@users.noreply.github.com>
@akash1810 akash1810 force-pushed the aajb/import-db-yaml-from-mobile-platform branch from 6dc5ffa to e462e3e Compare January 22, 2026 15:48
@akash1810 akash1810 marked this pull request as ready for review January 22, 2026 15:52

def healthCheck: Action[AnyContent] = Action.async {
// Check if we can talk to the registration database
registrar.dbHealthCheck()
Copy link
Contributor

@jacobwinch jacobwinch Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently I think this is going to make every instance query the DB for every healthcheck request from the load balancer (and also whenever anyone on the public internet sends a request to /healthcheck, which doesn't seem to require authentication). This might not be desirable.

It might also have unintended consequences, for example if the DB becomes unavailable briefly then all instances will cycle at once, which could worsen an outage.

If the goal is to check DB connectivity, can we just check from when an instance launches until connectivity is established and then stop checking?

Alternatively, if this has served its purpose in terms of testing this PR then we could just fallback to the original healthcheck now?

Copy link
Member

@akash1810 akash1810 Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might also have unintended consequences, for example if the DB becomes unavailable briefly then all instances will cycle at once, which could worsen an outage.

Ah, great point. I think every route hits the database, not sure if that changes things...

If the goal is to check DB connectivity, can we just check from when an instance launches until connectivity is established and then stop checking?

Interesting. As an in memory flag, or as part of the user-data?

Alternatively, if this has served its purpose in terms of testing this PR then we could just fallback to the original healthcheck now?

I think it would be good to have a way to confirm connectivity on PROD, at least once.

Copy link
Contributor

@jacobwinch jacobwinch Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, great point. I think every route hits the database, not sure if that changes things...

It's hard to be sure if this is a good idea without doing some testing and/or reviewing the error handling in detail. For example, if we currently serve 500s when we can't talk to the DB and we start serving 503s (due to 0 registered targets), then clients might behave unexpectedly.

It might actually end up being a desirable change in some scenarios (e.g. it might help to protect the DB from being overwhelmed) but I'd be reluctant to change something with potentially wide-ranging consequences when it's not really the goal of this PR.

Interesting. As an in memory flag, or as part of the user-data?

My initial thought would be just making it a lazy val in the class so that we check once when the class is initialised and then cache the value.

I think it would be good to have a way to confirm connectivity on PROD, at least once.

Fair enough; in that case I would probably go with the approach of checking we can connect once before we allow the instance to go healthy. This means that the old instances won't be terminated if something goes wrong, so users are protected.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough; in that case I would probably go with the approach of checking we can connect once before we allow the instance to go healthy. This means that the old instances won't be terminated if something goes wrong, so users are protected.

Done in d894af9.

To avoid strain on the database, check connectivity only once on instance start (`lazy val`).
We should see the log line exactly once per instance, regardless of how often the healthcheck endpoint is queried.

Co-authored-by: Julia <JuliaBrigitte@users.noreply.github.com>
@akash1810 akash1810 force-pushed the aajb/import-db-yaml-from-mobile-platform branch from 3cbbf5c to d894af9 Compare January 26, 2026 10:45
@JuliaBrigitte JuliaBrigitte merged commit 1e156eb into main Jan 26, 2026
9 checks passed
@JuliaBrigitte JuliaBrigitte deleted the aajb/import-db-yaml-from-mobile-platform branch January 26, 2026 12:12
@akash1810
Copy link
Member

Confirming this deployed to PROD (we have three log lines as we have three instances):

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants