-
Notifications
You must be signed in to change notification settings - Fork 5
New security group for Postgres access to registration db #1636
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
144c117 to
6174a3e
Compare
dd6fa8a to
f286f55
Compare
Co-authored-by: Akash <akash1810@users.noreply.github.com>
f286f55 to
6dc5ffa
Compare
Co-authored-by: Akash <akash1810@users.noreply.github.com>
Co-authored-by: Julia <JuliaBrigitte@users.noreply.github.com>
6dc5ffa to
e462e3e
Compare
|
|
||
| def healthCheck: Action[AnyContent] = Action.async { | ||
| // Check if we can talk to the registration database | ||
| registrar.dbHealthCheck() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently I think this is going to make every instance query the DB for every healthcheck request from the load balancer (and also whenever anyone on the public internet sends a request to /healthcheck, which doesn't seem to require authentication). This might not be desirable.
It might also have unintended consequences, for example if the DB becomes unavailable briefly then all instances will cycle at once, which could worsen an outage.
If the goal is to check DB connectivity, can we just check from when an instance launches until connectivity is established and then stop checking?
Alternatively, if this has served its purpose in terms of testing this PR then we could just fallback to the original healthcheck now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might also have unintended consequences, for example if the DB becomes unavailable briefly then all instances will cycle at once, which could worsen an outage.
Ah, great point. I think every route hits the database, not sure if that changes things...
If the goal is to check DB connectivity, can we just check from when an instance launches until connectivity is established and then stop checking?
Interesting. As an in memory flag, or as part of the user-data?
Alternatively, if this has served its purpose in terms of testing this PR then we could just fallback to the original healthcheck now?
I think it would be good to have a way to confirm connectivity on PROD, at least once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, great point. I think every route hits the database, not sure if that changes things...
It's hard to be sure if this is a good idea without doing some testing and/or reviewing the error handling in detail. For example, if we currently serve 500s when we can't talk to the DB and we start serving 503s (due to 0 registered targets), then clients might behave unexpectedly.
It might actually end up being a desirable change in some scenarios (e.g. it might help to protect the DB from being overwhelmed) but I'd be reluctant to change something with potentially wide-ranging consequences when it's not really the goal of this PR.
Interesting. As an in memory flag, or as part of the user-data?
My initial thought would be just making it a lazy val in the class so that we check once when the class is initialised and then cache the value.
I think it would be good to have a way to confirm connectivity on PROD, at least once.
Fair enough; in that case I would probably go with the approach of checking we can connect once before we allow the instance to go healthy. This means that the old instances won't be terminated if something goes wrong, so users are protected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough; in that case I would probably go with the approach of checking we can connect once before we allow the instance to go healthy. This means that the old instances won't be terminated if something goes wrong, so users are protected.
Done in d894af9.
To avoid strain on the database, check connectivity only once on instance start (`lazy val`). We should see the log line exactly once per instance, regardless of how often the healthcheck endpoint is queried. Co-authored-by: Julia <JuliaBrigitte@users.noreply.github.com>
3cbbf5c to
d894af9
Compare
|
Confirming this deployed to PROD (we have three log lines as we have three instances):
|

What does this change?
Added new security group for access to the registration db. This is to eventually replace the VPC default group and increase least privilege database access. Previously any application could reach the database due to their use of the VPC default group1. Now an application needs to explicitly use the
DatabaseAccessSecurityGroup. For ease, an SSM parameter is also created to reference the group.How to test
The healthcheck has been updated to make a simple database request - if the healthcheck passes then we're able to connect to the database. Therefore, to test, deployment to CODE needs to succeed. To confirm we're only making the database query once, "Performing a query to check DB connectivity" should only appear once, regardless of how often the healthcheck request is made.
Co-authored by: @akash1810
Footnotes
They could be using the group for DB access, and other functionality. ↩