Enhance default readiness probe to check health status by parsing the payload (requires curl) #48
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Requirements
Related issues
See launchdarkly/ld-relay#259 on the main ld-relay project.
Describe the solution you've provided
The revised readiness probe will not put the relay into service (make it discoverable) until its /status payload contains the string "healthy", which is unique to the top level "status" property (per-environment states are connected/disconnected, never healthy). It will also remove the relay from service (make it no longer discoverable for net-new connections) after a single probe response that does not contain "healthy" (e.g., top-level status can be "degraded" under various conditions including if any one environment is not connected).
Describe alternatives you've considered
See launchdarkly/ld-relay#259 for a potential alternative.
Additional context
I've tested that this works if curl is included in the image. If curl isn't in the image, it will not work. I am unsure how to account for older relay images.
I haven't tested what happens to already-established SDK connections if the readiness state transitions from true to false (probe fails). I'm assuming they will not be interrupted as my understanding of the readiness state is that it only affects discoverability.
I am unsure if the version in chart.yaml should be incremented for this change, as only values.yaml has changed, not chart.yaml.
The validation check here will need to be changed to match the new readiness probe:
ld-relay-helm/test/deployment_test.go
Line 284 in dca08b0
I'm not sure how to represent the new probe definition in that test code.