-
Notifications
You must be signed in to change notification settings - Fork 832
Description
What happened:
The /readiness endpoint returns HTTP 200 OK even when MongoDB is down. It detects the failure ("database":"down" in response body) but returns the wrong status code. Traffic is routed to an unhealthy pod.
/status endpoint: It performs no health check at all and returns a hardcoded success status.
I have added a screenshot of the same.

What you expected to happen:
Both endpoints should return HTTP 503 Service Unavailable when the database is down. This ensures that unhealthy pods don't receive any traffic, until recovery.
Where can this issue be corrected?
chaoscenter/graphql/server/pkg/handlers/readiness_handler.go
chaoscenter/graphql/server/pkg/handlers/status_handler.go
How to reproduce it (as minimally and precisely as possible):
In a Kubernetes deployment:
# Stop all MongoDB instances
kubectl scale statefulset chaos-mongodb -n litmus --replicas=0
kubectl scale statefulset chaos-mongodb-arbiter -n litmus --replicas=0
# Test the endpoints
kubectl exec -it -n litmus <graphql-server-pod> -- curl -i http://localhost:8081/readiness
kubectl exec -it -n litmus <graphql-server-pod> -- curl -i http://localhost:8081/statusIn local development:
# Port-forward the service
kubectl port-forward svc/chaos-litmus-server-service -n litmus 8081:9002
# Stop MongoDB
kubectl scale statefulset chaos-mongodb -n litmus --replicas=0
kubectl scale statefulset chaos-mongodb-arbiter -n litmus --replicas=0
# Test the endpoints
curl -i http://localhost:8081/readiness
curl -i http://localhost:8081/statusAnything else we need to know?:
I am currently working on this issue and will open a pr shortly