Skip to content

Readiness and Status probes return HTTP 200 when Database is down #5414

@Maximus-08

Description

@Maximus-08

What happened:

The /readiness endpoint returns HTTP 200 OK even when MongoDB is down. It detects the failure ("database":"down" in response body) but returns the wrong status code. Traffic is routed to an unhealthy pod.

/status endpoint: It performs no health check at all and returns a hardcoded success status.

I have added a screenshot of the same.
Image

What you expected to happen:
Both endpoints should return HTTP 503 Service Unavailable when the database is down. This ensures that unhealthy pods don't receive any traffic, until recovery.

Where can this issue be corrected?

chaoscenter/graphql/server/pkg/handlers/readiness_handler.go
chaoscenter/graphql/server/pkg/handlers/status_handler.go

How to reproduce it (as minimally and precisely as possible):

In a Kubernetes deployment:

# Stop all MongoDB instances
kubectl scale statefulset chaos-mongodb -n litmus --replicas=0
kubectl scale statefulset chaos-mongodb-arbiter -n litmus --replicas=0

# Test the endpoints
kubectl exec -it -n litmus <graphql-server-pod> -- curl -i http://localhost:8081/readiness
kubectl exec -it -n litmus <graphql-server-pod> -- curl -i http://localhost:8081/status

In local development:

# Port-forward the service
kubectl port-forward svc/chaos-litmus-server-service -n litmus 8081:9002

# Stop MongoDB
kubectl scale statefulset chaos-mongodb -n litmus --replicas=0
kubectl scale statefulset chaos-mongodb-arbiter -n litmus --replicas=0

# Test the endpoints
curl -i http://localhost:8081/readiness
curl -i http://localhost:8081/status

Anything else we need to know?:
I am currently working on this issue and will open a pr shortly

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions