Skip to content

feat(api): expand /health API to include database and Redis status#3826

Merged
brian-hussey merged 14 commits intoIBM:mainfrom
PawelK2012:expand-health-api
Apr 16, 2026
Merged

feat(api): expand /health API to include database and Redis status#3826
brian-hussey merged 14 commits intoIBM:mainfrom
PawelK2012:expand-health-api

Conversation

@PawelK2012
Copy link
Copy Markdown
Contributor

@PawelK2012 PawelK2012 commented Mar 24, 2026

🔗 Related Issue

Closes # #3817


📝 Summary

Goal: The /health API currently provides a basic health signal by validating Postgres connectivity and returning a generic healthy response. To improve observability and support more granular health monitoring, the endpoint should be expanded to include explicit status checks for additional infrastructure dependencies.
This change will enhance the /health response to report the health of both Postgres and Redis, while continuing to expose a clear overall service health indicator

Service healthy

Screenshot 2026-03-24 at 14 28 03

Response headers with _apply_runtime_mode_headers(response: Response)

Screenshot 2026-03-24 at 14 39 39

Redis is not configured

Screenshot 2026-03-24 at 14 50 15

Redis configured but unhealthy

Screenshot 2026-03-24 at 14 26 08

🏷️ Type of Change

  • Bug fix
  • Feature / Enhancement
  • Documentation
  • Refactor
  • Chore (deps, CI, tooling)
  • Other (describe below)

🧪 Verification

Check Command Status
Lint suite make lint
Unit tests make test
Coverage ≥ 80% make coverage

✅ Checklist

  • Code formatted (make black isort pre-commit)
  • Tests added/updated for changes
  • Documentation updated (if applicable)
  • No secrets or credentials committed

📓 Notes (optional)

Screenshots, design decisions, or additional context.

Copy link
Copy Markdown
Member

@brian-hussey brian-hussey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @PawelK2012 thank you for the submission. A few minor changes requested for security, maintainability point of view.

Overall very good PR, with test coverage, clear separation of concerns between the cache details and database details and the solution maintains backward compatibility with existing health check behaviour.

If you can address the requested changes that would be great, but if you can't please let me know and I'll get them taken care of in order to get this one merged in.

Thanks, Brian

Comment thread mcpgateway/main.py Outdated
Comment thread mcpgateway/main.py Outdated
Comment thread mcpgateway/main.py
Comment thread mcpgateway/schemas.py Outdated
Comment thread mcpgateway/schemas.py Outdated
Comment thread mcpgateway/schemas.py Outdated
Comment thread mcpgateway/main.py Outdated
Comment thread mcpgateway/main.py Outdated
@brian-hussey brian-hussey self-assigned this Mar 27, 2026
@crivetimihai crivetimihai changed the title Expand /health API to Include Database and Redis Status feat(api): expand /health API to include database and Redis status Mar 29, 2026
@crivetimihai crivetimihai added enhancement New feature or request COULD P3: Nice-to-have features with minimal impact if left out; included if time permits api REST API Related item labels Mar 29, 2026
@crivetimihai crivetimihai added this to the Release 1.1.0 milestone Mar 29, 2026
@crivetimihai
Copy link
Copy Markdown
Member

Thanks for the contribution @PawelK2012 — welcome to the project! 🎉 This is a thoughtful improvement and I appreciate the effort. Let me share some context on what we already have and a few suggestions:

What already exists:

  • The current /health endpoint (in main.py) already checks database connectivity by running SELECT 1 and returns {"status": "healthy/unhealthy", "mcp_runtime": {...}}. So database health is already covered.
  • We also have /ready (readiness check) and /health/security (admin-only security audit) as separate endpoints.

So the main value-add from your PR is the Redis health check and the structured response schema — both of which are useful!

A few things to consider:

  1. Latency on the liveness path — Kubernetes liveness probes typically hit /health. Adding a Redis ping() call means every liveness check now depends on Redis being responsive. If Redis is slow or temporarily unreachable, the liveness probe could fail and restart the pod — even though the application itself is fine. A common pattern is to keep /health (liveness) lightweight (just DB) and put deeper dependency checks in /ready (readiness). Would you consider adding Redis status to /ready instead, or creating a new /health/detailed endpoint?

  2. Backwards compatibility — The response shape changes from {"status": "healthy"} to {"status": "healthy", "statusItems": [...]}. Existing monitoring dashboards or scripts that parse the response might break. One option: keep the existing fields and add statusItems as an additional field, so existing consumers aren't affected.

  3. Small typo — "Databse" appears twice in the code (should be "Database") — easy fix!

  4. DCO requirement — All commits need a Signed-off-by line. You can add it with: git commit --amend -s && git push --force-with-lease.

This is a good direction — I'd love to see it land with the adjustments above. Let me know if you have questions or want to discuss the approach!

@PawelK2012
Copy link
Copy Markdown
Contributor Author

Thanks for the contribution @PawelK2012 — welcome to the project! 🎉 This is a thoughtful improvement and I appreciate the effort. Let me share some context on what we already have and a few suggestions:

What already exists:

  • The current /health endpoint (in main.py) already checks database connectivity by running SELECT 1 and returns {"status": "healthy/unhealthy", "mcp_runtime": {...}}. So database health is already covered.
  • We also have /ready (readiness check) and /health/security (admin-only security audit) as separate endpoints.

So the main value-add from your PR is the Redis health check and the structured response schema — both of which are useful!

A few things to consider:

  1. Latency on the liveness path — Kubernetes liveness probes typically hit /health. Adding a Redis ping() call means every liveness check now depends on Redis being responsive. If Redis is slow or temporarily unreachable, the liveness probe could fail and restart the pod — even though the application itself is fine. A common pattern is to keep /health (liveness) lightweight (just DB) and put deeper dependency checks in /ready (readiness). Would you consider adding Redis status to /ready instead, or creating a new /health/detailed endpoint?
  2. Backwards compatibility — The response shape changes from {"status": "healthy"} to {"status": "healthy", "statusItems": [...]}. Existing monitoring dashboards or scripts that parse the response might break. One option: keep the existing fields and add statusItems as an additional field, so existing consumers aren't affected.
  3. Small typo — "Databse" appears twice in the code (should be "Database") — easy fix!
  4. DCO requirement — All commits need a Signed-off-by line. You can add it with: git commit --amend -s && git push --force-with-lease.

This is a good direction — I'd love to see it land with the adjustments above. Let me know if you have questions or want to discuss the approach!

Hi @crivetimihai , thank you for feedback!

I implemented changes you suggested:

  1. Functionality moved under/ready API
  2. This is payload /ready is returning
  • the existing contract is preserved
  • new detail is additive only
  • existing consumers are unaffected
{
  "status": "ready",
  "status_items": [
    {
      "name": "Database",
      "status_code": 200,
      "message": "Database Connection Successful"
    },
    {
      "name": "Cache",
      "status_code": 200,
      "message": "Cache Connection Successful"
    }
  ],
  "mcp_runtime": {
    "mode": "python",
    "mounted": "python",
    "rust_build_included": false,
    "rust_runtime_enabled": false,
    "session_core_mode": "python",
    "event_store_mode": "python",
    "resume_core_mode": "python",
    "live_stream_core_mode": "python",
    "affinity_core_mode": "python",
    "session_auth_reuse_mode": "python",
    "rust_session_core_enabled": false,
    "rust_event_store_enabled": false,
    "rust_resume_core_enabled": false,
    "rust_live_stream_core_enabled": false,
    "rust_affinity_core_enabled": false,
    "rust_session_auth_reuse_enabled": false
  }
}
  1. Fixed
  2. Commits are signed

@brian-hussey @crivetimihai please let me know if any other changes are required.

@PawelK2012 PawelK2012 marked this pull request as ready for review April 1, 2026 10:02
@jonpspri
Copy link
Copy Markdown
Collaborator

jonpspri commented Apr 9, 2026

@brian-hussey Where did we land on this?

@brian-hussey
Copy link
Copy Markdown
Member

@jonpspri This is still in my queue to review and get in. I'll be back to it shortly to get it in.

brian-hussey
brian-hussey previously approved these changes Apr 16, 2026
Copy link
Copy Markdown
Member

@brian-hussey brian-hussey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@PawelK2012 Thanks for correcting based on all the feedback. This is looking good now, I'll add it to our merge queue now and get it merged in shortly.

mrpawelgawel and others added 2 commits April 16, 2026 18:34
Signed-off-by: PawelK2012 <Kaim.paul@gmail.com>
Signed-off-by: Brian Hussey <brian.hussey@ie.ibm.com>
Signed-off-by: PawelK2012 <Kaim.paul@gmail.com>
Signed-off-by: Brian Hussey <brian.hussey@ie.ibm.com>
PawelK2012 and others added 11 commits April 16, 2026 18:34
Signed-off-by: PawelK2012 <Kaim.paul@gmail.com>
Signed-off-by: Brian Hussey <brian.hussey@ie.ibm.com>
Signed-off-by: PawelK2012 <Kaim.paul@gmail.com>
Signed-off-by: Brian Hussey <brian.hussey@ie.ibm.com>
Signed-off-by: PawelK2012 <Kaim.paul@gmail.com>
Signed-off-by: Brian Hussey <brian.hussey@ie.ibm.com>
Signed-off-by: PawelK2012 <Kaim.paul@gmail.com>
Signed-off-by: Brian Hussey <brian.hussey@ie.ibm.com>
Signed-off-by: PawelK2012 <Kaim.paul@gmail.com>
Signed-off-by: Brian Hussey <brian.hussey@ie.ibm.com>
Signed-off-by: PawelK2012 <Kaim.paul@gmail.com>
Signed-off-by: Brian Hussey <brian.hussey@ie.ibm.com>
Signed-off-by: PawelK2012 <Kaim.paul@gmail.com>
Signed-off-by: Brian Hussey <brian.hussey@ie.ibm.com>
Signed-off-by: PawelK2012 <Kaim.paul@gmail.com>
Signed-off-by: Brian Hussey <brian.hussey@ie.ibm.com>
Signed-off-by: PawelK2012 <Kaim.paul@gmail.com>
Signed-off-by: Brian Hussey <brian.hussey@ie.ibm.com>
Signed-off-by: PawelK2012 <Kaim.paul@gmail.com>
Signed-off-by: Brian Hussey <brian.hussey@ie.ibm.com>
Signed-off-by: Brian Hussey <brian.hussey@ie.ibm.com>
Signed-off-by: Brian Hussey <brian.hussey@ie.ibm.com>
@brian-hussey brian-hussey merged commit 708c272 into IBM:main Apr 16, 2026
30 checks passed
@PawelK2012
Copy link
Copy Markdown
Contributor Author

@brian-hussey thank you for review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api REST API Related item COULD P3: Nice-to-have features with minimal impact if left out; included if time permits enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants