feat: Add health monitoring server with comprehensive system checks, closes #49 #303

ricardo-perello · 2025-09-04T00:54:33Z

feat: Add health monitoring server with comprehensive system checks, closes #49

Overview

This PR adds a comprehensive health monitoring server to rindexer that provides real-time monitoring of system components and services.

Features Added

Health Server Module

New health server module with HTTP endpoint at /health
Real-time monitoring of database, indexing, and sync status
Health configuration in manifest with default port 8080
Integration into all CLI start commands (indexer, graphql, all)

Health Checks Implemented

Database connectivity and health - Verifies PostgreSQL connection status
Indexing service status - Monitors active indexing tasks and service state
Data synchronization status - Checks PostgreSQL tables or CSV files for data presence
Overall system health - Provides timestamped health status with service breakdown

Dependencies Added

axum - HTTP server framework for the health endpoint

Configuration

Health server enabled by default in new project templates
Backward compatible - existing manifests without health configuration use sensible defaults
Parallel operation - runs alongside existing services without impacting core indexing functionality

Technical Details

Health Endpoint Response

{
  "status": "healthy",
  "timestamp": "2025-09-04T02:48:40.370651Z",
  "services": {
    "database": "healthy",
    "indexing": "healthy", 
    "sync": "healthy"
  },
  "indexing": {
    "active_tasks": 1,
    "is_running": true
  }
}

Health Server Architecture

Isolated postgres client - Health server creates its own database connection for monitoring without interfering with main indexer operations
Non-blocking operation - Runs in separate tokio task alongside GraphQL server
Graceful error handling - Continues operation even if health checks fail

Bug Fixes

Fixed database race condition - Resolved issue where health server was interfering with main indexer's database schema setup
Proper client isolation - Health server now only creates monitoring client without calling setup_postgres()

Examples Updated

Updated examples to include health server configuration
Health server runs on port 8080 by default
Accessible at http://localhost:8080/health

Testing

✅ Health server starts successfully alongside indexer
✅ Database connectivity monitoring works
✅ Indexing status monitoring works
✅ No interference with core indexing functionality
✅ Backward compatibility maintained

Closes

Health monitoring requirements #49

…loses joshstevens19#49 - Add new health server module with HTTP endpoint at /health - Implement health checks for database, indexing, and sync status - Add health configuration to manifest with default port 8080 - Integrate health server into all CLI start commands (indexer, graphql, all) - Add axum and port-killer dependencies for HTTP server functionality - Enable health server by default in new project templates - Update examples to include health server configuration The health server provides real-time monitoring of: - Database connectivity and health - Indexing service status and active tasks - Data synchronization status (PostgreSQL tables or CSV files) - Overall system health with timestamp All changes are backward compatible - existing manifests without health configuration will use sensible defaults. Health server runs in parallel with existing services without impacting core indexing functionality. Closes: Health monitoring requirements joshstevens19#49

vercel · 2025-09-04T00:54:37Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Preview	Comments	Updated (UTC)
rindexer-documentation	Ready	Preview	Comment	Sep 26, 2025 10:19am

joshstevens19

good PR just need some general cleaning up / making it more readable and a few other changes

cli/src/commands/start.rs

core/src/health.rs

core/src/manifest/health.rs

core/src/start.rs

refactor: Update health server configuration and checks - Remove the `enabled` field from `HealthOverrideSettings` in multiple locations. - Change `status` field in `HealthStatus` to use `HealthStatusType` enum for better type safety. - Implement detailed health checks for database, indexing, and sync services, returning appropriate health status. - Add port conflict checks between GraphQL and health servers to prevent runtime issues. - Update health server initialization to use the new health status checks and ensure it runs only when indexing is active. - Modify example configurations to reflect changes in health server settings. These changes enhance the health monitoring capabilities and ensure better integration with existing services.

- Move health configuration to global.health_override_port (always enabled) - Remove separate health manifest section and HealthSettings struct - Add port conflict validation between GraphQL and health servers - Refactor health_handler into smaller, more readable functions - Optimize database queries using query_one_or_none instead of COUNT(*) - Use initialize_database function for health server postgres client - Remove unnecessary comments and improve code organization - Update CLI to remove health field from manifest creation - Health server now follows indexer lifecycle (only runs when indexer is running) Addresses all feedback from PR review: - Port conflict checking between GraphQL and health servers - Simplified configuration with health always enabled - Better code structure and readability - Performance improvements for database queries - Proper integration with existing database initialization

ricardo-perello · 2025-09-14T14:33:42Z

Here's the PR text with a small explanation about the SQL query:

PR Review Response - All Feedback Addressed

Hi @joshstevens19! Thanks for the detailed feedback. I've addressed all the issues you highlighted. Here's what I've implemented:

Port Conflict Validation

Issue: Need to cross-check GraphQL vs health server ports to prevent conflicts
Solution: Added port conflict validation in start.rs that checks both ports before starting either service and bails out with a clear error message if they match.

Simplified Configuration Architecture

Issue: Health server configuration was clunky with separate manifest section
Solution:

Moved health configuration to global.health_override_port (always enabled by default)
Removed separate HealthSettings struct and health.rs manifest file
Health server now runs on port 8080 by default, configurable via global settings
Much cleaner and more intuitive configuration

Code Quality Improvements

Issue: Large health_handler function was hard to read
Solution: Refactored into smaller, focused functions:

build_health_status() - constructs the response object
check_database_health() - database connectivity check
check_indexing_health() - indexing service status
check_sync_health() - data synchronization status
determine_overall_status() - calculates overall health

Performance Optimizations

Issue: Database queries were inefficient
Solution:

Replaced COUNT(*) queries with query_one_or_none() for better performance
Uses EXISTS logic instead of counting all rows
Health server now uses initialize_database() function for proper client setup

Database Sync Health Check: The sync health check uses an optimized SQL query that checks for the existence of user data tables while filtering out system tables:

SELECT 1 FROM information_schema.tables 
WHERE table_schema NOT IN ('information_schema', 'pg_catalog', 'rindexer_internal') 
AND table_name NOT LIKE 'latest_block' 
AND table_name NOT LIKE '%_last_known_%' 
AND table_name NOT LIKE '%_last_run_%' 
LIMIT 1

This efficiently determines if any meaningful user data exists (returns 'healthy') or if the indexer hasn't synced any events yet (returns 'no_data').

Better Integration

Issue: Health server wasn't properly integrated with existing systems
Solution:

Health server follows indexer lifecycle (only runs when indexer is running)
Uses existing initialize_database() function instead of custom postgres setup
Proper error handling and graceful degradation
No interference with core indexing functionality

The health server is now production-ready with a clean, maintainable codebase that follows all the feedback you provided. All the architectural concerns have been addressed while maintaining the core functionality.

examples/rindexer_demo_cli/rindexer.yaml

core/src/health.rs

joshstevens19

2 small comments and also need to add docs about health etc

…nfiguration - Changed the health check query to use a raw string literal for better readability. - Updated the RPC endpoint in the rindexer demo CLI configuration to a new URL. These changes improve the clarity of the health check implementation and ensure the demo configuration points to a valid RPC endpoint.

- Added comprehensive health monitoring sections to AWS, GCP, and Railway deployment guides. - Included details on health server lifecycle, accessing health endpoints, health status types, and monitoring in production. - Updated CLI documentation to reflect automatic health server startup and configuration options. - Improved clarity and consistency in health monitoring information across all relevant documentation. These updates provide users with better insights into the health monitoring capabilities of rindexer and how to effectively utilize them in various deployment environments.

documentation/docs/pages/docs/start-building/running.mdx

- Add comprehensive health monitoring guide in start-building section - Simplify running.mdx by moving detailed health docs to dedicated page - Add health monitoring sections to all deployment guides (AWS, Railway, GCP) - Update CLI documentation with health server information - Fix health server lifecycle documentation for all start modes - Add detailed service health check explanations based on implementation - Update navigation to include health monitoring in start-building section - Remove standalone monitoring section in favor of integrated approach Improves onboarding experience by keeping running.mdx focused while providing comprehensive health monitoring documentation when needed.

- Corrected the link to the health monitoring documentation from the old path to the new path in the running.mdx file. - Ensures users are directed to the correct and updated health monitoring guide. This change improves the accuracy of documentation references, enhancing the user experience.

- added the health endpoint to the change log - Consolidated health details initialization in the start command to enhance clarity. - Simplified struct initialization for `HealthOverrideSettings` across multiple files. - Improved formatting and organization of code in health server and start modules for better maintainability.

vercel bot deployed to Preview September 4, 2025 00:55 View deployment

joshstevens19 requested changes Sep 4, 2025

View reviewed changes

ricardo-perello added 2 commits September 14, 2025 12:40

vercel bot deployed to Preview September 14, 2025 14:24 View deployment

ricardo-perello marked this pull request as ready for review September 14, 2025 14:33

ricardo-perello requested a review from joshstevens19 September 14, 2025 19:52

joshstevens19 reviewed Sep 23, 2025

View reviewed changes

examples/rindexer_demo_cli/rindexer.yaml Outdated Show resolved Hide resolved

joshstevens19 reviewed Sep 23, 2025

View reviewed changes

core/src/health.rs Outdated Show resolved Hide resolved

joshstevens19 requested changes Sep 23, 2025

View reviewed changes

ricardo-perello added 2 commits September 23, 2025 21:43

ricardo-perello requested a review from joshstevens19 September 23, 2025 20:09

vercel bot deployed to Preview September 23, 2025 20:10 View deployment

joshstevens19 reviewed Sep 24, 2025

View reviewed changes

documentation/docs/pages/docs/start-building/running.mdx Show resolved Hide resolved

vercel bot had a problem deploying to Preview September 25, 2025 08:48 Failure

vercel bot deployed to Preview September 25, 2025 09:01 View deployment

joshstevens19 approved these changes Sep 25, 2025

View reviewed changes

vercel bot deployed to Preview September 25, 2025 17:06 View deployment

style: clippy

36afdfb

vercel bot deployed to Preview September 26, 2025 10:19 View deployment

joshstevens19 merged commit e8ad16e into joshstevens19:master Sep 26, 2025
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Add health monitoring server with comprehensive system checks, closes #49 #303

feat: Add health monitoring server with comprehensive system checks, closes #49 #303

Uh oh!

ricardo-perello commented Sep 4, 2025

Uh oh!

vercel bot commented Sep 4, 2025 •

edited

Loading

Uh oh!

joshstevens19 left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ricardo-perello commented Sep 14, 2025

Uh oh!

Uh oh!

Uh oh!

joshstevens19 left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: Add health monitoring server with comprehensive system checks, closes #49 #303

feat: Add health monitoring server with comprehensive system checks, closes #49 #303

Uh oh!

Conversation

ricardo-perello commented Sep 4, 2025

feat: Add health monitoring server with comprehensive system checks, closes #49

Overview

Features Added

Health Server Module

Health Checks Implemented

Dependencies Added

Configuration

Technical Details

Health Endpoint Response

Health Server Architecture

Bug Fixes

Examples Updated

Testing

Closes

Uh oh!

vercel bot commented Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

joshstevens19 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ricardo-perello commented Sep 14, 2025

PR Review Response - All Feedback Addressed

Port Conflict Validation

Simplified Configuration Architecture

Code Quality Improvements

Performance Optimizations

Better Integration

Uh oh!

Uh oh!

Uh oh!

joshstevens19 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vercel bot commented Sep 4, 2025 •

edited

Loading