Skip to content

Conversation

@ricardo-perello
Copy link
Contributor

feat: Add health monitoring server with comprehensive system checks, closes #49

Overview

This PR adds a comprehensive health monitoring server to rindexer that provides real-time monitoring of system components and services.

Features Added

Health Server Module

  • New health server module with HTTP endpoint at /health
  • Real-time monitoring of database, indexing, and sync status
  • Health configuration in manifest with default port 8080
  • Integration into all CLI start commands (indexer, graphql, all)

Health Checks Implemented

  • Database connectivity and health - Verifies PostgreSQL connection status
  • Indexing service status - Monitors active indexing tasks and service state
  • Data synchronization status - Checks PostgreSQL tables or CSV files for data presence
  • Overall system health - Provides timestamped health status with service breakdown

Dependencies Added

  • axum - HTTP server framework for the health endpoint

Configuration

  • Health server enabled by default in new project templates
  • Backward compatible - existing manifests without health configuration use sensible defaults
  • Parallel operation - runs alongside existing services without impacting core indexing functionality

Technical Details

Health Endpoint Response

{
  "status": "healthy",
  "timestamp": "2025-09-04T02:48:40.370651Z",
  "services": {
    "database": "healthy",
    "indexing": "healthy", 
    "sync": "healthy"
  },
  "indexing": {
    "active_tasks": 1,
    "is_running": true
  }
}

Health Server Architecture

  • Isolated postgres client - Health server creates its own database connection for monitoring without interfering with main indexer operations
  • Non-blocking operation - Runs in separate tokio task alongside GraphQL server
  • Graceful error handling - Continues operation even if health checks fail

Bug Fixes

  • Fixed database race condition - Resolved issue where health server was interfering with main indexer's database schema setup
  • Proper client isolation - Health server now only creates monitoring client without calling setup_postgres()

Examples Updated

  • Updated examples to include health server configuration
  • Health server runs on port 8080 by default
  • Accessible at http://localhost:8080/health

Testing

  • ✅ Health server starts successfully alongside indexer
  • ✅ Database connectivity monitoring works
  • ✅ Indexing status monitoring works
  • ✅ No interference with core indexing functionality
  • ✅ Backward compatibility maintained

Closes

Health monitoring requirements #49

…loses joshstevens19#49

- Add new health server module with HTTP endpoint at /health
- Implement health checks for database, indexing, and sync status
- Add health configuration to manifest with default port 8080
- Integrate health server into all CLI start commands (indexer, graphql, all)
- Add axum and port-killer dependencies for HTTP server functionality
- Enable health server by default in new project templates
- Update examples to include health server configuration

The health server provides real-time monitoring of:
- Database connectivity and health
- Indexing service status and active tasks
- Data synchronization status (PostgreSQL tables or CSV files)
- Overall system health with timestamp

All changes are backward compatible - existing manifests without health
configuration will use sensible defaults. Health server runs in parallel
with existing services without impacting core indexing functionality.

Closes: Health monitoring requirements joshstevens19#49
@vercel
Copy link

vercel bot commented Sep 4, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
rindexer-documentation Ready Ready Preview Comment Sep 26, 2025 10:19am

Copy link
Owner

@joshstevens19 joshstevens19 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good PR just need some general cleaning up / making it more readable and a few other changes

refactor: Update health server configuration and checks

- Remove the `enabled` field from `HealthOverrideSettings` in multiple locations.
- Change `status` field in `HealthStatus` to use `HealthStatusType` enum for better type safety.
- Implement detailed health checks for database, indexing, and sync services, returning appropriate health status.
- Add port conflict checks between GraphQL and health servers to prevent runtime issues.
- Update health server initialization to use the new health status checks and ensure it runs only when indexing is active.
- Modify example configurations to reflect changes in health server settings.

These changes enhance the health monitoring capabilities and ensure better integration with existing services.
- Move health configuration to global.health_override_port (always enabled)
- Remove separate health manifest section and HealthSettings struct
- Add port conflict validation between GraphQL and health servers
- Refactor health_handler into smaller, more readable functions
- Optimize database queries using query_one_or_none instead of COUNT(*)
- Use initialize_database function for health server postgres client
- Remove unnecessary comments and improve code organization
- Update CLI to remove health field from manifest creation
- Health server now follows indexer lifecycle (only runs when indexer is running)

Addresses all feedback from PR review:
- Port conflict checking between GraphQL and health servers
- Simplified configuration with health always enabled
- Better code structure and readability
- Performance improvements for database queries
- Proper integration with existing database initialization
@ricardo-perello
Copy link
Contributor Author

Here's the PR text with a small explanation about the SQL query:


PR Review Response - All Feedback Addressed

Hi @joshstevens19! Thanks for the detailed feedback. I've addressed all the issues you highlighted. Here's what I've implemented:

Port Conflict Validation

Issue: Need to cross-check GraphQL vs health server ports to prevent conflicts
Solution: Added port conflict validation in start.rs that checks both ports before starting either service and bails out with a clear error message if they match.

Simplified Configuration Architecture

Issue: Health server configuration was clunky with separate manifest section
Solution:

  • Moved health configuration to global.health_override_port (always enabled by default)
  • Removed separate HealthSettings struct and health.rs manifest file
  • Health server now runs on port 8080 by default, configurable via global settings
  • Much cleaner and more intuitive configuration

Code Quality Improvements

Issue: Large health_handler function was hard to read
Solution: Refactored into smaller, focused functions:

  • build_health_status() - constructs the response object
  • check_database_health() - database connectivity check
  • check_indexing_health() - indexing service status
  • check_sync_health() - data synchronization status
  • determine_overall_status() - calculates overall health

Performance Optimizations

Issue: Database queries were inefficient
Solution:

  • Replaced COUNT(*) queries with query_one_or_none() for better performance
  • Uses EXISTS logic instead of counting all rows
  • Health server now uses initialize_database() function for proper client setup

Database Sync Health Check: The sync health check uses an optimized SQL query that checks for the existence of user data tables while filtering out system tables:

SELECT 1 FROM information_schema.tables 
WHERE table_schema NOT IN ('information_schema', 'pg_catalog', 'rindexer_internal') 
AND table_name NOT LIKE 'latest_block' 
AND table_name NOT LIKE '%_last_known_%' 
AND table_name NOT LIKE '%_last_run_%' 
LIMIT 1

This efficiently determines if any meaningful user data exists (returns 'healthy') or if the indexer hasn't synced any events yet (returns 'no_data').

Better Integration

Issue: Health server wasn't properly integrated with existing systems
Solution:

  • Health server follows indexer lifecycle (only runs when indexer is running)
  • Uses existing initialize_database() function instead of custom postgres setup
  • Proper error handling and graceful degradation
  • No interference with core indexing functionality

The health server is now production-ready with a clean, maintainable codebase that follows all the feedback you provided. All the architectural concerns have been addressed while maintaining the core functionality.

@ricardo-perello ricardo-perello marked this pull request as ready for review September 14, 2025 14:33
Copy link
Owner

@joshstevens19 joshstevens19 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 small comments and also need to add docs about health etc

…nfiguration

- Changed the health check query to use a raw string literal for better readability.
- Updated the RPC endpoint in the rindexer demo CLI configuration to a new URL.

These changes improve the clarity of the health check implementation and ensure the demo configuration points to a valid RPC endpoint.
- Added comprehensive health monitoring sections to AWS, GCP, and Railway deployment guides.
- Included details on health server lifecycle, accessing health endpoints, health status types, and monitoring in production.
- Updated CLI documentation to reflect automatic health server startup and configuration options.
- Improved clarity and consistency in health monitoring information across all relevant documentation.

These updates provide users with better insights into the health monitoring capabilities of rindexer and how to effectively utilize them in various deployment environments.
- Add comprehensive health monitoring guide in start-building section
- Simplify running.mdx by moving detailed health docs to dedicated page
- Add health monitoring sections to all deployment guides (AWS, Railway, GCP)
- Update CLI documentation with health server information
- Fix health server lifecycle documentation for all start modes
- Add detailed service health check explanations based on implementation
- Update navigation to include health monitoring in start-building section
- Remove standalone monitoring section in favor of integrated approach

Improves onboarding experience by keeping running.mdx focused while providing
comprehensive health monitoring documentation when needed.
- Corrected the link to the health monitoring documentation from the old path to the new path in the running.mdx file.
- Ensures users are directed to the correct and updated health monitoring guide.

This change improves the accuracy of documentation references, enhancing the user experience.
- added the health endpoint to the change log

- Consolidated health details initialization in the start command to enhance clarity.
- Simplified struct initialization for `HealthOverrideSettings` across multiple files.
- Improved formatting and organization of code in health server and start modules for better maintainability.
@joshstevens19 joshstevens19 merged commit e8ad16e into joshstevens19:master Sep 26, 2025
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

expose GET /health endpoint

2 participants