You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
fix: health endpoint now detects stopped block production
This commit fixes issue #2643 where the health endpoint still reports
OK when a node has stopped producing blocks.
Changes:
- Updated HealthServer to accept store, config, and logger dependencies
- Implemented block production monitoring in the Livez endpoint:
* For aggregator nodes, checks if LastBlockTime is recent
* Returns WARN if block production is slow (> 3x block time)
* Returns FAIL if block production has stopped (> 5x block time)
* Uses LazyBlockInterval for lazy mode aggregators
* Non-aggregator nodes continue to return PASS
- Added constants for health check thresholds:
* healthCheckWarnMultiplier = 3
* healthCheckFailMultiplier = 5
- Added comprehensive unit tests covering all scenarios:
Server tests (pkg/rpc/server/server_test.go):
* Non-aggregator nodes
* Aggregator with no blocks
* Aggregator with recent blocks (PASS)
* Aggregator with slow production (WARN)
* Aggregator with stopped production (FAIL)
* Lazy aggregator with correct thresholds
* Error handling
Client tests (pkg/rpc/client/client_test.go):
* Non-aggregator returns PASS
* Aggregator with recent blocks returns PASS
* Aggregator with slow block production returns WARN
* Aggregator with stopped block production returns FAIL
- Updated setupTestServer to pass new dependencies
- Added createCustomTestServer helper for testing with custom configs
The thresholds are configurable based on the node's BlockTime or
LazyBlockInterval settings, making the health check adaptive to
different node configurations.
Fixes#2643
0 commit comments