feat(net): expose per-kind reputation-change and ban counters#180
Open
constwz wants to merge 1 commit into
Open
feat(net): expose per-kind reputation-change and ban counters#180constwz wants to merge 1 commit into
constwz wants to merge 1 commit into
Conversation
Adds a `ReputationMetrics` struct (scope `network`) with a `Counter` per `ReputationChangeKind` plus three outcome counters (`bans_total`, `disconnect_and_bans_total`, `unbans_total`), and instruments `PeersManager::apply_reputation_change` to increment them. New Prometheus metrics: network_reputation_changes_bad_message network_reputation_changes_bad_block network_reputation_changes_bad_transactions network_reputation_changes_bad_announcement network_reputation_changes_already_seen_transaction network_reputation_changes_timeout network_reputation_changes_bad_protocol network_reputation_changes_failed_to_connect network_reputation_changes_dropped network_reputation_changes_reset network_reputation_changes_other network_bans_total network_disconnect_and_bans_total network_unbans_total Diagnostic motivation: today, a peer-pool drain induced by reputation bans is invisible at the metrics layer. Banned peers go into the `ban_list`, which has no exposed gauge or counter, and the existing `DisconnectMetrics` counters cannot tell a graceful close apart from a rep-driven disconnect — both increment `disconnect_requested`. Concretely this affects BSC node operators investigating the "peers drop to zero after sync" pattern (bnb-chain/reth-bsc#320): without these counters, distinguishing "we are banning peers because of repeated `BadBlock` penalties" from "peers are leaving us for unrelated reasons" requires log inspection. With them, a single PromQL `rate(network_reputation_changes_bad_block[5m])` correlated against `network_connected_peers` makes the diagnosis a Grafana panel. The kind-counter increments before the trusted-peer / unknown-peer guards so it answers "what's hitting us" — outcome counters answer "did we punish for it". Trusted-peer exemption is preserved. No behaviour change beyond the new counters.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds per-`ReputationChangeKind` and per-outcome counters to the network's `PeersManager`, so reputation-driven peer drops are visible at the Prometheus layer.
Why
Today, a peer-pool drain induced by reputation bans is invisible at the metrics layer:
Concretely this affects BSC node operators investigating the peers drop to zero after sync pattern (bnb-chain/reth-bsc#320). Without these counters, distinguishing "we are banning peers because of repeated `BadBlock` penalties" from "peers are leaving us for unrelated reasons" requires log inspection. With them, a single PromQL `rate(network_reputation_changes_bad_block[5m])` correlated against `network_connected_peers` makes the diagnosis a panel.
New metrics
```
network_reputation_changes_bad_message
network_reputation_changes_bad_block
network_reputation_changes_bad_transactions
network_reputation_changes_bad_announcement
network_reputation_changes_already_seen_transaction
network_reputation_changes_timeout
network_reputation_changes_bad_protocol
network_reputation_changes_failed_to_connect
network_reputation_changes_dropped
network_reputation_changes_reset
network_reputation_changes_other
network_bans_total
network_disconnect_and_bans_total
network_unbans_total
```
Behaviour
Test plan
Refs bnb-chain/reth-bsc#320.