Network metrics per client #7445

iri031 · 2025-05-12T20:18:23Z

Issue Addressed

Which issue # does this PR address?
#7389

Proposed Changes

Metrics for:

Blocks received and imported per client
Sync time per client
Bytes and messages received per client

dapplion · 2025-05-14T20:09:50Z

beacon_node/network/src/network_beacon_processor/gossip_methods.rs

@@ -214,6 +214,19 @@ impl<T: BeaconChainTypes> NetworkBeaconProcessor<T> {
        reprocess_tx: Option<mpsc::Sender<ReprocessQueueMessage>>,
        seen_timestamp: Duration,
    ) {
+        let attestation_size = attestation.as_ssz_bytes().len() as u64;


This calls force us to spend cycles re-serializing a lot of SSZ objects. To track this for free we should do the tracking in the lighthouse-network layer which I'm not sure has access to peer client information. If not possible to track this for free skip tracking bytes per client for now

Moved to lighthouse-network

dapplion · 2025-05-14T20:10:42Z

beacon_node/network/src/network_beacon_processor/gossip_methods.rs

+
+        metrics::inc_counter_vec(
+            &metrics::MESSAGES_RECEIVED_PER_CLIENT,
+            &[&client]


You should label this metric by [client, object_type] where object type could be

gossip_attestation

gossip_block

range_sync_block

backfill_sync_block

etc

dapplion · 2025-05-14T20:12:33Z

beacon_node/network/src/network_beacon_processor/gossip_methods.rs

@@ -241,6 +254,21 @@ impl<T: BeaconChainTypes> NetworkBeaconProcessor<T> {
        packages: GossipAttestationBatch<T::EthSpec>,
        reprocess_tx: Option<mpsc::Sender<ReprocessQueueMessage>>,
    ) {
+        for package in &packages {


You can move attestation metrics to process_gossip_attestation_result and only track valid objects

dapplion · 2025-05-14T20:14:31Z

beacon_node/network/src/network_beacon_processor/gossip_methods.rs

+        metrics::inc_counter_vec(
+            &metrics::MESSAGES_RECEIVED_PER_CLIENT,
+            &[&client]
+        );


You could add a helper method

fn track_messages_received_per_client(&self, peer_id: PeerId, object_type: &'static str) { let client = self.network_globals.client(&peer_id).kind.to_string(); metrics::inc_counter_vec( &metrics::MESSAGES_RECEIVED_PER_CLIENT, &[&client, object_type] ); }

to de-duplicate a lot of code

dapplion · 2025-05-14T20:16:23Z

beacon_node/lighthouse_network/src/peer_manager/peerdb.rs

@@ -169,6 +169,11 @@ impl<E: EthSpec> PeerDB<E> {
        }
    }

+    /// Returns the sync start time of the peer if it exists
+    pub fn sync_start_time(&self, peer_id: &PeerId) -> Option<&Instant> {
+        self.peers.get(peer_id).and_then(|info| info.sync_start_time())


I don't follow the meaning/purpose of sync start time, could you detail it?

I added this to calculate sync time per client. When a sync request is received I update this parameter to record the time when the sync started here and later when the state of the peer changes to is_synced here I calculate the time taken to include in the metric.

dapplion · 2025-05-14T20:17:31Z

beacon_node/network/src/sync/network_context.rs

+            .peer_info(peer_id)
+            .map(|info| info.client().version.clone())
+            .unwrap_or_default()
+    }


You have a single metric, not one per client type and one for client version. In mainnet is version field set for most clients? Could you test this and report back? If so, to what?

How can I test for the mainnet?

One potential way to do it would be to test it directly in the mainnet with temporary logging to you're lighthouse client>

first add analysis to network_contexte:

// Add to SyncNetworkContext impl pub fn analyze_peer_versions_sample(&self) -> (usize, usize, Vec<String>) { let peers: Vec<PeerId> = self.network_globals() .peers .read() .connected_peers() .take(50) // sample 50 but you can adjust that .cloned() .collect(); let mut with_version = 0; let mut without_version = 0; let mut examples = Vec::new(); for peer_id in peers { let version = self.client_version(&peer_id); let client_type = self.client_type(&peer_id); if version.is_empty() { without_version += 1; } else { with_version += 1; if examples.len() < 8 { examples.push(format!("{}: {}", client_type, version)); } } } (with_version, without_version, examples) }

then call this function every X min in sync manager:

// Add this in main sync loop or network service let mut last_version_check = Instant::now(); // Inside your main loop: if last_version_check.elapsed() > Duration::from_secs(300) { // here I took 5 min let (with_ver, without_ver, examples) = self.network_context.analyze_peer_versions_sample(); let total = with_ver + without_ver; let percentage = if total > 0 { (with_ver as f64 / total as f64) * 100.0 } else { 0.0 }; info!("MAINNET VERSION TEST: {}/{} peers have versions ({:.1}%). Examples: {:?}", with_ver, total, percentage, examples); last_version_check = Instant::now(); }

and then build and enable the log to get the info

cargo build --release ./target/release/lighthouse bn \ --network mainnet \ --datadir /your/datadir \ --http \ --log-level info \ 2>&1 | tee mainnet_version_test.log

you should get something like: MAINNET VERSION TEST: 42/50 peers have versions (84.0%). Examples: ["xx"]

there is around 8200 CL nodes that you can communicate with in the mainnet, so adjusting the sample might be good

Another way could be to use a crawler, use discv5 to discover nodes (starting from boot-nodes) and then libp2p to the discovered nodes to look at the version field (first option might be better, it depends).

Hi again, I modified my crawler to get the data that we needed and made it runs for the past few hours. Here is what I got @dapplion

got "Unknown" client names result from parsing failures. It makes the data inaccurate, my bad for that

to get better data you can look here -> https://monitoreth.io/nodes

…metrics-by-client

dapplion · 2025-05-26T22:03:39Z

FYI @iri031 there are still some unaddressed comment

initial instrumentation

de962e6

iri031 requested a review from jxs as a code owner May 12, 2025 20:18

iri031 changed the title ~~initial instrumentation~~ Network metrics per client May 12, 2025

minor fix

9221645

dapplion reviewed May 14, 2025

View reviewed changes

iri031 added 4 commits May 20, 2025 21:52

add bytes received metrics in lighthouse-network

d5690b8

remove code duplication

caac880

Merge branch 'unstable' of https://github.com/iri031/lighthouse into …

85bd42f

…metrics-by-client

refactor

fb590e1

iri031 added 3 commits June 1, 2025 08:35

remove de-dup

4cfddbb

fix

46a31bb

refactor

817d88b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Network metrics per client #7445

Network metrics per client #7445

Uh oh!

iri031 commented May 12, 2025 •

edited

Loading

Uh oh!

dapplion May 14, 2025

Uh oh!

iri031 May 25, 2025

Uh oh!

dapplion May 14, 2025

Uh oh!

dapplion May 14, 2025

Uh oh!

dapplion May 14, 2025

Uh oh!

dapplion May 14, 2025

Uh oh!

iri031 Jun 1, 2025

Uh oh!

dapplion May 14, 2025

Uh oh!

iri031 Jun 1, 2025

Uh oh!

gitToki Jun 7, 2025 •

edited

Loading

Uh oh!

gitToki Jun 8, 2025

Uh oh!

dapplion commented May 26, 2025

Uh oh!

Uh oh!

Network metrics per client #7445

Are you sure you want to change the base?

Network metrics per client #7445

Uh oh!

Conversation

iri031 commented May 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Issue Addressed

Proposed Changes

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gitToki Jun 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dapplion commented May 26, 2025

Uh oh!

Uh oh!

iri031 commented May 12, 2025 •

edited

Loading

gitToki Jun 7, 2025 •

edited

Loading