Skip to content

Lots of prometheus metrics has the wrong type #978

Open
@firecow

Description

What version of nebula are you using?

1.7.2

What operating system are you using?

Linux

Describe the Bug

Scraping nebula metrics i have found that a lot of metric entries have incorrect metric types.

curl localhost:9102/metrics
...
# HELP nebula_firewall_incoming_dropped_local_ip firewall.incoming.dropped.local_ip
# TYPE nebula_firewall_incoming_dropped_local_ip gauge
nebula_firewall_incoming_dropped_local_ip 0
# HELP nebula_firewall_incoming_dropped_no_rule firewall.incoming.dropped.no_rule
# TYPE nebula_firewall_incoming_dropped_no_rule gauge
nebula_firewall_incoming_dropped_no_rule 1693
# HELP nebula_firewall_incoming_dropped_remote_ip firewall.incoming.dropped.remote_ip
# TYPE nebula_firewall_incoming_dropped_remote_ip gauge
nebula_firewall_incoming_dropped_remote_ip 0
# HELP nebula_firewall_outgoing_dropped_local_ip firewall.outgoing.dropped.local_ip
# TYPE nebula_firewall_outgoing_dropped_local_ip gauge
nebula_firewall_outgoing_dropped_local_ip 37005
# HELP nebula_firewall_outgoing_dropped_no_rule firewall.outgoing.dropped.no_rule
# TYPE nebula_firewall_outgoing_dropped_no_rule gauge
nebula_firewall_outgoing_dropped_no_rule 0
# HELP nebula_firewall_outgoing_dropped_remote_ip firewall.outgoing.dropped.remote_ip
# TYPE nebula_firewall_outgoing_dropped_remote_ip gauge
nebula_firewall_outgoing_dropped_remote_ip 0
...
# HELP nebula_handshake_manager_initiated handshake_manager.initiated
# TYPE nebula_handshake_manager_initiated gauge
nebula_handshake_manager_initiated 129
# HELP nebula_handshake_manager_timed_out handshake_manager.timed_out
# TYPE nebula_handshake_manager_timed_out gauge
nebula_handshake_manager_timed_out 85
...
# HELP nebula_messages_rx_recv_error messages.rx.recv_error
# TYPE nebula_messages_rx_recv_error gauge
nebula_messages_rx_recv_error 51
# HELP nebula_messages_tx_punchy messages.tx.punchy
# TYPE nebula_messages_tx_punchy gauge
nebula_messages_tx_punchy 1.359801e+06
# HELP nebula_messages_tx_recv_error messages.tx.recv_error
# TYPE nebula_messages_tx_recv_error gauge
nebula_messages_tx_recv_error 39
# HELP nebula_network_packets_duplicate network.packets.duplicate
# TYPE nebula_network_packets_duplicate gauge
nebula_network_packets_duplicate 0
# HELP nebula_network_packets_lost network.packets.lost
# TYPE nebula_network_packets_lost gauge
nebula_network_packets_lost 765710
# HELP nebula_network_packets_out_of_window network.packets.out_of_window
# TYPE nebula_network_packets_out_of_window gauge
nebula_network_packets_out_of_window 0
...

At least these metrics needs to be changed from gauge to counter

The help comments aren't that helpful either 😄 But I can live with that.

Some parts of the code know that certain metrics are counters https://github.com/slackhq/nebula/blob/master/bits.go#L13, but for some reason it's not being propagated to the prometheus output properly.

Logs from affected hosts

No response

Config files from affected hosts

No response

Metadata

Assignees

No one assigned

    Labels

    NeedsDecisionFeedback is required from experts, contributors, and/or the community before a change can be made.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions