Skip to content

Infiniband metrics: still not collected when irdma is loaded (PE 1.7.0) #2846

Open
@mtds

Description

@mtds

Host operating system: output of uname -a

Linux (...) 4.18.0-477.27.1.el8_8.x86_64 #1 SMP Wed Sep 20 15:55:39 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

Host operating system: Rocky Linux 8.8

node_exporter version: output of node_exporter --version

~$ node_exporter --version
node_exporter, version 1.7.0 (branch: HEAD, revision: 7333465abf9efba81876303bb57e6fadb946041b)
  build user:       root@35918982f6d8
  build date:       20231112-23:53:35
  go version:       go1.21.4
  platform:         linux/amd64
  tags:             netgo osusergo static_build

node_exporter command line flags

--no-collector.arp --collector.netdev.device-include=ib0 \
--collector.textfile.directory /var/lib/prometheus/node-exporter/textfile_collector \
--collector.filesystem.ignored-mount-points=^/(sys|proc|dev|run|cvmfs|d|u|lustre|WWW|etc|misc)($|/)

node_exporter log output

  • At launch, on a test run with a difference default port used for listening:
~# node_exporter --web.disable-exporter-metrics --web.listen-address=":9111" --log.level=debug --collector.disable-defaults --collector.infiniband
ts=2023-11-13T08:50:17.916Z caller=node_exporter.go:192 level=info msg="Starting node_exporter" version="(version=1.7.0, branch=HEAD, revision=7333465abf9efba81876303bb57e6fadb946041b)"
ts=2023-11-13T08:50:17.916Z caller=node_exporter.go:193 level=info msg="Build context" build_context="(go=go1.21.4, platform=linux/amd64, user=root@35918982f6d8, date=20231112-23:53:35, tags=netgo osusergo static_build)"
ts=2023-11-13T08:50:17.916Z caller=node_exporter.go:195 level=warn msg="Node Exporter is running as root user. This exporter is designed to run as unprivileged user, root is not required."
ts=2023-11-13T08:50:17.916Z caller=node_exporter.go:198 level=debug msg="Go MAXPROCS" procs=1                                                                                                                                                                 
ts=2023-11-13T08:50:17.916Z caller=node_exporter.go:110 level=info msg="Enabled collectors"                                                                                                                                                                   
ts=2023-11-13T08:50:17.916Z caller=node_exporter.go:117 level=info collector=infiniband                                                                                                                                                                       
ts=2023-11-13T08:50:17.923Z caller=tls_config.go:274 level=info msg="Listening on" address=0.0.0.0:9111                                                                                                                                                       
ts=2023-11-13T08:50:17.923Z caller=tls_config.go:277 level=info msg="TLS is disabled." http2=false address=0.0.0.0:9111

Are you running node_exporter in Docker?

No.

What did you do that produced an error?

There's no error whatsoever: the exporter is just not able to collect IB metrics (see next section).

What did you expect to see?

When the irdma module is not loaded, Node Exporter correctly collects and reports IB metrics:

ts=2023-11-14T10:56:03.868Z caller=node_exporter.go:78 level=debug msg="collect query:" filters="unsupported value type"
ts=2023-11-14T10:56:03.874Z caller=collector.go:173 level=debug msg="collector succeeded" name=infiniband duration_seconds=0.006788827

What did you see instead?

Infiniband metrics are not collected when the irdma module is loaded:

(...)
ts=2023-11-13T08:50:33.312Z caller=node_exporter.go:78 level=debug msg="collect query:" filters="unsupported value type"                                                                                                                                      
ts=2023-11-13T08:50:33.312Z caller=infiniband_linux.go:119 level=debug collector=infiniband msg="infiniband statistics not found, skipping"                                                                                                                   
ts=2023-11-13T08:50:33.313Z caller=collector.go:167 level=debug msg="collector returned no data" name=infiniband duration_seconds=0.000573153 err="collector returned no data"

Workaround

  • Explicitly unload the irdma module:
modprobe -r irdma

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions