Skip to content

DCGM exporter v4 always sets Hostname label to localhost #71

@Deezzir

Description

@Deezzir

The DCGM exporter changed the logic of setting the Hostname label in v4. The label is used in the Grafana Dashboard exported by the hardware-observer. If its value is localhost (currently the case for the exported metrics by DCGM snap v4), the dashboard will be broken, and no data on the graphs will be shown.

Problem

The problem is that we always set localhost as the host for the dcgm-exporter to connect to the nv-hostengine service. Reference.

The upstream dcgm-exporter will set the label to use the hostname provided through the argument -r. See this and this.

Because we always set the argument when starting the exporter, the label will always be set to localhost.

Solution 1

I propose to add a new boolean snap config option dcgm-exporter-use-hostname which, instead of localhost, will use the hostname command if set, ie:

if [ -n "$nv_hostengine_port" ]; then
    if [ "$dcgm_exporter_use_hostname" = "true" ]; then
        args+=("-r" "$(hostname):$nv_hostengine_port")
    else
        args+=("-r" "localhost:$nv_hostengine_port")
    fi
fi

The dcgm-exporter service will still be able to connect to the nv-hosteninge service as the hostname will correctly resolve to localhost in this case, and the correct value for the label will be set, allowing the dashboard to work correctly.

HOWEVER, this will only work if there is an entry in /etc/hosts :

...
127.0.0.1 {hostname}
...

On my local machine, I have this entry, but on the swob machine from Testflinger, there are only the following entries:

127.0.1.1 swob.maas swob
127.0.0.1 localhost
...

The exporter will fail to start on that machine until I manually add the entry.

Solution 2

Use juju_instance to filter by host in the GPU dashboard. I am not completely sure how it works and will juju set the correct labels, but in my testing environments, I only see the label set to localhost:9400.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions