Skip to content

Host-wide measures are double-counted in the GUI #7098

Open
@crusaderky

Description

@crusaderky

There are some problems with totalizations in the Bokeh GUI whenever there's more than one worker per host. They are particularly glaring on LocalClusters.

  • In the Workers tab:
    • The cluster total for the columns net read, net write, disk read and disk write sums up the value for each worker. However, these are host-wide measures, so if two or more workers sit on the same host, the total will be double-counted.
    • The cluster total for the columns gpu_memory_used and gpu_utilization sums up the value for each worker. However, two workers may share the same GPU, depending on the CUDA_VISIBLE_DEVICES environment variable (see nvml.py). If the variable is not set and in general on single-GPU hosts, all workers on the same host will share the same GPU. Again, this leads to double-counting.
    • The cluster total of the column event_loop_interval is a sum of the workers. This makes no sense; it should be a mean.
  • The More... -> Workers Disk and More... -> Workers Network tabs show one bar per worker. This is misleading; there should be one bar per host.
  • The More... -> GPU Memory and More... -> GPU Utilization tabs show one bar per worker. This is misleading; there should be one bar per GPU.

Metadata

Metadata

Assignees

No one assigned

    Labels

    good second issueClearly described, educational, but less trivial than "good first issue".

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions