How to generate CPU and GPU utilization time series for cluster-trace-gpu-v2020 dataset

Please, what is the correct table and approach to generate a time series of `CPU` and `GPU` utilization (with respect to total cluster `CPU` and `GPU` availability) from the `cluster-trace-gpu-v2020` dataset? Currently, I am joining `pai_machine_metric.csv` and `pai_machine_spec.csv` tables then calculating the 1-minute utilization as:

```python
# generate timestamps with relevant resolution with respect to earliest start time and latest end time
date_range = pd.date_range(df['start_time'].min(), df['end_time'].max(), freq='60s')
records = []

for d in date_range:
  # find all records in pai_machine_metric where current timestamp, d, is within start and end time
  match_df = df[(df['start_time']<d) &(df['end_time']>d)].copy()

  # calculate the total number of CPUs and GPUs being utilized at current timestamp
  # then divide by the total number of available CPUs and GPUs to get the utilization between [0, 1]
  cpu_utlization = (match_df['machine_cpu']*match_df['cap_cpu']/100).sum()/match_df['cap_cpu'].sum()
  gpu_utilization = (match_df['machine_gpu']/100).sum()/match_df['cap_gpu'].sum()
  records.append(dict(
    timestamp=d,
    cpu_utilization=cpu_utilization,
    gpu_utilization=gpu_utilization,
  ))

utilization_df = pd.DataFrame(records)
```

Is this a correct way to about it or should I be making use of a different table and/or approach?

Also, please could you clarify what the `machine_load_1` variable in `pai_machine_metric` is reporting? Specifically, what is the `load` referring to?

Lastly, I am finding datapoints where the `cap_gpu` is less than the `machine_gpu/100` value, implying that more GPUs were utilized than available on the machine. How should I interpret such datapoints?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to generate CPU and GPU utilization time series for cluster-trace-gpu-v2020 dataset #226

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

How to generate CPU and GPU utilization time series for cluster-trace-gpu-v2020 dataset #226

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions