Open
Description
It can be somewhat hard to determine when / why the scheduler decides to scale the cluster under adaptive mode. Ideally a dashboard page could shed some light here.
We currently have /json/counts.json
which provides desired_workers
. I think that's it.
I think there are two main pieces of information to convey:
- Stock: The current state of things including current CPU load, current CPU capacity, and the current desired CPU capacity. Likewise for memory
- Flow: The history of decisions on when to scale up / down the cluster (ideally with information on why those decisions were made (the state at that time)
Here's a rough sketch for number 1.
cc @rsignell-usgs, @jsignell for adaptive things, and @jacobtomlinson for dashboard design things.